idnits 2.17.1 draft-ietf-rtgwg-cl-use-cases-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 4, 2013) is 4089 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-04) exists of draft-ietf-rtgwg-cl-framework-00 == Outdated reference: A later version (-16) exists of draft-ietf-rtgwg-cl-requirement-07 -- Obsolete informational reference (is this intentional?): RFC 1717 (Obsoleted by RFC 1990) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG S. Ning 3 Internet-Draft Tata Communications 4 Intended status: Informational A. Malis 5 Expires: August 8, 2013 D. McDysan 6 Verizon 7 L. Yong 8 Huawei USA 9 C. Villamizar 10 Outer Cape Cod Network 11 Consulting 12 February 4, 2013 14 Composite Link Use Cases and Design Considerations 15 draft-ietf-rtgwg-cl-use-cases-02 17 Abstract 19 This document provides a set of use cases and design considerations 20 for composite links. 22 Composite link is a formalization of multipath techniques currently 23 in use in IP and MPLS networks and a set of extensions to multipath 24 techniques. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on August 8, 2013. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Conventions used in this document . . . . . . . . . . . . . . 3 62 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 63 3. Composite Link Foundation Use Cases . . . . . . . . . . . . . 4 64 4. Delay Sensitive Applications . . . . . . . . . . . . . . . . . 7 65 5. Large Volume of IP and LDP Traffic . . . . . . . . . . . . . . 7 66 6. Composite Link and Packet Ordering . . . . . . . . . . . . . . 8 67 6.1. MPLS-TP in network edges only . . . . . . . . . . . . . . 10 68 6.2. Composite Link at core LSP ingress/egress . . . . . . . . 11 69 6.3. MPLS-TP as a MPLS client . . . . . . . . . . . . . . . . . 12 70 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 12 72 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 73 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 74 10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 75 10.2. Informative References . . . . . . . . . . . . . . . . . . 13 76 Appendix A. More Details on Existing Network Operator 77 Practices and Protocol Usage . . . . . . . . . . . . 15 78 Appendix B. Existing Multipath Standards and Techniques . . . . . 17 79 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 18 80 B.2. Static and Dynamic Load Balancing Multipath . . . . . . . 19 81 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 20 82 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 20 83 Appendix C. Characteristics of Transport in Core Networks . . . . 20 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 86 1. Introduction 88 Composite link requirements are specified in 89 [I-D.ietf-rtgwg-cl-requirement]. A composite link framework is 90 defined in [I-D.ietf-rtgwg-cl-framework]. 92 Multipath techniques have been widely used in IP networks for over 93 two decades. The use of MPLS began more than a decade ago. 94 Multipath has been widely used in IP/MPLS networks for over a decade 95 with very little protocol support dedicated to effective use of 96 multipath. 98 The state of the art in multipath prior to composite links is 99 documented in Appendix B. 101 Both Ethernet Link Aggregation [IEEE-802.1AX] and MPLS link bundling 102 [RFC4201] have been widely used in today's MPLS networks. Composite 103 link differs in the following caracteristics. 105 1. A composite link allows bundling of non-homogenous links together 106 as a single logical link. 108 2. A composite link provides more information in the TE-LSDB and 109 supports more explicit control over placement of LSP. 111 2. Conventions used in this document 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in RFC 2119 [RFC2119]. 117 2.1. Terminology 119 Terminology defined in [I-D.ietf-rtgwg-cl-requirement] is used in 120 this document. 122 In addition, the following terms are used: 124 classic multipath: 125 Classic multipath refers to the most common current practice in 126 implementation and deployment of multipath (see Appendix A). The 127 most common current practice makes use of a hash on the MPLS 128 label stack and if IPv4 or IPv6 are indicates under the label 129 stack, makes use of the IP source and destination addresses 130 [RFC4385] [RFC4928]. 132 classic link bundling: 133 Classic link bundling refers to the use of [RFC4201] where the 134 "all ones" component is not used. Where the "all ones" component 135 is used, link bundling behaves as classic multipath does. 136 Classic link bundling selects a single component link on which to 137 put any given LSP. 139 Among the important distinctions between classic multipath or classic 140 link bundling and Composite Link are: 142 1. Classic multipath has no provision to retain order among flows 143 within a subset of LSP. Classic link bundling retains order 144 among all flows but as a result does a poor job of splitting load 145 among components and therefore is rarely (if ever) deployed. 146 Composite Link allows per LSP control of load split 147 characteristics. 149 2. Classic multipath and classic link bundling do not provide a 150 means to put some LSP on component links with lower delay. 151 Composite Link does. 153 3. Classic multipath will provide a load balance for IP and LDP 154 traffic. Classic link bundling will not. Neither classic 155 multipath or classic link bundling will measure IP and LDP 156 traffic and reduce the advertised "Available Bandwidth" as a 157 result of that measurement. Composite Link better supports 158 RSVP-TE used with significant traffic levels of native IP and 159 native LDP. 161 4. Classic link bundling cannot support an LSP that is greater in 162 capacity than any single component link. Classic multipath and 163 Composite Link support this capability but will reorder traffic 164 on such an LSP. Composite Link can retain order of an LSP that 165 is carried within an LSP that is greater in capacity than any 166 single component link if the contained LSP has such a 167 requirement. 169 None of these techniques, classic multipath, classic link bundling, 170 or Composite Link, will reorder traffic among IP microflows. None of 171 these techniques will reorder traffic among PW, if a PWE3 Control 172 Word is used [RFC4385]. 174 3. Composite Link Foundation Use Cases 176 A simple composite link composed entirely of physical links is 177 illustrated in Figure 1, where a composite link is configured between 178 LSR1 and LSR2. This composite link has three component links. 180 Individual component links in a composite link may be supported by 181 different transport technologies such as wavelength, Ethernet VLAN. 182 Even if the transport technology implementing the component links is 183 identical, the characteristics (e.g., bandwidth, latency) of the 184 component links may differ. 186 The composite link in Figure 1 may carry LSP traffic flows and 187 control plane packets. Control plane packets may appear as IP 188 packets or may be carried within a generic associated channel (G-Ach) 189 [RFC5586]. A LSP may be established over the link by either RSVP-TE 190 [RFC3209] or LDP [RFC5036] signaling protocols. All component links 191 in a composite link are summarized in the same forwarding adjacency 192 LSP (FA-LSP) routing advertisement [RFC3945]. The composite link is 193 summarized as one TE-Link advertised into the IGP by the composite 194 link end points. This information is used in path computation when a 195 full MPLS control plane is in use. The individual component links or 196 groups of component links may optionally be advertised into the IGP 197 as sub-TLV of the composite link advertisement to indicate capacity 198 available with various characteristics, such as a delay range. 200 Management Plane 201 Configuration and Measurement <------------+ 202 ^ | 203 | | 204 +-------+-+ +-+-------+ 205 | | | | | | 206 CP Packets V | | V CP Packets 207 | V | | Component Link 1 | | ^ | 208 | | |=|===========================|=| | | 209 | +----| | Component Link 2 | |----+ | 210 | |=|===========================|=| | 211 Aggregated LSPs | | | | | 212 ~|~~~~~~>| | Component Link 3 | |~~~~>~~|~~ 213 | |=|===========================|=| | 214 | | | | | | 215 | LSR1 | | LSR2 | 216 +---------+ +---------+ 217 ! ! 218 ! ! 219 !<------ Composite Link ------->! 221 Figure 1: a composite link constructed with multiple physical links 222 between two LSR 224 [I-D.ietf-rtgwg-cl-requirement] specifies that component links may 225 themselves be composite links. Figure 2 shows three three forms of 226 component links which may be deployed in a network. 228 +-------+ 1. Physical Link +-------+ 229 | |-|----------------------------------------------|-| | 230 | | | | | | 231 | | | +------+ +------+ | | | 232 | | | | MPLS | 2. Logical Link | MPLS | | | | 233 | |.|.... |......|.....................|......|....|.| | 234 | | |-----| LSR3 |---------------------| LSR4 |----| | | 235 | | | +------+ +------+ | | | 236 | | | | | | 237 | | | | | | 238 | | | +------+ +------+ | | | 239 | | | |GMPLS | 3. Logical Link |GMPLS | | | | 240 | |.|. ...|......|.....................|......|....|.| | 241 | | |-----| LSR5 |---------------------| LSR6 |----| | | 242 | | +------+ +------+ | | 243 | LSR1 | | LSR2 | 244 +-------+ +-------+ 245 |<------------- Composite Link ------------------->| 247 Figure 2: Illustration of Various Component Link Types 249 The three forms of component link shown in Figure 2 are: 251 1. The first component link is configured with direct physical 252 media. 254 2. The second component link is a TE tunnel that traverses LSR3 and 255 LSR4, where LSR3 and LSR4 are the nodes supporting MPLS, but 256 supporting few or no GMPLS extensions. 258 3. The third component link is formed by lower layer network that 259 has GMPLS enabled. In this case, LSR5 and LSR6 are not the nodes 260 controlled by the MPLS but provide the connectivity for the 261 component link. 263 A composite link forms one logical link between connected LSR and is 264 used to carry aggregated traffic [I-D.ietf-rtgwg-cl-requirement]. 265 Composite link relies on its component links to carry the traffic 266 over the composite link. The endpoints of the composite link maps 267 incoming traffic into component links. 269 For example, LSR1 in Figure 1 distributes the set of traffic flows 270 including control plane packets among the set of component links. 271 LSR2 in Figure 1 receives the packets from its component links and 272 sends them to MPLS forwarding engine with no attempt to reorder 273 packets arriving on different component links. The traffic in the 274 opposite direction, from LSR2 to LSR1, is distributed across the set 275 of component links by the LSR2. 277 These three forms of component link are only example. Many other 278 examples are possible. A component link may itself be a composite 279 link. A segment of an LSP (single hop for that LSP) may be a 280 composite link. 282 4. Delay Sensitive Applications 284 Most applications benefit from lower delay. Some types of 285 applications are far more sensitive than others. For example, real 286 time bidirectional applications such as voice communication or two 287 way video conferencing are far more sensitive to delay than 288 unidirectional streaming audio or video. Non-interactive bulk 289 transfer is almost insensitive to delay if a large enough TCP window 290 is used. 292 Some applications are sensitive to delay but unwilling to pay extra 293 to insure lower delay. For example, many SIP end users are willing 294 to accept the delay offerred to best effort services as long as call 295 quality is good most of the time. 297 Other applications are sensitive to delay and willing to pay extra to 298 insure lower delay. For example, financial trading applications are 299 extremely sensitive to delay and with a lot at stake are willing to 300 go to great lengths to reduce delay. 302 Among the requirements of Composite Link are requirements to 303 advertise capacity available within configured ranges of delay within 304 a given composite link and the support the ability to place an LSP 305 only on component links that meeting that LSP's delay requirements. 307 The Composite Link requirements to accommodate delay sensitive 308 applications are analogous to diffserv requirements to accomodate 309 applications requiring higher quality of service on the same 310 infrastructure as applications with less demanding requirements. The 311 ability to share capacity with less demanding applications, with best 312 effort applications being the least demanding, can greatly reduce the 313 cost of delivering service to the more demanding applications. 315 5. Large Volume of IP and LDP Traffic 317 IP and LDP do not support traffic engineering. Both make use of a 318 shortest (lowest routing metric) path, with an option to use equal 319 cost multipath (ECMP). Note that though ECMP is prohibited in LDP 320 specifications, it is widely implemented. Where implemented for LDP, 321 ECMP is generally disabled by default for standards compliance, but 322 often enabled in LDP deployments. 324 Without traffic engineering capability, there must be sufficient 325 capacity to accomodate the IP and LDP traffic. If not, persistent 326 queuing delay and loss will occur. Unlike RSVP-TE, a subset of 327 traffic cannot be routed using constraint based routing to avoid a 328 congested portion of an infrastructure. 330 In existing networks which accomodate IP and/or LDP with RSVP-TE, 331 either the IP and LDP can be carried over RSVP-TE, or where the 332 traffic contribution of IP and LDP is small, IP and LDP can be 333 carried native and the effect on RSVP-TE can be ignored. Ignoring 334 the traffic contribution of IP is certainly valid on high capacity 335 networks where native IP is used primarily for control and network 336 management and customer IP is carried within RSVP-TE. 338 Where it is desireable to carry native IP and/or LDP and IP and/or 339 LDP traffic volumes are not negligible, RSVP-TE needs improvement. 340 The enhancement offerred by Composite Link is an ability to measure 341 the IP and LDP, filter the measurements, and reduce the capacity 342 available to RSVP-TE to avoid congestion. The treatment given to the 343 IP or LDP traffic is similar to the treatment when using the "auto- 344 bandwidth" feature in some RSVP-TE implementations on that same 345 traffic, and giving a higher priority (numerically lower setup 346 priority and holding priority value) to the "auto-bandwidth" LSP. 347 The difference is that the measurement is made at each hop and the 348 reduction in advertised bandwidth is made more directly. 350 6. Composite Link and Packet Ordering 352 A strong motivation for Composite Link is the need to provide LSP 353 capacity in IP backbones that exceeds the capacity of single 354 wavelengths provided by transport equipment and exceeds the practical 355 capacity limits acheivable through inverse multiplexing. Appendix C 356 describes characteristics and limitations of transport systems today. 357 Section 2 defines the terms "classic multipath" and "classic link 358 bundling" used in this section. 360 For purpose of discussion, consider two very large cities, city A and 361 city Z. For example, in the US high traffic cities might be New York 362 and Los Angeles and in Europe high traffic cities might be London and 363 Amsterdam. Two other high volume cities, city B and city Y may share 364 common provider core network infrastructure. Using the same 365 examples, the city B and Y may Washington DC and San Francisco or 366 Paris and Stockholm. In the US, the common infrastructure may span 367 Denver, Chicago, Detroit, and Cleveland. Other major traffic 368 contributors on either US coast include Boston, northern Virginia on 369 the east coast, and Seattle, and San Diego on the west coast. The 370 capacity of IP/MPLS links within the shared infrastructure, for 371 example city to city links in the Denver, Chicago, Detroit, and 372 Cleveland path in the US example, have capacities for most of the 373 2000s decade that greatly exceeded single circuits available in 374 transport networks. 376 For a case with four large traffic sources on either side of the 377 shared infrastructure, up to sixteen core city to core city traffic 378 flows in excess of transport circuit capacity may be accomodated on 379 the shared infrastructure. 381 Today the most common IP/MPLS core network design makes use of very 382 large links which consist of many smaller component links, but use 383 classic multipath techniques rather than classic link bundling or 384 Composite Link. A component link typically corresponds to the 385 largest circuit that the transport system is capable of providing (or 386 the largest cost effective circuit). IP source and destination 387 address hashing is used to distribute flows across the set of 388 component links as described in Appendix B.3. 390 Classic multipath can handle large LSP up to the total capacity of 391 the multipath (within limits, see Appendix B.2). A disadvantage of 392 classic multipath is the reordering among traffic within a given core 393 city to core city LSP. While there is no reordering within any 394 microflow and therefore no customer visible issue, MPLS-TP cannot be 395 used across an infrastructure where classic multipath is in use, 396 except within pseudowires. 398 These capacity issues force the use of classic multipath today. 399 Classic multipath excludes a direct use of MPLS-TP. The desire for 400 OAM, offerred by MPLS-TP, is in conflict with the use of classic 401 multipath. There are a number of alternatives that satisfy both 402 requirements. Some alternatives are described below. 404 MPLS-TP in network edges only 406 A simple approach which requires no change to the core is to 407 disallow MPLS-TP across the core unless carried within a 408 pseudowire (PW). MPLS-TP may be used within edge domains where 409 classic multipath is not used. PW may be signaled end to end 410 using single segment PW (SS-PW), or stitched across domains using 411 multisegment PW (MS-PW). The PW and anything carried within the 412 PW may use OAM as long as fat-PW [RFC6391] load splitting is not 413 used by the PW. 415 Composite Link at core LSP ingress/egress 417 The interior of the core network may use classic link bundling, 418 with the limitation that no LSP can exceed the capacity of a 419 single circuit. Larger non-MPLS-TP LSP can be configured using 420 multiple ingress to egress component MPLS-TP LSP. This can be 421 accomplished using existing IP source and destination address 422 hashing configured at LSP ingress and egress, or using Composite 423 Link configured at ingress and egress. Each component LSP, if 424 constrained to be no larger than the capacity of a single 425 circuit. can make use of MPLS-TP and offer OAM for all top level 426 LSP across the core. 428 MPLS-TP as a MPLS client 430 A third approach involves modifying the behavior of LSR in the 431 interior of the network core, such that MPLS-TP can be used on a 432 subset of LSP, where the capacity of any one LSP within that 433 MPLS-TP subset of LSP is not larger than the capacity of a single 434 circuit. This requirement is accommodated through a combination 435 of signaling to indicate LSP for which traffic splitting needs to 436 be constrained, the ability to constrain the depth of the label 437 stack over which traffic splitting can be applied on a per LSP 438 basis, and the ability to constrain the use of IP addresses below 439 the label stack for traffic splitting also on a per LSP basis. 441 The above list of alternatives allow packet ordering within an LSP to 442 be maintained in some circumstances and allow very large LSP 443 capacities. Each of these alternatives are discussed further in the 444 following subsections. 446 6.1. MPLS-TP in network edges only 448 Classic MPLS link bundling is defined in [RFC4201] and has existed 449 since early in the 2000s decade. Classic MPLS link bundling place 450 any given LSP entirely on a single component link. Classic MPLS link 451 bundling is not in widespread use as the means to accomodate large 452 link capacities in core networks due to the simplicity and better 453 multiplexing gain, and therefore lower network cost of classic 454 multipath. 456 If MPLS-TP OAM capability in the IP/MPLS network core LSP is not 457 required, then there is no need to change existing network designs 458 which use classic multipath and both label stack and IP source and 459 destination address based hashing as a basis for load splitting. 461 If MPLS-TP is needed for a subset of LSP, then those LSP can be 462 carried within pseudowires. The pseudowires adds a thin layer of 463 encapsulation and therefore a small overhead. If only a subset of 464 LSP need MPLS-TP OAM, then some LSP must make use of the pseudowires 465 and other LSP avoid them. A straihtforward way to accomplish this is 466 with administrative attributes [RFC3209]. 468 6.2. Composite Link at core LSP ingress/egress 470 Composite Link can be configured only for large LSP that are made of 471 smaller MPLS-TP component LSP. This approach is capable of 472 supporting MPLS-TP OAM over the entire set of component link LSP and 473 therefore the entire set of top level LSP traversing the core. 475 There are two primary disadvantage of this approach. One is the 476 number of top level LSP traversing the core can be dramatically 477 increased. The other disadvantage is the loss of multiplexing gain 478 that results from use of classic link bundling within the interior of 479 the core network. 481 If component LSP use MPLS-TP, then no component LSP can exceed the 482 capacity of a single circuit. For a given composite LSP there can 483 either be a number of equal capacity component LSP or some number of 484 full capacity component links plus one LSP carrying the excess. For 485 example, a 350 Gb/s composite LSP over a 100 Gb/s infrastructure may 486 use five 70 Gb/s component LSP or three 100 Gb/s LSP plus one 50 Gb/s 487 LSP. Classic MPLS link bundling is needed to support MPLS-TP and 488 suffers from a bin packing problem even if LSP traffic is completely 489 predictable, which it never is in practice. 491 The common means of setting composite link bandwidth parameters uses 492 long term statistical measures. For example, many providers base 493 their LSP bandwidth parameters on the 95th percentile of carried 494 traffic as measured over a one week period. It is common to add 495 10-30% to the 95th percentile value measured over the prior week and 496 adjust bandwidth parameters of LSP weekly. It is also possible to 497 measure traffic flow at the LSR and adjust bandwidth parameters 498 somewhat more dynamically. This is less common in deployments and 499 where deployed, make use of filtering to track very long term trends 500 in traffic levels. In either case, short term variation of traffic 501 levels relative to signaled LSP capacity are common. Allowing a 502 large overallocation of LSP bandwidth parameters (ie: adding 30% or 503 more) avoids overutilization of any given LSP, but increases unused 504 network capacity and increases network cost. Allowing a small 505 overallocation of LSP bandwidth parameters (ie: 10-20% or less) 506 results in both underutilization and overutilization but 507 statistically results in a total utilization within the core that is 508 under capacity most or all of the time. 510 The classic multipath solution accomodates the situation in which 511 some composite LSP are underutilizing their signaled capacity and 512 others are overutilizing their capacity with the need for far less 513 unused network capacity to accomodate variation in actual traffic 514 levels. If the actual traffic levels of LSP can be described by a 515 probability distribution, the variation of the sum of LSP is less 516 than the variation of any given LSP for all but a constant traffic 517 level (where the variation of the sum and the components are both 518 zero). 520 There are two situations which can motivate the use of this approach. 521 This design is favored if the provider values MPLS-TP OAM across the 522 core more than efficiency (or is unaware of the efficiency issue). 523 This design can also make sense if transport equipment or very low 524 cost core LSR are available which support only classic link bundling 525 and regardless of loss of multiplexing gain, are more cost effective 526 at carrying transit traffic than using equipment which supports IP 527 source and destination address hashing. 529 6.3. MPLS-TP as a MPLS client 531 Accomodating MPLS-TP as a MPLS client requires a small change to 532 forwarding behavior and is therefore most applicable to major network 533 overbuilds or new deployments. The change to forwarding is an 534 ability to limit the depth of MPLS labels used in hashing on the 535 label stack on a per LSP basis. Some existing hardware, particularly 536 microprogrammed hardware, may be able to accomodate this forwarding 537 change. Providing support in new hardware is not difficult, a much 538 smaller change than, for example, changes required to disable PHP in 539 an environment where LSP hierarchy is used. 541 The advantage of this approach is an ability to accommodate MPLS-TP 542 as a client LSP but retain the high multiplexing gain and therefore 543 efficency and low network cost of a pure MPLS deployment. The 544 disadvantage is the need for a small change in forwarding. 546 7. IANA Considerations 548 This memo includes no request to IANA. 550 8. Security Considerations 552 This document is a use cases document. Existing protocols are 553 referenced such as MPLS. Existing techniques such as MPLS link 554 bundling and multipath techniques are referenced. These protocols 555 and techniques are documented elsewhere and contain security 556 considerations which are unchanged by this document. 558 This document also describes use cases for Composite Link, which is a 559 work-in-progress. Composite Link requirements are defined in 560 [I-D.ietf-rtgwg-cl-requirement]. [I-D.ietf-rtgwg-cl-framework] 561 defines a framework for Composite Link. Composite Link bears many 562 similarities to MPLS link bundling and multipath techniques used with 563 MPLS. Aditional security considerations, if any, beyond those 564 already identified for MPLS, MPLS link bundling and multipath 565 techniques, will be documented in the framework document if specific 566 to the overall framework of Composite Link, or in protocol extensions 567 if specific to a given protocol extension defined later to support 568 Composite Link. 570 9. Acknowledgments 572 Authors would like to thank [ no one so far ] for their reviews and 573 great suggestions. 575 In the interest of full disclosure of affiliation and in the interest 576 of acknowledging sponsorship, past affiliations of authors are noted. 577 Much of the work done by Ning So occurred while Ning was at Verizon. 578 Much of the work done by Curtis Villamizar occurred while at 579 Infinera. Infinera continues to sponsor this work on a consulting 580 basis. 582 10. References 584 10.1. Normative References 586 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 587 Requirement Levels", BCP 14, RFC 2119, March 1997. 589 10.2. Informative References 591 [I-D.ietf-rtgwg-cl-framework] 592 Ning, S., McDysan, D., Osborne, E., Yong, L., and C. 593 Villamizar, "Composite Link Framework in Multi Protocol 594 Label Switching (MPLS)", draft-ietf-rtgwg-cl-framework-00 595 (work in progress), August 2012. 597 [I-D.ietf-rtgwg-cl-requirement] 598 Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. 599 Yong, "Requirements for MPLS Over a Composite Link", 600 draft-ietf-rtgwg-cl-requirement-07 (work in progress), 601 June 2012. 603 [IEEE-802.1AX] 604 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 605 Standard for Local and Metropolitan Area Networks - Link 606 Aggregation", 2006, . 609 [ITU-T.G.694.2] 610 ITU-T, "Spectral grids for WDM applications: CWDM 611 wavelength grid", 2003, 612 . 614 [ITU-T.G.800] 615 ITU-T, "Unified functional architecture of transport 616 networks", 2007, 617 . 619 [ITU-T.Y.1540] 620 ITU-T, "Internet protocol data communication service - IP 621 packet transfer and availability performance parameters", 622 2007, . 624 [ITU-T.Y.1541] 625 ITU-T, "Network performance objectives for IP-based 626 services", 2006, . 628 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 629 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 631 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 632 and W. Weiss, "An Architecture for Differentiated 633 Services", RFC 2475, December 1998. 635 [RFC2597] Heinanen, J., Baker, F., Weiss, W., and J. Wroclawski, 636 "Assured Forwarding PHB Group", RFC 2597, June 1999. 638 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 639 June 1999. 641 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 642 Multicast Next-Hop Selection", RFC 2991, November 2000. 644 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 645 Algorithm", RFC 2992, November 2000. 647 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 648 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 649 Tunnels", RFC 3209, December 2001. 651 [RFC3260] Grossman, D., "New Terminology and Clarifications for 652 Diffserv", RFC 3260, April 2002. 654 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 655 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 656 June 2004. 658 [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching 659 (GMPLS) Architecture", RFC 3945, October 2004. 661 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 662 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 664 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 665 Internet Protocol", RFC 4301, December 2005. 667 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 668 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 669 Use over an MPLS PSN", RFC 4385, February 2006. 671 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 672 Cost Multipath Treatment in MPLS Networks", BCP 128, 673 RFC 4928, June 2007. 675 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 676 Specification", RFC 5036, October 2007. 678 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 679 Associated Channel", RFC 5586, June 2009. 681 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 682 J., and S. Amante, "Flow-Aware Transport of Pseudowires 683 over an MPLS Packet Switched Network", RFC 6391, 684 November 2011. 686 Appendix A. More Details on Existing Network Operator Practices and 687 Protocol Usage 689 Often, network operators have a contractual Service Level Agreement 690 (SLA) with customers for services that are comprised of numerical 691 values for performance measures, principally availability, latency, 692 delay variation. Additionally, network operators may have Service 693 Level Sepcification (SLS) that is for internal use by the operator. 694 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 695 for examples of the form of such SLA and SLS specifications. In this 696 document we use the term Network Performance Objective (NPO) as 697 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 698 have network operator and service specific implications. Note that 699 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 700 and may be looser than network operator SLA or SLS objectives. 701 Applications and acceptable user experience have an important 702 relationship to these performance parameters. 704 Consider latency as an example. In some cases, minimizing latency 705 relates directly to the best customer experience (e.g., in TCP closer 706 is faster). In other cases, user experience is relatively 707 insensitive to latency, up to a specific limit at which point user 708 perception of quality degrades significantly (e.g., interactive human 709 voice and multimedia conferencing). A number of NPOs have. a bound 710 on point-point latency, and as long as this bound is met, the NPO is 711 met -- decreasing the latency is not necessary. In some NPOs, if the 712 specified latency is not met, the user considers the service as 713 unavailable. An unprotected LSP can be manually provisioned on a set 714 of to meet this type of NPO, but this lowers availability since an 715 alternate route that meets the latency NPO cannot be determined. 717 Historically, when an IP/MPLS network was operated over a lower layer 718 circuit switched network (e.g., SONET rings), a change in latency 719 caused by the lower layer network (e.g., due to a maintenance action 720 or failure) this was not known to the MPLS network. This resulted in 721 latency affecting end user experience, sometimes violating NPOs or 722 resulting in user complaints. 724 A response to this problem was to provision IP/MPLS networks over 725 unprotected circuits and set the metric and/or TE-metric proportional 726 to latency. This resulted in traffic being directed over the least 727 latency path, even if this was not needed to meet an NPO or meet user 728 experience objectives. This results in reduced flexibility and 729 increased cost for network operators. Using lower layer networks to 730 provide restoration and grooming is expected to be more efficient, 731 but the inability to communicate performance parameters, in 732 particular latency, from the lower layer network to the higher layer 733 network is an important problem to be solved before this can be done. 735 Latency NPOs for point-to-point services are often tied closely to 736 geographic locations, while latency for multipoint services may be 737 based upon a worst case within a region. 739 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 740 terms of loss exceeding a threshold for a period on the order of 5 741 minutes. However, the timeframes for restoration (i.e., as 742 implemented by pre-determined protection, convergence of routing 743 protocols and/or signaling) for services range from on the order of 744 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 745 protection switching), to several minutes (e.g., to allow BGP to 746 reconverge for L3VPN) and may differ among the set of customers 747 within a single service. 749 The presence of only three Traffic Class (TC) bits (previously known 750 as EXP bits) in the MPLS shim header is limiting when a network 751 operator needs to support QoS classes for multiple services (e.g., 752 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 753 classes that need to be supported. In some cases one bit is used to 754 indicate conformance to some ingress traffic classification, leaving 755 only two bits for indicating the service QoS classes. The approach 756 that has been taken is to aggregate these QoS classes into similar 757 sets on LER-LSR and LSR-LSR links. 759 Labeled LSPs and use of link layer encapsulation have been 760 standardized in order to provide a means to meet these needs. 762 The IP DSCP cannot be used for flow identification since RFC 4301 763 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 764 network operators do not rely on the DSCP of Internet packets. In 765 addition, the use of IP DSCP for flow identification is incompatible 766 with Assured Forwarding services [RFC2597] or any other service which 767 may use more than one DSCP code point to carry traffic for a given 768 microflow. 770 A label is pushed onto Internet packets when they are carried along 771 with L2/L3VPN packets on the same link or lower layer network 772 provides a mean to distinguish between the QoS class for these 773 packets. 775 Operating an MPLS-TE network involves a different paradigm from 776 operating an IGP metric-based LDP signaled MPLS network. The 777 multipoint-to-point LDP signaled MPLS LSPs occur automatically, and 778 balancing across parallel links occurs if the IGP metrics are set 779 "equally" (with equality a locally definable relation). 781 Traffic is typically comprised of a few large (some very large) flows 782 and many small flows. In some cases, separate LSPs are established 783 for very large flows. This can occur even if the IP header 784 information is inspected by a LSR, for example an IPsec tunnel that 785 carries a large amount of traffic. An important example of large 786 flows is that of a L2/L3 VPN customer who has an access line 787 bandwdith comparable to a client-client composite link bandwidth -- 788 there could be flows that are on the order of the access line 789 bandwdith. 791 Appendix B. Existing Multipath Standards and Techniques 793 Today the requirement to handle large aggregations of traffic, much 794 larger than a single component link, can be handled by a number of 795 techniques which we will collectively call multipath. Multipath 796 applied to parallel links between the same set of nodes includes 797 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 798 other aggregation techniques some of which may be vendor specific. 799 Multipath applied to diverse paths rather than parallel links 800 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 801 even BGP, and equal cost LSP, as described in Appendix B.4. Various 802 mutilpath techniques have strengths and weaknesses. 804 the term Composite Link is more general than terms such as Link 805 Aggregation which is generally considered to be specific to Ethernet 806 and its use here is consistent with the broad definition in 807 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 808 refers to techniques which only solve the problem of large 809 aggregations of traffic, without addressing the other requirements 810 outlined in this document, particularly those described in Section 4 811 and Section 5. 813 B.1. Common Multpath Load Spliting Techniques 815 Identical load balancing techniqes are used for multipath both over 816 parallel links and over diverse paths. 818 Large aggregates of IP traffic do not provide explicit signaling to 819 indicate the expected traffic loads. Large aggregates of MPLS 820 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 821 are signaled using RSVP-TE extensions do provide explicit signaling 822 which includes the expected traffic load for the aggregate. LSP 823 which are signaled using LDP do not provide an expected traffic load. 825 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 826 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 827 payload, there is no signaling associated with these inner LSP. 828 Therefore even when using RSVP-TE signaling there may be insufficient 829 information provided by signaling to adequately distribute load based 830 solely on signaling. 832 Generally a set of label stack entries that is unique across the 833 ordered set of label numbers in the label stack can safely be assumed 834 to contain a group of flows. The reordering of traffic can therefore 835 be considered to be acceptable unless reordering occurs within 836 traffic containing a common unique set of label stack entries. 837 Existing load splitting techniques take advantage of this property in 838 addition to looking beyond the bottom of the label stack and 839 determining if the payload is IPv4 or IPv6 to load balance traffic 840 accordingly. 842 MPLS-TP OAM violates the assumption that it is safe to reorder 843 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 844 existing multipth techniques must be modified. Such modifications 845 are outside the scope of this document. 847 For example,a large aggregate of IP traffic may be subdivided into a 848 large number of groups of flows using a hash on the IP source and 849 destination addresses. This is as described in [RFC2475] and 850 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 851 can be performed on the set of labels in the label stack. These 852 techniques are both examples of means to subdivide traffic into 853 groups of flows for the purpose of load balancing traffic across 854 aggregated link capacity. The means of identifying a set of flows 855 should not be confused with the definition of a flow. 857 Discussion of whether a hash based approach provides a sufficiently 858 even load balance using any particular hashing algorithm or method of 859 distributing traffic across a set of component links is outside of 860 the scope of this document. 862 The current load balancing techniques are referenced in [RFC4385] and 863 [RFC4928]. The use of three hash based approaches are described in 864 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 865 described in [RFC6391]. The use of hash based approaches is 866 mentioned as an example of an existing set of techniques to 867 distribute traffic over a set of component links. Other techniques 868 are not precluded. 870 B.2. Static and Dynamic Load Balancing Multipath 872 Static multipath generally relies on the mathematical probability 873 that given a very large number of small microflows, these microflows 874 will tend to be distributed evenly across a hash space. Early very 875 static multipath implementations assumed that all component links are 876 of equal capacity and perform a modulo operation across the hashed 877 value. An alternate static multipath technique uses a table 878 generally with a power of two size, and distributes the table entries 879 proportionally among component links according to the capacity of 880 each component link. 882 Static load balancing works well if there are a very large number of 883 small microflows (i.e., microflow rate is much less than component 884 link capacity). However, the case where there are even a few large 885 microflows is not handled well by static load balancing. 887 A dynamic load balancing multipath technique is one where the traffic 888 bound to each component link is measured and the load split is 889 adjusted accordingly. As long as the adjustment is done within a 890 single network element, then no protocol extensions are required and 891 there are no interoperability issues. 893 Note that if the load balancing algorithm and/or its parameters is 894 adjusted, then packets in some flows may be briefly delivered out of 895 sequence, however in practice such adjustments can be made very 896 infrequent. 898 B.3. Traffic Split over Parallel Links 900 The load spliting techniques defined in Appendix B.1 and Appendix B.2 901 are both used in splitting traffic over parallel links between the 902 same pair of nodes. The best known technique, though far from being 903 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 904 technique had been applied much earlier using OSPF or ISIS Equal Cost 905 MultiPath (ECMP) over parallel links between the same nodes. 906 Multilink PPP [RFC1717] uses a technique that provides inverse 907 multiplexing, however a number of vendors had provided proprietary 908 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 909 Link Aggregation but are no longer used. 911 Link bundling [RFC4201] provides yet another means of handling 912 parallel LSP. RFC4201 explicitly allow a special value of all ones 913 to indicate a split across all members of the bundle. This "all 914 ones" component link is signaled in the MPLS RESV to indicate that 915 the link bundle is making use of classic multipath techniques. 917 B.4. Traffic Split over Multiple Paths 919 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 920 traffic split over multiple paths that may traverse intermediate 921 nodes. ECMP is often incorrectly equated to only this case, and 922 multipath over multiple diverse paths is often incorrectly equated to 923 ECMP. 925 Many implementations are able to create more than one LSP between a 926 pair of nodes, where these LSP are routed diversely to better make 927 use of available capacity. The load on these LSP can be distributed 928 proportionally to the reserved bandwidth of the LSP. These multiple 929 LSP may be advertised as a single PSC FA and any LSP making use of 930 the FA may be split over these multiple LSP. 932 Link bundling [RFC4201] component links may themselves be LSP. When 933 this technique is used, any LSP which specifies the link bundle may 934 be split across the multiple paths of the LSP that comprise the 935 bundle. 937 Appendix C. Characteristics of Transport in Core Networks 939 The characteristics of primary interest are the capacity of a single 940 circuit and the use of wave division multiplexing (WDM) to provide a 941 large number of parallel circuits. 943 Wave division multiplexing (WDM) supports multiple independent 944 channels (independent ignoring crosstalk noise) at slightly different 945 wavelengths of light, multiplexed onto a single fiber. Typical in 946 the early 2000s was 40 wavelengths of 10 Gb/s capacity per 947 wavelength. These wavelengths are in the C-band range, which is 948 about 1530-1565 nm, though some work has been done using the L-band 949 1565-1625 nm. 951 The C-band has been carved up using a 100 GHz spacing from 191.7 THz 952 to 196.1 THz by [ITU-T.G.694.2]. This yields 44 channels. If the 953 outermost channels are not used, due to poorer transmission 954 characteristics, then typcially 40 are used. For practical reasons, 955 a 50 GhZ or 25 GHz spacing is used by more recent equipment, 956 yielding. 80 or 160 channels in practice. 958 The early optical modulation techniques used within a single channel 959 yielded 2.5Gb/s and 10 Gb/s capacity per channel. As modulation 960 techniques have improved 40 Gb/s and 100 Gb/s per channel have been 961 acheived. 963 The 40 channels of 10 Gb/s common in the mid 2000s yields a total of 964 400 Gb/s. Tighter spacing and better modulations are yielding up to 965 8 Tb/s or more in more recent systems. 967 Over the optical is an electrical encoding. In the 1990s this was 968 typically Synchronous Optical Networking (SONET) or Synchronous 969 Digital Hierarchy (SDH), with a maximum defined circuit capacity of 970 40 Gb/s (OC-768), though the 10 Gb/s OC-192 is more common. More 971 recently the low level electrical encoding has been Optical Transport 972 Network (OTN) defined by ITU-T. OTN currently defines circuit 973 capacities up to a nominal 100 Gb/s (ODU4). Both SONET/SDH and OTN 974 make use of time division multiplexing (TDM) where the a higher 975 capacity circuit such as a 100 Gb/s ODU4 in OTN may be subdivided 976 into lower fixed capacity circuits such as ten 10 Gb/s ODU2. 978 In the 1990s, all IP and later IP/MPLS networks either used a 979 fraction of maximum circuit capacity, or at most the full circuit 980 capacity toward the end of the decade, when full circuit capacity was 981 2.5 Gb/s or 10 Gb/s. Beyond 2000, the TDM circuit multiplexing 982 capability of SONET/SDH or OTN was rarely used. 984 Early in the 2000s both transport equipment and core LSR offerred 40 985 Gb/s SONET OC-768. However 10 Gb/s transport equipment was 986 predominantly deployed throughout the decade, partially because LSR 987 10GbE ports were far more cost effective than either OC-192 or OC-768 988 and became practical in the second half of the decade. 990 Entering the 2010 decade, LSR 40GbE and 100GbE are expected to become 991 widely available and cost effective. Slightly preceeding this 992 transport equipment making use of 40 Gb/s and 100 Gb/s modulations 993 are becoming available. This transport equipment is capable or 994 carrying 40 Gb/s ODU3 and 100 Gb/s ODU4 circuits. 996 Early in the 2000s decade IP/MPLS core networks were making use of 997 single 10 Gb/s circuits. Capacity grew quickly in the first half of 998 the decade but more IP/MPLS core networks had only a small number of 999 IP/MPLS links requiring 4-8 parallel 10 Gb/s circuits. However, the 1000 use of multipath was necessary, was deemed the simplest and most cost 1001 effective alternative, and became thoroughly entrenched. By the end 1002 of the 2000s decade nearly all major IP/MPLS core service provider 1003 networks and a few content provider networks had IP/MPLS links which 1004 exceeded 100 Gb/s, long before 40GbE was available and 40 Gb/s 1005 transport in widespread use. 1007 It is less clear when IP/MPLS LSP exceeded 10 Gb/s, 40 Gb/s, and 100 1008 Gb/s. By 2010, many service providers have LSP in excess of 100 1009 Gb/s, but few are willing to disclose how many LSP have reached this 1010 capacity. 1012 At the time of writing 40GbE and 100GbE LSR products are being 1013 evaluated by service providers and contect providers and are in use 1014 in network trials. The cost of components required to deliver 100 1015 GbE products remains high making these products less cost effective. 1016 This is expected to change within years. 1018 The important point is that IP/MPLS core network links have long ago 1019 exceeded 100 Gb/s and a small number of IP/MPLS LSP exceed 100 Gb/s. 1020 By the time 100 Gb/s circuits are widely deployed, IP/MPLS core 1021 network links are likely to exceed 1 Tb/s and many IP/MPLS LSP 1022 capacities are likely to exceed 100 Gb/s. Therefore multipath 1023 techniques are likely here to stay. 1025 Authors' Addresses 1027 So Ning 1028 Tata Communications 1030 Email: ning.so@tatacommunications.com 1031 Andrew Malis 1032 Verizon 1033 60 Sylvan Road 1034 Waltham, MA 02451 1036 Phone: +1 781-466-2362 1037 Email: andrew.g.malis@verizon.com 1039 Dave McDysan 1040 Verizon 1041 22001 Loudoun County PKWY 1042 Ashburn, VA 20147 1044 Email: dave.mcdysan@verizon.com 1046 Lucy Yong 1047 Huawei USA 1048 5340 Legacy Dr. 1049 Plano, TX 75025 1051 Phone: +1 469-277-5837 1052 Email: lucy.yong@huawei.com 1054 Curtis Villamizar 1055 Outer Cape Cod Network Consulting 1057 Email: curtis@occnc.com