idnits 2.17.1 draft-symmvo-rtgwg-cl-use-cases-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 22, 2012) is 4441 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-16) exists of draft-ietf-rtgwg-cl-requirement-04 == Outdated reference: A later version (-06) exists of draft-so-yong-rtgwg-cl-framework-04 -- Obsolete informational reference (is this intentional?): RFC 1717 (Obsoleted by RFC 1990) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG N. So 3 Internet-Draft A. Malis 4 Intended status: Informational D. McDysan 5 Expires: August 25, 2012 Verizon 6 L. Yong 7 Huawei USA 8 C. Villamizar 9 Outer Cape Cod Network 10 Consulting 11 February 22, 2012 13 Composite Link USe Cases and Design Considerations 14 draft-symmvo-rtgwg-cl-use-cases-00 16 Abstract 18 This document provides a set of use cases and design considerations 19 for composite links. 21 Composite link is a formalization of multipath techniques currently 22 in use in IP and MPLS networks and a set of extensions to multipath 23 techniques. 25 Note: symmvo in the draft name is the initials of the set of authors: 26 So, Yong, McDysan, Malis, Villamizar, Osborne. This paragraph will 27 be removed when/if this document is adopted as a WG item. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on August 25, 2012. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Conventions used in this document . . . . . . . . . . . . . . 3 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 66 3. Composite Link Foundation Use Cases . . . . . . . . . . . . . 4 67 4. Delay Sensitive Applications . . . . . . . . . . . . . . . . . 7 68 5. Large Volume of IP and LDP Traffic . . . . . . . . . . . . . . 7 69 6. Composite Link and Packet Ordering . . . . . . . . . . . . . . 8 70 6.1. MPLS-TP in network edges only . . . . . . . . . . . . . . 10 71 6.2. Composite Link at core LSP ingress/egress . . . . . . . . 11 72 6.3. MPLS-TP as a MPLS client . . . . . . . . . . . . . . . . . 12 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 74 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 75 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 77 9.2. Informative References . . . . . . . . . . . . . . . . . . 13 78 Appendix A. More Details on Existing Network Operator 79 Practices and Protocol Usage . . . . . . . . . . . . 15 80 Appendix B. Existing Multipath Standards and Techniques . . . . . 17 81 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 18 82 B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 19 83 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 19 84 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 20 85 Appendix C. Characteristics of Transport in Core Networks . . . . 20 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 88 1. Introduction 90 Composite link requirements are specified in 91 [I-D.ietf-rtgwg-cl-requirement]. A composite link framework is 92 defined in [I-D.so-yong-rtgwg-cl-framework]. 94 Multipath techniques have been widely used in IP networks for over 95 two decades. The use of MPLS began more than a decade ago. 96 Multipath has been widely used in IP/MPLS networks for over a decade 97 with very little protocol support dedicated to effective use of 98 multipath. 100 The state of the art in multipath prior to composite links is 101 documented in Appendix B. 103 Both Ethernet Link Aggregation [IEEE-802.1AX] and MPLS link bundling 104 [RFC4201] have been widely used in today's MPLS networks. Composite 105 link differs in the following caracteristics. 107 1. A composite link allows bundling of non-homogenous links together 108 as a single logical link. 110 2. A composite link provides more information in the TE-LSDB and 111 supports more explicit control over placement of LSP. 113 2. Conventions used in this document 115 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 116 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 117 document are to be interpreted as described in RFC 2119 [RFC2119]. 119 2.1. Terminology 121 Terminology defined in [I-D.ietf-rtgwg-cl-requirement] is used in 122 this document. 124 In addition, the following terms are used: 126 classic multipath: 127 Classic multipath refers to the most common current practice in 128 implementation and deployment of multipath (see Appendix A). The 129 most common current practice makes use of a hash on the MPLS 130 label stack and if IPv4 or IPv6 are indicates under the label 131 stack, makes use of the IP source and destination addresses 132 [RFC4385] [RFC4928]. 134 classic link bundling: 135 Classic link bundling refers to the use of [RFC4201] where the 136 "all ones" component is not used. Where the "all ones" component 137 is used, link bundling behaves as classic multipath does. 138 Classic link bundling selects a single component link on which to 139 put any given LSP. 141 Among the important distinctions between classic multipath or classic 142 link bundling and Composite Link are: 144 1. Classic multipath has no provision to retain order among flows 145 within a subset of LSP. Classic link bundling retains order 146 among all flows but as a result does a poor job of splitting load 147 among components and therefore is rarely (if ever) deployed. 148 Composite Link allows per LSP control of load split 149 characteristics. 151 2. Classic multipath and classic link bundling do not provide a 152 means to put some LSP on component links with lower delay. 153 Composite Link does. 155 3. Classic multipath will provide a load balance for IP and LDP 156 traffic. Classic link bundling will not. Neither classic 157 multipath or classic link bundling will measure IP and LDP 158 traffic and reduce the advertised "Available Bandwidth" as a 159 result of that measurement. Composite Link better supports 160 RSVP-TE used with significant traffic levels of native IP and 161 native LDP. 163 4. Classic link bundling cannot support an LSP that is greater in 164 capacity than any single component link. Classic multipath and 165 Composite Link support this capability but will reorder traffic 166 on such an LSP. Composite Link can retain order of an LSP that 167 is carried within an LSP that is greater in capacity than any 168 single component link if the contained LSP has such a 169 requirement. 171 None of these techniques, classic multipath, classic link bundling, 172 or Composite Link, will reorder traffic among IP microflows. None of 173 these techniques will reorder traffic among PW, if a PWE3 Control 174 Word is used [RFC4385]. 176 3. Composite Link Foundation Use Cases 178 A simple composite link composed entirely of physical links is 179 illustrated in Figure 1, where a composite link is configured between 180 LSR1 and LSR2. This composite link has three component links. 182 Individual component links in a composite link may be supported by 183 different transport technologies such as wavelength, Ethernet VLAN. 184 Even if the transport technology implementing the component links is 185 identical, the characteristics (e.g., bandwidth, latency) of the 186 component links may differ. 188 The composite link in Figure 1 may carry LSP traffic flows and 189 control plane packets. Control plane packets may appear as IP 190 packets or may be carried within a generic associated channel (G-Ach) 191 [RFC5586]. A LSP may be established over the link by either RSVP-TE 192 [RFC3209] or LDP [RFC5036] signaling protocols. All component links 193 in a composite link are summarized in the same forwarding adjacency 194 LSP (FA-LSP) routing advertisement [RFC3945]. The composite link is 195 summarized as one TE-Link advertised into the IGP by the composite 196 link end points. This information is used in path computation when a 197 full MPLS control plane is in use. The individual component links or 198 groups of component links may optionally be advertised into the IGP 199 as sub-TLV of the composite link advertisement to indicate capacity 200 available with various characteristics, such as a delay range. 202 Management Plane 203 Configuration and Measurement <------------+ 204 ^ | 205 | | 206 +-------+-+ +-+-------+ 207 | | | | | | 208 CP Packets V | | V CP Packets 209 | V | | Component Link 1 | | ^ | 210 | | |=|===========================|=| | | 211 | +----| | Component Link 2 | |----+ | 212 | |=|===========================|=| | 213 Aggregated LSPs | | | | | 214 ~|~~~~~~>| | Component Link 3 | |~~~~>~~|~~ 215 | |=|===========================|=| | 216 | | | | | | 217 | LSR1 | | LSR2 | 218 +---------+ +---------+ 219 ! ! 220 ! ! 221 !<------ Composite Link ------->! 223 Figure 1: a composite link constructed with multiple physical links 224 between two LSR 226 [I-D.ietf-rtgwg-cl-requirement] specifies that component links may 227 themselves be composite links. Figure 2 shows three three forms of 228 component links which may be deployed in a network. 230 +-------+ 1. Physical Link +-------+ 231 | |-|----------------------------------------------|-| | 232 | | | | | | 233 | | | +------+ +------+ | | | 234 | | | | MPLS | 2. Logical Link | MPLS | | | | 235 | |.|.... |......|.....................|......|....|.| | 236 | | |-----| LSR3 |---------------------| LSR4 |----| | | 237 | | | +------+ +------+ | | | 238 | | | | | | 239 | | | | | | 240 | | | +------+ +------+ | | | 241 | | | |GMPLS | 3. Logical Link |GMPLS | | | | 242 | |.|. ...|......|.....................|......|....|.| | 243 | | |-----| LSR5 |---------------------| LSR6 |----| | | 244 | | +------+ +------+ | | 245 | LSR1 | | LSR2 | 246 +-------+ +-------+ 247 |<------------- Composite Link ------------------->| 249 Figure 2: Illustration of Various Component Link Types 251 The three forms of component link shown in Figure 2 are: 253 1. The first component link is configured with direct physical 254 media. 256 2. The second component link is a TE tunnel that traverses LSR3 and 257 LSR4, where LSR3 and LSR4 are the nodes supporting MPLS, but 258 supporting few or no GMPLS extensions. 260 3. The third component link is formed by lower layer network that 261 has GMPLS enabled. In this case, LSR5 and LSR6 are not the nodes 262 controlled by the MPLS but provide the connectivity for the 263 component link. 265 A composite link forms one logical link between connected LSR and is 266 used to carry aggregated traffic [I-D.ietf-rtgwg-cl-requirement]. 267 Composite link relies on its component links to carry the traffic 268 over the composite link. The endpoints of the composite link maps 269 incoming traffic into component links. 271 For example, LSR1 in Figure 1 distributes the set of traffic flows 272 including control plane packets among the set of component links. 273 LSR2 in Figure 1 receives the packets from its component links and 274 sends them to MPLS forwarding engine with no attempt to reorder 275 packets arriving on different component links. The traffic in the 276 opposite direction, from LSR2 to LSR1, is distributed across the set 277 of component links by the LSR2. 279 These three forms of component link are only example. Many other 280 examples are possible. A component link may itself be a composite 281 link. A segment of an LSP (single hop for that LSP) may be a 282 composite link. 284 4. Delay Sensitive Applications 286 Most applications benefit from lower delay. Some types of 287 applications are far more sensitive than others. For example, real 288 time bidirectional applications such as voice communication or two 289 way video conferencing are far more sensitive to delay than 290 unidirectional streaming audio or video. Non-interactive bulk 291 transfer is almost insensitive to delay if a large enough TCP window 292 is used. 294 Some applications are sensitive to delay but unwilling to pay extra 295 to insure lower delay. For example, many SIP end users are willing 296 to accept the delay offerred to best effort services as long as call 297 quality is good most of the time. 299 Other applications are sensitive to delay and willing to pay extra to 300 insure lower delay. For example, financial trading applications are 301 extremely sensitive to delay and with a lot at stake are willing to 302 go to great lengths to reduce delay. 304 Among the requirements of Composite Link are requirements to 305 advertise capacity available within configured ranges of delay within 306 a given composite link and the support the ability to place an LSP 307 only on component links that meeting that LSP's delay requirements. 309 The Composite Link requirements to accommodate delay sensitive 310 applications are analogous to diffserv requirements to accomodate 311 applications requiring higher quality of service on the same 312 infrastructure as applications with less demanding requirements. The 313 ability to share capacity with less demanding applications, with best 314 effort applications being the least demanding, can greatly reduce the 315 cost of delivering service to the more demanding applications. 317 5. Large Volume of IP and LDP Traffic 319 IP and LDP do not support traffic engineering. Both make use of a 320 shortest (lowest routing metric) path, with an option to use equal 321 cost multipath (ECMP). Note that though ECMP is prohibited in LDP 322 specifications, it is widely implemented. Where implemented for LDP, 323 ECMP is generally disabled by default for standards compliance, but 324 often enabled in LDP deployments. 326 Without traffic engineering capability, there must be sufficient 327 capacity to accomodate the IP and LDP traffic. If not, persistent 328 queuing delay and loss will occur. Unlike RSVP-TE, a subset of 329 traffic cannot be routed using constraint based routing to avoid a 330 congested portion of an infrastructure. 332 In existing networks which accomodate IP and/or LDP with RSVP-TE, 333 either the IP and LDP can be carried over RSVP-TE, or where the 334 traffic contribution of IP and LDP is small, IP and LDP can be 335 carried native and the effect on RSVP-TE can be ignored. Ignoring 336 the traffic contribution of IP is certainly valid on high capacity 337 networks where native IP is used primarily for control and network 338 management and customer IP is carried within RSVP-TE. 340 Where it is desireable to carry native IP and/or LDP and IP and/or 341 LDP traffic volumes are not negligible, RSVP-TE needs improvement. 342 The enhancement offerred by Composite Link is an ability to measure 343 the IP and LDP, filter the measurements, and reduce the capacity 344 available to RSVP-TE to avoid congestion. The treatment given to the 345 IP or LDP traffic is similar to the treatment when using the "auto- 346 bandwidth" feature in some RSVP-TE implementations on that same 347 traffic, and giving a higher priority (numerically lower setup 348 priority and holding priority value) to the "auto-bandwidth" LSP. 349 The difference is that the measurement is made at each hop and the 350 reduction in advertised bandwidth is made more directly. 352 6. Composite Link and Packet Ordering 354 A strong motivation for Composite Link is the need to provide LSP 355 capacity in IP backbones that exceeds the capacity of single 356 wavelengths provided by transport equipment and exceeds the practical 357 capacity limits acheivable through inverse multiplexing. Appendix C 358 describes characteristics and limitations of transport systems today. 359 Section 2 defines the terms "classic multipath" and "classic link 360 bundling" used in this section. 362 For purpose of discussion, consider two very large cities, city A and 363 city Z. For example, in the US high traffic cities might be New York 364 and Los Angeles and in Europe high traffic cities might be London and 365 Amsterdam. Two other high volume cities, city B and city Y may share 366 common provider core network infrastructure. Using the same 367 examples, the city B and Y may Washington DC and San Francisco or 368 Paris and Stockholm. In the US, the common infrastructure may span 369 Denver, Chicago, Detroit, and Cleveland. Other major traffic 370 contributors on either US coast include Boston, northern Virginia on 371 the east coast, and Seattle, and San Diego on the west coast. The 372 capacity of IP/MPLS links within the shared infrastructure, for 373 example city to city links in the Denver, Chicago, Detroit, and 374 Cleveland path in the US example, have capacities for most of the 375 2000s decade that greatly exceeded single circuits available in 376 transport networks. 378 For a case with four large traffic sources on either side of the 379 shared infrastructure, up to sixteen core city to core city traffic 380 flows in excess of transport circuit capacity may be accomodated on 381 the shared infrastructure. 383 Today the most common IP/MPLS core network design makes use of very 384 large links which consist of many smaller component links, but use 385 classic multipath techniques rather than classic link bundling or 386 Composite Link. A component link typically corresponds to the 387 largest circuit that the transport system is capable of providing (or 388 the largest cost effective circuit). IP source and destination 389 address hashing is used to distribute flows across the set of 390 component links as described in Appendix B.3. 392 Classic multipath can handle large LSP up to the total capacity of 393 the multipath (within limits, see Appendix B.2). A disadvantage of 394 classic multipath is the reordering among traffic within a given core 395 city to core city LSP. While there is no reordering within any 396 microflow and therefore no customer visible issue, MPLS-TP cannot be 397 used across an infrastructure where classic multipath is in use, 398 except within pseudowires. 400 These capacity issues force the use of classic multipath today. 401 Classic multipath excludes a direct use of MPLS-TP. The desire for 402 OAM, offerred by MPLS-TP, is in conflict with the use of classic 403 multipath. There are a number of alternatives that satisfy both 404 requirements. Some alternatives are described below. 406 MPLS-TP in network edges only 408 A simple approach which requires no change to the core is to 409 disallow MPLS-TP across the core unless carried within a 410 pseudowire (PW). MPLS-TP may be used within edge domains where 411 classic multipath is not used. PW may be signaled end to end 412 using single segment PW (SS-PW), or stitched across domains using 413 multisegment PW (MS-PW). The PW and anything carried within the 414 PW may use OAM as long as fat-PW [RFC6391] load splitting is not 415 used by the PW. 417 Composite Link at core LSP ingress/egress 419 The interior of the core network may use classic link bundling, 420 with the limitation that no LSP can exceed the capacity of a 421 single circuit. Larger non-MPLS-TP LSP can be configured using 422 multiple ingress to egress component MPLS-TP LSP. This can be 423 accomplished using existing IP source and destination address 424 hashing configured at LSP ingress and egress, or using Composite 425 Link configured at ingress and egress. Each component LSP, if 426 constrained to be no larger than the capacity of a single 427 circuit. can make use of MPLS-TP and offer OAM for all top level 428 LSP across the core. 430 MPLS-TP as a MPLS client 432 A third approach involves modifying the behavior of LSR in the 433 interior of the network core, such that MPLS-TP can be used on a 434 subset of LSP, where the capacity of any one LSP within that 435 MPLS-TP subset of LSP is not larger than the capacity of a single 436 circuit. This requirement is accommodated through a combination 437 of signaling to indicate LSP for which traffic splitting needs to 438 be constrained, the ability to constrain the depth of the label 439 stack over which traffic splitting can be applied on a per LSP 440 basis, and the ability to constrain the use of IP addresses below 441 the label stack for traffic splitting also on a per LSP basis. 443 The above list of alternatives allow packet ordering within an LSP to 444 be maintained in some circumstances and allow very large LSP 445 capacities. Each of these alternatives are discussed further in the 446 following subsections. 448 6.1. MPLS-TP in network edges only 450 Classic MPLS link bundling is defined in [RFC4201] and has existed 451 since early in the 2000s decade. Classic MPLS link bundling place 452 any given LSP entirely on a single component link. Classic MPLS link 453 bundling is not in widespread use as the means to accomodate large 454 link capacities in core networks due to the simplicity and better 455 multiplexing gain, and therefore lower network cost of classic 456 multipath. 458 If MPLS-TP OAM capability in the IP/MPLS network core LSP is not 459 required, then there is no need to change existing network designs 460 which use classic multipath and both label stack and IP source and 461 destination address based hashing as a basis for load splitting. 463 If MPLS-TP is needed for a subset of LSP, then those LSP can be 464 carried within pseudowires. The pseudowires adds a thin layer of 465 encapsulation and therefore a small overhead. If only a subset of 466 LSP need MPLS-TP OAM, then some LSP must make use of the pseudowires 467 and other LSP avoid them. A straihtforward way to accomplish this is 468 with administrative attributes [RFC3209]. 470 6.2. Composite Link at core LSP ingress/egress 472 Composite Link can be configured only for large LSP that are made of 473 smaller MPLS-TP component LSP. This approach is capable of 474 supporting MPLS-TP OAM over the entire set of component link LSP and 475 therefore the entire set of top level LSP traversing the core. 477 There are two primary disadvantage of this approach. One is the 478 number of top level LSP traversing the core can be dramatically 479 increased. The other disadvantage is the loss of multiplexing gain 480 that results from use of classic link bundling within the interior of 481 the core network. 483 If component LSP use MPLS-TP, then no component LSP can exceed the 484 capacity of a single circuit. For a given composite LSP there can 485 either be a number of equal capacity component LSP or some number of 486 full capacity component links plus one LSP carrying the excess. For 487 example, a 350 Gb/s composite LSP over a 100 Gb/s infrastructure may 488 use five 70 Gb/s component LSP or three 100 Gb/s LSP plus one 50 Gb/s 489 LSP. Classic MPLS link bundling is needed to support MPLS-TP and 490 suffers from a bin packing problem even if LSP traffic is completely 491 predictable, which it never is in practice. 493 The common means of setting composite link bandwidth parameters uses 494 long term statistical measures. For example, many providers base 495 their LSP bandwidth parameters on the 95th percentile of carried 496 traffic as measured over a one week period. It is common to add 497 10-30% to the 95th percentile value measured over the prior week and 498 adjust bandwidth parameters of LSP weekly. It is also possible to 499 measure traffic flow at the LSR and adjust bandwidth parameters 500 somewhat more dynamically. This is less common in deployments and 501 where deployed, make use of filtering to track very long term trends 502 in traffic levels. In either case, short term variation of traffic 503 levels relative to signaled LSP capacity are common. Allowing a 504 large overallocation of LSP bandwidth parameters (ie: adding 30% or 505 more) avoids overutilization of any given LSP, but increases unused 506 network capacity and increases network cost. Allowing a small 507 overallocation of LSP bandwidth parameters (ie: 10-20% or less) 508 results in both underutilization and overutilization but 509 statistically results in a total utilization within the core that is 510 under capacity most or all of the time. 512 The classic multipath solution accomodates the situation in which 513 some composite LSP are underutilizing their signaled capacity and 514 others are overutilizing their capacity with the need for far less 515 unused network capacity to accomodate variation in actual traffic 516 levels. If the actual traffic levels of LSP can be described by a 517 probability distribution, the variation of the sum of LSP is less 518 than the variation of any given LSP for all but a constant traffic 519 level (where the variation of the sum and the components are both 520 zero). 522 There are two situations which can motivate the use of this approach. 523 This design is favored if the provider values MPLS-TP OAM across the 524 core more than efficiency (or is unaware of the efficiency issue). 525 This design can also make sense if transport equipment or very low 526 cost core LSR are available which support only classic link bundling 527 and regardless of loss of multiplexing gain, are more cost effective 528 at carrying transit traffic than using equipment which supports IP 529 source and destination address hashing. 531 6.3. MPLS-TP as a MPLS client 533 Accomodating MPLS-TP as a MPLS client requires a small change to 534 forwarding behavior and is therefore most applicable to major network 535 overbuilds or new deployments. The change to forwarding is an 536 ability to limit the depth of MPLS labels used in hashing on the 537 label stack on a per LSP basis. Some existing hardware, particularly 538 microprogrammed hardware, may be able to accomodate this forwarding 539 change. Providing support in new hardware is not difficult, a much 540 smaller change than, for example, changes required to disable PHP in 541 an environment where LSP hierarchy is used. 543 The advantage of this approach is an ability to accommodate MPLS-TP 544 as a client LSP but retain the high multiplexing gain and therefore 545 efficency and low network cost of a pure MPLS deployment. The 546 disadvantage is the need for a small change in forwarding. 548 7. Security Considerations 550 This document is a use cases document. Existing protocols are 551 referenced such as MPLS. Existing techniques such as MPLS link 552 bundling and multipath techniques are referenced. These protocols 553 and techniques are documented elsewhere and contain security 554 considerations which are unchanged by this document. 556 This document also describes use cases for Composite Link, which is a 557 work-in-progress. Composite Link requirements are defined in 558 [I-D.ietf-rtgwg-cl-requirement]. [I-D.so-yong-rtgwg-cl-framework] 559 defines a framework for Composite Link. Composite Link bears many 560 similarities to MPLS link bundling and multipath techniques used with 561 MPLS. Aditional security considerations, if any, beyond those 562 already identified for MPLS, MPLS link bundling and multipath 563 techniques, will be documented in the framework document if specific 564 to the overall framework of Composite Link, or in protocol extensions 565 if specific to a given protocol extension defined later to support 566 Composite Link. 568 8. Acknowledgments 570 Authors would like to thank [ no one so far ] for their reviews and 571 great suggestions. 573 9. References 575 9.1. Normative References 577 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 578 Requirement Levels", BCP 14, RFC 2119, March 1997. 580 9.2. Informative References 582 [I-D.ietf-rtgwg-cl-requirement] 583 Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. 584 Yong, "Requirements for MPLS Over a Composite Link", 585 draft-ietf-rtgwg-cl-requirement-04 (work in progress), 586 March 2011. 588 [I-D.so-yong-rtgwg-cl-framework] 589 So, N., Malis, A., McDysan, D., Yong, L., Villamizar, C., 590 and T. Li, "Composite Link Framework in Multi Protocol 591 Label Switching (MPLS)", 592 draft-so-yong-rtgwg-cl-framework-04 (work in progress), 593 June 2011. 595 [IEEE-802.1AX] 596 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 597 Standard for Local and Metropolitan Area Networks - Link 598 Aggregation", 2006, . 601 [ITU-T.G.694.2] 602 ITU-T, "Spectral grids for WDM applications: CWDM 603 wavelength grid", 2003, 604 . 606 [ITU-T.G.800] 607 ITU-T, "Unified functional architecture of transport 608 networks", 2007, 609 . 611 [ITU-T.Y.1540] 612 ITU-T, "Internet protocol data communication service - IP 613 packet transfer and availability performance parameters", 614 2007, . 616 [ITU-T.Y.1541] 617 ITU-T, "Network performance objectives for IP-based 618 services", 2006, . 620 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 621 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 623 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 624 and W. Weiss, "An Architecture for Differentiated 625 Services", RFC 2475, December 1998. 627 [RFC2597] Heinanen, J., Baker, F., Weiss, W., and J. Wroclawski, 628 "Assured Forwarding PHB Group", RFC 2597, June 1999. 630 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 631 June 1999. 633 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 634 Multicast Next-Hop Selection", RFC 2991, November 2000. 636 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 637 Algorithm", RFC 2992, November 2000. 639 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 640 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 641 Tunnels", RFC 3209, December 2001. 643 [RFC3260] Grossman, D., "New Terminology and Clarifications for 644 Diffserv", RFC 3260, April 2002. 646 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 647 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 648 June 2004. 650 [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching 651 (GMPLS) Architecture", RFC 3945, October 2004. 653 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 654 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 656 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 657 Internet Protocol", RFC 4301, December 2005. 659 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 660 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 661 Use over an MPLS PSN", RFC 4385, February 2006. 663 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 664 Cost Multipath Treatment in MPLS Networks", BCP 128, 665 RFC 4928, June 2007. 667 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 668 Specification", RFC 5036, October 2007. 670 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 671 Associated Channel", RFC 5586, June 2009. 673 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 674 J., and S. Amante, "Flow-Aware Transport of Pseudowires 675 over an MPLS Packet Switched Network", RFC 6391, 676 November 2011. 678 Appendix A. More Details on Existing Network Operator Practices and 679 Protocol Usage 681 Often, network operators have a contractual Service Level Agreement 682 (SLA) with customers for services that are comprised of numerical 683 values for performance measures, principally availability, latency, 684 delay variation. Additionally, network operators may have Service 685 Level Sepcification (SLS) that is for internal use by the operator. 686 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 687 for examples of the form of such SLA and SLS specifications. In this 688 document we use the term Network Performance Objective (NPO) as 689 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 690 have network operator and service specific implications. Note that 691 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 692 and may be looser than network operator SLA or SLS objectives. 693 Applications and acceptable user experience have an important 694 relationship to these performance parameters. 696 Consider latency as an example. In some cases, minimizing latency 697 relates directly to the best customer experience (e.g., in TCP closer 698 is faster). In other cases, user experience is relatively 699 insensitive to latency, up to a specific limit at which point user 700 perception of quality degrades significantly (e.g., interactive human 701 voice and multimedia conferencing). A number of NPOs have. a bound 702 on point-point latency, and as long as this bound is met, the NPO is 703 met -- decreasing the latency is not necessary. In some NPOs, if the 704 specified latency is not met, the user considers the service as 705 unavailable. An unprotected LSP can be manually provisioned on a set 706 of to meet this type of NPO, but this lowers availability since an 707 alternate route that meets the latency NPO cannot be determined. 709 Historically, when an IP/MPLS network was operated over a lower layer 710 circuit switched network (e.g., SONET rings), a change in latency 711 caused by the lower layer network (e.g., due to a maintenance action 712 or failure) this was not known to the MPLS network. This resulted in 713 latency affecting end user experience, sometimes violating NPOs or 714 resulting in user complaints. 716 A response to this problem was to provision IP/MPLS networks over 717 unprotected circuits and set the metric and/or TE-metric proportional 718 to latency. This resulted in traffic being directed over the least 719 latency path, even if this was not needed to meet an NPO or meet user 720 experience objectives. This results in reduced flexibility and 721 increased cost for network operators. Using lower layer networks to 722 provide restoration and grooming is expected to be more efficient, 723 but the inability to communicate performance parameters, in 724 particular latency, from the lower layer network to the higher layer 725 network is an important problem to be solved before this can be done. 727 Latency NPOs for point-to-point services are often tied closely to 728 geographic locations, while latency for multipoint services may be 729 based upon a worst case within a region. 731 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 732 terms of loss exceeding a threshold for a period on the order of 5 733 minutes. However, the timeframes for restoration (i.e., as 734 implemented by pre-determined protection, convergence of routing 735 protocols and/or signaling) for services range from on the order of 736 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 737 protection switching), to several minutes (e.g., to allow BGP to 738 reconverge for L3VPN) and may differ among the set of customers 739 within a single service. 741 The presence of only three Traffic Class (TC) bits (previously known 742 as EXP bits) in the MPLS shim header is limiting when a network 743 operator needs to support QoS classes for multiple services (e.g., 744 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 745 classes that need to be supported. In some cases one bit is used to 746 indicate conformance to some ingress traffic classification, leaving 747 only two bits for indicating the service QoS classes. The approach 748 that has been taken is to aggregate these QoS classes into similar 749 sets on LER-LSR and LSR-LSR links. 751 Labeled LSPs and use of link layer encapsulation have been 752 standardized in order to provide a means to meet these needs. 754 The IP DSCP cannot be used for flow identification since RFC 4301 755 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 756 network operators do not rely on the DSCP of Internet packets. In 757 addition, the use of IP DSCP for flow identification is incompatible 758 with Assured Forwarding services [RFC2597] or any other service which 759 may use more than one DSCP code point to carry traffic for a given 760 microflow. 762 A label is pushed onto Internet packets when they are carried along 763 with L2/L3VPN packets on the same link or lower layer network 764 provides a mean to distinguish between the QoS class for these 765 packets. 767 Operating an MPLS-TE network involves a different paradigm from 768 operating an IGP metric-based LDP signaled MPLS network. The 769 multipoint-to-point LDP signaled MPLS LSPs occur automatically, and 770 balancing across parallel links occurs if the IGP metrics are set 771 "equally" (with equality a locally definable relation). 773 Traffic is typically comprised of a few large (some very large) flows 774 and many small flows. In some cases, separate LSPs are established 775 for very large flows. This can occur even if the IP header 776 information is inspected by a LSR, for example an IPsec tunnel that 777 carries a large amount of traffic. An important example of large 778 flows is that of a L2/L3 VPN customer who has an access line 779 bandwdith comparable to a client-client composite link bandwidth -- 780 there could be flows that are on the order of the access line 781 bandwdith. 783 Appendix B. Existing Multipath Standards and Techniques 785 Today the requirement to handle large aggregations of traffic, much 786 larger than a single component link, can be handled by a number of 787 techniques which we will collectively call multipath. Multipath 788 applied to parallel links between the same set of nodes includes 789 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 790 other aggregation techniques some of which may be vendor specific. 791 Multipath applied to diverse paths rather than parallel links 792 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 793 even BGP, and equal cost LSP, as described in Appendix B.4. Various 794 mutilpath techniques have strengths and weaknesses. 796 the term Composite Link is more general than terms such as Link 797 Aggregation which is generally considered to be specific to Ethernet 798 and its use here is consistent with the broad definition in 799 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 800 refers to techniques which only solve the problem of large 801 aggregations of traffic, without addressing the other requirements 802 outlined in this document, particularly those described in Section 4 803 and Section 5. 805 B.1. Common Multpath Load Spliting Techniques 807 Identical load balancing techniqes are used for multipath both over 808 parallel links and over diverse paths. 810 Large aggregates of IP traffic do not provide explicit signaling to 811 indicate the expected traffic loads. Large aggregates of MPLS 812 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 813 are signaled using RSVP-TE extensions do provide explicit signaling 814 which includes the expected traffic load for the aggregate. LSP 815 which are signaled using LDP do not provide an expected traffic load. 817 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 818 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 819 payload, there is no signaling associated with these inner LSP. 820 Therefore even when using RSVP-TE signaling there may be insufficient 821 information provided by signaling to adequately distribute load based 822 solely on signaling. 824 Generally a set of label stack entries that is unique across the 825 ordered set of label numbers in the label stack can safely be assumed 826 to contain a group of flows. The reordering of traffic can therefore 827 be considered to be acceptable unless reordering occurs within 828 traffic containing a common unique set of label stack entries. 829 Existing load splitting techniques take advantage of this property in 830 addition to looking beyond the bottom of the label stack and 831 determining if the payload is IPv4 or IPv6 to load balance traffic 832 accordingly. 834 MPLS-TP OAM violates the assumption that it is safe to reorder 835 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 836 existing multipth techniques must be modified. Such modifications 837 are outside the scope of this document. 839 For example,a large aggregate of IP traffic may be subdivided into a 840 large number of groups of flows using a hash on the IP source and 841 destination addresses. This is as described in [RFC2475] and 842 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 843 can be performed on the set of labels in the label stack. These 844 techniques are both examples of means to subdivide traffic into 845 groups of flows for the purpose of load balancing traffic across 846 aggregated link capacity. The means of identifying a set of flows 847 should not be confused with the definition of a flow. 849 Discussion of whether a hash based approach provides a sufficiently 850 even load balance using any particular hashing algorithm or method of 851 distributing traffic across a set of component links is outside of 852 the scope of this document. 854 The current load balancing techniques are referenced in [RFC4385] and 855 [RFC4928]. The use of three hash based approaches are described in 856 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 857 described in [RFC6391]. The use of hash based approaches is 858 mentioned as an example of an existing set of techniques to 859 distribute traffic over a set of component links. Other techniques 860 are not precluded. 862 B.2. Simple and Adaptive Load Balancing Multipath 864 Simple multipath generally relies on the mathematical probability 865 that given a very large number of small microflows, these microflows 866 will tend to be distributed evenly across a hash space. Early very 867 simple multipath implementations assumed that all component links are 868 of equal capacity and perform a modulo operation across the hashed 869 value. An alternate simple multipath technique uses a table 870 generally with a power of two size, and distributes the table entries 871 proportionally among component links according to the capacity of 872 each component link. 874 Simple load balancing works well if there are a very large number of 875 small microflows (i.e., microflow rate is much less than component 876 link capacity). However, the case where there are even a few large 877 microflows is not handled well by simple load balancing. 879 An adaptive load balancing multipath technique is one where the 880 traffic bound to each component link is measured and the load split 881 is adjusted accordingly. As long as the adjustment is done within a 882 single network element, then no protocol extensions are required and 883 there are no interoperability issues. 885 Note that if the load balancing algorithm and/or its parameters is 886 adjusted, then packets in some flows may be briefly delivered out of 887 sequence, however in practice such adjustments can be made very 888 infrequent. 890 B.3. Traffic Split over Parallel Links 892 The load spliting techniques defined in Appendix B.1 and Appendix B.2 893 are both used in splitting traffic over parallel links between the 894 same pair of nodes. The best known technique, though far from being 895 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 896 technique had been applied much earlier using OSPF or ISIS Equal Cost 897 MultiPath (ECMP) over parallel links between the same nodes. 898 Multilink PPP [RFC1717] uses a technique that provides inverse 899 multiplexing, however a number of vendors had provided proprietary 900 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 901 Link Aggregation but are no longer used. 903 Link bundling [RFC4201] provides yet another means of handling 904 parallel LSP. RFC4201 explicitly allow a special value of all ones 905 to indicate a split across all members of the bundle. This "all 906 ones" component link is signaled in the MPLS RESV to indicate that 907 the link bundle is making use of classic multipath techniques. 909 B.4. Traffic Split over Multiple Paths 911 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 912 traffic split over multiple paths that may traverse intermediate 913 nodes. ECMP is often incorrectly equated to only this case, and 914 multipath over multiple diverse paths is often incorrectly equated to 915 ECMP. 917 Many implementations are able to create more than one LSP between a 918 pair of nodes, where these LSP are routed diversely to better make 919 use of available capacity. The load on these LSP can be distributed 920 proportionally to the reserved bandwidth of the LSP. These multiple 921 LSP may be advertised as a single PSC FA and any LSP making use of 922 the FA may be split over these multiple LSP. 924 Link bundling [RFC4201] component links may themselves be LSP. When 925 this technique is used, any LSP which specifies the link bundle may 926 be split across the multiple paths of the LSP that comprise the 927 bundle. 929 Appendix C. Characteristics of Transport in Core Networks 931 The characteristics of primary interest are the capacity of a single 932 circuit and the use of wave division multiplexing (WDM) to provide a 933 large number of parallel circuits. 935 Wave division multiplexing (WDM) supports multiple independent 936 channels (independent ignoring crosstalk noise) at slightly different 937 wavelengths of light, multiplexed onto a single fiber. Typical in 938 the early 2000s was 40 wavelengths of 10 Gb/s capacity per 939 wavelength. These wavelengths are in the C-band range, which is 940 about 1530-1565 nm, though some work has been done using the L-band 941 1565-1625 nm. 943 The C-band has been carved up using a 100 GHz spacing from 191.7 THz 944 to 196.1 THz by [ITU-T.G.694.2]. This yields 44 channels. If the 945 outermost channels are not used, due to poorer transmission 946 characteristics, then typcially 40 are used. For practical reasons, 947 a 50 GhZ or 25 GHz spacing is used by more recent equipment, 948 yielding. 80 or 160 channels in practice. 950 The early optical modulation techniques used within a single channel 951 yielded 2.5Gb/s and 10 Gb/s capacity per channel. As modulation 952 techniques have improved 40 Gb/s and 100 Gb/s per channel have been 953 acheived. 955 The 40 channels of 10 Gb/s common in the mid 2000s yields a total of 956 400 Gb/s. Tighter spacing and better modulations are yielding up to 957 8 Tb/s or more in more recent systems. 959 Over the optical is an electrical encoding. In the 1990s this was 960 typically Synchronous Optical Networking (SONET) or Synchronous 961 Digital Hierarchy (SDH), with a maximum defined circuit capacity of 962 40 Gb/s (OC-768), though the 10 Gb/s OC-192 is more common. More 963 recently the low level electrical encoding has been Optical Transport 964 Network (OTN) defined by ITU-T. OTN currently defines circuit 965 capacities up to a nominal 100 Gb/s (ODU4). Both SONET/SDH and OTN 966 make use of time division multiplexing (TDM) where the a higher 967 capacity circuit such as a 100 Gb/s ODU4 in OTN may be subdivided 968 into lower fixed capacity circuits such as ten 10 Gb/s ODU2. 970 In the 1990s, all IP and later IP/MPLS networks either used a 971 fraction of maximum circuit capacity, or at most the full circuit 972 capacity toward the end of the decade, when full circuit capacity was 973 2.5 Gb/s or 10 Gb/s. Beyond 2000, the TDM circuit multiplexing 974 capability of SONET/SDH or OTN was rarely used. 976 Early in the 2000s both transport equipment and core LSR offerred 40 977 Gb/s SONET OC-768. However 10 Gb/s transport equipment was 978 predominantly deployed throughout the decade, partially because LSR 979 10GbE ports were far more cost effective than either OC-192 or OC-768 980 and became practical in the second half of the decade. 982 Entering the 2010 decade, LSR 40GbE and 100GbE are expected to become 983 widely available and cost effective. Slightly preceeding this 984 transport equipment making use of 40 Gb/s and 100 Gb/s modulations 985 are becoming available. This transport equipment is capable or 986 carrying 40 Gb/s ODU3 and 100 Gb/s ODU4 circuits. 988 Early in the 2000s decade IP/MPLS core networks were making use of 989 single 10 Gb/s circuits. Capacity grew quickly in the first half of 990 the decade but more IP/MPLS core networks had only a small number of 991 IP/MPLS links requiring 4-8 parallel 10 Gb/s circuits. However, the 992 use of multipath was necessary, was deemed the simplest and most cost 993 effective alternative, and became thoroughly entrenched. By the end 994 of the 2000s decade nearly all major IP/MPLS core service provider 995 networks and a few content provider networks had IP/MPLS links which 996 exceeded 100 Gb/s, long before 40GbE was available and 40 Gb/s 997 transport in widespread use. 999 It is less clear when IP/MPLS LSP exceeded 10 Gb/s, 40 Gb/s, and 100 1000 Gb/s. By 2010, many service providers have LSP in excess of 100 1001 Gb/s, but few are willing to disclose how many LSP have reached this 1002 capacity. 1004 At the time of writing 40GbE and 100GbE LSR products are being 1005 evaluated by service providers and contect providers and are in use 1006 in network trials. The cost of components required to deliver 100 1007 GbE products remains high making these products less cost effective. 1008 This is expected to change within years. 1010 The important point is that IP/MPLS core network links have long ago 1011 exceeded 100 Gb/s and a small number of IP/MPLS LSP exceed 100 Gb/s. 1012 By the time 100 Gb/s circuits are widely deployed, IP/MPLS core 1013 network links are likely to exceed 1 Tb/s and many IP/MPLS LSP 1014 capacities are likely to exceed 100 Gb/s. Therefore multipath 1015 techniques are likely here to stay. 1017 Authors' Addresses 1019 So Ning 1020 Verizon 1021 2400 N. Glenville Ave. 1022 Richardson, TX 75082 1024 Phone: +1 972-729-7905 1025 Email: ning.so@verizonbusiness.com 1027 Andrew Malis 1028 Verizon 1029 117 West St. 1030 Waltham, MA 02451 1032 Phone: +1 781-466-2362 1033 Email: andrew.g.malis@verizon.com 1034 Dave McDysan 1035 Verizon 1036 22001 Loudoun County PKWY 1037 Ashburn, VA 20147 1039 Email: dave.mcdysan@verizon.com 1041 Lucy Yong 1042 Huawei USA 1043 1700 Alma Dr. Suite 500 1044 Plano, TX 75075 1046 Phone: +1 469-229-5387 1047 Email: lucyyong@huawei.com 1049 Curtis Villamizar 1050 Outer Cape Cod Network Consulting 1052 Email: curtis@occnc.com