idnits 2.17.1 draft-symmvo-rtgwg-cl-use-cases-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (June 20, 2012) is 4299 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-16) exists of draft-ietf-rtgwg-cl-requirement-04 == Outdated reference: A later version (-06) exists of draft-so-yong-rtgwg-cl-framework-04 -- Obsolete informational reference (is this intentional?): RFC 1717 (Obsoleted by RFC 1990) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG S. Ning 3 Internet-Draft Tata Communications 4 Intended status: Informational A. Malis 5 Expires: December 22, 2012 D. McDysan 6 Verizon 7 L. Yong 8 Huawei USA 9 C. Villamizar 10 Outer Cape Cod Network 11 Consulting 12 June 20, 2012 14 Composite Link Use Cases and Design Considerations 15 draft-symmvo-rtgwg-cl-use-cases-01 17 Abstract 19 This document provides a set of use cases and design considerations 20 for composite links. 22 Composite link is a formalization of multipath techniques currently 23 in use in IP and MPLS networks and a set of extensions to multipath 24 techniques. 26 Note: symmvo in the draft name is the initials of the set of authors: 27 So, Yong, McDysan, Malis, Villamizar, Osborne. This paragraph will 28 be removed when/if this document is adopted as a WG item. 30 Status of this Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at http://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on December 22, 2012. 47 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Conventions used in this document . . . . . . . . . . . . . . 3 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 66 3. Composite Link Foundation Use Cases . . . . . . . . . . . . . 4 67 4. Delay Sensitive Applications . . . . . . . . . . . . . . . . . 7 68 5. Large Volume of IP and LDP Traffic . . . . . . . . . . . . . . 7 69 6. Composite Link and Packet Ordering . . . . . . . . . . . . . . 8 70 6.1. MPLS-TP in network edges only . . . . . . . . . . . . . . 10 71 6.2. Composite Link at core LSP ingress/egress . . . . . . . . 11 72 6.3. MPLS-TP as a MPLS client . . . . . . . . . . . . . . . . . 12 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . 12 74 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 13 75 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 76 9.1. Normative References . . . . . . . . . . . . . . . . . . . 13 77 9.2. Informative References . . . . . . . . . . . . . . . . . . 13 78 Appendix A. More Details on Existing Network Operator 79 Practices and Protocol Usage . . . . . . . . . . . . 15 80 Appendix B. Existing Multipath Standards and Techniques . . . . . 17 81 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 18 82 B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 19 83 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 20 84 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 20 85 Appendix C. Characteristics of Transport in Core Networks . . . . 20 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 88 1. Introduction 90 Composite link requirements are specified in 91 [I-D.ietf-rtgwg-cl-requirement]. A composite link framework is 92 defined in [I-D.so-yong-rtgwg-cl-framework]. 94 Multipath techniques have been widely used in IP networks for over 95 two decades. The use of MPLS began more than a decade ago. 96 Multipath has been widely used in IP/MPLS networks for over a decade 97 with very little protocol support dedicated to effective use of 98 multipath. 100 The state of the art in multipath prior to composite links is 101 documented in Appendix B. 103 Both Ethernet Link Aggregation [IEEE-802.1AX] and MPLS link bundling 104 [RFC4201] have been widely used in today's MPLS networks. Composite 105 link differs in the following caracteristics. 107 1. A composite link allows bundling of non-homogenous links together 108 as a single logical link. 110 2. A composite link provides more information in the TE-LSDB and 111 supports more explicit control over placement of LSP. 113 2. Conventions used in this document 115 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 116 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 117 document are to be interpreted as described in RFC 2119 [RFC2119]. 119 2.1. Terminology 121 Terminology defined in [I-D.ietf-rtgwg-cl-requirement] is used in 122 this document. 124 In addition, the following terms are used: 126 classic multipath: 127 Classic multipath refers to the most common current practice in 128 implementation and deployment of multipath (see Appendix A). The 129 most common current practice makes use of a hash on the MPLS 130 label stack and if IPv4 or IPv6 are indicates under the label 131 stack, makes use of the IP source and destination addresses 132 [RFC4385] [RFC4928]. 134 classic link bundling: 135 Classic link bundling refers to the use of [RFC4201] where the 136 "all ones" component is not used. Where the "all ones" component 137 is used, link bundling behaves as classic multipath does. 138 Classic link bundling selects a single component link on which to 139 put any given LSP. 141 Among the important distinctions between classic multipath or classic 142 link bundling and Composite Link are: 144 1. Classic multipath has no provision to retain order among flows 145 within a subset of LSP. Classic link bundling retains order 146 among all flows but as a result does a poor job of splitting load 147 among components and therefore is rarely (if ever) deployed. 148 Composite Link allows per LSP control of load split 149 characteristics. 151 2. Classic multipath and classic link bundling do not provide a 152 means to put some LSP on component links with lower delay. 153 Composite Link does. 155 3. Classic multipath will provide a load balance for IP and LDP 156 traffic. Classic link bundling will not. Neither classic 157 multipath or classic link bundling will measure IP and LDP 158 traffic and reduce the advertised "Available Bandwidth" as a 159 result of that measurement. Composite Link better supports 160 RSVP-TE used with significant traffic levels of native IP and 161 native LDP. 163 4. Classic link bundling cannot support an LSP that is greater in 164 capacity than any single component link. Classic multipath and 165 Composite Link support this capability but will reorder traffic 166 on such an LSP. Composite Link can retain order of an LSP that 167 is carried within an LSP that is greater in capacity than any 168 single component link if the contained LSP has such a 169 requirement. 171 None of these techniques, classic multipath, classic link bundling, 172 or Composite Link, will reorder traffic among IP microflows. None of 173 these techniques will reorder traffic among PW, if a PWE3 Control 174 Word is used [RFC4385]. 176 3. Composite Link Foundation Use Cases 178 A simple composite link composed entirely of physical links is 179 illustrated in Figure 1, where a composite link is configured between 180 LSR1 and LSR2. This composite link has three component links. 182 Individual component links in a composite link may be supported by 183 different transport technologies such as wavelength, Ethernet VLAN. 184 Even if the transport technology implementing the component links is 185 identical, the characteristics (e.g., bandwidth, latency) of the 186 component links may differ. 188 The composite link in Figure 1 may carry LSP traffic flows and 189 control plane packets. Control plane packets may appear as IP 190 packets or may be carried within a generic associated channel (G-Ach) 191 [RFC5586]. A LSP may be established over the link by either RSVP-TE 192 [RFC3209] or LDP [RFC5036] signaling protocols. All component links 193 in a composite link are summarized in the same forwarding adjacency 194 LSP (FA-LSP) routing advertisement [RFC3945]. The composite link is 195 summarized as one TE-Link advertised into the IGP by the composite 196 link end points. This information is used in path computation when a 197 full MPLS control plane is in use. The individual component links or 198 groups of component links may optionally be advertised into the IGP 199 as sub-TLV of the composite link advertisement to indicate capacity 200 available with various characteristics, such as a delay range. 202 Management Plane 203 Configuration and Measurement <------------+ 204 ^ | 205 | | 206 +-------+-+ +-+-------+ 207 | | | | | | 208 CP Packets V | | V CP Packets 209 | V | | Component Link 1 | | ^ | 210 | | |=|===========================|=| | | 211 | +----| | Component Link 2 | |----+ | 212 | |=|===========================|=| | 213 Aggregated LSPs | | | | | 214 ~|~~~~~~>| | Component Link 3 | |~~~~>~~|~~ 215 | |=|===========================|=| | 216 | | | | | | 217 | LSR1 | | LSR2 | 218 +---------+ +---------+ 219 ! ! 220 ! ! 221 !<------ Composite Link ------->! 223 Figure 1: a composite link constructed with multiple physical links 224 between two LSR 226 [I-D.ietf-rtgwg-cl-requirement] specifies that component links may 227 themselves be composite links. Figure 2 shows three three forms of 228 component links which may be deployed in a network. 230 +-------+ 1. Physical Link +-------+ 231 | |-|----------------------------------------------|-| | 232 | | | | | | 233 | | | +------+ +------+ | | | 234 | | | | MPLS | 2. Logical Link | MPLS | | | | 235 | |.|.... |......|.....................|......|....|.| | 236 | | |-----| LSR3 |---------------------| LSR4 |----| | | 237 | | | +------+ +------+ | | | 238 | | | | | | 239 | | | | | | 240 | | | +------+ +------+ | | | 241 | | | |GMPLS | 3. Logical Link |GMPLS | | | | 242 | |.|. ...|......|.....................|......|....|.| | 243 | | |-----| LSR5 |---------------------| LSR6 |----| | | 244 | | +------+ +------+ | | 245 | LSR1 | | LSR2 | 246 +-------+ +-------+ 247 |<------------- Composite Link ------------------->| 249 Figure 2: Illustration of Various Component Link Types 251 The three forms of component link shown in Figure 2 are: 253 1. The first component link is configured with direct physical 254 media. 256 2. The second component link is a TE tunnel that traverses LSR3 and 257 LSR4, where LSR3 and LSR4 are the nodes supporting MPLS, but 258 supporting few or no GMPLS extensions. 260 3. The third component link is formed by lower layer network that 261 has GMPLS enabled. In this case, LSR5 and LSR6 are not the nodes 262 controlled by the MPLS but provide the connectivity for the 263 component link. 265 A composite link forms one logical link between connected LSR and is 266 used to carry aggregated traffic [I-D.ietf-rtgwg-cl-requirement]. 267 Composite link relies on its component links to carry the traffic 268 over the composite link. The endpoints of the composite link maps 269 incoming traffic into component links. 271 For example, LSR1 in Figure 1 distributes the set of traffic flows 272 including control plane packets among the set of component links. 273 LSR2 in Figure 1 receives the packets from its component links and 274 sends them to MPLS forwarding engine with no attempt to reorder 275 packets arriving on different component links. The traffic in the 276 opposite direction, from LSR2 to LSR1, is distributed across the set 277 of component links by the LSR2. 279 These three forms of component link are only example. Many other 280 examples are possible. A component link may itself be a composite 281 link. A segment of an LSP (single hop for that LSP) may be a 282 composite link. 284 4. Delay Sensitive Applications 286 Most applications benefit from lower delay. Some types of 287 applications are far more sensitive than others. For example, real 288 time bidirectional applications such as voice communication or two 289 way video conferencing are far more sensitive to delay than 290 unidirectional streaming audio or video. Non-interactive bulk 291 transfer is almost insensitive to delay if a large enough TCP window 292 is used. 294 Some applications are sensitive to delay but unwilling to pay extra 295 to insure lower delay. For example, many SIP end users are willing 296 to accept the delay offerred to best effort services as long as call 297 quality is good most of the time. 299 Other applications are sensitive to delay and willing to pay extra to 300 insure lower delay. For example, financial trading applications are 301 extremely sensitive to delay and with a lot at stake are willing to 302 go to great lengths to reduce delay. 304 Among the requirements of Composite Link are requirements to 305 advertise capacity available within configured ranges of delay within 306 a given composite link and the support the ability to place an LSP 307 only on component links that meeting that LSP's delay requirements. 309 The Composite Link requirements to accommodate delay sensitive 310 applications are analogous to diffserv requirements to accomodate 311 applications requiring higher quality of service on the same 312 infrastructure as applications with less demanding requirements. The 313 ability to share capacity with less demanding applications, with best 314 effort applications being the least demanding, can greatly reduce the 315 cost of delivering service to the more demanding applications. 317 5. Large Volume of IP and LDP Traffic 319 IP and LDP do not support traffic engineering. Both make use of a 320 shortest (lowest routing metric) path, with an option to use equal 321 cost multipath (ECMP). Note that though ECMP is prohibited in LDP 322 specifications, it is widely implemented. Where implemented for LDP, 323 ECMP is generally disabled by default for standards compliance, but 324 often enabled in LDP deployments. 326 Without traffic engineering capability, there must be sufficient 327 capacity to accomodate the IP and LDP traffic. If not, persistent 328 queuing delay and loss will occur. Unlike RSVP-TE, a subset of 329 traffic cannot be routed using constraint based routing to avoid a 330 congested portion of an infrastructure. 332 In existing networks which accomodate IP and/or LDP with RSVP-TE, 333 either the IP and LDP can be carried over RSVP-TE, or where the 334 traffic contribution of IP and LDP is small, IP and LDP can be 335 carried native and the effect on RSVP-TE can be ignored. Ignoring 336 the traffic contribution of IP is certainly valid on high capacity 337 networks where native IP is used primarily for control and network 338 management and customer IP is carried within RSVP-TE. 340 Where it is desireable to carry native IP and/or LDP and IP and/or 341 LDP traffic volumes are not negligible, RSVP-TE needs improvement. 342 The enhancement offerred by Composite Link is an ability to measure 343 the IP and LDP, filter the measurements, and reduce the capacity 344 available to RSVP-TE to avoid congestion. The treatment given to the 345 IP or LDP traffic is similar to the treatment when using the "auto- 346 bandwidth" feature in some RSVP-TE implementations on that same 347 traffic, and giving a higher priority (numerically lower setup 348 priority and holding priority value) to the "auto-bandwidth" LSP. 349 The difference is that the measurement is made at each hop and the 350 reduction in advertised bandwidth is made more directly. 352 6. Composite Link and Packet Ordering 354 A strong motivation for Composite Link is the need to provide LSP 355 capacity in IP backbones that exceeds the capacity of single 356 wavelengths provided by transport equipment and exceeds the practical 357 capacity limits acheivable through inverse multiplexing. Appendix C 358 describes characteristics and limitations of transport systems today. 359 Section 2 defines the terms "classic multipath" and "classic link 360 bundling" used in this section. 362 For purpose of discussion, consider two very large cities, city A and 363 city Z. For example, in the US high traffic cities might be New York 364 and Los Angeles and in Europe high traffic cities might be London and 365 Amsterdam. Two other high volume cities, city B and city Y may share 366 common provider core network infrastructure. Using the same 367 examples, the city B and Y may Washington DC and San Francisco or 368 Paris and Stockholm. In the US, the common infrastructure may span 369 Denver, Chicago, Detroit, and Cleveland. Other major traffic 370 contributors on either US coast include Boston, northern Virginia on 371 the east coast, and Seattle, and San Diego on the west coast. The 372 capacity of IP/MPLS links within the shared infrastructure, for 373 example city to city links in the Denver, Chicago, Detroit, and 374 Cleveland path in the US example, have capacities for most of the 375 2000s decade that greatly exceeded single circuits available in 376 transport networks. 378 For a case with four large traffic sources on either side of the 379 shared infrastructure, up to sixteen core city to core city traffic 380 flows in excess of transport circuit capacity may be accomodated on 381 the shared infrastructure. 383 Today the most common IP/MPLS core network design makes use of very 384 large links which consist of many smaller component links, but use 385 classic multipath techniques rather than classic link bundling or 386 Composite Link. A component link typically corresponds to the 387 largest circuit that the transport system is capable of providing (or 388 the largest cost effective circuit). IP source and destination 389 address hashing is used to distribute flows across the set of 390 component links as described in Appendix B.3. 392 Classic multipath can handle large LSP up to the total capacity of 393 the multipath (within limits, see Appendix B.2). A disadvantage of 394 classic multipath is the reordering among traffic within a given core 395 city to core city LSP. While there is no reordering within any 396 microflow and therefore no customer visible issue, MPLS-TP cannot be 397 used across an infrastructure where classic multipath is in use, 398 except within pseudowires. 400 These capacity issues force the use of classic multipath today. 401 Classic multipath excludes a direct use of MPLS-TP. The desire for 402 OAM, offerred by MPLS-TP, is in conflict with the use of classic 403 multipath. There are a number of alternatives that satisfy both 404 requirements. Some alternatives are described below. 406 MPLS-TP in network edges only 408 A simple approach which requires no change to the core is to 409 disallow MPLS-TP across the core unless carried within a 410 pseudowire (PW). MPLS-TP may be used within edge domains where 411 classic multipath is not used. PW may be signaled end to end 412 using single segment PW (SS-PW), or stitched across domains using 413 multisegment PW (MS-PW). The PW and anything carried within the 414 PW may use OAM as long as fat-PW [RFC6391] load splitting is not 415 used by the PW. 417 Composite Link at core LSP ingress/egress 419 The interior of the core network may use classic link bundling, 420 with the limitation that no LSP can exceed the capacity of a 421 single circuit. Larger non-MPLS-TP LSP can be configured using 422 multiple ingress to egress component MPLS-TP LSP. This can be 423 accomplished using existing IP source and destination address 424 hashing configured at LSP ingress and egress, or using Composite 425 Link configured at ingress and egress. Each component LSP, if 426 constrained to be no larger than the capacity of a single 427 circuit. can make use of MPLS-TP and offer OAM for all top level 428 LSP across the core. 430 MPLS-TP as a MPLS client 432 A third approach involves modifying the behavior of LSR in the 433 interior of the network core, such that MPLS-TP can be used on a 434 subset of LSP, where the capacity of any one LSP within that 435 MPLS-TP subset of LSP is not larger than the capacity of a single 436 circuit. This requirement is accommodated through a combination 437 of signaling to indicate LSP for which traffic splitting needs to 438 be constrained, the ability to constrain the depth of the label 439 stack over which traffic splitting can be applied on a per LSP 440 basis, and the ability to constrain the use of IP addresses below 441 the label stack for traffic splitting also on a per LSP basis. 443 The above list of alternatives allow packet ordering within an LSP to 444 be maintained in some circumstances and allow very large LSP 445 capacities. Each of these alternatives are discussed further in the 446 following subsections. 448 6.1. MPLS-TP in network edges only 450 Classic MPLS link bundling is defined in [RFC4201] and has existed 451 since early in the 2000s decade. Classic MPLS link bundling place 452 any given LSP entirely on a single component link. Classic MPLS link 453 bundling is not in widespread use as the means to accomodate large 454 link capacities in core networks due to the simplicity and better 455 multiplexing gain, and therefore lower network cost of classic 456 multipath. 458 If MPLS-TP OAM capability in the IP/MPLS network core LSP is not 459 required, then there is no need to change existing network designs 460 which use classic multipath and both label stack and IP source and 461 destination address based hashing as a basis for load splitting. 463 If MPLS-TP is needed for a subset of LSP, then those LSP can be 464 carried within pseudowires. The pseudowires adds a thin layer of 465 encapsulation and therefore a small overhead. If only a subset of 466 LSP need MPLS-TP OAM, then some LSP must make use of the pseudowires 467 and other LSP avoid them. A straihtforward way to accomplish this is 468 with administrative attributes [RFC3209]. 470 6.2. Composite Link at core LSP ingress/egress 472 Composite Link can be configured only for large LSP that are made of 473 smaller MPLS-TP component LSP. This approach is capable of 474 supporting MPLS-TP OAM over the entire set of component link LSP and 475 therefore the entire set of top level LSP traversing the core. 477 There are two primary disadvantage of this approach. One is the 478 number of top level LSP traversing the core can be dramatically 479 increased. The other disadvantage is the loss of multiplexing gain 480 that results from use of classic link bundling within the interior of 481 the core network. 483 If component LSP use MPLS-TP, then no component LSP can exceed the 484 capacity of a single circuit. For a given composite LSP there can 485 either be a number of equal capacity component LSP or some number of 486 full capacity component links plus one LSP carrying the excess. For 487 example, a 350 Gb/s composite LSP over a 100 Gb/s infrastructure may 488 use five 70 Gb/s component LSP or three 100 Gb/s LSP plus one 50 Gb/s 489 LSP. Classic MPLS link bundling is needed to support MPLS-TP and 490 suffers from a bin packing problem even if LSP traffic is completely 491 predictable, which it never is in practice. 493 The common means of setting composite link bandwidth parameters uses 494 long term statistical measures. For example, many providers base 495 their LSP bandwidth parameters on the 95th percentile of carried 496 traffic as measured over a one week period. It is common to add 497 10-30% to the 95th percentile value measured over the prior week and 498 adjust bandwidth parameters of LSP weekly. It is also possible to 499 measure traffic flow at the LSR and adjust bandwidth parameters 500 somewhat more dynamically. This is less common in deployments and 501 where deployed, make use of filtering to track very long term trends 502 in traffic levels. In either case, short term variation of traffic 503 levels relative to signaled LSP capacity are common. Allowing a 504 large overallocation of LSP bandwidth parameters (ie: adding 30% or 505 more) avoids overutilization of any given LSP, but increases unused 506 network capacity and increases network cost. Allowing a small 507 overallocation of LSP bandwidth parameters (ie: 10-20% or less) 508 results in both underutilization and overutilization but 509 statistically results in a total utilization within the core that is 510 under capacity most or all of the time. 512 The classic multipath solution accomodates the situation in which 513 some composite LSP are underutilizing their signaled capacity and 514 others are overutilizing their capacity with the need for far less 515 unused network capacity to accomodate variation in actual traffic 516 levels. If the actual traffic levels of LSP can be described by a 517 probability distribution, the variation of the sum of LSP is less 518 than the variation of any given LSP for all but a constant traffic 519 level (where the variation of the sum and the components are both 520 zero). 522 There are two situations which can motivate the use of this approach. 523 This design is favored if the provider values MPLS-TP OAM across the 524 core more than efficiency (or is unaware of the efficiency issue). 525 This design can also make sense if transport equipment or very low 526 cost core LSR are available which support only classic link bundling 527 and regardless of loss of multiplexing gain, are more cost effective 528 at carrying transit traffic than using equipment which supports IP 529 source and destination address hashing. 531 6.3. MPLS-TP as a MPLS client 533 Accomodating MPLS-TP as a MPLS client requires a small change to 534 forwarding behavior and is therefore most applicable to major network 535 overbuilds or new deployments. The change to forwarding is an 536 ability to limit the depth of MPLS labels used in hashing on the 537 label stack on a per LSP basis. Some existing hardware, particularly 538 microprogrammed hardware, may be able to accomodate this forwarding 539 change. Providing support in new hardware is not difficult, a much 540 smaller change than, for example, changes required to disable PHP in 541 an environment where LSP hierarchy is used. 543 The advantage of this approach is an ability to accommodate MPLS-TP 544 as a client LSP but retain the high multiplexing gain and therefore 545 efficency and low network cost of a pure MPLS deployment. The 546 disadvantage is the need for a small change in forwarding. 548 7. Security Considerations 550 This document is a use cases document. Existing protocols are 551 referenced such as MPLS. Existing techniques such as MPLS link 552 bundling and multipath techniques are referenced. These protocols 553 and techniques are documented elsewhere and contain security 554 considerations which are unchanged by this document. 556 This document also describes use cases for Composite Link, which is a 557 work-in-progress. Composite Link requirements are defined in 558 [I-D.ietf-rtgwg-cl-requirement]. [I-D.so-yong-rtgwg-cl-framework] 559 defines a framework for Composite Link. Composite Link bears many 560 similarities to MPLS link bundling and multipath techniques used with 561 MPLS. Aditional security considerations, if any, beyond those 562 already identified for MPLS, MPLS link bundling and multipath 563 techniques, will be documented in the framework document if specific 564 to the overall framework of Composite Link, or in protocol extensions 565 if specific to a given protocol extension defined later to support 566 Composite Link. 568 8. Acknowledgments 570 Authors would like to thank [ no one so far ] for their reviews and 571 great suggestions. 573 In the interest of full disclosure of affiliation and in the interest 574 of acknowledging sponsorship, past affiliations of authors are noted. 575 Much of the work done by Ning So occurred while Ning was at Verizon. 576 Much of the work done by Curtis Villamizar occurred while at 577 Infinera. Infinera continues to sponsor this work on a consulting 578 basis. 580 9. References 582 9.1. Normative References 584 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 585 Requirement Levels", BCP 14, RFC 2119, March 1997. 587 9.2. Informative References 589 [I-D.ietf-rtgwg-cl-requirement] 590 Villamizar, C., McDysan, D., Ning, S., Malis, A., and L. 591 Yong, "Requirements for MPLS Over a Composite Link", 592 draft-ietf-rtgwg-cl-requirement-04 (work in progress), 593 March 2011. 595 [I-D.so-yong-rtgwg-cl-framework] 596 So, N., Malis, A., McDysan, D., Yong, L., Villamizar, C., 597 and T. Li, "Composite Link Framework in Multi Protocol 598 Label Switching (MPLS)", 599 draft-so-yong-rtgwg-cl-framework-04 (work in progress), 600 June 2011. 602 [IEEE-802.1AX] 603 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 604 Standard for Local and Metropolitan Area Networks - Link 605 Aggregation", 2006, . 608 [ITU-T.G.694.2] 609 ITU-T, "Spectral grids for WDM applications: CWDM 610 wavelength grid", 2003, 611 . 613 [ITU-T.G.800] 614 ITU-T, "Unified functional architecture of transport 615 networks", 2007, 616 . 618 [ITU-T.Y.1540] 619 ITU-T, "Internet protocol data communication service - IP 620 packet transfer and availability performance parameters", 621 2007, . 623 [ITU-T.Y.1541] 624 ITU-T, "Network performance objectives for IP-based 625 services", 2006, . 627 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 628 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 630 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 631 and W. Weiss, "An Architecture for Differentiated 632 Services", RFC 2475, December 1998. 634 [RFC2597] Heinanen, J., Baker, F., Weiss, W., and J. Wroclawski, 635 "Assured Forwarding PHB Group", RFC 2597, June 1999. 637 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 638 June 1999. 640 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 641 Multicast Next-Hop Selection", RFC 2991, November 2000. 643 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 644 Algorithm", RFC 2992, November 2000. 646 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 647 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 648 Tunnels", RFC 3209, December 2001. 650 [RFC3260] Grossman, D., "New Terminology and Clarifications for 651 Diffserv", RFC 3260, April 2002. 653 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 654 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 655 June 2004. 657 [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching 658 (GMPLS) Architecture", RFC 3945, October 2004. 660 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 661 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 663 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 664 Internet Protocol", RFC 4301, December 2005. 666 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 667 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 668 Use over an MPLS PSN", RFC 4385, February 2006. 670 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 671 Cost Multipath Treatment in MPLS Networks", BCP 128, 672 RFC 4928, June 2007. 674 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 675 Specification", RFC 5036, October 2007. 677 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 678 Associated Channel", RFC 5586, June 2009. 680 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 681 J., and S. Amante, "Flow-Aware Transport of Pseudowires 682 over an MPLS Packet Switched Network", RFC 6391, 683 November 2011. 685 Appendix A. More Details on Existing Network Operator Practices and 686 Protocol Usage 688 Often, network operators have a contractual Service Level Agreement 689 (SLA) with customers for services that are comprised of numerical 690 values for performance measures, principally availability, latency, 691 delay variation. Additionally, network operators may have Service 692 Level Sepcification (SLS) that is for internal use by the operator. 693 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 694 for examples of the form of such SLA and SLS specifications. In this 695 document we use the term Network Performance Objective (NPO) as 696 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 697 have network operator and service specific implications. Note that 698 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 699 and may be looser than network operator SLA or SLS objectives. 700 Applications and acceptable user experience have an important 701 relationship to these performance parameters. 703 Consider latency as an example. In some cases, minimizing latency 704 relates directly to the best customer experience (e.g., in TCP closer 705 is faster). In other cases, user experience is relatively 706 insensitive to latency, up to a specific limit at which point user 707 perception of quality degrades significantly (e.g., interactive human 708 voice and multimedia conferencing). A number of NPOs have. a bound 709 on point-point latency, and as long as this bound is met, the NPO is 710 met -- decreasing the latency is not necessary. In some NPOs, if the 711 specified latency is not met, the user considers the service as 712 unavailable. An unprotected LSP can be manually provisioned on a set 713 of to meet this type of NPO, but this lowers availability since an 714 alternate route that meets the latency NPO cannot be determined. 716 Historically, when an IP/MPLS network was operated over a lower layer 717 circuit switched network (e.g., SONET rings), a change in latency 718 caused by the lower layer network (e.g., due to a maintenance action 719 or failure) this was not known to the MPLS network. This resulted in 720 latency affecting end user experience, sometimes violating NPOs or 721 resulting in user complaints. 723 A response to this problem was to provision IP/MPLS networks over 724 unprotected circuits and set the metric and/or TE-metric proportional 725 to latency. This resulted in traffic being directed over the least 726 latency path, even if this was not needed to meet an NPO or meet user 727 experience objectives. This results in reduced flexibility and 728 increased cost for network operators. Using lower layer networks to 729 provide restoration and grooming is expected to be more efficient, 730 but the inability to communicate performance parameters, in 731 particular latency, from the lower layer network to the higher layer 732 network is an important problem to be solved before this can be done. 734 Latency NPOs for point-to-point services are often tied closely to 735 geographic locations, while latency for multipoint services may be 736 based upon a worst case within a region. 738 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 739 terms of loss exceeding a threshold for a period on the order of 5 740 minutes. However, the timeframes for restoration (i.e., as 741 implemented by pre-determined protection, convergence of routing 742 protocols and/or signaling) for services range from on the order of 743 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 744 protection switching), to several minutes (e.g., to allow BGP to 745 reconverge for L3VPN) and may differ among the set of customers 746 within a single service. 748 The presence of only three Traffic Class (TC) bits (previously known 749 as EXP bits) in the MPLS shim header is limiting when a network 750 operator needs to support QoS classes for multiple services (e.g., 751 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 752 classes that need to be supported. In some cases one bit is used to 753 indicate conformance to some ingress traffic classification, leaving 754 only two bits for indicating the service QoS classes. The approach 755 that has been taken is to aggregate these QoS classes into similar 756 sets on LER-LSR and LSR-LSR links. 758 Labeled LSPs and use of link layer encapsulation have been 759 standardized in order to provide a means to meet these needs. 761 The IP DSCP cannot be used for flow identification since RFC 4301 762 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 763 network operators do not rely on the DSCP of Internet packets. In 764 addition, the use of IP DSCP for flow identification is incompatible 765 with Assured Forwarding services [RFC2597] or any other service which 766 may use more than one DSCP code point to carry traffic for a given 767 microflow. 769 A label is pushed onto Internet packets when they are carried along 770 with L2/L3VPN packets on the same link or lower layer network 771 provides a mean to distinguish between the QoS class for these 772 packets. 774 Operating an MPLS-TE network involves a different paradigm from 775 operating an IGP metric-based LDP signaled MPLS network. The 776 multipoint-to-point LDP signaled MPLS LSPs occur automatically, and 777 balancing across parallel links occurs if the IGP metrics are set 778 "equally" (with equality a locally definable relation). 780 Traffic is typically comprised of a few large (some very large) flows 781 and many small flows. In some cases, separate LSPs are established 782 for very large flows. This can occur even if the IP header 783 information is inspected by a LSR, for example an IPsec tunnel that 784 carries a large amount of traffic. An important example of large 785 flows is that of a L2/L3 VPN customer who has an access line 786 bandwdith comparable to a client-client composite link bandwidth -- 787 there could be flows that are on the order of the access line 788 bandwdith. 790 Appendix B. Existing Multipath Standards and Techniques 792 Today the requirement to handle large aggregations of traffic, much 793 larger than a single component link, can be handled by a number of 794 techniques which we will collectively call multipath. Multipath 795 applied to parallel links between the same set of nodes includes 796 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 797 other aggregation techniques some of which may be vendor specific. 798 Multipath applied to diverse paths rather than parallel links 799 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 800 even BGP, and equal cost LSP, as described in Appendix B.4. Various 801 mutilpath techniques have strengths and weaknesses. 803 the term Composite Link is more general than terms such as Link 804 Aggregation which is generally considered to be specific to Ethernet 805 and its use here is consistent with the broad definition in 806 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 807 refers to techniques which only solve the problem of large 808 aggregations of traffic, without addressing the other requirements 809 outlined in this document, particularly those described in Section 4 810 and Section 5. 812 B.1. Common Multpath Load Spliting Techniques 814 Identical load balancing techniqes are used for multipath both over 815 parallel links and over diverse paths. 817 Large aggregates of IP traffic do not provide explicit signaling to 818 indicate the expected traffic loads. Large aggregates of MPLS 819 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 820 are signaled using RSVP-TE extensions do provide explicit signaling 821 which includes the expected traffic load for the aggregate. LSP 822 which are signaled using LDP do not provide an expected traffic load. 824 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 825 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 826 payload, there is no signaling associated with these inner LSP. 827 Therefore even when using RSVP-TE signaling there may be insufficient 828 information provided by signaling to adequately distribute load based 829 solely on signaling. 831 Generally a set of label stack entries that is unique across the 832 ordered set of label numbers in the label stack can safely be assumed 833 to contain a group of flows. The reordering of traffic can therefore 834 be considered to be acceptable unless reordering occurs within 835 traffic containing a common unique set of label stack entries. 836 Existing load splitting techniques take advantage of this property in 837 addition to looking beyond the bottom of the label stack and 838 determining if the payload is IPv4 or IPv6 to load balance traffic 839 accordingly. 841 MPLS-TP OAM violates the assumption that it is safe to reorder 842 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 843 existing multipth techniques must be modified. Such modifications 844 are outside the scope of this document. 846 For example,a large aggregate of IP traffic may be subdivided into a 847 large number of groups of flows using a hash on the IP source and 848 destination addresses. This is as described in [RFC2475] and 849 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 850 can be performed on the set of labels in the label stack. These 851 techniques are both examples of means to subdivide traffic into 852 groups of flows for the purpose of load balancing traffic across 853 aggregated link capacity. The means of identifying a set of flows 854 should not be confused with the definition of a flow. 856 Discussion of whether a hash based approach provides a sufficiently 857 even load balance using any particular hashing algorithm or method of 858 distributing traffic across a set of component links is outside of 859 the scope of this document. 861 The current load balancing techniques are referenced in [RFC4385] and 862 [RFC4928]. The use of three hash based approaches are described in 863 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 864 described in [RFC6391]. The use of hash based approaches is 865 mentioned as an example of an existing set of techniques to 866 distribute traffic over a set of component links. Other techniques 867 are not precluded. 869 B.2. Simple and Adaptive Load Balancing Multipath 871 Simple multipath generally relies on the mathematical probability 872 that given a very large number of small microflows, these microflows 873 will tend to be distributed evenly across a hash space. Early very 874 simple multipath implementations assumed that all component links are 875 of equal capacity and perform a modulo operation across the hashed 876 value. An alternate simple multipath technique uses a table 877 generally with a power of two size, and distributes the table entries 878 proportionally among component links according to the capacity of 879 each component link. 881 Simple load balancing works well if there are a very large number of 882 small microflows (i.e., microflow rate is much less than component 883 link capacity). However, the case where there are even a few large 884 microflows is not handled well by simple load balancing. 886 An adaptive load balancing multipath technique is one where the 887 traffic bound to each component link is measured and the load split 888 is adjusted accordingly. As long as the adjustment is done within a 889 single network element, then no protocol extensions are required and 890 there are no interoperability issues. 892 Note that if the load balancing algorithm and/or its parameters is 893 adjusted, then packets in some flows may be briefly delivered out of 894 sequence, however in practice such adjustments can be made very 895 infrequent. 897 B.3. Traffic Split over Parallel Links 899 The load spliting techniques defined in Appendix B.1 and Appendix B.2 900 are both used in splitting traffic over parallel links between the 901 same pair of nodes. The best known technique, though far from being 902 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 903 technique had been applied much earlier using OSPF or ISIS Equal Cost 904 MultiPath (ECMP) over parallel links between the same nodes. 905 Multilink PPP [RFC1717] uses a technique that provides inverse 906 multiplexing, however a number of vendors had provided proprietary 907 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 908 Link Aggregation but are no longer used. 910 Link bundling [RFC4201] provides yet another means of handling 911 parallel LSP. RFC4201 explicitly allow a special value of all ones 912 to indicate a split across all members of the bundle. This "all 913 ones" component link is signaled in the MPLS RESV to indicate that 914 the link bundle is making use of classic multipath techniques. 916 B.4. Traffic Split over Multiple Paths 918 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 919 traffic split over multiple paths that may traverse intermediate 920 nodes. ECMP is often incorrectly equated to only this case, and 921 multipath over multiple diverse paths is often incorrectly equated to 922 ECMP. 924 Many implementations are able to create more than one LSP between a 925 pair of nodes, where these LSP are routed diversely to better make 926 use of available capacity. The load on these LSP can be distributed 927 proportionally to the reserved bandwidth of the LSP. These multiple 928 LSP may be advertised as a single PSC FA and any LSP making use of 929 the FA may be split over these multiple LSP. 931 Link bundling [RFC4201] component links may themselves be LSP. When 932 this technique is used, any LSP which specifies the link bundle may 933 be split across the multiple paths of the LSP that comprise the 934 bundle. 936 Appendix C. Characteristics of Transport in Core Networks 938 The characteristics of primary interest are the capacity of a single 939 circuit and the use of wave division multiplexing (WDM) to provide a 940 large number of parallel circuits. 942 Wave division multiplexing (WDM) supports multiple independent 943 channels (independent ignoring crosstalk noise) at slightly different 944 wavelengths of light, multiplexed onto a single fiber. Typical in 945 the early 2000s was 40 wavelengths of 10 Gb/s capacity per 946 wavelength. These wavelengths are in the C-band range, which is 947 about 1530-1565 nm, though some work has been done using the L-band 948 1565-1625 nm. 950 The C-band has been carved up using a 100 GHz spacing from 191.7 THz 951 to 196.1 THz by [ITU-T.G.694.2]. This yields 44 channels. If the 952 outermost channels are not used, due to poorer transmission 953 characteristics, then typcially 40 are used. For practical reasons, 954 a 50 GhZ or 25 GHz spacing is used by more recent equipment, 955 yielding. 80 or 160 channels in practice. 957 The early optical modulation techniques used within a single channel 958 yielded 2.5Gb/s and 10 Gb/s capacity per channel. As modulation 959 techniques have improved 40 Gb/s and 100 Gb/s per channel have been 960 acheived. 962 The 40 channels of 10 Gb/s common in the mid 2000s yields a total of 963 400 Gb/s. Tighter spacing and better modulations are yielding up to 964 8 Tb/s or more in more recent systems. 966 Over the optical is an electrical encoding. In the 1990s this was 967 typically Synchronous Optical Networking (SONET) or Synchronous 968 Digital Hierarchy (SDH), with a maximum defined circuit capacity of 969 40 Gb/s (OC-768), though the 10 Gb/s OC-192 is more common. More 970 recently the low level electrical encoding has been Optical Transport 971 Network (OTN) defined by ITU-T. OTN currently defines circuit 972 capacities up to a nominal 100 Gb/s (ODU4). Both SONET/SDH and OTN 973 make use of time division multiplexing (TDM) where the a higher 974 capacity circuit such as a 100 Gb/s ODU4 in OTN may be subdivided 975 into lower fixed capacity circuits such as ten 10 Gb/s ODU2. 977 In the 1990s, all IP and later IP/MPLS networks either used a 978 fraction of maximum circuit capacity, or at most the full circuit 979 capacity toward the end of the decade, when full circuit capacity was 980 2.5 Gb/s or 10 Gb/s. Beyond 2000, the TDM circuit multiplexing 981 capability of SONET/SDH or OTN was rarely used. 983 Early in the 2000s both transport equipment and core LSR offerred 40 984 Gb/s SONET OC-768. However 10 Gb/s transport equipment was 985 predominantly deployed throughout the decade, partially because LSR 986 10GbE ports were far more cost effective than either OC-192 or OC-768 987 and became practical in the second half of the decade. 989 Entering the 2010 decade, LSR 40GbE and 100GbE are expected to become 990 widely available and cost effective. Slightly preceeding this 991 transport equipment making use of 40 Gb/s and 100 Gb/s modulations 992 are becoming available. This transport equipment is capable or 993 carrying 40 Gb/s ODU3 and 100 Gb/s ODU4 circuits. 995 Early in the 2000s decade IP/MPLS core networks were making use of 996 single 10 Gb/s circuits. Capacity grew quickly in the first half of 997 the decade but more IP/MPLS core networks had only a small number of 998 IP/MPLS links requiring 4-8 parallel 10 Gb/s circuits. However, the 999 use of multipath was necessary, was deemed the simplest and most cost 1000 effective alternative, and became thoroughly entrenched. By the end 1001 of the 2000s decade nearly all major IP/MPLS core service provider 1002 networks and a few content provider networks had IP/MPLS links which 1003 exceeded 100 Gb/s, long before 40GbE was available and 40 Gb/s 1004 transport in widespread use. 1006 It is less clear when IP/MPLS LSP exceeded 10 Gb/s, 40 Gb/s, and 100 1007 Gb/s. By 2010, many service providers have LSP in excess of 100 1008 Gb/s, but few are willing to disclose how many LSP have reached this 1009 capacity. 1011 At the time of writing 40GbE and 100GbE LSR products are being 1012 evaluated by service providers and contect providers and are in use 1013 in network trials. The cost of components required to deliver 100 1014 GbE products remains high making these products less cost effective. 1015 This is expected to change within years. 1017 The important point is that IP/MPLS core network links have long ago 1018 exceeded 100 Gb/s and a small number of IP/MPLS LSP exceed 100 Gb/s. 1019 By the time 100 Gb/s circuits are widely deployed, IP/MPLS core 1020 network links are likely to exceed 1 Tb/s and many IP/MPLS LSP 1021 capacities are likely to exceed 100 Gb/s. Therefore multipath 1022 techniques are likely here to stay. 1024 Authors' Addresses 1026 So Ning 1027 Tata Communications 1029 Email: ning.so@tatacommunications.com 1030 Andrew Malis 1031 Verizon 1032 117 West St. 1033 Waltham, MA 02451 1035 Phone: +1 781-466-2362 1036 Email: andrew.g.malis@verizon.com 1038 Dave McDysan 1039 Verizon 1040 22001 Loudoun County PKWY 1041 Ashburn, VA 20147 1043 Email: dave.mcdysan@verizon.com 1045 Lucy Yong 1046 Huawei USA 1047 5340 Legacy Dr. 1048 Plano, TX 75025 1050 Phone: +1 469-277-5837 1051 Email: lucy.yong@huawei.com 1053 Curtis Villamizar 1054 Outer Cape Cod Network Consulting 1056 Email: curtis@occnc.com