idnits 2.17.1 draft-ietf-rtgwg-cl-requirement-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: FR#5 Any automatic LSP routing and/or load balancing solutions MUST not oscillate such that performance observed by users changes such that an SLA is violated. Since oscillation may cause reordering, there MUST be means to control the frequency of changing the component link over which a flow is placed. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: DR#9 When a worst case failure scenario occurs, the number of RSVP-TE LSPs to be resignaled will cause a period of unavailability as perceived by users. The resignaling time of the solution MUST meet the SLA objective for the duration of unavailability. The resignaling time of the solution MUST not increase significantly as compared with current methods. -- The document date (July 8, 2010) is 5034 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'I-D.ietf-pwe3-fat-pw' is mentioned on line 661, but not defined == Missing Reference: 'IEEE-802.1AX' is mentioned on line 698, but not defined == Missing Reference: 'ITU-T.Y.1541' is mentioned on line 510, but not defined == Missing Reference: 'RFC1717' is mentioned on line 701, but not defined ** Obsolete undefined reference: RFC 1717 (Obsoleted by RFC 1990) == Missing Reference: 'RFC2475' is mentioned on line 645, but not defined == Missing Reference: 'RFC2615' is mentioned on line 703, but not defined == Missing Reference: 'RFC2991' is mentioned on line 660, but not defined == Missing Reference: 'RFC2992' is mentioned on line 660, but not defined == Missing Reference: 'RFC3260' is mentioned on line 646, but not defined == Missing Reference: 'RFC4201' is mentioned on line 725, but not defined == Missing Reference: 'RFC4301' is mentioned on line 565, but not defined == Missing Reference: 'RFC4385' is mentioned on line 658, but not defined == Missing Reference: 'RFC4928' is mentioned on line 659, but not defined == Unused Reference: 'RFC2702' is defined on line 427, but no explicit reference was found in the text == Unused Reference: 'RFC4665' is defined on line 446, but no explicit reference was found in the text == Unused Reference: 'RFC5254' is defined on line 450, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 19 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG C. Villamizar, Ed. 3 Internet-Draft Infinera Corporation 4 Intended status: Informational D. McDysan, Ed. 5 Expires: January 9, 2011 S. Ning 6 A. Malis 7 Verizon 8 L. Yong 9 Huawei USA 10 July 8, 2010 12 Requirements for MPLS Over a Composite Link 13 draft-ietf-rtgwg-cl-requirement-01 15 Abstract 17 There is often a need to provide large aggregates of bandwidth that 18 is best provided using parallel links between routers or MPLS LSR. 19 In core networks there is often no alternative since the aggregate 20 capacities of core networks today far exceed the capacity of a single 21 physical link or single packet processing element. Furthermore, 22 links may be composed of network elements operating across multiple 23 layers. 25 The presence of parallel links, potentially comprised of multiple 26 layers has resulted in a additional requirements. Certain services 27 may benefit from being restricted to a subset of the set of composite 28 link component links or a specific component link, where component 29 link characteristics, such as latency, differ. Certain services 30 require that LSP be treated as atomic and avoid reordering. Other 31 services will continue to require only that reordering not occur with 32 a microflow as is current practice. 34 Current practice related to multipath is described briefly in an 35 appendix. 37 Status of this Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on January 9, 2011. 54 Copyright Notice 56 Copyright (c) 2010 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 73 2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 4. Network Operator Functional Requirements . . . . . . . . . . . 5 76 4.1. Availability, Stability and Transient Response . . . . . . 5 77 4.2. Component Links Provided by Lower Layer Networks . . . . . 6 78 4.3. Parallel Component Links with Different Characteristics . 7 79 5. Derived Requirements . . . . . . . . . . . . . . . . . . . . . 8 80 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 83 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 84 9.1. Normative References . . . . . . . . . . . . . . . . . . . 10 85 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 86 9.3. Appendix References . . . . . . . . . . . . . . . . . . . 11 87 Appendix A. More Details on Existing Network Operator 88 Practices and Protocol Usage . . . . . . . . . . . . 12 89 Appendix B. Existing Multipath Standards and Techniques . . . . . 14 90 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 15 91 B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 16 92 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 16 93 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 17 94 Appendix C. ITU-T G.800 Composite Link Definitions and 95 Terminology . . . . . . . . . . . . . . . . . . . . . 17 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 98 1. Introduction 100 The purpose of this document is to describe why network operators 101 require certain functions in order to solve certain business problems 102 (Section 2). The intent is to first describe why things need to be 103 done in terms of functional requirements that are as independent as 104 possible of protocol specifications (Section 4). For certain 105 functional requirements this document describes a set of derived 106 protocol requirements (Section 5). Three appendices provide 107 supporting details as a summary of existing/prior operator 108 approaches, a summary of implementation techniques and relevant 109 protocol standards, and a summary of G.800 terminology used to define 110 the concept of a composite link. (Appendix B). 112 1.1. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in RFC 2119 [RFC2119]. 118 2. Assumptions 120 The services supported include L3VPN, L2VPN (VPWS and VPLS), Internet 121 traffic encapsulated by at least one MPLS label, and dynamically 122 signaled MPLS-TP LSPs and pseudowires. The MPLS LSPs supporting 123 these services may be pt-pt, pt-mpt, or mpt-mpt. 125 The location in a network where these requirements apply are a Label 126 Edge Router (LER) or a Label Switch Router (LSR) as defined in RFC 127 3031 [RFC3031]. 129 The IP DSCP cannot be used for flow identification since L3VPN 130 requires Diffserv transparency (see RFC 4031 5.5.2 [RFC4031]), and in 131 general network operators do not rely on the DSCP of Internet 132 packets. 134 3. Definitions 136 Composite Link: 137 Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link 138 as summarized in Appendix Appendix C. The following definitions 139 map the ITU-T G.800 terminology into IETF terminology which is 140 used in this document. 142 Multiple parallel links: When multiple parallel component links 143 between the an LER/LSR and another LER/LSR. 145 Multi-layer Component Link: A component link that is formed by 146 other network elements at other layers. 148 Component Link: A physical link (e.g., Lambda, Ethernet PHY, SONET/ 149 SDH, OTN, etc.) with packet transport capability, or a logical 150 link (e.g., MPLS LSP, Ethernet VLAN, MPLS-TP LSP, etc.) 152 Flow: A sequence of packets that must be transferred on one 153 component link. 155 Flow identification: The label stack and other information that 156 uniquely identifies a flow. Other information in flow 157 identification may include an IP header, PW control word, 158 Ethernet MAC address, etc. Note that an LSP may contain one or 159 more Flows or an LSP may be equivalent to a Flow. Flow 160 identification is used to locally select a component link, or a 161 path through the network toward the destination. 163 4. Network Operator Functional Requirements 165 The Functional Requirements in this section are grouped in 166 subsections starting with the highest priority. 168 4.1. Availability, Stability and Transient Response 170 Limiting the period of unavailability in response to failures or 171 transient events is extremely important as well as maintaining 172 stability. The transient period between some service disrupting 173 event and the convergence of the routing and/or signaling protocols 174 MUST occur within a time frame specified by SLA objectives. The 175 timeframes range from rapid restoration, on the order of 100 ms or 176 less (e.g., for VPWS), to several minutes (e.g., for L3VPN) and may 177 differ among the set of customers within a single service. 179 FR#1 The solution SHALL provide a means to summarize routing 180 advertisements regarding the characteristics of a composite 181 link such that the routing protocol convergence within the 182 timeframe needed to meet the SLA objective.. 184 FR#2 The solution SHALL provide a means for aggregating signaling 185 such that in response to a failure in the worst case cross 186 section of the network that MPLS LSPs are restored within the 187 timeframe needed to meet the SLA objective. 189 FR#3 The solution SHALL provide to select a path for a flow across a 190 network that contains a number of paths comprised of pairs of 191 nodes connected by composite links in such a way as to 192 automatically distribute the load over the network nodes 193 connected by composite links while meeting all of the other 194 mandatory requirements stated above. The solution SHOULD work 195 in a manner similar to that when the characteristics of the 196 individual component links are advertised. 198 FR#4 If extensions to existing protocols are specified and/or new 199 protocols are defined, then the solution SHOULD provide a means 200 for a network operator to migrate an existing deployment in a 201 minimally disruptive manner. 203 FR#5 Any automatic LSP routing and/or load balancing solutions MUST 204 not oscillate such that performance observed by users changes 205 such that an SLA is violated. Since oscillation may cause 206 reordering, there MUST be means to control the frequency of 207 changing the component link over which a flow is placed. 209 FR#6 Management and diagnostic protocols MUST be able to operate 210 over composite links. 212 4.2. Component Links Provided by Lower Layer Networks 214 Case 3 as defined in [ITU-T.G.800] involves a component link 215 supporting an MPLS layer network over another lower layer network 216 (e.g., circuit switched or another MPLS network (e.g., MPLS-TP)). 217 The lower layer network may change the latency (and/or other 218 performance parameters) seen by the MPLS layer network. Network 219 Operators have SLAs of which some components are based on performance 220 parameters. Currently, there is no protocol for the lower layer 221 network to inform the higher layer network of a change in a 222 performance parameter. Communication of the latency performance 223 parameter is a very important requirement. Communication of other 224 performance parameters (e.g., delay variation) is desirable. 226 FR#7 In order to support network SLAs and provide acceptable user 227 experience, the solution SHALL specify a protocol means to 228 allow a lower layer server network to communicate latency to 229 the higher layer client network. 231 FR#8 The precision of latency reporting SHOULD be at least 10% of 232 the one way latency for latency of 1 ms or more. 234 FR#9 The solution SHALL provide a means to limit the latency on a 235 per LSP basis between nodes within a network to meet an SLA 236 target when the path between these nodes contains one or more 237 pairs of nodes connected via a composite link. 239 The SLAs differ across the services, and some services have 240 different SLAs for different QoS classes, for example, one QoS 241 class may have a much larger latency bound than another. 242 Overload can occur which would violate an SLA parameter (e.g., 243 loss) and some remedy to handle this case for a composite 244 link. 246 FR#10 If the total demand offered by traffic flows exceeds the 247 capacity of the composite link, the solution SHOULD define a 248 means to cause the LSPs for some traffic flows to move to some 249 other point in the network that is not congested. These 250 "preempted LSPs" may not be restored if there is no 251 uncongested path in the network. 253 4.3. Parallel Component Links with Different Characteristics 255 Corresponding to Case 1 of [ITU-T.G.800], as one means to provide 256 high availability, network operators deploy a topology in the MPLS 257 network using lower layer networks that have a certain degree of 258 diversity at the lower layer(s). Many techniques have been developed 259 to balance the distribution of flows across component links that 260 connect the same pair of nodes (See Appendix B.3). When the path for 261 a flow can be chosen from a set of candidate nodes connected via 262 composite links, other techniques have been developed (See 263 Appendix B.4). 265 FR#11 The solution SHALL measure traffic on a labeled traffic flow 266 and dynamically select the component link on which to place 267 this flow in order to balance the load so that no component 268 link in the composite link between a pair of nodes is 269 overloaded. 271 FR#12 When a traffic flow is moved from one component link to 272 another in the same composite link between a set of nodes (or 273 sites), it MUST be done so in a minimally disruptive manner. 275 When a flow is moved from a current link to a target link with 276 different latency, reordering can occur if the target link 277 latency is less than that of the current or clumping can occur 278 if target link latency is greater than that of the current. 279 Therefore, some flows (e.g., timing distribution, PW circuit 280 emulation) are quite sensitive to these effects, which may be 281 specified in an SLA or are needed to meet a user experience 282 objective (e.g. jitter buffer under/overrun). 284 FR#13 The solution SHALL provide a means to identify flows whose 285 rearrangement frequency needs to be bounded by a configured 286 value. 288 FR#14 The solution SHALL provide a means that communicates whether 289 the flows within an LSP can be split across multiple component 290 links. The solution SHOULD provide a means to indicate the 291 flow identification field(s) which can be used along the flow 292 path which can be used to perform this function. 294 FR#15 The solution SHALL provide a means to indicate that a traffic 295 flow shall select a component link with the minimum latency 296 value. 298 FR#16 The solution SHALL provide a means to indicate that a traffic 299 flow shall select a component link with a maximum acceptable 300 latency value as specified by protocol. 302 FR#17 The solution SHALL provide a means to indicate that a traffic 303 flow shall select a component link with a maximum acceptable 304 delay variation value as specified by protocol. 306 FR#18 The solution SHALL provide a local means to a node which 307 automatically distribute flows across the component links in 308 the composite link that connects to the other node such that 309 SLA objectives are met. 311 FR#19 The solution SHALL provide a means to distribute flows from a 312 single LSP across multiple component links to handle at least 313 the case where the traffic carried in an LSP exceeds that of 314 any component link in the composite link. 316 5. Derived Requirements 318 This section takes the next step and derives high-level requirements 319 on protocol specification from the functional requirements. 321 DR#1 The solution SHOULD attempt to extend existing protocols 322 wherever possible, developing a new protocol only if this adds 323 a significant set of capabilities. 325 The vast majority of network operators have provisioned L3VPN 326 services over LDP. Many have deployed L2VPN services over LDP 327 as well. TE extensions to IGP and RSVP-TE are viewed as being 328 overly complex by some operators. 330 DR#2 A solution SHOULD extend LDP capabilities to meet functional 331 requirements (without using TE methods as decided in 332 [RFC3468]). 334 DR#3 Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported 335 on a composite link. Other functional requirements should be 336 supported as independently of signaling protocol as possible. 338 DR#4 When the nodes connected via a composite link are in the same 339 MPLS network topology, the solution MAY define extensions to 340 the IGP. 342 DR#5 When the nodes are connected via a composite link are in 343 different MPLS network topologies, the solution SHALL NOT rely 344 on extensions to the IGP. 346 DR#6 When a worst case failure scenario occurs,the resulting number 347 of links advertised in the IGP causes IGP convergence to occur, 348 causing a period of unavailability as perceived by users. The 349 convergence time of the solution MUST meet the SLA objective 350 for the duration of unavailability. 352 DR#7 The Solution SHALL summarize the characteristics of the 353 component links as a composite link IGP advertisement that 354 results in convergence time better than that of advertising the 355 individual component links. This summary SHALL be designed so 356 that it represents the range of capabilities of the individual 357 component links such that functional requirements are met, and 358 also minimizes the frequency of advertisement updates which may 359 cause IGP convergence to occur. Examples of advertisement 360 update tiggering events to be considered include: LSP 361 establishment/release, changes in component link 362 characteristics (e.g., latency, up/down state), and/or 363 bandwidth utilization. 365 DR#8 When a worst case failure scenario occurs,the resulting number 366 of links advertised in the IGP causes IGP convergence to occur, 367 causing a period of unavailability as perceived by users. The 368 convergence time of the solution MUST meet the SLA objective 369 for the duration of unavailability. 371 DR#9 When a worst case failure scenario occurs, the number of 372 RSVP-TE LSPs to be resignaled will cause a period of 373 unavailability as perceived by users. The resignaling time of 374 the solution MUST meet the SLA objective for the duration of 375 unavailability. The resignaling time of the solution MUST not 376 increase significantly as compared with current methods. 378 6. Acknowledgements 380 Frederic Jounay of France Telecom and Yuji Kamite of NTT 381 Communications Corporation co-authored a version of this document. 383 A rewrite of this document occurred after the IETF77 meeting. 384 Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder 385 and Alex Zinin, and others provided valuable guidance prior to and at 386 the IETF77 RTGWG meeting. 388 Tony Li and John Drake have made numerous valuable comments on the 389 RTGWG mailing list that are reflected in versions following the 390 IETF77 meeting. 392 7. IANA Considerations 394 This memo includes no request to IANA. 396 8. Security Considerations 398 This document specifies a set of requirements. The requirements 399 themselves do not pose a security threat. If these requirements are 400 met using MPLS signaling as commonly practiced today with 401 authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP, 402 then the requirement to provide additional information in this 403 communication presents additional information that could conceivably 404 be gathered in a man-in-the-middle confidentiality breach. Such an 405 attack would require a capability to monitor this signaling either 406 through a provider breach or access to provider physical transmission 407 infrastructure. A provider breach already poses a threat of numerous 408 tpes of attacks which are of far more serious consequence. Encrption 409 of the signaling can prevent or render more difficult any 410 confidentiality breach that otherwise might occur by means of access 411 to provider physical transmission infrastructure. 413 9. References 415 9.1. Normative References 417 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 418 Requirement Levels", BCP 14, RFC 2119, March 1997. 420 9.2. Informative References 422 [ITU-T.G.800] 423 ITU-T, "Unified functional architecture of transport 424 networks", 2007, . 427 [RFC2702] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., and J. 428 McManus, "Requirements for Traffic Engineering Over MPLS", 429 RFC 2702, September 1999. 431 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 432 Label Switching Architecture", RFC 3031, January 2001. 434 [RFC3468] Andersson, L. and G. Swallow, "The Multiprotocol Label 435 Switching (MPLS) Working Group decision on MPLS signaling 436 protocols", RFC 3468, February 2003. 438 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 439 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 440 June 2004. 442 [RFC4031] Carugi, M. and D. McDysan, "Service Requirements for Layer 443 3 Provider Provisioned Virtual Private Networks (PPVPNs)", 444 RFC 4031, April 2005. 446 [RFC4665] Augustyn, W. and Y. Serbest, "Service Requirements for 447 Layer 2 Provider-Provisioned Virtual Private Networks", 448 RFC 4665, September 2006. 450 [RFC5254] Bitar, N., Bocci, M., and L. Martini, "Requirements for 451 Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)", 452 RFC 5254, October 2008. 454 9.3. Appendix References 456 [I-D.ietf-pwe3-fat-pw] 457 Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 458 J., and S. Amante, "Flow Aware Transport of Pseudowires 459 over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in 460 progress), January 2010. 462 [IEEE-802.1AX] 463 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 464 Standard for Local and Metropolitan Area Networks - Link 465 Aggregation", 2006, . 468 [ITU-T.Y.1541] 469 ITU-T, "Network performance objectives for IP-based 470 services", 2006, . 472 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 473 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 475 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 476 and W. Weiss, "An Architecture for Differentiated 477 Services", RFC 2475, December 1998. 479 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 480 June 1999. 482 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 483 Multicast Next-Hop Selection", RFC 2991, November 2000. 485 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 486 Algorithm", RFC 2992, November 2000. 488 [RFC3260] Grossman, D., "New Terminology and Clarifications for 489 Diffserv", RFC 3260, April 2002. 491 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 492 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 494 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 495 Internet Protocol", RFC 4301, December 2005. 497 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 498 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 499 Use over an MPLS PSN", RFC 4385, February 2006. 501 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 502 Cost Multipath Treatment in MPLS Networks", BCP 128, 503 RFC 4928, June 2007. 505 Appendix A. More Details on Existing Network Operator Practices and 506 Protocol Usage 508 Network operators have SLAs for services that are comprised of 509 numerical values for performance measures, principally availability, 510 latency, delay variation. See [ITU-T.Y.1541], RFC 3809, Section 4.9 511 [RFC3809] for examples of the form of such SLAs. Note that the 512 numerical values of Y.1541 span multiple networks and may be looser 513 than network operator SLAs. Applications and acceptable user 514 experience have a relationship to these performance parameters. 516 Consider latency as an example. In some cases, minimizing latency 517 relates directly to the best customer experience (e.g., in TCP closer 518 is faster). I other cases, user experience is relatively insensitive 519 to latency, up to a specific limit at which point user perception of 520 quality degrades significantly (e.g., interactive human voice and 521 multimedia conferencing). A number of SLAs have. a bound on point- 522 point latency, and as long as this bound is met, the SLA is met -- 523 decreasing the latency is not necessary. In some SLAs, if the 524 specified latency is not met, the user considers the service as 525 unavailable. An unprotected LSP can be manually provisioned on a set 526 of to meet this type of SLA, but this lowers availability since an 527 alternate route that meets the latency SLA cannot be determined. 529 Historically, when an IP/MPLS network was operated over a lower layer 530 circuit switched network (e.g., SONET rings), a change in latency 531 caused by the lower layer network (e.g., due to a maintenance action 532 or failure) this was not known to the MPLS network. This resulted in 533 latency affecting end user experience, sometimes violating SLAs or 534 resulting in user complaints. 536 A response to this problem was to provision IP/MPLS networks over 537 unprotected circuits and set the metric and/or TE-metric proportional 538 to latency. This resulted in traffic being directed over the least 539 latency path, even if this was not needed to meet an SLA or meet user 540 experience objectives. This results in reduced flexibility and 541 increased cost for network operators. Using lower layer networks to 542 provide restoration and grooming is expected to be more efficient, 543 but the inability to communicate performance parameters, in 544 particular latency, from the lower layer network to the higher layer 545 network is an important problem to be solved before this can be done. 547 Latency SLAs for pt-pt services are often tied closely to geographic 548 locations, while latency for multipoint services may be based upon a 549 worst case within a region. 551 The presence of only three Traffic Class (TC) bits (previously known 552 as EXP bits) in the MPLS shim header is limiting when a network 553 operator needs to support QoS classes for multiple services (e.g., 554 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 555 classes that need to be supported. In some cases one bit is used to 556 indicate conformance to some ingress traffic classification, leaving 557 only two bits for indicating the service QoS classes. The approach 558 that has been taken is to aggregate these QoS classes into similar 559 sets on LER-LSR and LSR-LSR links. 561 Labeled LSPs have been and use of link layer encapsulation have been 562 standardized in order to provide a means to meet these needs. 564 The IP DSCP cannot be used for flow identification since RFC 4301 565 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 566 network operators do not rely on the DSCP of Internet packets. 568 A label is pushed onto Internet packets when they are carried along 569 with L2/L3VPN packets on the same link or lower layer network 570 provides a mean to distinguish between the QoS class for these 571 packets. 573 Operating an MPLS-TE network involves a different paradigm from 574 operating an IGP metric-based LDP signaled MPLS network. The mpt-pt 575 LDP signaled MPLS LSPs occur automatically, and balancing across 576 parallel links occurs if the IGP metrics are set "equally" (with 577 equality a locally definable relation). 579 Traffic is typically comprised of a few large (some very large) flows 580 and many small flows. In some cases, separate LSPs are established 581 for very large flows. This can occur even if the IP header 582 information is inspected by a router, for example an IPsec tunnel 583 that carries a large amount of traffic. An important example of 584 large flows is that of a L2/L3 VPN customer who has an access line 585 bandwdith comparable to a client-client composite link bandwidth -- 586 there could be flows that are on the order of the access line 587 bandwdith. 589 Appendix B. Existing Multipath Standards and Techniques 591 Today the requirement to handle large aggregations of traffic, much 592 larger than a single component link, can be handled by a number of 593 techniques which we will collectively call multipath. Multipath 594 applied to parallel links between the same set of nodes includes 595 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 596 other aggregation techniques some of which may be vendor specific. 597 Multipath applied to diverse paths rather than parallel links 598 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 599 even BGP, and equal cost LSP, as described in Appendix B.4. Various 600 mutilpath techniques have strengths and weaknesses. 602 The term composite link is more general than terms such as link 603 aggregate which is generally considered to be specific to Ethernet 604 and its use here is consistent with the broad definition in 605 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 606 refers to techniques which only solve the problem of large 607 aggregations of traffic, without addressing the other requirements 608 outlined in this document. 610 B.1. Common Multpath Load Spliting Techniques 612 Identical load balancing techniqes are used for multipath both over 613 parallel links and over diverse paths. 615 Large aggregates of IP traffic do not provide explicit signaling to 616 indicate the expected traffic loads. Large aggregates of MPLS 617 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 618 are signaled using RSVP-TE extensions do provide explicit signaling 619 which includes the expected traffic load for the aggregate. LSP 620 which are signaled using LDP do not provide an expected traffic load. 622 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 623 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 624 payload, there is no signaling associated with these inner LSP. 625 Therefore even when using RSVP-TE signaling there may be insufficient 626 information provided by signaling to adequately distribute load 627 across a composite link. 629 Generally a set of label stack entries that is unique across the 630 ordered set of label numbers can safely be assumed to contain a group 631 of flows. The reordering of traffic can therefore be considered to 632 be acceptable unless reordering occurs within traffic containing a 633 common unique set of label stack entries. Existing load splitting 634 techniques take advantage of this property in addition to looking 635 beyond the bottom of the label stack and determining if the payload 636 is IPv4 or IPv6 to load balance traffic accordingly. 638 MPLS-TP OAM violates the assumption that it is safe to reorder 639 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 640 existing multipth techniques must be modified. Such modifications 641 are outside the scope of this document. 643 For example a large aggregate of IP traffic may be subdivided into a 644 large number of groups of flows using a hash on the IP source and 645 destination addresses. This is as described in [RFC2475] and 646 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 647 can be performed on the set of labels in the label stack. These 648 techniques are both examples of means to subdivide traffic into 649 groups of flows for the purpose of load balancing traffic across 650 aggregated link capacity. The means of identifying a flow should not 651 be confused with the definition of a flow. 653 Discussion of whether a hash based approach provides a sufficiently 654 even load balance using any particular hashing algorithm or method of 655 distributing traffic across a set of component links is outside of 656 the scope of this document. 658 The current load balancing techniques are referenced in [RFC4385] and 659 [RFC4928]. The use of three hash based approaches are described in 660 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 661 described in [I-D.ietf-pwe3-fat-pw]. The use of hash based 662 approaches is mentioned as an example of an existing set of 663 techniques to distribute traffic over a set of component links. 664 Other techniques are not precluded. 666 B.2. Simple and Adaptive Load Balancing Multipath 668 Simple multipath generally relies on the mathematical probability 669 that given a very large number of small microflows, these microflows 670 will tend to be distributed evenly across a hash space. A common 671 simple multipath implementation assumes that all members (component 672 links) are of equal capacity and perform a modulo operation across 673 the hashed value. An alternate simple multipath technique uses a 674 table generally with a power of two size, and distributes the table 675 entries proportionally among members according to the capacity of 676 each member. 678 Simple load balancing works well if there are a very large number of 679 small microflows (i.e., microflow rate is much less than component 680 link capacity). However, the case where there are even a few large 681 microflows is not handled well by simple load balancing. 683 An adaptive multipath technique is one where the traffic bound to 684 each member (component link) is measured and the load split is 685 adjusted accordingly. As long as the adjustment is done within a 686 single network element, then no protocol extensions are required and 687 there are no interoperability issues. 689 Note that if the load balancing algorithm and/or its parameters is 690 adjusted, then packets in some flows may be delivered out of 691 sequence. 693 B.3. Traffic Split over Parallel Links 695 The load spliting techniques defined in Appendix B.1 and Appendix B.2 696 are both used in splitting traffic over parallel links between the 697 same pair of nodes. The best known technique, though far from being 698 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 699 technique had been applied much earlier using OSPF or ISIS Equal Cost 700 MultiPath (ECMP) over parallel links between the same nodes. 701 Multilink PPP [RFC1717] uses a technique that provides inverse 702 multiplexing, however a number of vendors had provided proprietary 703 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 704 Link Aggregation but are no longer used. 706 Link bundling [RFC4201] provides yet another means of handling 707 parallel LSP. RFC4201 explicitly allow a special value of all ones 708 to indicate a split across all members of the bundle. 710 B.4. Traffic Split over Multiple Paths 712 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 713 traffic split over multiple paths that may traverse intermediate 714 nodes. ECMP is often incorrectly equated to only this case, and 715 multipath over multiple diverse paths is often incorrectly equated to 716 ECMP. 718 Many implementations are able to create more than one LSP between a 719 pair of nodes, where these LSP are routed diversely to better make 720 use of available capacity. The load on these LSP can be distributed 721 proportionally to the reserved bandwidth of the LSP. These multiple 722 LSP may be advertised as a single PSC FA and any LSP making use of 723 the FA may be split over these multiple LSP. 725 Link bundling [RFC4201] component links may themselves be LSP. When 726 this technique is used, any LSP which specifies the link bundle may 727 be split across the multiple paths of the LSP that comprise the 728 bundle. 730 Appendix C. ITU-T G.800 Composite Link Definitions and Terminology 732 Composite Link: 733 Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link 734 in terms of three cases, of which the following two are relevant 735 (the one describing inverse (TDM) multiplexing does not apply). 736 Note that these case definitions are taken verbatim from section 737 6.9, "Layer Relationships". 739 Case 1: "Multiple parallel links between the same subnetworks 740 can be bundled together into a single composite link. Each 741 component of the composite link is independent in the sense 742 that each component link is supported by a separate server 743 layer trail. The composite link conveys communication 744 information using different server layer trails thus the 745 sequence of symbols crossing this link may not be preserved. 746 This is illustrated in Figure 14." 748 Case 3: "A link can also be constructed by a concatenation of 749 component links and configured channel forwarding 750 relationships. The forwarding relationships must have a 1:1 751 correspondence to the link connections that will be provided 752 by the client link. In this case, it is not possible to 753 fully infer the status of the link by observing the server 754 layer trails visible at the ends of the link. This is 755 illustrated in Figure 16." 757 Subnetwork: A set of one or more nodes (i.e., LER or LSR) and links. 758 As a special case it can represent a site comprised of multiple 759 nodes. 761 Forwarding Relationship: Configured forwarding between ports on a 762 subnetwork. It may be connectionless (e.g., IP, not considered 763 in this draft), or connection oriented (e.g., MPLS signaled or 764 configured). 766 Component Link: A topolological relationship between subnetworks 767 (i.e., a connection between nodes), which may be a wavelength, 768 circuit, virtual circuit or an MPLS LSP. 770 Authors' Addresses 772 Curtis Villamizar (editor) 773 Infinera Corporation 774 169 W. Java Drive 775 Sunnyvale, CA 94089 777 Email: cvillamizar@infinera.com 779 Dave McDysan (editor) 780 Verizon 781 22001 Loudoun County PKWY 782 Ashburn, VA 20147 784 Email: dave.mcdysan@verizon.com 786 So Ning 787 Verizon 788 2400 N. Glenville Ave. 789 Richardson, TX 75082 791 Phone: +1 972-729-7905 792 Email: ning.so@verizonbusiness.com 793 Andrew Malis 794 Verizon 795 117 West St. 796 Waltham, MA 02451 798 Phone: +1 781-466-2362 799 Email: andrew.g.malis@verizon.com 801 Lucy Yong 802 Huawei USA 803 1700 Alma Dr. Suite 500 804 Plano, TX 75075 806 Phone: +1 469-229-5387 807 Email: lucyyong@huawei.com