idnits 2.17.1 draft-ietf-rtgwg-cl-requirement-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 508 has weird spacing: '... packet trans...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: FR#5 Any automatic LSP routing and/or load balancing solutions MUST not oscillate such that performance observed by users changes such that an NPO is violated. Since oscillation may cause reordering, there MUST be means to control the frequency of changing the component link over which a flow is placed. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: DR#8 When a worst case failure scenario occurs, the number of RSVP-TE LSPs to be resignaled will cause a period of unavailability as perceived by users. The resignaling time of the solution MUST meet the NPO objective for the duration of unavailability. The resignaling time of the solution MUST not increase significantly as compared with current methods. -- The document date (October 11, 2010) is 4947 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'I-D.ietf-pwe3-fat-pw' is mentioned on line 721, but not defined == Missing Reference: 'IEEE-802.1AX' is mentioned on line 758, but not defined == Missing Reference: 'ITU-T.Y.1540' is mentioned on line 601, but not defined == Missing Reference: 'ITU-T.Y.1541' is mentioned on line 559, but not defined == Missing Reference: 'RFC1717' is mentioned on line 761, but not defined ** Obsolete undefined reference: RFC 1717 (Obsoleted by RFC 1990) == Missing Reference: 'RFC2475' is mentioned on line 705, but not defined == Missing Reference: 'RFC2615' is mentioned on line 763, but not defined == Missing Reference: 'RFC2991' is mentioned on line 720, but not defined == Missing Reference: 'RFC2992' is mentioned on line 720, but not defined == Missing Reference: 'RFC3260' is mentioned on line 706, but not defined == Missing Reference: 'RFC4201' is mentioned on line 785, but not defined == Missing Reference: 'RFC4301' is mentioned on line 625, but not defined == Missing Reference: 'RFC4385' is mentioned on line 718, but not defined == Missing Reference: 'RFC4928' is mentioned on line 719, but not defined == Unused Reference: 'RFC2702' is defined on line 446, but no explicit reference was found in the text == Unused Reference: 'RFC4665' is defined on line 471, but no explicit reference was found in the text == Unused Reference: 'RFC5254' is defined on line 488, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-l2vpn-vpms-frmwk-requirements-03 Summary: 1 error (**), 0 flaws (~~), 22 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTGWG C. Villamizar, Ed. 3 Internet-Draft Infinera Corporation 4 Intended status: Informational D. McDysan, Ed. 5 Expires: April 14, 2011 S. Ning 6 A. Malis 7 Verizon 8 L. Yong 9 Huawei USA 10 October 11, 2010 12 Requirements for MPLS Over a Composite Link 13 draft-ietf-rtgwg-cl-requirement-02 15 Abstract 17 There is often a need to provide large aggregates of bandwidth that 18 are best provided using parallel links between routers or MPLS LSR. 19 In core networks there is often no alternative since the aggregate 20 capacities of core networks today far exceed the capacity of a single 21 physical link or single packet processing element. 23 The presence of parallel links, with each link potentially comprised 24 of multiple layers has resulted in additional requirements. Certain 25 services may benefit from being restricted to a subset of the 26 component links or a specific component link, where component link 27 characteristics, such as latency, differ. Certain services require 28 that an LSP be treated as atomic and avoid reordering. Other 29 services will continue to require only that reordering not occur 30 within a microflow as is current practice. 32 Current practice related to multipath is described briefly in an 33 appendix. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on April 14, 2011. 51 Copyright Notice 53 Copyright (c) 2010 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 70 2. Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 4. Network Operator Functional Requirements . . . . . . . . . . . 5 73 4.1. Availability, Stability and Transient Response . . . . . . 5 74 4.2. Component Links Provided by Lower Layer Networks . . . . . 6 75 4.3. Parallel Component Links with Different Characteristics . 7 76 5. Derived Requirements . . . . . . . . . . . . . . . . . . . . . 9 77 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 78 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 79 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 80 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 81 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 82 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 83 9.3. Appendix References . . . . . . . . . . . . . . . . . . . 12 84 Appendix A. More Details on Existing Network Operator 85 Practices and Protocol Usage . . . . . . . . . . . . 13 86 Appendix B. Existing Multipath Standards and Techniques . . . . . 15 87 B.1. Common Multpath Load Spliting Techniques . . . . . . . . . 16 88 B.2. Simple and Adaptive Load Balancing Multipath . . . . . . . 17 89 B.3. Traffic Split over Parallel Links . . . . . . . . . . . . 18 90 B.4. Traffic Split over Multiple Paths . . . . . . . . . . . . 18 91 Appendix C. ITU-T G.800 Composite Link Definitions and 92 Terminology . . . . . . . . . . . . . . . . . . . . . 18 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 95 1. Introduction 97 The purpose of this document is to describe why network operators 98 require certain functions in order to solve certain business problems 99 (Section 2). The intent is to first describe why things need to be 100 done in terms of functional requirements that are as independent as 101 possible of protocol specifications (Section 4). For certain 102 functional requirements this document describes a set of derived 103 protocol requirements (Section 5). Three appendices provide 104 supporting details as a summary of existing/prior operator approaches 105 (Appendix A), a summary of implementation techniques and relevant 106 protocol standards (Appendix B), and a summary of G.800 terminology 107 used to define a composite link (Appendix C). 109 1.1. Requirements Language 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in RFC 2119 [RFC2119]. 115 2. Assumptions 117 The services supported include L3VPN RFC 4364 [RFC4364], RFC 4797 118 [RFC4797]L2VPN RFC 4664 [RFC4664] (VPWS, VPLS (RFC 4761 [RFC4761], 119 RFC 4762 [RFC4762]) and VPMS VPMS Framework 120 [I-D.ietf-l2vpn-vpms-frmwk-requirements]), Internet traffic 121 encapsulated by at least one MPLS label, and dynamically signaled 122 MPLS or MPLS-TP LSPs and pseudowires. The MPLS LSPs supporting these 123 services may be pt-pt, pt-mpt, or mpt-mpt. 125 The locations in a network where these requirements apply are a Label 126 Edge Router (LER) or a Label Switch Router (LSR) as defined in RFC 127 3031 [RFC3031]. 129 The IP DSCP cannot be used for flow identification since L3VPN 130 requires Diffserv transparency (see RFC 4031 5.5.2 [RFC4031]), and in 131 general network operators do not rely on the DSCP of Internet 132 packets. 134 3. Definitions 136 ITU-T G.800 Based Composite and Component Link Definitions: 137 Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite and 138 component links as summarized in Appendix C. The following 139 definitions for composite and component links are derived from 140 and intended to be consistent with the cited ITU-T G.800 141 terminology. 143 Composite Link: A composite link is a logical link composed of a 144 set of parallel point-to-point component links, where all 145 links in the set share the same endpoints. A composite link 146 may itself be a component of another composite link, but only 147 a strict hierarchy of links is allowed. 149 Component Link: A point-to-point physical or logical link that 150 preserves ordering in the steady state. A component link may 151 have transient out of order events, but such events must not 152 exceed the network's specific NPO. Examples of a physical 153 link are: Lambda, Ethernet PHY, and OTN. Examples of a 154 logical link are: MPLS LSP, Ethernet VLAN, and MPLS-TP LSP. 156 Flow: A sequence of packets that must be transferred in order. 158 Flow identification: The label stack and other information that 159 uniquely identifies a flow. Other information in flow 160 identification may include an IP header, PW control word, 161 Ethernet MAC address, etc. Note that an LSP may contain one or 162 more Flows or an LSP may be equivalent to a Flow. Flow 163 identification is used to locally select a component link, or a 164 path through the network toward the destination. 166 Network Performance Objective (NPO): Numerical values for 167 performance measures, principally availability, latency, and 168 delay variation. See Appendix A for more details. 170 4. Network Operator Functional Requirements 172 The Functional Requirements in this section are grouped in 173 subsections starting with the highest priority. 175 4.1. Availability, Stability and Transient Response 177 Limiting the period of unavailability in response to failures or 178 transient events is extremely important as well as maintaining 179 stability. The transient period between some service disrupting 180 event and the convergence of the routing and/or signaling protocols 181 MUST occur within a time frame specified by NPO values. Appendix A 182 provides references and a summary of service types requiring a range 183 of restoration times. 185 FR#1 The solution SHALL provide a means to summarize routing 186 advertisements regarding the characteristics of a composite 187 link such that the routing protocol converges within the 188 timeframe needed to meet the network performance objective. 190 FR#2 The solution SHALL ensure that all possible restoration 191 operations happen within the timeframe needed to meet the NPO. 192 The solution may need to specify a means for aggregating 193 signaling to meet this requirement. 195 FR#3 The solution SHALL provide a mechanism to select a path for a 196 flow across a network that contains a number of paths comprised 197 of pairs of nodes connected by composite links in such a way as 198 to automatically distribute the load over the network nodes 199 connected by composite links while meeting all of the other 200 mandatory requirements stated above. The solution SHOULD work 201 in a manner similar to that of current networks without any 202 composite link protocol enhancements when the characteristics 203 of the individual component links are advertised. 205 FR#4 If extensions to existing protocols are specified and/or new 206 protocols are defined, then the solution SHOULD provide a means 207 for a network operator to migrate an existing deployment in a 208 minimally disruptive manner. 210 FR#5 Any automatic LSP routing and/or load balancing solutions MUST 211 not oscillate such that performance observed by users changes 212 such that an NPO is violated. Since oscillation may cause 213 reordering, there MUST be means to control the frequency of 214 changing the component link over which a flow is placed. 216 FR#6 Management and diagnostic protocols MUST be able to operate 217 over composite links. 219 4.2. Component Links Provided by Lower Layer Networks 221 Case 3 as defined in [ITU-T.G.800] involves a component link 222 supporting an MPLS layer network over another lower layer network 223 (e.g., circuit switched or another MPLS network (e.g., MPLS-TP)). 224 The lower layer network may change the latency (and/or other 225 performance parameters) seen by the MPLS layer network. Network 226 Operators have NPOs of which some components are based on performance 227 parameters. Currently, there is no protocol for the lower layer 228 network to inform the higher layer network of a change in a 229 performance parameter. Communication of the latency performance 230 parameter is a very important requirement. Communication of other 231 performance parameters (e.g., delay variation) is desirable. 233 FR#7 In order to support network NPOs and provide acceptable user 234 experience, the solution SHALL specify a protocol means to 235 allow a lower layer server network to communicate latency to 236 the higher layer client network. 238 FR#8 The precision of latency reporting SHOULD be at least 10% of 239 the one way latencies for latency of 1 ms or more. 241 FR#9 The solution SHALL provide a means to limit the latency on a 242 per LSP basis between nodes within a network to meet an NPO 243 target when the path between these nodes contains one or more 244 pairs of nodes connected via a composite link. 246 The NPOs differ across the services, and some services have 247 different NPOs for different QoS classes, for example, one QoS 248 class may have a much larger latency bound than another. 249 Overload can occur which would violate an NPO parameter (e.g., 250 loss) and some remedy to handle this case for a composite link 251 is required. 253 FR#10 If the total demand offered by traffic flows exceeds the 254 capacity of the composite link, the solution SHOULD define a 255 means to cause the LSPs for some traffic flows to move to some 256 other point in the network that is not congested. These 257 "preempted LSPs" may not be restored if there is no 258 uncongested path in the network. 260 4.3. Parallel Component Links with Different Characteristics 262 Corresponding to Case 1 of [ITU-T.G.800], as one means to provide 263 high availability, network operators deploy a topology in the MPLS 264 network using lower layer networks that have a certain degree of 265 diversity at the lower layer(s). Many techniques have been developed 266 to balance the distribution of flows across component links that 267 connect the same pair of nodes (See Appendix B.3). When the path for 268 a flow can be chosen from a set of candidate nodes connected via 269 composite links, other techniques have been developed (See 270 Appendix B.4). 272 FR#11 The solution SHALL measure traffic on a labeled traffic flow 273 and dynamically select the component link on which to place 274 this flow in order to balance the load so that no component 275 link in the composite link between a pair of nodes is 276 overloaded. 278 FR#12 When a traffic flow is moved from one component link to 279 another in the same composite link between a set of nodes (or 280 sites), it MUST be done so in a minimally disruptive manner. 282 When a flow is moved from a current link to a target link with 283 different latency, reordering can occur if the target link 284 latency is less than that of the current or clumping can occur 285 if target link latency is greater than that of the current. 286 Therefore, some flows (e.g., timing distribution, PW circuit 287 emulation) are quite sensitive to these effects, which may be 288 specified in an NPO or are needed to meet a user experience 289 objective (e.g. jitter buffer under/overrun). 291 FR#13 The solution SHALL provide a means to identify flows whose 292 rearrangement frequency needs to be bounded by a configured 293 value. 295 FR#14 The solution SHALL provide a means that communicates whether 296 the flows within an LSP can be split across multiple component 297 links. The solution SHOULD provide a means to indicate the 298 flow identification field(s) which can be used along the flow 299 path which can be used to perform this function. 301 FR#15 The solution SHALL provide a means to indicate that a traffic 302 flow shall select a component link with the minimum latency 303 value. 305 FR#16 The solution SHALL provide a means to indicate that a traffic 306 flow shall select a component link with a maximum acceptable 307 latency value as specified by protocol. 309 FR#17 The solution SHALL provide a means to indicate that a traffic 310 flow shall select a component link with a maximum acceptable 311 delay variation value as specified by protocol. 313 FR#18 The solution SHALL provide a means local to a node that 314 automatically distributes flows across the component links in 315 the composite link such that NPOs are met. 317 FR#19 The solution SHALL provide a means to distribute flows from a 318 single LSP across multiple component links to handle at least 319 the case where the traffic carried in an LSP exceeds that of 320 any component link in the composite link. As defined in 321 section 3, a flow is a sequence of packets that must be 322 transferred on one component link. 324 FR#20 The solution SHOULD support the use case where a composite 325 link itself is a component link for a higher order composite 326 link. For example, a composite link comprised of MPLS-TP bi- 327 directional tunnels viewed as logical links could then be used 328 as a component link in yet another composite link that 329 connects MPLS routers. 331 5. Derived Requirements 333 This section takes the next step and derives high-level requirements 334 on protocol specification from the functional requirements. 336 DR#1 The solution SHOULD attempt to extend existing protocols 337 wherever possible, developing a new protocol only if this adds 338 a significant set of capabilities. 340 The vast majority of network operators have provisioned L3VPN 341 services over LDP. Many have deployed L2VPN services over LDP 342 as well. TE extensions to IGP and RSVP-TE are viewed as being 343 overly complex by some operators. 345 DR#2 A solution SHOULD extend LDP capabilities to meet functional 346 requirements (without using TE methods as decided in 347 [RFC3468]). 349 DR#3 Coexistence of LDP and RSVP-TE signaled LSPs MUST be supported 350 on a composite link. Other functional requirements should be 351 supported as independently of signaling protocol as possible. 353 DR#4 When the nodes connected via a composite link are in the same 354 MPLS network topology, the solution MAY define extensions to 355 the IGP. 357 DR#5 When the nodes are connected via a composite link are in 358 different MPLS network topologies, the solution SHALL NOT rely 359 on extensions to the IGP. 361 DR#6 The Solution SHALL support composite link IGP advertisement 362 that results in convergence time better than that of 363 advertising the individual component links. The solution SHALL 364 be designed so that it represents the range of capabilities of 365 the individual component links such that functional 366 requirements are met, and also minimizes the frequency of 367 advertisement updates which may cause IGP convergence to occur. 369 One solution approach is to summarize the characteristics of 370 the component links in IGP advertisements; however, the intent 371 of the above requirement is not to specify the form of a 372 solution. Examples of advertisement update triggering events 373 to be considered include: LSP establishment/release, changes in 374 component link characteristics (e.g., latency, up/down state), 375 and/or bandwidth utilization. 377 DR#7 When a worst case failure scenario occurs,the resulting number 378 of links advertised in the IGP causes IGP convergence to occur, 379 causing a period of unavailability as perceived by users. The 380 convergence time of the solution MUST meet the SLA objective 381 for the duration of unavailability. 383 DR#8 When a worst case failure scenario occurs, the number of 384 RSVP-TE LSPs to be resignaled will cause a period of 385 unavailability as perceived by users. The resignaling time of 386 the solution MUST meet the NPO objective for the duration of 387 unavailability. The resignaling time of the solution MUST not 388 increase significantly as compared with current methods. 390 6. Acknowledgements 392 Frederic Jounay of France Telecom and Yuji Kamite of NTT 393 Communications Corporation co-authored a version of this document. 395 A rewrite of this document occurred after the IETF77 meeting. 396 Dimitri Papadimitriou, Lou Berger, Tony Li, the WG chairs John Scuder 397 and Alex Zinin, and others provided valuable guidance prior to and at 398 the IETF77 RTGWG meeting. 400 Tony Li and John Drake have made numerous valuable comments on the 401 RTGWG mailing list that are reflected in versions following the 402 IETF77 meeting. 404 7. IANA Considerations 406 This memo includes no request to IANA. 408 8. Security Considerations 410 This document specifies a set of requirements. The requirements 411 themselves do not pose a security threat. If these requirements are 412 met using MPLS signaling as commonly practiced today with 413 authenticated but unencrypted OSPF-TE, ISIS-TE, and RSVP-TE or LDP, 414 then the requirement to provide additional information in this 415 communication presents additional information that could conceivably 416 be gathered in a man-in-the-middle confidentiality breach. Such an 417 attack would require a capability to monitor this signaling either 418 through a provider breach or access to provider physical transmission 419 infrastructure. A provider breach already poses a threat of numerous 420 tpes of attacks which are of far more serious consequence. Encrption 421 of the signaling can prevent or render more difficult any 422 confidentiality breach that otherwise might occur by means of access 423 to provider physical transmission infrastructure. 425 9. References 427 9.1. Normative References 429 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 430 Requirement Levels", BCP 14, RFC 2119, March 1997. 432 9.2. Informative References 434 [I-D.ietf-l2vpn-vpms-frmwk-requirements] 435 Kamite, Y., JOUNAY, F., Niven-Jenkins, B., Brungard, D., 436 and L. Jin, "Framework and Requirements for Virtual 437 Private Multicast Service (VPMS)", 438 draft-ietf-l2vpn-vpms-frmwk-requirements-03 (work in 439 progress), July 2010. 441 [ITU-T.G.800] 442 ITU-T, "Unified functional architecture of transport 443 networks", 2007, . 446 [RFC2702] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M., and J. 447 McManus, "Requirements for Traffic Engineering Over MPLS", 448 RFC 2702, September 1999. 450 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 451 Label Switching Architecture", RFC 3031, January 2001. 453 [RFC3468] Andersson, L. and G. Swallow, "The Multiprotocol Label 454 Switching (MPLS) Working Group decision on MPLS signaling 455 protocols", RFC 3468, February 2003. 457 [RFC3809] Nagarajan, A., "Generic Requirements for Provider 458 Provisioned Virtual Private Networks (PPVPN)", RFC 3809, 459 June 2004. 461 [RFC4031] Carugi, M. and D. McDysan, "Service Requirements for Layer 462 3 Provider Provisioned Virtual Private Networks (PPVPNs)", 463 RFC 4031, April 2005. 465 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 466 Networks (VPNs)", RFC 4364, February 2006. 468 [RFC4664] Andersson, L. and E. Rosen, "Framework for Layer 2 Virtual 469 Private Networks (L2VPNs)", RFC 4664, September 2006. 471 [RFC4665] Augustyn, W. and Y. Serbest, "Service Requirements for 472 Layer 2 Provider-Provisioned Virtual Private Networks", 473 RFC 4665, September 2006. 475 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 476 (VPLS) Using BGP for Auto-Discovery and Signaling", 477 RFC 4761, January 2007. 479 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 480 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 481 RFC 4762, January 2007. 483 [RFC4797] Rekhter, Y., Bonica, R., and E. Rosen, "Use of Provider 484 Edge to Provider Edge (PE-PE) Generic Routing 485 Encapsulation (GRE) or IP in BGP/MPLS IP Virtual Private 486 Networks", RFC 4797, January 2007. 488 [RFC5254] Bitar, N., Bocci, M., and L. Martini, "Requirements for 489 Multi-Segment Pseudowire Emulation Edge-to-Edge (PWE3)", 490 RFC 5254, October 2008. 492 9.3. Appendix References 494 [I-D.ietf-pwe3-fat-pw] 495 Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 496 J., and S. Amante, "Flow Aware Transport of Pseudowires 497 over an MPLS PSN", draft-ietf-pwe3-fat-pw-03 (work in 498 progress), January 2010. 500 [IEEE-802.1AX] 501 IEEE Standards Association, "IEEE Std 802.1AX-2008 IEEE 502 Standard for Local and Metropolitan Area Networks - Link 503 Aggregation", 2006, . 506 [ITU-T.Y.1540] 507 ITU-T, "Internet protocol data communication service - IP 508 packet transfer and availability performance parameters", 509 2007, . 511 [ITU-T.Y.1541] 512 ITU-T, "Network performance objectives for IP-based 513 services", 2006, . 515 [RFC1717] Sklower, K., Lloyd, B., McGregor, G., and D. Carr, "The 516 PPP Multilink Protocol (MP)", RFC 1717, November 1994. 518 [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., 519 and W. Weiss, "An Architecture for Differentiated 520 Services", RFC 2475, December 1998. 522 [RFC2615] Malis, A. and W. Simpson, "PPP over SONET/SDH", RFC 2615, 523 June 1999. 525 [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and 526 Multicast Next-Hop Selection", RFC 2991, November 2000. 528 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 529 Algorithm", RFC 2992, November 2000. 531 [RFC3260] Grossman, D., "New Terminology and Clarifications for 532 Diffserv", RFC 3260, April 2002. 534 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 535 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 537 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 538 Internet Protocol", RFC 4301, December 2005. 540 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 541 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 542 Use over an MPLS PSN", RFC 4385, February 2006. 544 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 545 Cost Multipath Treatment in MPLS Networks", BCP 128, 546 RFC 4928, June 2007. 548 Appendix A. More Details on Existing Network Operator Practices and 549 Protocol Usage 551 Often, network operators have a contractual Service Level Agreement 552 (SLA) with customers for services that are comprised of numerical 553 values for performance measures, principally availability, latency, 554 delay variation. Additionally, network operators may have Service 555 Level Sepcification (SLS) that is for internal use by the operator. 556 See [ITU-T.Y.1540], [ITU-T.Y.1541], RFC3809, Section 4.9 [RFC3809] 557 for examples of the form of such SLA and SLS specifications. In this 558 document we use the term Network Performance Objective (NPO) as 559 defined in section 5 of [ITU-T.Y.1541] since the SLA and SLS measures 560 have network operator and service specific implications. Note that 561 the numerical NPO values of Y.1540 and Y.1541 span multiple networks 562 and may be looser than network operator SLA or SLS objectives. 563 Applications and acceptable user experience have an important 564 relationship to these performance parameters. 566 Consider latency as an example. In some cases, minimizing latency 567 relates directly to the best customer experience (e.g., in TCP closer 568 is faster). I other cases, user experience is relatively insensitive 569 to latency, up to a specific limit at which point user perception of 570 quality degrades significantly (e.g., interactive human voice and 571 multimedia conferencing). A number of NPOs have. a bound on point- 572 point latency, and as long as this bound is met, the NPO is met -- 573 decreasing the latency is not necessary. In some NPOs, if the 574 specified latency is not met, the user considers the service as 575 unavailable. An unprotected LSP can be manually provisioned on a set 576 of to meet this type of NPO, but this lowers availability since an 577 alternate route that meets the latency NPO cannot be determined. 579 Historically, when an IP/MPLS network was operated over a lower layer 580 circuit switched network (e.g., SONET rings), a change in latency 581 caused by the lower layer network (e.g., due to a maintenance action 582 or failure) this was not known to the MPLS network. This resulted in 583 latency affecting end user experience, sometimes violating NPOs or 584 resulting in user complaints. 586 A response to this problem was to provision IP/MPLS networks over 587 unprotected circuits and set the metric and/or TE-metric proportional 588 to latency. This resulted in traffic being directed over the least 589 latency path, even if this was not needed to meet an NPO or meet user 590 experience objectives. This results in reduced flexibility and 591 increased cost for network operators. Using lower layer networks to 592 provide restoration and grooming is expected to be more efficient, 593 but the inability to communicate performance parameters, in 594 particular latency, from the lower layer network to the higher layer 595 network is an important problem to be solved before this can be done. 597 Latency NPOs for pt-pt services are often tied closely to geographic 598 locations, while latency for multipoint services may be based upon a 599 worst case within a region. 601 Section 7 of [ITU-T.Y.1540] defines availability for an IP service in 602 terms of loss exceeding a threshold for a period on the order of 5 603 minutes. However, the timeframes for restoration (i.e., as 604 implemented by pre-determined protection, convergence of routing 605 protocols and/or signaling) for services range from on the order of 606 100 ms or less (e.g., for VPWS to emulate classical SDH/SONET 607 protection switching), to several minutes (e.g., to allow BGP to 608 reconverge for L3VPN) and may differ among the set of customers 609 within a single service. 611 The presence of only three Traffic Class (TC) bits (previously known 612 as EXP bits) in the MPLS shim header is limiting when a network 613 operator needs to support QoS classes for multiple services (e.g., 614 L2VPN VPWS, VPLS, L3VPN and Internet), each of which has a set of QoS 615 classes that need to be supported. In some cases one bit is used to 616 indicate conformance to some ingress traffic classification, leaving 617 only two bits for indicating the service QoS classes. The approach 618 that has been taken is to aggregate these QoS classes into similar 619 sets on LER-LSR and LSR-LSR links. 621 Labeled LSPs have been and use of link layer encapsulation have been 622 standardized in order to provide a means to meet these needs. 624 The IP DSCP cannot be used for flow identification since RFC 4301 625 Section 5.5 [RFC4301] requires Diffserv transparency, and in general 626 network operators do not rely on the DSCP of Internet packets. 628 A label is pushed onto Internet packets when they are carried along 629 with L2/L3VPN packets on the same link or lower layer network 630 provides a mean to distinguish between the QoS class for these 631 packets. 633 Operating an MPLS-TE network involves a different paradigm from 634 operating an IGP metric-based LDP signaled MPLS network. The mpt-pt 635 LDP signaled MPLS LSPs occur automatically, and balancing across 636 parallel links occurs if the IGP metrics are set "equally" (with 637 equality a locally definable relation). 639 Traffic is typically comprised of a few large (some very large) flows 640 and many small flows. In some cases, separate LSPs are established 641 for very large flows. This can occur even if the IP header 642 information is inspected by a router, for example an IPsec tunnel 643 that carries a large amount of traffic. An important example of 644 large flows is that of a L2/L3 VPN customer who has an access line 645 bandwdith comparable to a client-client composite link bandwidth -- 646 there could be flows that are on the order of the access line 647 bandwdith. 649 Appendix B. Existing Multipath Standards and Techniques 651 Today the requirement to handle large aggregations of traffic, much 652 larger than a single component link, can be handled by a number of 653 techniques which we will collectively call multipath. Multipath 654 applied to parallel links between the same set of nodes includes 655 Ethernet Link Aggregation [IEEE-802.1AX], link bundling [RFC4201], or 656 other aggregation techniques some of which may be vendor specific. 657 Multipath applied to diverse paths rather than parallel links 658 includes Equal Cost MultiPath (ECMP) as applied to OSPF, ISIS, or 659 even BGP, and equal cost LSP, as described in Appendix B.4. Various 660 mutilpath techniques have strengths and weaknesses. 662 The term composite link is more general than terms such as link 663 aggregate which is generally considered to be specific to Ethernet 664 and its use here is consistent with the broad definition in 665 [ITU-T.G.800]. The term multipath excludes inverse multiplexing and 666 refers to techniques which only solve the problem of large 667 aggregations of traffic, without addressing the other requirements 668 outlined in this document. 670 B.1. Common Multpath Load Spliting Techniques 672 Identical load balancing techniqes are used for multipath both over 673 parallel links and over diverse paths. 675 Large aggregates of IP traffic do not provide explicit signaling to 676 indicate the expected traffic loads. Large aggregates of MPLS 677 traffic are carried in MPLS tunnels supported by MPLS LSP. LSP which 678 are signaled using RSVP-TE extensions do provide explicit signaling 679 which includes the expected traffic load for the aggregate. LSP 680 which are signaled using LDP do not provide an expected traffic load. 682 MPLS LSP may contain other MPLS LSP arranged hierarchically. When an 683 MPLS LSR serves as a midpoint LSR in an LSP carrying other LSP as 684 payload, there is no signaling associated with these inner LSP. 685 Therefore even when using RSVP-TE signaling there may be insufficient 686 information provided by signaling to adequately distribute load 687 across a composite link. 689 Generally a set of label stack entries that is unique across the 690 ordered set of label numbers can safely be assumed to contain a group 691 of flows. The reordering of traffic can therefore be considered to 692 be acceptable unless reordering occurs within traffic containing a 693 common unique set of label stack entries. Existing load splitting 694 techniques take advantage of this property in addition to looking 695 beyond the bottom of the label stack and determining if the payload 696 is IPv4 or IPv6 to load balance traffic accordingly. 698 MPLS-TP OAM violates the assumption that it is safe to reorder 699 traffic within an LSP. If MPLS-TP OAM is to be accommodated, then 700 existing multipth techniques must be modified. Such modifications 701 are outside the scope of this document. 703 For example a large aggregate of IP traffic may be subdivided into a 704 large number of groups of flows using a hash on the IP source and 705 destination addresses. This is as described in [RFC2475] and 706 clarified in [RFC3260]. For MPLS traffic carrying IP, a similar hash 707 can be performed on the set of labels in the label stack. These 708 techniques are both examples of means to subdivide traffic into 709 groups of flows for the purpose of load balancing traffic across 710 aggregated link capacity. The means of identifying a flow should not 711 be confused with the definition of a flow. 713 Discussion of whether a hash based approach provides a sufficiently 714 even load balance using any particular hashing algorithm or method of 715 distributing traffic across a set of component links is outside of 716 the scope of this document. 718 The current load balancing techniques are referenced in [RFC4385] and 719 [RFC4928]. The use of three hash based approaches are described in 720 [RFC2991] and [RFC2992]. A mechanism to identify flows within PW is 721 described in [I-D.ietf-pwe3-fat-pw]. The use of hash based 722 approaches is mentioned as an example of an existing set of 723 techniques to distribute traffic over a set of component links. 724 Other techniques are not precluded. 726 B.2. Simple and Adaptive Load Balancing Multipath 728 Simple multipath generally relies on the mathematical probability 729 that given a very large number of small microflows, these microflows 730 will tend to be distributed evenly across a hash space. A common 731 simple multipath implementation assumes that all members (component 732 links) are of equal capacity and perform a modulo operation across 733 the hashed value. An alternate simple multipath technique uses a 734 table generally with a power of two size, and distributes the table 735 entries proportionally among members according to the capacity of 736 each member. 738 Simple load balancing works well if there are a very large number of 739 small microflows (i.e., microflow rate is much less than component 740 link capacity). However, the case where there are even a few large 741 microflows is not handled well by simple load balancing. 743 An adaptive multipath technique is one where the traffic bound to 744 each member (component link) is measured and the load split is 745 adjusted accordingly. As long as the adjustment is done within a 746 single network element, then no protocol extensions are required and 747 there are no interoperability issues. 749 Note that if the load balancing algorithm and/or its parameters is 750 adjusted, then packets in some flows may be delivered out of 751 sequence. 753 B.3. Traffic Split over Parallel Links 755 The load spliting techniques defined in Appendix B.1 and Appendix B.2 756 are both used in splitting traffic over parallel links between the 757 same pair of nodes. The best known technique, though far from being 758 the first, is Ethernet Link Aggregation [IEEE-802.1AX]. This same 759 technique had been applied much earlier using OSPF or ISIS Equal Cost 760 MultiPath (ECMP) over parallel links between the same nodes. 761 Multilink PPP [RFC1717] uses a technique that provides inverse 762 multiplexing, however a number of vendors had provided proprietary 763 extensions to PPP over SONET/SDH [RFC2615] that predated Ethernet 764 Link Aggregation but are no longer used. 766 Link bundling [RFC4201] provides yet another means of handling 767 parallel LSP. RFC4201 explicitly allow a special value of all ones 768 to indicate a split across all members of the bundle. 770 B.4. Traffic Split over Multiple Paths 772 OSPF or ISIS Equal Cost MultiPath (ECMP) is a well known form of 773 traffic split over multiple paths that may traverse intermediate 774 nodes. ECMP is often incorrectly equated to only this case, and 775 multipath over multiple diverse paths is often incorrectly equated to 776 ECMP. 778 Many implementations are able to create more than one LSP between a 779 pair of nodes, where these LSP are routed diversely to better make 780 use of available capacity. The load on these LSP can be distributed 781 proportionally to the reserved bandwidth of the LSP. These multiple 782 LSP may be advertised as a single PSC FA and any LSP making use of 783 the FA may be split over these multiple LSP. 785 Link bundling [RFC4201] component links may themselves be LSP. When 786 this technique is used, any LSP which specifies the link bundle may 787 be split across the multiple paths of the LSP that comprise the 788 bundle. 790 Appendix C. ITU-T G.800 Composite Link Definitions and Terminology 791 Composite Link: 792 Section 6.9.2 of ITU-T-G.800 [ITU-T.G.800] defines composite link 793 in terms of three cases, of which the following two are relevant 794 (the one describing inverse (TDM) multiplexing does not apply). 795 Note that these case definitions are taken verbatim from section 796 6.9, "Layer Relationships". 798 Case 1: "Multiple parallel links between the same subnetworks 799 can be bundled together into a single composite link. Each 800 component of the composite link is independent in the sense 801 that each component link is supported by a separate server 802 layer trail. The composite link conveys communication 803 information using different server layer trails thus the 804 sequence of symbols crossing this link may not be preserved. 805 This is illustrated in Figure 14." 807 Case 3: "A link can also be constructed by a concatenation of 808 component links and configured channel forwarding 809 relationships. The forwarding relationships must have a 1:1 810 correspondence to the link connections that will be provided 811 by the client link. In this case, it is not possible to 812 fully infer the status of the link by observing the server 813 layer trails visible at the ends of the link. This is 814 illustrated in Figure 16." 816 Subnetwork: A set of one or more nodes (i.e., LER or LSR) and links. 817 As a special case it can represent a site comprised of multiple 818 nodes. 820 Forwarding Relationship: Configured forwarding between ports on a 821 subnetwork. It may be connectionless (e.g., IP, not considered 822 in this draft), or connection oriented (e.g., MPLS signaled or 823 configured). 825 Component Link: A topolological relationship between subnetworks 826 (i.e., a connection between nodes), which may be a wavelength, 827 circuit, virtual circuit or an MPLS LSP. 829 Authors' Addresses 831 Curtis Villamizar (editor) 832 Infinera Corporation 833 169 W. Java Drive 834 Sunnyvale, CA 94089 836 Email: cvillamizar@infinera.com 837 Dave McDysan (editor) 838 Verizon 839 22001 Loudoun County PKWY 840 Ashburn, VA 20147 842 Email: dave.mcdysan@verizon.com 844 So Ning 845 Verizon 846 2400 N. Glenville Ave. 847 Richardson, TX 75082 849 Phone: +1 972-729-7905 850 Email: ning.so@verizonbusiness.com 852 Andrew Malis 853 Verizon 854 117 West St. 855 Waltham, MA 02451 857 Phone: +1 781-466-2362 858 Email: andrew.g.malis@verizon.com 860 Lucy Yong 861 Huawei USA 862 1700 Alma Dr. Suite 500 863 Plano, TX 75075 865 Phone: +1 469-229-5387 866 Email: lucyyong@huawei.com