idnits 2.17.1 draft-ietf-mpls-entropy-label-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5036, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC3031, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3031, updated by this document, for RFC5378 checks: 1998-03-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 9, 2012) is 4370 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'L' is mentioned on line 702, but not defined == Missing Reference: 'E' is mentioned on line 702, but not defined == Missing Reference: 'TL4' is mentioned on line 796, but not defined -- Looks like a reference, but probably isn't: '1' on line 821 == Missing Reference: 'TL3' is mentioned on line 797, but not defined == Missing Reference: 'TL1' is mentioned on line 799, but not defined == Missing Reference: 'TL0' is mentioned on line 753, but not defined -- Looks like a reference, but probably isn't: '3' on line 823 == Missing Reference: 'AL' is mentioned on line 801, but not defined == Missing Reference: 'L4' is mentioned on line 821, but not defined == Missing Reference: 'L3' is mentioned on line 821, but not defined == Missing Reference: 'Rn' is mentioned on line 822, but not defined -- Looks like a reference, but probably isn't: '0' on line 823 == Unused Reference: 'RFC4364' is defined on line 944, but no explicit reference was found in the text == Unused Reference: 'RFC4761' is defined on line 955, but no explicit reference was found in the text == Unused Reference: 'RFC4762' is defined on line 959, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) -- Obsolete informational reference (is this intentional?): RFC 4447 (Obsoleted by RFC 8077) Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft J. Drake 4 Updates: 3031, 5036 (if approved) Juniper Networks 5 Intended status: Standards Track S. Amante 6 Expires: November 10, 2012 Level 3 Communications, LLC 7 W. Henderickx 8 Alcatel-Lucent 9 L. Yong 10 Huawei USA 11 May 9, 2012 13 The Use of Entropy Labels in MPLS Forwarding 14 draft-ietf-mpls-entropy-label-03 16 Abstract 18 Load balancing is a powerful tool for engineering traffic across a 19 network. This memo suggests ways of improving load balancing across 20 MPLS networks using the concept of "entropy labels". It defines the 21 concept, describes why entropy labels are useful, enumerates 22 properties of entropy labels that allow maximal benefit, and shows 23 how they can be signaled and used for various applications. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 10, 2012. 42 Copyright Notice 44 Copyright (c) 2012 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Conventions used . . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 6 62 2. Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3. Entropy Labels and Their Structure . . . . . . . . . . . . . . 8 64 4. Data Plane Processing of Entropy Labels . . . . . . . . . . . 9 65 4.1. Egress LSR . . . . . . . . . . . . . . . . . . . . . . . . 9 66 4.2. Ingress LSR . . . . . . . . . . . . . . . . . . . . . . . 10 67 4.3. Transit LSR . . . . . . . . . . . . . . . . . . . . . . . 11 68 4.4. Penultimate Hop LSR . . . . . . . . . . . . . . . . . . . 11 69 5. Signaling for Entropy Labels . . . . . . . . . . . . . . . . . 11 70 5.1. LDP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 71 5.1.1. Processing the ELC TLV . . . . . . . . . . . . . . . . 12 72 5.2. BGP Signaling . . . . . . . . . . . . . . . . . . . . . . 13 73 5.3. RSVP-TE Signaling . . . . . . . . . . . . . . . . . . . . 14 74 5.4. Multicast LSPs and Entropy Labels . . . . . . . . . . . . 14 75 6. Operations, Administration, and Maintenance (OAM) and 76 Entropy Labels . . . . . . . . . . . . . . . . . . . . . . . . 14 77 7. MPLS-TP and Entropy Labels . . . . . . . . . . . . . . . . . . 15 78 8. Entropy Labels in Various Scenarios . . . . . . . . . . . . . 15 79 8.1. LDP Tunnel . . . . . . . . . . . . . . . . . . . . . . . . 16 80 8.2. LDP Over RSVP-TE . . . . . . . . . . . . . . . . . . . . . 18 81 8.3. MPLS Applications . . . . . . . . . . . . . . . . . . . . 18 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 83 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 84 10.1. Reserved Label for ELI . . . . . . . . . . . . . . . . . . 19 85 10.2. LDP Entropy Label Capability TLV . . . . . . . . . . . . . 19 86 10.3. BGP Entropy Label Capability Attribute . . . . . . . . . . 20 87 10.4. RSVP-TE Entropy Label Capability flag . . . . . . . . . . 20 88 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 89 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 12.1. Normative References . . . . . . . . . . . . . . . . . . . 20 91 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 92 Appendix A. Applicability of LDP Entropy Label Capability TLV . . 22 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 95 1. Introduction 97 Load balancing, or multi-pathing, is an attempt to balance traffic 98 across a network by allowing the traffic to use multiple paths. Load 99 balancing has several benefits: it eases capacity planning; it can 100 help absorb traffic surges by spreading them across multiple paths; 101 it allows better resilience by offering alternate paths in the event 102 of a link or node failure. 104 As providers scale their networks, they use several techniques to 105 achieve greater bandwidth between nodes. Two widely used techniques 106 are: Link Aggregation Group (LAG) and Equal-Cost Multi-Path (ECMP). 107 LAG is used to bond together several physical circuits between two 108 adjacent nodes so they appear to higher-layer protocols as a single, 109 higher bandwidth 'virtual' pipe. ECMP is used between two nodes 110 separated by one or more hops, to allow load balancing over several 111 shortest paths in the network. This is typically obtained by 112 arranging IGP metrics such that there are several equal cost paths 113 between source-destination pairs. Both of these techniques may, and 114 often do, co-exist in various parts of a given provider's network, 115 depending on various choices made by the provider. 117 A very important requirement when load balancing is that packets 118 belonging to a given 'flow' must be mapped to the same path, i.e., 119 the same exact sequence of links across the network. This is to 120 avoid jitter, latency and re-ordering issues for the flow. What 121 constitutes a flow varies considerably. A common example of a flow 122 is a TCP session. Other examples are an L2TP session corresponding 123 to a given broadband user, or traffic within an ATM virtual circuit. 125 To meet this requirement, a node uses certain fields, termed 'keys', 126 within a packet's header as input to a load balancing function 127 (typically a hash function) that selects the path for all packets in 128 a given flow. The keys chosen for the load balancing function depend 129 on the packet type; a typical set (for IP packets) is the IP source 130 and destination addresses, the protocol type, and (for TCP and UDP 131 traffic) the source and destination port numbers. An overly 132 conservative choice of fields may lead to many flows mapping to the 133 same hash value (and consequently poorer load balancing); an overly 134 aggressive choice may map a flow to multiple values, potentially 135 violating the above requirement. 137 For MPLS networks, most of the same principles (and benefits) apply. 138 However, finding useful keys in a packet for the purpose of load 139 balancing can be more of a challenge. In many cases, MPLS 140 encapsulation may require fairly deep inspection of packets to find 141 these keys at transit LSRs. 143 One way to eliminate the need for this deep inspection is to have the 144 ingress LSR of an MPLS Label Switched Path extract the appropriate 145 keys from a given packet, input them to its load balancing function, 146 and place the result in an additional label, termed the 'entropy 147 label', as part of the MPLS label stack it pushes onto that packet. 149 The packet's MPLS entire label stack can then be used by transit LSRs 150 to perform load balancing, as the entropy label introduces the right 151 level of "entropy" into the label stack. 153 There are five key reasons why this is beneficial: 155 1. at the ingress LSR, MPLS encapsulation hasn't yet occurred, so 156 deep inspection is not necessary; 158 2. the ingress LSR has more context and information about incoming 159 packets than transit LSRs; 161 3. ingress LSRs usually operate at lower bandwidths than transit 162 LSRs, allowing them to do more work per packet; 164 4. transit LSRs do not need to perform deep packet inspection and 165 can load balance effectively using only a packet's MPLS label 166 stack; and 168 5. transit LSRs, not having the full context that an ingress LSR 169 does, have the hard choice between potentially misinterpreting 170 fields in a packet as valid keys for load balancing (causing 171 packet ordering problems) or adopting a conservative approach 172 (giving rise to sub-optimal load balancing). Entropy labels 173 relieves them of making this choice. 175 This memo describes why entropy labels are needed and defines the 176 properties of entropy labels; in particular how they are generated 177 and received, and the expected behavior of transit LSRs. Finally, it 178 describes in general how signaling works and what needs to be 179 signaled, as well as specifics for the signaling of entropy labels 180 for LDP ([RFC5036]), BGP ([RFC3107]), and RSVP-TE ([RFC3209]). 182 1.1. Conventions used 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in [RFC2119]. 188 The following acronyms are used: 190 BoS: Bottom of Stack 192 CE: Customer Edge device 194 ECMP: Equal Cost Multi-Path 196 EL: Entropy Label 198 ELC: Entropy Label Capability 200 ELI: Entropy Label Indicator 202 FEC: Forwarding Equivalence Class 204 LAG: Link Aggregation Group 206 LER: Label Edge Router 208 LSR: Label Switching Router 210 PE: Provider Edge Router 212 PHP: Penultimate Hop Popping 214 TC: Traffic Class 216 TTL: Time-to-Live 218 UHP: Ultimate Hop Popping 220 VPLS: Virtual Private LAN (Local Area Network) Service 222 VPN: Virtual Private Network 224 The term ingress (or egress) LSR is used interchangeably with ingress 225 (or egress) LER. The term application throughout the text refers to 226 an MPLS application (such as a VPN or VPLS). 228 A label stack (say of three labels) is denoted by , where 229 L1 is the "outermost" label and L3 the innermost (closest to the 230 payload). Packet flows are depicted left to right, and signaling is 231 shown right to left (unless otherwise indicated). 233 The term 'label' is used both for the entire 32-bit label and the 20- 234 bit label field within a label. It should be clear from the context 235 which is meant. 237 1.2. Motivation 239 MPLS is very successful generic forwarding substrate that transports 240 several dozen types of protocols, most notably: IP, PWE3, VPLS and IP 241 VPNs. Within each type of protocol, there typically exist several 242 variants, each with a different set of load balancing keys, e.g., for 243 IP: IPv4, IPv6, IPv6 in IPv4, etc.; for PWE3: Ethernet, ATM, Frame- 244 Relay, etc. There are also several different types of Ethernet over 245 PW encapsulation, ATM over PW encapsulation, etc. as well. Finally, 246 given the popularity of MPLS, it is likely that it will continue to 247 be extended to transport new protocols. 249 Currently, each transit LSR along the path of a given LSP has to try 250 to infer the underlying protocol within an MPLS packet in order to 251 extract appropriate keys for load balancing. Unfortunately, if the 252 transit LSR is unable to infer the MPLS packet's protocol (as is 253 often the case), it will typically use the topmost (or all) MPLS 254 labels in the label stack as keys for the load balancing function. 255 The result may be an extremely inequitable distribution of traffic 256 across equal-cost paths exiting that LSR. This is because MPLS 257 labels are generally fairly coarse-grained forwarding labels that 258 typically describe a next-hop, or provide some of demultiplexing 259 and/or forwarding function, and do not describe the packet's 260 underlying protocol. 262 On the other hand, an ingress LSR (e.g., a PE router) has detailed 263 knowledge of an packet's contents, typically through a priori 264 configuration of the encapsulation(s) that are expected at a given 265 PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.). They also have more 266 flexible forwarding hardware. PE routers need this information and 267 these capabilities to: 269 a) apply the required services for the CE; 271 b) discern the packet's CoS forwarding treatment; 273 c) apply filters to forward or block traffic to/from the CE; 275 d) to forward routing/control traffic to an onboard management 276 processor; and, 278 e) load-balance the traffic on its uplinks to transit LSRs (e.g., 279 P routers). 281 By knowing the expected encapsulation types, an ingress LSR router 282 can apply a more specific set of payload parsing routines to extract 283 the keys appropriate for a given protocol. This allows for 284 significantly improved accuracy in determining the appropriate load 285 balancing behavior for each protocol. 287 If the ingress LSR were to capture the flow information so gathered 288 in a convenient form for downstream transit LSRs, transit LSRs could 289 remain completely oblivious to the contents of each MPLS packet, and 290 use only the captured flow information to perform load balancing. In 291 particular, there will be no reason to duplicate an ingress LSR's 292 complex packet/payload parsing functionality in a transit LSR. This 293 will result in less complex transit LSRs, enabling them to more 294 easily scale to higher forwarding rates, larger port density, lower 295 power consumption, etc. The idea in this memo is to capture this 296 flow information as a label, the so-called entropy label. 298 Ingress LSRs can also adapt more readily to new protocols and extract 299 the appropriate keys to use for load balancing packets of those 300 protocols. This means that deploying new protocols or services in 301 edge devices requires fewer concomitant changes in the core, 302 resulting in higher edge service velocity and at the same time more 303 stable core networks. 305 2. Approaches 307 There are two main approaches to encoding load balancing information 308 in the label stack. The first allocates multiple labels for a 309 particular Forwarding Equivalence Class (FEC). These labels are 310 equivalent in terms of forwarding semantics, but having multiple 311 labels allows flexibility in assigning labels to flows belonging to 312 the same FEC. This approach has the advantage that the label stack 313 has the same depth whether or not one uses label-based load 314 balancing; and so, consequently, there is no change to forwarding 315 operations on transit and egress LSRs. However, it has a major 316 drawback in that there is a significant increase in both signaling 317 and forwarding state. 319 The other approach encodes the load balancing information as an 320 additional label in the label stack, thus increasing the depth of the 321 label stack by one. With this approach, there is minimal change to 322 signaling state for a FEC; also, there is no change in forwarding 323 operations in transit LSRs, and no increase of forwarding state in 324 any LSR. The only purpose of the additional label is to increase the 325 entropy in the label stack, so this is called an "entropy label". 326 This memo focuses solely on this approach. 328 This latter approach uses upstream generated entropy labels, which 329 may conflict with downstream allocated application labels. There are 330 a few approaches to deal with this: 1) allocate a pair of labels for 331 each FEC, one that must have an entropy label below it, and one that 332 must not; 2) use a label (the "Entropy Label Indicator") to indicate 333 that the next label is an entropy label; and 3) allow entropy labels 334 only where there is no possible confusion. The first doubles control 335 and data plane state in the network; the last is too restrictive. 336 The approach taken here is the second. In making both the above 337 choices, the trade-off is to increase label stack depth rather than 338 control and data plane state in the network. 340 Finally, one may choose to associate ELs with MPLS tunnels (LSPs), or 341 with MPLS applications (e.g., VPNs). (What this entails is described 342 in later sections.) We take the former approach, for the following 343 reasons: 345 1. There are a small number of tunneling protocols for MPLS, but a 346 large and growing number of applications. Defining ELs on a 347 tunnel basis means simpler standards, lower development, 348 interoperability and testing efforts. 350 2. As a consequence, there will be much less churn in the network as 351 new applications (services) are defined and deployed. 353 3. Processing application labels in the data plane is more complex 354 than processing tunnel labels. Thus, it is preferable to burden 355 the latter rather than the former with EL processing. 357 4. Associating ELs with tunnels makes it simpler to deal with 358 hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs. 359 Each layer in the hierarchy can choose independently whether or 360 not they want ELs. 362 The cost of this approach is that ELIs will be mandatory; again, the 363 trade-off is the size of the label stack. To summarize, the net 364 increase in the label stack to use entropy labels is two: one 365 reserved label for the ELI, and the entropy label itself. 367 3. Entropy Labels and Their Structure 369 An entropy label (as used here) is a label: 371 1. that is not used for forwarding; 373 2. that is not signaled; and 375 3. whose only purpose in the label stack is to provide 'entropy' to 376 improve load balancing. 378 Entropy labels are generated by an ingress LSR, based entirely on 379 load balancing information. However, they MUST NOT have values in 380 the reserved label space (0-15) [IANA MPLS Label Values]. To ensure 381 that they are not used inadvertently for forwarding, entropy labels 382 SHOULD have a TTL of 0. The CoS field of an entropy label can be set 383 to any value deemed appropriate. 385 Since entropy labels are generated by an ingress LSR, an egress LSR 386 MUST be able to distinguish unambiguously between entropy labels and 387 application labels. This is accomplished by REQUIRING that the label 388 immediately preceding an entropy label (EL) in the MPLS label stack 389 be an 'entropy label indicator' (ELI). The ELI is a reserved label 390 with value (TBD by IANA). An ELI MUST have 'Bottom of Stack' (BoS) 391 bit = 0 ([RFC3032]). The TTL SHOULD be set to whatever value the 392 label above it in the stack has. The CoS field can be set to any 393 value deemed appropriate; typically, this will be the value in the 394 label above the ELI in the label stack. 396 Entropy labels are useful for pseudowires ([RFC4447]). [RFC6391] 397 explains how entropy labels can be used for RFC 4447-style 398 pseudowires, and thus is complementary to this memo, which focuses on 399 how entropy labels can be used for tunnels, and thus for all other 400 MPLS applications. 402 4. Data Plane Processing of Entropy Labels 404 4.1. Egress LSR 406 Suppose egress LSR Y is capable of processing entropy labels for a 407 tunnel. Y indicates this to all ingresses via signaling (see 408 Section 5). Y MUST be prepared to deal both with packets with an 409 imposed EL and those without; the ELI will distinguish these cases. 410 If a particular ingress chooses not to impose an EL, Y's processing 411 of the received label stack (which might be empty) is as if Y chose 412 not to accept ELs. 414 If an ingress X chooses to impose an EL, then Y will receive a tunnel 415 termination packet with label stack . Y recognizes TL as the label it distributed to its 417 upstreams for the tunnel, and pops it. (Note that TL may be the 418 implicit null label, in which case it doesn't appear in the label 419 stack.) Y then recognizes the ELI and pops two labels: the ELI and 420 the EL. Y then processes the remaining packet header as normal; this 421 may require further processing of tunnel termination, perhaps with 422 further ELI+EL pairs. When processing the final tunnel termination, 423 Y MAY enqueue the packet based on that tunnel TL's or ELI's TC value, 424 and MAY use the tunnel TL's or ELI's TTL to compute the TTL of the 425 remaining packet header. The EL's TTL MUST be ignored. 427 If any ELI processed by Y has BoS bit set, Y MUST discard the packet, 428 and MAY log an error. The EL's BoS bit will indicate whether or not 429 there are more labels in the stack. 431 4.2. Ingress LSR 433 If an egress LSR Y indicates via signaling that it can process ELs on 434 a particular tunnel, an ingress LSR X can choose whether or not to 435 insert ELs for packets going into that tunnel. Y MUST handle both 436 cases. 438 The steps that X performs to insert ELs are as follows: 440 1. On an incoming packet, identify the application to which the 441 packet belongs, and thereby pick the fields to input to the load 442 balancing function; call the output LB. 444 2. Determine the application label AL (if any). Push onto the 445 packet. 447 3. Based on the application, the load balancing output LB and other 448 factors, determine the egress LSR Y, the tunnel to Y, the 449 specific interface to the next hop, and thus the tunnel label TL. 450 Use LB to generate the entropy label EL. 452 4. If, for the chosen tunnel, Y has not indicated that it can 453 process ELs, push onto the packet. If Y has indicated that 454 it can process ELs for the tunnel, push onto the 455 packet. X SHOULD put the same TTL and TC fields for the ELI as 456 it does for TL. The TTL for the EL MUST be zero. The TC for the 457 EL may be any value. 459 5. X then determines whether further tunnel hierarchy is needed; if 460 so, X goes back to step 3, possibly with a new egress Y for the 461 new tunnel. Otherwise, X is done, and sends out the packet. 463 Notes: 465 a. X computes load balancing information and generates the EL based 466 on the incoming application packet, even though the signaling of 467 EL capability is associated with tunnels. 469 b. X MAY insert several entropy labels in the stack (each, of 470 course, preceded by an ELI), potentially one for each 471 hierarchical tunnel, provided that the egress for that tunnel has 472 indicated that it can process ELs for that tunnel. 474 c. X MUST NOT include an entropy label for a given tunnel unless the 475 egress LSR Y has indicated that it can process entropy labels for 476 that tunnel. 478 d. The signaling and use of entropy labels in one direction 479 (signaling from Y to X, and data path from X to Y) is completely 480 independent of the signaling and use of entropy labels in the 481 reverse direction (signaling from X to Y, and data path from Y to 482 X). 484 4.3. Transit LSR 486 Transit LSRs MAY operate with no change in forwarding behavior. The 487 following are suggestions for optimizations that improve load 488 balancing, reduce the amount of packet data processed, and/or enhance 489 backward compatibility. 491 If a transit LSR recognizes the ELI, it MAY choose to load balance 492 solely on the following label (the EL); otherwise, it SHOULD use as 493 much of the whole label stack as feasible as keys for the load 494 balancing function, with the exception that reserved labels MUST NOT 495 be used. 497 Some transit LSRs look beyond the label stack for better load 498 balancing information. This is a simple, backward compatible 499 approach in networks where some ingress LSRs impose ELs and others 500 don't. However, this is of limited incremental value if an EL is 501 indeed present, and requires more packet processing from the LSR. A 502 transit LSR MAY choose to parse the label stack for the presence of 503 the ELI, and look beyond the label stack only if it does not find it, 504 thus retaining the old behavior when needed, yet avoided unnecessary 505 work if not. 507 4.4. Penultimate Hop LSR 509 No change is needed at penultimate hop LSRs. 511 5. Signaling for Entropy Labels 513 An egress LSR Y can signal to ingress LSR(s) its ability to process 514 entropy labels (henceforth called "Entropy Label Capability" or ELC) 515 on a given tunnel. Note that Entropy Label Capability may be 516 asymmetric: if LSRs X and Y are at opposite ends of a tunnel, X may 517 be able to process entropy labels, whereas Y may not. The signaling 518 extensions below allow for this asymmetry. 520 For an illustration of signaling and forwarding with entropy labels, 521 see Section 8. 523 5.1. LDP Signaling 525 A new LDP TLV ([RFC5036]) is defined to signal an egress's ability to 526 process entropy labels. This is called the ELC TLV, and may appear 527 as an Optional Parameter of the Label Mapping Message TLV. 529 The presence of the ELC TLV in a Label Mapping Message indicates to 530 ingress LSRs that the egress LSR can process entropy labels for the 531 associated LDP tunnel. The ELC TLV has Type (TBD by IANA) and Length 532 0. 534 The structure of the ELC TLV is shown below. 536 0 1 2 3 537 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 |U|F| Type (TBD) | Length (0) | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 542 Figure 1: Entropy Label Capability TLV 544 where: 546 U: Unknown bit. This bit MUST be set to 1. If the ELC TLV is not 547 understood by the receiver, then it MUST be ignored. 549 F: Forward bit. This bit MUST be set be set to 1. Since the ELC 550 TLV is going to be propagated hop-by-hop, it should be forwarded 551 even by nodes that may not understand it. 553 Type: Type field. To be assigned by IANA. 555 Length: Length field. This field specifies the total length in 556 octets of the ELC TLV, and is currently defined to be 0. 558 5.1.1. Processing the ELC TLV 560 An LSR that receives a Label Mapping with the ELC TLV but does not 561 understand it MUST propagate it intact to its neighbors and MUST NOT 562 send a notification to the sender (following the meaning of the U- 563 and F-bits). 565 An LSR X may receive multiple Label Mappings for a given FEC F from 566 its neighbors. In its turn, X may advertise a Label Mapping for F to 567 its neighbors. If X understands the ELC TLV, and if any of the 568 advertisements it received for FEC F does not include the ELC TLV, X 569 MUST NOT include the ELC TLV in its own advertisements of F. If all 570 the advertised Mappings for F include the ELC TLV, then X MUST 571 advertise its Mapping for F with the ELC TLV. If any of X's 572 neighbors resends its Mapping, sends a new Mapping or Withdraws a 573 previously advertised Mapping for F, X MUST re-evaluate the status of 574 ELC for FEC F, and, if there is a change, X MUST re-advertise its 575 Mapping for F with the updated status of ELC. 577 5.2. BGP Signaling 579 When BGP [RFC4271] is used for distributing Network Layer 580 Reachability Information (NLRI) as described in, for example, 581 [RFC3107], the BGP UPDATE message may include the ELC attribute as 582 part of the Path Attributes. This is an optional, transitive BGP 583 attribute of type (to be assigned by IANA). The inclusion of this 584 attribute with an NLRI indicates that the advertising BGP router can 585 process entropy labels as an egress LSR for all routes in that NLRI. 587 A BGP speaker S that originates an UPDATE should include the ELC 588 attribute only if both of the following are true: 590 A1: S sets the BGP NEXT_HOP attribute to itself; AND 592 A2: S can process entropy labels. 594 Suppose a BGP speaker T receives an UPDATE U with the ELC attribute. 595 T has two choices. T can simply re-advertise U with the ELC 596 attribute if either of the following is true: 598 B1: T does not change the NEXT_HOP attribute; OR 600 B2: T simply swaps labels without popping the entire label stack and 601 processing the payload below. 603 An example of the use of B1 is Route Reflectors. 605 However, if T changes the NEXT_HOP attribute for U and in the data 606 plane pops the entire label stack to process the payload, T MAY 607 include an ELC attribute for UPDATE U' if both of the following are 608 true: 610 C1: T sets the NEXT_HOP attribute of U' to itself; AND 612 C2: T can process entropy labels. 614 Otherwise, T MUST remove the ELC attribute. 616 5.3. RSVP-TE Signaling 618 Entropy Label support is signaled in RSVP-TE [RFC3209] using the 619 Entropy Label Capability (ELC) flag in the Attribute Flags TLV of the 620 LSP_ATTRIBUTES object [RFC5420]. The presence of the ELC flag in a 621 Path message indicates that the ingress can process entropy labels in 622 the upstream direction; this only makes sense for a bidirectional LSP 623 and MUST be ignored otherwise. The presence of the ELC flag in a 624 Resv message indicates that the egress can process entropy labels in 625 the downstream direction. 627 The bit number for the ELC flag is to be assigned by IANA. 629 5.4. Multicast LSPs and Entropy Labels 631 Multicast LSPs [RFC4875], [RFC6388] typically do not use ECMP for 632 load balancing, as the combination of replication and multipathing 633 can lead to duplicate traffic delivery. However, these LSPs can 634 traverse bundled links [RFC4201] and LAGs. In both these cases, load 635 balancing is useful, and hence entropy labels can be of value for 636 multicast LSPs. 638 The methodology defined for entropy labels here will be used for 639 multicast LSPs; however, the details of signaling and processing ELs 640 for multicast LSPs will be specified in a companion document. 642 6. Operations, Administration, and Maintenance (OAM) and Entropy Labels 644 Generally OAM comprises a set of functions operating in the data 645 plane to allow a network operator to monitor its network 646 infrastructure and to implement mechanisms in order to enhance the 647 general behavior and the level of performance of its network, e.g., 648 the efficient and automatic detection, localization, diagnosis and 649 handling of defects. 651 Currently defined OAM mechanisms for MPLS include LSP Ping/Traceroute 652 [RFC4379] and Bidirectional Failure Detection (BFD) for MPLS 653 [RFC5884]. The latter provides connectivity verification between the 654 endpoints of an LSP, and recommends establishing a separate BFD 655 session for every path between the endpoints. 657 The LSP traceroute procedures of [RFC4379] allow an ingress LSR to 658 obtain label ranges that can be used to send packets on every path to 659 the egress LSR. It works by having ingress LSR sequentially ask the 660 transit LSRs along a particular path to a given egress LSR to return 661 a label range such that the inclusion of a label in that range in a 662 packet will cause the replying transit LSR to send that packet out 663 the egress interface for that path. The ingress provides the label 664 range returned by transit LSR N to transit LSR N + 1, which returns a 665 label range which is less than or equal in span to the range provided 666 to it. This process iterates until the penultimate transit LSR 667 replies to the ingress LSR with a label range that is acceptable to 668 it and to all LSRs along path preceding it for forwarding a packet 669 along the path. 671 However, the LSP traceroute procedures do not specify where in the 672 label stack the value from the label range is to be placed, whether 673 deep packet inspection is allowed and if so, which keys and key 674 values are to be used. 676 This memo updates LSP traceroute by specifying that the value from 677 the label range is to be placed in the entropy label. Deep packet 678 inspection is thus not necessary, although an LSR may use it, 679 provided it do so consistently, i.e., if the label range to go to a 680 given downstream LSR is computed with deep packet inspection, then 681 the data path should use the same approach and the same keys. 683 In order to have a BFD session on a given path, a value from the 684 label range for that path should be used as the EL value for BFD 685 packets sent on that path. 687 7. MPLS-TP and Entropy Labels 689 Since MPLS-TP does not use ECMP, entropy labels are not applicable to 690 an MPLS-TP deployment. 692 8. Entropy Labels in Various Scenarios 694 This section describes the use of entropy labels in various 695 scenarios. 697 In the figures below, the following conventions used to depict 698 processing between X and Y. Note that control plane signaling goes 699 right to left, whereas data plane processing goes left to right. 701 Protocols 702 Y: <--- [L, E] Y signals L to X 703 X ------------- Y 704 LS: Label stack 705 X: + X pushes 706 Y: - Y pops 708 This means that Y signals to X label L for an LDP tunnel. E can be 709 one of: 711 0: meaning egress is NOT entropy label capable, or 713 1: meaning egress is entropy label capable. 715 The line with LS: shows the label stack on the wire. Below that is 716 the operation that each LSR does in the data plane, where + means 717 push the following label stack, - means pop the following label 718 stack, L~L' means swap L with L', and * means that the operation is 719 not depicted. 721 8.1. LDP Tunnel 723 The following illustrates several simple intra-AS LDP tunnels. The 724 first diagram shows ultimate hop popping (UHP) with ingress inserting 725 an EL, the second UHP with no ELs, the third PHP with ELs, and 726 finally, PHP with no ELs, but also with an application label AL 727 (which could, for example, be a VPN label). 729 Note that, in all the cases below, the MPLS application does not 730 matter; it may be that X pushes some more labels (perhaps for a VPN 731 or VPLS) below the ones shown, and Y pops them. 733 A: <--- [TL4, 1] 734 B: <-- [TL3, 1] 735 ... 736 W: <-- [TL1, 1] 737 Y: <-- [TL0, 1] 738 X --------------- A --------- B ... W ---------- Y 739 LS: 740 X: + 741 A: TL4~TL3 742 B: TL3~TL2 743 ... 744 W: TL1~TL0 745 Y: - 747 LDP with UHP; ingress inserts ELs 749 A: <--- [TL4, 1] 750 B: <-- [TL3, 1] 751 ... 752 W: <-- [TL1, 1] 753 Y: <-- [TL0, 1] 754 X --------------- A --------- B ... W ---------- Y 755 LS: 756 X: + 757 A: TL4~TL3 758 B: TL3~TL2 759 ... 760 W: TL1~TL0 761 Y: - 763 LDP with UHP; ingress does not insert ELs 765 A: <--- [TL4, 1] 766 B: <-- [TL3, 1] 767 ... 768 W: <-- [TL1, 1] 769 Y: <-- [3, 1] 770 X --------------- A --------- B ... W ---------- Y 771 X: + 772 A: TL4~TL3 773 B: TL3~TL2 774 ... 775 W: -TL1 776 Y: - 778 LDP with PHP; ingress inserts ELs 780 A: <--- [TL4, 1] 781 B: <-- [TL3, 1] 782 ... 783 W: <-- [TL1, 1] 784 Y: <-- [3, 1] 785 VPN: <------------------------------------------ [AL] 786 X --------------- A --------- B ... W ---------- Y 787 LS: 788 X: + 789 A: TL4~TL3 790 B: TL3~TL2 791 ... 792 W: -TL1 793 Y: - 794 LDP with PHP + VPN; ingress does not insert ELs 796 A: <--- [TL4, 1] 797 B: <-- [TL3, 1] 798 ... 799 W: <-- [TL1, 1] 800 Y: <-- [3, 1] 801 VPN: <--------------------------------------------- [AL] 802 X --------------- A ------------ B ... W ---------- Y 803 LS: 804 X: + 805 A: TL4~TL3 806 B: TL3~TL2 807 ... 808 W: -TL1 809 Y: - 811 LDP with PHP + VPN; ingress inserts ELs 813 8.2. LDP Over RSVP-TE 815 The following illustrates "LDP over RSVP-TE" tunnels. X and Y are 816 the ingress and egress (respectively) of the LDP tunnel; A and W are 817 the ingress and egress of the RSVP-TE tunnel. It is assumed that 818 both the LDP and RSVP-TE tunnels have PHP. 820 LDP with ELs, RSVP-TE without ELs 821 LDP: <--- [L4, 1] <------- [L3, 1] <--- [3, 1] 822 RSVP-TE: <-- [Rn, 0] 823 <-- [3, 0] 824 X --------------- A --------- B ... W ---------- Y 825 LS: ... 826 DP: + L4~ * -L1 - 828 Figure 2: LDP over RSVP-TE Tunnels 830 8.3. MPLS Applications 832 An ingress LSR X must keep state per unicast tunnel as to whether the 833 egress for that tunnel can process entropy labels. X does not have 834 to keep state per application running over that tunnel. However, an 835 ingress PE can choose on a per-application basis whether or not to 836 insert ELs. For example, X may have an application for which it does 837 not wish to use ECMP (e.g., circuit emulation), or for which it does 838 not know which keys to use for load balancing (e.g., Appletalk over a 839 pseudowire). In either of those cases, X may choose not to insert 840 entropy labels, but may choose to insert entropy labels for an IP VPN 841 over the same tunnel. 843 9. Security Considerations 845 This document describes advertisement of the capability to support 846 receipt of entropy labels which an ingress LSR may insert in MPLS 847 packets in order to allow transit LSRs to attain better load 848 balancing across LAG and/or ECMP paths in the network. 850 This document does not introduce new security vulnerabilities to LDP, 851 BGP or RSVP-TE. Please refer to the Security Considerations section 852 of these protocols ([RFC5036], [RFC4271] and [RFC3209]) for security 853 mechanisms applicable to each. 855 Given that there is no end-user control over the values used for 856 entropy labels, there is little risk of Entropy Label forgery which 857 could cause uneven load-balancing in the network. 859 If Entropy Label Capability is not signaled from an egress PE to an 860 ingress PE, due to, for example, malicious configuration activity on 861 the egress PE, then the PE will fall back to not using entropy labels 862 for load-balancing traffic over LAG or ECMP paths which is in general 863 no worse than the behavior observed in current production networks. 864 That said, it is recommended that operators monitor changes to PE 865 configurations and, more importantly, the fairness of load 866 distribution over LAG or ECMP paths. If the fairness of load 867 distribution over a set of paths changes that could indicate a 868 misconfiguration, bug or other non-optimal behavior on their PEs and 869 they should take corrective action. 871 10. IANA Considerations 873 10.1. Reserved Label for ELI 875 IANA is requested to allocate a reserved label for the Entropy Label 876 Indicator (ELI) from the "Multiprotocol Label Switching Architecture 877 (MPLS) Label Values" Registry. 879 10.2. LDP Entropy Label Capability TLV 881 IANA is requested to allocate the next available value from the IETF 882 Consensus range in the LDP TLV Type Name Space Registry as the 883 "Entropy Label Capability TLV". 885 10.3. BGP Entropy Label Capability Attribute 887 IANA is requested to allocate the next available Path Attribute Type 888 Code from the "BGP Path Attributes" registry as the "BGP Entropy 889 Label Capability Attribute". 891 10.4. RSVP-TE Entropy Label Capability flag 893 IANA is requested to allocate a new bit from the "Attribute Flags" 894 sub-registry of the "RSVP TE Parameters" registry. 896 Bit | Name | Attribute | Attribute | RRO 897 No | | Flags Path | Flags Resv | 898 ----+--------------------------+------------+------------+----- 899 TBD Entropy Label Capability Yes Yes No 901 11. Acknowledgments 903 We wish to thank Ulrich Drafz for his contributions, as well as the 904 entire 'hash label' team for their valuable comments and discussion. 906 Sincere thanks to Nischal Sheth for his many suggestions and 907 comments, and his careful reading of the document, especially with 908 regard to data plane processing of entropy labels. 910 12. References 912 12.1. Normative References 914 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 915 Requirement Levels", BCP 14, RFC 2119, March 1997. 917 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 918 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 919 Encoding", RFC 3032, January 2001. 921 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 922 BGP-4", RFC 3107, May 2001. 924 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 925 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 926 Tunnels", RFC 3209, December 2001. 928 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 929 Specification", RFC 5036, October 2007. 931 [RFC5420] Farrel, A., Papadimitriou, D., Vasseur, JP., and A. 932 Ayyangar, "Encoding of Attributes for MPLS LSP 933 Establishment Using Resource Reservation Protocol Traffic 934 Engineering (RSVP-TE)", RFC 5420, February 2009. 936 12.2. Informative References 938 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 939 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 941 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 942 Protocol 4 (BGP-4)", RFC 4271, January 2006. 944 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 945 Networks (VPNs)", RFC 4364, February 2006. 947 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 948 Label Switched (MPLS) Data Plane Failures", RFC 4379, 949 February 2006. 951 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 952 Heron, "Pseudowire Setup and Maintenance Using the Label 953 Distribution Protocol (LDP)", RFC 4447, April 2006. 955 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 956 (VPLS) Using BGP for Auto-Discovery and Signaling", 957 RFC 4761, January 2007. 959 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 960 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 961 RFC 4762, January 2007. 963 [RFC4875] Aggarwal, R., Papadimitriou, D., and S. Yasukawa, 964 "Extensions to Resource Reservation Protocol - Traffic 965 Engineering (RSVP-TE) for Point-to-Multipoint TE Label 966 Switched Paths (LSPs)", RFC 4875, May 2007. 968 [RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow, 969 "Bidirectional Forwarding Detection (BFD) for MPLS Label 970 Switched Paths (LSPs)", RFC 5884, June 2010. 972 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 973 "Label Distribution Protocol Extensions for Point-to- 974 Multipoint and Multipoint-to-Multipoint Label Switched 975 Paths", RFC 6388, November 2011. 977 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 978 J., and S. Amante, "Flow-Aware Transport of Pseudowires 979 over an MPLS Packet Switched Network", RFC 6391, 980 November 2011. 982 Appendix A. Applicability of LDP Entropy Label Capability TLV 984 In the case of unlabeled IPv4 (Internet) traffic, the Best Current 985 Practice is for an egress LSR to propagate eBGP learned routes within 986 a SP's Autonomous System after resetting the BGP next-hop attribute 987 to one of its Loopback IP addresses. That Loopback IP address is 988 injected into the Service Provider's IGP and, concurrently, a label 989 assigned to it via LDP. Thus, when an ingress LSR is performing a 990 forwarding lookup for a BGP destination it recursively resolves the 991 associated next-hop to a Loopback IP address and associated LDP label 992 of the egress LSR. 994 Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy Label 995 Capability TLV will typically be applied only to the FEC for the 996 Loopback IP address of the egress LSR and the egress LSR need not 997 announce an entropy label capability for the eBGP learned route. 999 Authors' Addresses 1001 Kireeti Kompella 1002 Juniper Networks 1003 1194 N. Mathilda Ave. 1004 Sunnyvale, CA 94089 1005 US 1007 Email: kireeti@juniper.net 1009 John Drake 1010 Juniper Networks 1011 1194 N. Mathilda Ave. 1012 Sunnyvale, CA 94089 1013 US 1015 Email: jdrake@juniper.net 1016 Shane Amante 1017 Level 3 Communications, LLC 1018 1025 Eldorado Blvd 1019 Broomfield, CO 80021 1020 US 1022 Email: shane@level3.net 1024 Wim Henderickx 1025 Alcatel-Lucent 1026 Copernicuslaan 50 1027 2018 Antwerp 1028 Belgium 1030 Email: wim.henderickx@alcatel-lucent.com 1032 Lucy Yong 1033 Huawei USA 1034 5340 Legacy Dr. 1035 Plano, TX 75024 1036 US 1038 Email: lucy.yong@huawei.com