idnits 2.17.1 draft-ietf-mpls-entropy-label-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3031, updated by this document, for RFC5378 checks: 1998-03-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 6, 2012) is 4243 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'L' is mentioned on line 723, but not defined == Missing Reference: 'E' is mentioned on line 723, but not defined == Missing Reference: 'TL4' is mentioned on line 818, but not defined -- Looks like a reference, but probably isn't: '1' on line 843 == Missing Reference: 'TL3' is mentioned on line 819, but not defined == Missing Reference: 'TL1' is mentioned on line 821, but not defined == Missing Reference: 'TL0' is mentioned on line 774, but not defined -- Looks like a reference, but probably isn't: '3' on line 845 == Missing Reference: 'AL' is mentioned on line 823, but not defined == Missing Reference: 'L4' is mentioned on line 843, but not defined == Missing Reference: 'L3' is mentioned on line 843, but not defined == Missing Reference: 'Rn' is mentioned on line 844, but not defined -- Looks like a reference, but probably isn't: '0' on line 845 ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) -- Obsolete informational reference (is this intentional?): RFC 4447 (Obsoleted by RFC 8077) Summary: 1 error (**), 0 flaws (~~), 11 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft J. Drake 4 Updates: 3031, 3107, 3209, 5036 Juniper Networks 5 (if approved) S. Amante 6 Intended status: Standards Track Level 3 Communications, LLC 7 Expires: March 10, 2013 W. Henderickx 8 Alcatel-Lucent 9 L. Yong 10 Huawei USA 11 September 6, 2012 13 The Use of Entropy Labels in MPLS Forwarding 14 draft-ietf-mpls-entropy-label-06 16 Abstract 18 Load balancing is a powerful tool for engineering traffic across a 19 network. This memo suggests ways of improving load balancing across 20 MPLS networks using the concept of "entropy labels". It defines the 21 concept, describes why entropy labels are useful, enumerates 22 properties of entropy labels that allow maximal benefit, and shows 23 how they can be signaled and used for various applications. This 24 document updates RFCs 3031, 3107, 3209 and 5036. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on March 10, 2013. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Conventions used . . . . . . . . . . . . . . . . . . . . . 4 62 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 6 63 2. Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. Entropy Labels and Their Structure . . . . . . . . . . . . . . 8 65 4. Data Plane Processing of Entropy Labels . . . . . . . . . . . 9 66 4.1. Egress LSR . . . . . . . . . . . . . . . . . . . . . . . . 9 67 4.2. Ingress LSR . . . . . . . . . . . . . . . . . . . . . . . 10 68 4.3. Transit LSR . . . . . . . . . . . . . . . . . . . . . . . 11 69 4.4. Penultimate Hop LSR . . . . . . . . . . . . . . . . . . . 12 70 5. Signaling for Entropy Labels . . . . . . . . . . . . . . . . . 12 71 5.1. LDP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 72 5.1.1. Processing the ELC TLV . . . . . . . . . . . . . . . . 13 73 5.2. BGP Signaling . . . . . . . . . . . . . . . . . . . . . . 13 74 5.3. RSVP-TE Signaling . . . . . . . . . . . . . . . . . . . . 14 75 5.4. Multicast LSPs and Entropy Labels . . . . . . . . . . . . 14 76 6. Operations, Administration, and Maintenance (OAM) and 77 Entropy Labels . . . . . . . . . . . . . . . . . . . . . . . . 15 78 7. MPLS-TP and Entropy Labels . . . . . . . . . . . . . . . . . . 16 79 8. Entropy Labels in Various Scenarios . . . . . . . . . . . . . 16 80 8.1. LDP Tunnel . . . . . . . . . . . . . . . . . . . . . . . . 16 81 8.2. LDP Over RSVP-TE . . . . . . . . . . . . . . . . . . . . . 18 82 8.3. MPLS Applications . . . . . . . . . . . . . . . . . . . . 19 83 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 84 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 85 10.1. Reserved Label for ELI . . . . . . . . . . . . . . . . . . 20 86 10.2. LDP Entropy Label Capability TLV . . . . . . . . . . . . . 20 87 10.3. BGP Entropy Label Capability Attribute . . . . . . . . . . 20 88 10.4. RSVP-TE Entropy Label Capability flag . . . . . . . . . . 20 89 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 90 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 91 12.1. Normative References . . . . . . . . . . . . . . . . . . . 21 92 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 93 Appendix A. Applicability of LDP Entropy Label Capability TLV . . 22 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 96 1. Introduction 98 Load balancing, or multi-pathing, is an attempt to balance traffic 99 across a network by allowing the traffic to use multiple paths. Load 100 balancing has several benefits: it eases capacity planning; it can 101 help absorb traffic surges by spreading them across multiple paths; 102 it allows better resilience by offering alternate paths in the event 103 of a link or node failure. 105 As providers scale their networks, they use several techniques to 106 achieve greater bandwidth between nodes. Two widely used techniques 107 are: Link Aggregation Group (LAG) and Equal-Cost Multi-Path (ECMP). 108 LAG is used to bond together several physical circuits between two 109 adjacent nodes so they appear to higher-layer protocols as a single, 110 higher bandwidth 'virtual' pipe. ECMP is used between two nodes 111 separated by one or more hops, to allow load balancing over several 112 shortest paths in the network. This is typically obtained by 113 arranging IGP metrics such that there are several equal cost paths 114 between source-destination pairs. Both of these techniques may, and 115 often do, co-exist in various parts of a given provider's network, 116 depending on various choices made by the provider. 118 A very important requirement when load balancing is that packets 119 belonging to a given 'flow' must be mapped to the same path, i.e., 120 the same exact sequence of links across the network. This is to 121 avoid jitter, latency and re-ordering issues for the flow. What 122 constitutes a flow varies considerably. A common example of a flow 123 is a TCP session. Other examples are an L2TP session corresponding 124 to a given broadband user, or traffic within an ATM virtual circuit. 126 To meet this requirement, a node uses certain fields, termed 'keys', 127 within a packet's header as input to a load balancing function 128 (typically a hash function) that selects the path for all packets in 129 a given flow. The keys chosen for the load balancing function depend 130 on the packet type; a typical set (for IP packets) is the IP source 131 and destination addresses, the protocol type, and (for TCP and UDP 132 traffic) the source and destination port numbers. An overly 133 conservative choice of fields may lead to many flows mapping to the 134 same hash value (and consequently poorer load balancing); an overly 135 aggressive choice may map a flow to multiple values, potentially 136 violating the above requirement. 138 For MPLS networks, most of the same principles (and benefits) apply. 139 However, finding useful keys in a packet for the purpose of load 140 balancing can be more of a challenge. In many cases, MPLS 141 encapsulation may require fairly deep inspection of packets to find 142 these keys at transit Label Switching Routers (LSRs). 144 One way to eliminate the need for this deep inspection is to have the 145 ingress LSR of an MPLS Label Switched Path extract the appropriate 146 keys from a given packet, input them to its load balancing function, 147 and place the result in an additional label, termed the 'entropy 148 label', as part of the MPLS label stack it pushes onto that packet. 150 The packet's MPLS entire label stack can then be used by transit LSRs 151 to perform load balancing, as the entropy label introduces the right 152 level of "entropy" into the label stack. 154 There are five key reasons why this is beneficial: 156 1. at the ingress LSR, MPLS encapsulation hasn't yet occurred, so 157 deep inspection is not necessary; 159 2. the ingress LSR has more context and information about incoming 160 packets than transit LSRs; 162 3. ingress LSRs usually operate at lower bandwidths than transit 163 LSRs, allowing them to do more work per packet; 165 4. transit LSRs do not need to perform deep packet inspection and 166 can load balance effectively using only a packet's MPLS label 167 stack; and 169 5. transit LSRs, not having the full context that an ingress LSR 170 does, have the hard choice between potentially misinterpreting 171 fields in a packet as valid keys for load balancing (causing 172 packet ordering problems) or adopting a conservative approach 173 (giving rise to sub-optimal load balancing). Entropy labels 174 relieves them of making this choice. 176 This memo describes why entropy labels are needed and defines the 177 properties of entropy labels; in particular how they are generated 178 and received, and the expected behavior of transit LSRs. Finally, it 179 describes in general how signaling works and what needs to be 180 signaled, as well as specifics for the signaling of entropy labels 181 for LDP ([RFC5036]), BGP ([RFC3107]), and RSVP-TE ([RFC3209]). 183 1.1. Conventions used 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 187 document are to be interpreted as described in [RFC2119]. 189 The following acronyms are used: 191 BoS: Bottom of Stack 193 CE: Customer Edge device 195 ECMP: Equal Cost Multi-Path 197 EL: Entropy Label 199 ELC: Entropy Label Capability 201 ELI: Entropy Label Indicator 203 FEC: Forwarding Equivalence Class 205 LAG: Link Aggregation Group 207 LER: Label Edge Router 209 LSP: Label Switched Path 211 LSR: Label Switching Router 213 PE: Provider Edge Router 215 PW: Pseudowire 217 PHP: Penultimate Hop Popping 219 TC: Traffic Class 221 TTL: Time-to-Live 223 UHP: Ultimate Hop Popping 225 VPLS: Virtual Private LAN (Local Area Network) Service 227 VPN: Virtual Private Network 229 The term ingress (or egress) LSR is used interchangeably with ingress 230 (or egress) LER. The term application throughout the text refers to 231 an MPLS application (such as a VPN or VPLS). 233 A label stack (say of three labels) is denoted by , where 234 L1 is the "outermost" label and L3 the innermost (closest to the 235 payload). Packet flows are depicted left to right, and signaling is 236 shown right to left (unless otherwise indicated). 238 The term 'label' is used both for the entire 32-bit label stack entry 239 and the 20-bit label field within a label stack entry. It should be 240 clear from the context which is meant. 242 1.2. Motivation 244 MPLS is a very successful generic forwarding substrate that 245 transports several dozen types of protocols, most notably: IP, PWs, 246 VPLS and IP VPNs. Within each type of protocol, there typically 247 exist several variants, each with a different set of load balancing 248 keys, e.g., for IP: IPv4, IPv6, IPv6 in IPv4, etc.; for PWs: 249 Ethernet, ATM, Frame-Relay, etc. There are also several different 250 types of Ethernet over PW encapsulation, ATM over PW encapsulation, 251 etc. as well. Finally, given the popularity of MPLS, it is likely 252 that it will continue to be extended to transport new protocols. 254 Currently, each transit LSR along the path of a given LSP has to try 255 to infer the underlying protocol within an MPLS packet in order to 256 extract appropriate keys for load balancing. Unfortunately, if the 257 transit LSR is unable to infer the MPLS packet's protocol (as is 258 often the case), it will typically use the topmost (or all) MPLS 259 labels in the label stack as keys for the load balancing function. 260 The result may be an extremely inequitable distribution of traffic 261 across equal-cost paths exiting that LSR. This is because MPLS 262 labels are generally fairly coarse-grained forwarding labels that 263 typically describe a next-hop, or provide some of demultiplexing 264 and/or forwarding function, and do not describe the packet's 265 underlying protocol. 267 On the other hand, an ingress LSR (e.g., a PE router) has detailed 268 knowledge of an packet's contents, typically through a priori 269 configuration of the encapsulation(s) that are expected at a given 270 PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.). They also have more 271 flexible forwarding hardware. PE routers need this information and 272 these capabilities to: 274 a) apply the required services for the CE; 276 b) discern the packet's CoS forwarding treatment; 278 c) apply filters to forward or block traffic to/from the CE; 280 d) to forward routing/control traffic to an onboard management 281 processor; and, 283 e) load-balance the traffic on its uplinks to transit LSRs (e.g., 284 P routers). 286 By knowing the expected encapsulation types, an ingress LSR router 287 can apply a more specific set of payload parsing routines to extract 288 the keys appropriate for a given protocol. This allows for 289 significantly improved accuracy in determining the appropriate load 290 balancing behavior for each protocol. 292 If the ingress LSR were to capture the flow information so gathered 293 in a convenient form for downstream transit LSRs, transit LSRs could 294 remain completely oblivious to the contents of each MPLS packet, and 295 use only the captured flow information to perform load balancing. In 296 particular, there will be no reason to duplicate an ingress LSR's 297 complex packet/payload parsing functionality in a transit LSR. This 298 will result in less complex transit LSRs, enabling them to more 299 easily scale to higher forwarding rates, larger port density, lower 300 power consumption, etc. The idea in this memo is to capture this 301 flow information as a label, the so-called entropy label. 303 Ingress LSRs can also adapt more readily to new protocols and extract 304 the appropriate keys to use for load balancing packets of those 305 protocols. This means that deploying new protocols or services in 306 edge devices requires fewer concomitant changes in the core, 307 resulting in higher edge service velocity and at the same time more 308 stable core networks. 310 2. Approaches 312 There are two main approaches to encoding load balancing information 313 in the label stack. The first allocates multiple labels for a 314 particular Forwarding Equivalence Class (FEC). These labels are 315 equivalent in terms of forwarding semantics, but having multiple 316 labels allows flexibility in assigning labels to flows belonging to 317 the same FEC. This approach has the advantage that the label stack 318 has the same depth whether or not one uses label-based load 319 balancing; and so, consequently, there is no change to forwarding 320 operations on transit and egress LSRs. However, it has a major 321 drawback in that there is a significant increase in both signaling 322 and forwarding state. 324 The other approach encodes the load balancing information as an 325 additional label in the label stack, thus increasing the depth of the 326 label stack by one. With this approach, there is minimal change to 327 signaling state for a FEC; also, there is no change in forwarding 328 operations in transit LSRs, and no increase of forwarding state in 329 any LSR. The only purpose of the additional label is to increase the 330 entropy in the label stack, so this is called an "entropy label". 331 This memo focuses solely on this approach. 333 This latter approach uses upstream generated entropy labels, which 334 may conflict with downstream allocated application labels. There are 335 a few approaches to deal with this: 1) allocate a pair of labels for 336 each FEC, one that must have an entropy label below it, and one that 337 must not; 2) use a label (the "Entropy Label Indicator") to indicate 338 that the next label is an entropy label; and 3) allow entropy labels 339 only where there is no possible confusion. The first doubles control 340 and data plane state in the network; the last is too restrictive. 341 The approach taken here is the second. In making both the above 342 choices, the trade-off is to increase label stack depth rather than 343 control and data plane state in the network. 345 Finally, one may choose to associate ELs with MPLS tunnels (LSPs), or 346 with MPLS applications (e.g., VPNs). (What this entails is described 347 in later sections.) We take the former approach, for the following 348 reasons: 350 1. There are a small number of tunneling protocols for MPLS, but a 351 large and growing number of applications. Defining ELs on a 352 tunnel basis means simpler standards, lower development, 353 interoperability and testing efforts. 355 2. As a consequence, there will be much less churn in the network as 356 new applications (services) are defined and deployed. 358 3. Processing application labels in the data plane is more complex 359 than processing tunnel labels. Thus, it is preferable to burden 360 the latter rather than the former with EL processing. 362 4. Associating ELs with tunnels makes it simpler to deal with 363 hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs. 364 Each layer in the hierarchy can choose independently whether or 365 not they want ELs. 367 The cost of this approach is that ELIs will be mandatory; again, the 368 trade-off is the size of the label stack. To summarize, the net 369 increase in the label stack to use entropy labels is two: one 370 reserved label for the ELI, and the entropy label itself. 372 3. Entropy Labels and Their Structure 374 An entropy label (as used here) is a label: 376 1. that is not used for forwarding; 378 2. that is not signaled; and 379 3. whose only purpose in the label stack is to provide 'entropy' to 380 improve load balancing. 382 Entropy labels are generated by an ingress LSR, based entirely on 383 load balancing information. However, they MUST NOT have values in 384 the reserved label space (0-15) [IANA MPLS Label Values]. 386 Since entropy labels are generated by an ingress LSR, an egress LSR 387 MUST be able to distinguish unambiguously between entropy labels and 388 application labels. To accomplish this, it is REQUIRED that the 389 label immediately preceding an entropy label (EL) in the MPLS label 390 stack be an 'entropy label indicator' (ELI), where preceding means 391 closer to the top of the label stack (farther from bottom of stack 392 indication). The ELI is a reserved label with value (TBD by IANA). 393 How to set values of the TTL, TC and 'Bottom of Stack' (BoS) fields 394 ([RFC3032]) for the ELI and for ELs is discussed in Section 4.2. 396 Entropy labels are useful for pseudowires ([RFC4447]). [RFC6391] 397 explains how entropy labels can be used for RFC 4447-style 398 pseudowires, and thus is complementary to this memo, which focuses on 399 how entropy labels can be used for tunnels, and thus for all other 400 MPLS applications. 402 4. Data Plane Processing of Entropy Labels 404 4.1. Egress LSR 406 Suppose egress LSR Y is capable of processing entropy labels for a 407 tunnel. Y indicates this to all ingresses via signaling (see 408 Section 5). Y MUST be prepared to deal both with packets with an 409 imposed EL and those without; the ELI will distinguish these cases. 410 If a particular ingress chooses not to impose an EL, Y's processing 411 of the received label stack (which might be empty) is as if Y chose 412 not to accept ELs. 414 If an ingress X chooses to impose an EL, then Y will receive a tunnel 415 termination packet with label stack . Y recognizes TL as the label it distributed to its 417 upstreams for the tunnel, and pops it. (Note that TL may be the 418 implicit null label, in which case it doesn't appear in the label 419 stack.) Y then recognizes the ELI and pops two labels: the ELI and 420 the EL. Y then processes the remaining packet header as normal; this 421 may require further processing of tunnel termination, perhaps with 422 further ELI+EL pairs. When processing the final tunnel termination, 423 Y MAY enqueue the packet based on that tunnel TL's or ELI's TC value, 424 and MAY use the tunnel TL's or ELI's TTL to compute the TTL of the 425 remaining packet header. The EL's TTL MUST be ignored. 427 If any ELI processed by Y has BoS bit set, Y MUST discard the packet, 428 and MAY log an error. The EL's BoS bit will indicate whether or not 429 there are more labels in the stack. 431 4.2. Ingress LSR 433 If an egress LSR Y indicates via signaling that it can process ELs on 434 a particular tunnel, an ingress LSR X can choose whether or not to 435 insert ELs for packets going into that tunnel. Y MUST handle both 436 cases. 438 The steps that X performs to insert ELs are as follows: 440 1. On an incoming packet, identify the application to which the 441 packet belongs; based on this, pick appropriate fields as input 442 to the load balancing function; apply the load balancing function 443 to these input fields, and let LB be the output. 445 2. Determine the application label AL (if any). Push onto the 446 packet. 448 3. Based on the application, the load balancing output LB and other 449 factors, determine the egress LSR Y, the tunnel to Y, the 450 specific interface to the next hop, and thus the tunnel label TL. 451 Use LB to generate the entropy label EL. 453 4. If, for the chosen tunnel, Y has not indicated that it can 454 process ELs, push onto the packet. If Y has indicated that 455 it can process ELs for the tunnel, push onto the 456 packet. X SHOULD put the same TTL and TC fields for the ELI as 457 it does for TL. X MAY choose different values for the TTL and TC 458 fields if it is known that the ELI will not be exposed as the top 459 label at any point along the LSP (as may happen in cases where 460 PHP is used and the ELI and EL are not stripped at the 461 penultimate hop (see Section 4.4). The BoS bit for the ELI MUST 462 be zero. The TTL for the EL MUST be zero to ensure that it is 463 not used inadvertently for forwarding. The TC for the EL may be 464 any value. The BoS bit for the EL depends on whether or not 465 there are more labels in the label stack. 467 5. X then determines whether further tunnel hierarchy is needed; if 468 so, X goes back to step 3, possibly with a new egress Y for the 469 new tunnel. Otherwise, X is done, and sends out the packet. 471 Notes: 473 a. X computes load balancing information and generates the EL based 474 on the incoming application packet, even though the signaling of 475 EL capability is associated with tunnels. 477 b. X MAY insert several entropy labels in the stack (each, of 478 course, preceded by an ELI), potentially one for each 479 hierarchical tunnel, provided that the egress for that tunnel has 480 indicated that it can process ELs for that tunnel. 482 c. X MUST NOT include an entropy label for a given tunnel unless the 483 egress LSR Y has indicated that it can process entropy labels for 484 that tunnel. 486 d. The signaling and use of entropy labels in one direction 487 (signaling from Y to X, and data path from X to Y) is completely 488 independent of the signaling and use of entropy labels in the 489 reverse direction (signaling from X to Y, and data path from Y to 490 X). 492 4.3. Transit LSR 494 Transit LSRs MAY operate with no change in forwarding behavior. The 495 following are suggestions for optimizations that improve load 496 balancing, reduce the amount of packet data processed, and/or enhance 497 backward compatibility. 499 If a transit LSR recognizes the ELI, it MAY choose to load balance 500 solely on the following label (the EL); otherwise, it SHOULD use as 501 much of the whole label stack as feasible as keys for the load 502 balancing function. In any case, reserved labels MUST NOT be used as 503 keys for the load balancing function. 505 Some transit LSRs look beyond the label stack for better load 506 balancing information. This is a simple, backward compatible 507 approach in networks where some ingress LSRs impose ELs and others 508 don't. However, this is of limited incremental value if an EL is 509 indeed present, and requires more packet processing from the LSR. A 510 transit LSR MAY choose to parse the label stack for the presence of 511 the ELI, and look beyond the label stack only if it does not find it, 512 thus retaining the old behavior when needed, yet avoiding unnecessary 513 work if not needed. 515 As stated in Section 4.1 and Section 5, an egress LSR that signals 516 both ELC and implicit null MUST pop the ELI and the next label if it 517 encounters a packet with the ELI as the topmost label. Any other LSR 518 (including PHP LSRs) MUST drop such packets, as per section 3.18 of 519 [RFC3031]. 521 4.4. Penultimate Hop LSR 523 No change is needed at penultimate hop LSRs. However, a PHP LSR that 524 recognizes the ELI MAY choose to pop the ELI and following label 525 (which should be an entropy label) in addition to popping the tunnel 526 label, provided that doing so doesn't diminish its ability to load 527 balance on the next hop. 529 5. Signaling for Entropy Labels 531 An egress LSR Y can signal to ingress LSR(s) its ability to process 532 entropy labels (henceforth called "Entropy Label Capability" or ELC) 533 on a given tunnel. In particular, even if Y signals an implicit null 534 label, indicating that PHP is to be performed, Y MUST be prepared to 535 pop the ELI and EL. 537 Note that Entropy Label Capability may be asymmetric: if LSRs X and Y 538 are at opposite ends of a tunnel, X may be able to process entropy 539 labels, whereas Y may not. The signaling extensions below allow for 540 this asymmetry. 542 For an illustration of signaling and forwarding with entropy labels, 543 see Section 8. 545 5.1. LDP Signaling 547 A new LDP TLV ([RFC5036]) is defined to signal an egress's ability to 548 process entropy labels. This is called the ELC TLV, and may appear 549 as an Optional Parameter of the Label Mapping Message TLV. 551 The presence of the ELC TLV in a Label Mapping Message indicates to 552 ingress LSRs that the egress LSR can process entropy labels for the 553 associated LDP tunnel. The ELC TLV has Type (TBD by IANA) and Length 554 0. 556 The structure of the ELC TLV is shown below. 558 0 1 2 3 559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 |U|F| Type (TBD) | Length (0) | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 Figure 1: Entropy Label Capability TLV 566 where: 568 U: Unknown bit. This bit MUST be set to 1. If the ELC TLV is not 569 understood by the receiver, then it MUST be ignored. 571 F: Forward bit. This bit MUST be set be set to 1. Since the ELC 572 TLV is going to be propagated hop-by-hop, it should be forwarded 573 even by nodes that may not understand it. 575 Type: Type field. To be assigned by IANA. 577 Length: Length field. This field specifies the total length in 578 octets of the ELC TLV, and is currently defined to be 0. 580 5.1.1. Processing the ELC TLV 582 An LSR that receives a Label Mapping with the ELC TLV but does not 583 understand it MUST propagate it intact to its neighbors and MUST NOT 584 send a notification to the sender (following the meaning of the U- 585 and F-bits). 587 An LSR X may receive multiple Label Mappings for a given FEC F from 588 its neighbors. In its turn, X may advertise a Label Mapping for F to 589 its neighbors. If X understands the ELC TLV, and if any of the 590 advertisements it received for FEC F does not include the ELC TLV, X 591 MUST NOT include the ELC TLV in its own advertisements of F. If all 592 the advertised Mappings for F include the ELC TLV, then X MUST 593 advertise its Mapping for F with the ELC TLV. If any of X's 594 neighbors resends its Mapping, sends a new Mapping or Withdraws a 595 previously advertised Mapping for F, X MUST re-evaluate the status of 596 ELC for FEC F, and, if there is a change, X MUST re-advertise its 597 Mapping for F with the updated status of ELC. 599 5.2. BGP Signaling 601 When BGP [RFC4271] is used for distributing Network Layer 602 Reachability Information (NLRI) as described in, for example, 603 [RFC3107], the BGP UPDATE message may include the ELC attribute as 604 part of the Path Attributes. This is an optional, transitive BGP 605 attribute of type (to be assigned by IANA). The inclusion of this 606 attribute with an NLRI indicates that the advertising BGP router can 607 process entropy labels as an egress LSR for all routes in that NLRI. 609 A BGP speaker S that originates an UPDATE should include the ELC 610 attribute only if both of the following are true: 612 A1: S sets the BGP NEXT_HOP attribute to itself; AND 613 A2: S can process entropy labels. 615 Suppose a BGP speaker T receives an UPDATE U with the ELC attribute. 616 T has two choices. T can simply re-advertise U with the ELC 617 attribute if either of the following is true: 619 B1: T does not change the NEXT_HOP attribute; OR 621 B2: T simply swaps labels without popping the entire label stack and 622 processing the payload below. 624 An example of the use of B1 is Route Reflectors. 626 However, if T changes the NEXT_HOP attribute for U and in the data 627 plane pops the entire label stack to process the payload, T MAY 628 include an ELC attribute for UPDATE U' if both of the following are 629 true: 631 C1: T sets the NEXT_HOP attribute of U' to itself; AND 633 C2: T can process entropy labels. 635 Otherwise, T MUST remove the ELC attribute. 637 5.3. RSVP-TE Signaling 639 Entropy Label support is signaled in RSVP-TE [RFC3209] using the 640 Entropy Label Capability (ELC) flag in the Attribute Flags TLV of the 641 LSP_ATTRIBUTES object [RFC5420]. The presence of the ELC flag in a 642 Path message indicates that the ingress can process entropy labels in 643 the upstream direction; this only makes sense for a bidirectional LSP 644 and MUST be ignored otherwise. The presence of the ELC flag in a 645 Resv message indicates that the egress can process entropy labels in 646 the downstream direction. 648 The bit number for the ELC flag is to be assigned by IANA. 650 5.4. Multicast LSPs and Entropy Labels 652 Multicast LSPs [RFC4875], [RFC6388] typically do not use ECMP for 653 load balancing, as the combination of replication and multipathing 654 can lead to duplicate traffic delivery. However, these LSPs can 655 traverse bundled links [RFC4201] and LAGs. In both these cases, load 656 balancing is useful, and hence entropy labels can be of value for 657 multicast LSPs. 659 The methodology defined for entropy labels here will be used for 660 multicast LSPs; however, the details of signaling and processing ELs 661 for multicast LSPs will be specified in a companion document. 663 6. Operations, Administration, and Maintenance (OAM) and Entropy Labels 665 Generally OAM comprises a set of functions operating in the data 666 plane to allow a network operator to monitor its network 667 infrastructure and to implement mechanisms in order to enhance the 668 general behavior and the level of performance of its network, e.g., 669 the efficient and automatic detection, localization, diagnosis and 670 handling of defects. 672 Currently defined OAM mechanisms for MPLS include LSP Ping/Traceroute 673 [RFC4379] and Bidirectional Failure Detection (BFD) for MPLS 674 [RFC5884]. The latter provides connectivity verification between the 675 endpoints of an LSP, and recommends establishing a separate BFD 676 session for every path between the endpoints. 678 The LSP traceroute procedures of [RFC4379] allow an ingress LSR to 679 obtain label ranges that can be used to send packets on every path to 680 the egress LSR. It works by having ingress LSR sequentially ask the 681 transit LSRs along a particular path to a given egress LSR to return 682 a label range such that the inclusion of a label in that range in a 683 packet will cause the replying transit LSR to send that packet out 684 the egress interface for that path. The ingress provides the label 685 range returned by transit LSR N to transit LSR N + 1, which returns a 686 label range which is less than or equal in span to the range provided 687 to it. This process iterates until the penultimate transit LSR 688 replies to the ingress LSR with a label range that is acceptable to 689 it and to all LSRs along path preceding it for forwarding a packet 690 along the path. 692 However, the LSP traceroute procedures do not specify where in the 693 label stack the value from the label range is to be placed, whether 694 deep packet inspection is allowed and if so, which keys and key 695 values are to be used. 697 This memo updates LSP traceroute by specifying that the value from 698 the label range is to be placed in the entropy label. Deep packet 699 inspection is thus not necessary, although an LSR may use it, 700 provided it do so consistently, i.e., if the label range to go to a 701 given downstream LSR is computed with deep packet inspection, then 702 the data path should use the same approach and the same keys. 704 In order to have a BFD session on a given path, a value from the 705 label range for that path should be used as the EL value for BFD 706 packets sent on that path. 708 7. MPLS-TP and Entropy Labels 710 Since MPLS-TP does not use ECMP, entropy labels are not applicable to 711 an MPLS-TP deployment. 713 8. Entropy Labels in Various Scenarios 715 This section describes the use of entropy labels in various 716 scenarios. 718 In the figures below, the following conventions used to depict 719 processing between X and Y. Note that control plane signaling goes 720 right to left, whereas data plane processing goes left to right. 722 Protocols 723 Y: <--- [L, E] Y signals L to X 724 X ------------- Y 725 LS: Label stack 726 X: + X pushes 727 Y: - Y pops 729 This means that Y signals to X label L for an LDP tunnel. E can be 730 one of: 732 0: meaning egress is NOT entropy label capable, or 734 1: meaning egress is entropy label capable. 736 The line with LS: shows the label stack on the wire. Below that is 737 the operation that each LSR does in the data plane, where + means 738 push the following label stack, - means pop the following label 739 stack, L~L' means swap L with L', and * means that the operation is 740 not depicted. 742 8.1. LDP Tunnel 744 The following illustrates several simple intra-AS LDP tunnels. The 745 first diagram shows ultimate hop popping (UHP) with ingress inserting 746 an EL, the second UHP with no ELs, the third PHP with ELs, and 747 finally, PHP with no ELs, but also with an application label AL 748 (which could, for example, be a VPN label). 750 Note that, in all the cases below, the MPLS application does not 751 matter; it may be that X pushes some more labels (perhaps for a VPN 752 or VPLS) below the ones shown, and Y pops them. 754 A: <--- [TL4, 1] 755 B: <-- [TL3, 1] 756 ... 757 W: <-- [TL1, 1] 758 Y: <-- [TL0, 1] 759 X --------------- A --------- B ... W ---------- Y 760 LS: 761 X: + 762 A: TL4~TL3 763 B: TL3~TL2 764 ... 765 W: TL1~TL0 766 Y: - 768 LDP with UHP; ingress inserts ELs 770 A: <--- [TL4, 1] 771 B: <-- [TL3, 1] 772 ... 773 W: <-- [TL1, 1] 774 Y: <-- [TL0, 1] 775 X --------------- A --------- B ... W ---------- Y 776 LS: 777 X: + 778 A: TL4~TL3 779 B: TL3~TL2 780 ... 781 W: TL1~TL0 782 Y: - 784 LDP with UHP; ingress does not insert ELs 786 A: <--- [TL4, 1] 787 B: <-- [TL3, 1] 788 ... 789 W: <-- [TL1, 1] 790 Y: <-- [3, 1] 791 X --------------- A --------- B ... W ---------- Y 792 X: + 793 A: TL4~TL3 794 B: TL3~TL2 795 ... 796 W: -TL1 797 Y: - 799 LDP with PHP; ingress inserts ELs 801 A: <--- [TL4, 1] 802 B: <-- [TL3, 1] 803 ... 804 W: <-- [TL1, 1] 805 Y: <-- [3, 1] 806 VPN: <------------------------------------------ [AL] 807 X --------------- A --------- B ... W ---------- Y 808 LS: 809 X: + 810 A: TL4~TL3 811 B: TL3~TL2 812 ... 813 W: -TL1 814 Y: - 816 LDP with PHP + VPN; ingress does not insert ELs 818 A: <--- [TL4, 1] 819 B: <-- [TL3, 1] 820 ... 821 W: <-- [TL1, 1] 822 Y: <-- [3, 1] 823 VPN: <--------------------------------------------- [AL] 824 X --------------- A ------------ B ... W ---------- Y 825 LS: 826 X: + 827 A: TL4~TL3 828 B: TL3~TL2 829 ... 830 W: -TL1 831 Y: - 833 LDP with PHP + VPN; ingress inserts ELs 835 8.2. LDP Over RSVP-TE 837 The following illustrates "LDP over RSVP-TE" tunnels. X and Y are 838 the ingress and egress (respectively) of the LDP tunnel; A and W are 839 the ingress and egress of the RSVP-TE tunnel. It is assumed that 840 both the LDP and RSVP-TE tunnels have PHP. 842 LDP with ELs, RSVP-TE without ELs 843 LDP: <--- [L4, 1] <------- [L3, 1] <--- [3, 1] 844 RSVP-TE: <-- [Rn, 0] 845 <-- [3, 0] 846 X --------------- A --------- B ... W ---------- Y 847 LS: ... 848 DP: + L4~ * -L1 - 850 Figure 2: LDP over RSVP-TE Tunnels 852 8.3. MPLS Applications 854 An ingress LSR X must keep state per unicast tunnel as to whether the 855 egress for that tunnel can process entropy labels. X does not have 856 to keep state per application running over that tunnel. However, an 857 ingress PE can choose on a per-application basis whether or not to 858 insert ELs. For example, X may have an application for which it does 859 not wish to use ECMP (e.g., circuit emulation), or for which it does 860 not know which keys to use for load balancing (e.g., Appletalk over a 861 pseudowire). In either of those cases, X may choose not to insert 862 entropy labels, but may choose to insert entropy labels for an IP VPN 863 over the same tunnel. 865 9. Security Considerations 867 This document describes advertisement of the capability to support 868 receipt of entropy labels which an ingress LSR may insert in MPLS 869 packets in order to allow transit LSRs to attain better load 870 balancing across LAG and/or ECMP paths in the network. 872 This document does not introduce new security vulnerabilities to LDP, 873 BGP or RSVP-TE. Please refer to the Security Considerations section 874 of these protocols ([RFC5036], [RFC4271] and [RFC3209]) for security 875 mechanisms applicable to each. 877 Given that there is no end-user control over the values used for 878 entropy labels, there is little risk of Entropy Label forgery which 879 could cause uneven load-balancing in the network. 881 If Entropy Label Capability is not signaled from an egress PE to an 882 ingress PE, due to, for example, malicious configuration activity on 883 the egress PE, then the PE will fall back to not using entropy labels 884 for load-balancing traffic over LAG or ECMP paths which is in general 885 no worse than the behavior observed in current production networks. 886 That said, it is recommended that operators monitor changes to PE 887 configurations and, more importantly, the fairness of load 888 distribution over LAG or ECMP paths. If the fairness of load 889 distribution over a set of paths changes that could indicate a 890 misconfiguration, bug or other non-optimal behavior on their PEs and 891 they should take corrective action. 893 10. IANA Considerations 895 10.1. Reserved Label for ELI 897 IANA is requested to allocate a reserved label for the Entropy Label 898 Indicator (ELI) from the "Multiprotocol Label Switching Architecture 899 (MPLS) Label Values" Registry. 901 10.2. LDP Entropy Label Capability TLV 903 IANA is requested to allocate the next available value from the IETF 904 Consensus range (0x0001-0x07FF) in the LDP TLV Type Name Space 905 Registry as the "Entropy Label Capability TLV". 907 10.3. BGP Entropy Label Capability Attribute 909 IANA is requested to allocate the next available Path Attribute Type 910 Code from the "BGP Path Attributes" registry as the "BGP Entropy 911 Label Capability Attribute". 913 10.4. RSVP-TE Entropy Label Capability flag 915 IANA is requested to allocate a new bit from the "Attribute Flags" 916 sub-registry of the "RSVP TE Parameters" registry. 918 Bit | Name | Attribute | Attribute | RRO 919 No | | Flags Path | Flags Resv | 920 ----+--------------------------+------------+------------+----- 921 TBD Entropy Label Capability Yes Yes No 923 11. Acknowledgments 925 We wish to thank Ulrich Drafz for his contributions, as well as the 926 entire 'hash label' team for their valuable comments and discussion. 928 Sincere thanks to Nischal Sheth for his many suggestions and 929 comments, and his careful reading of the document, especially with 930 regard to data plane processing of entropy labels. 932 12. References 933 12.1. Normative References 935 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 936 Requirement Levels", BCP 14, RFC 2119, March 1997. 938 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 939 Label Switching Architecture", RFC 3031, January 2001. 941 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 942 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 943 Encoding", RFC 3032, January 2001. 945 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 946 BGP-4", RFC 3107, May 2001. 948 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 949 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 950 Tunnels", RFC 3209, December 2001. 952 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 953 Specification", RFC 5036, October 2007. 955 [RFC5420] Farrel, A., Papadimitriou, D., Vasseur, JP., and A. 956 Ayyangarps, "Encoding of Attributes for MPLS LSP 957 Establishment Using Resource Reservation Protocol Traffic 958 Engineering (RSVP-TE)", RFC 5420, February 2009. 960 12.2. Informative References 962 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 963 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 965 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 966 Protocol 4 (BGP-4)", RFC 4271, January 2006. 968 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 969 Label Switched (MPLS) Data Plane Failures", RFC 4379, 970 February 2006. 972 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 973 Heron, "Pseudowire Setup and Maintenance Using the Label 974 Distribution Protocol (LDP)", RFC 4447, April 2006. 976 [RFC4875] Aggarwal, R., Papadimitriou, D., and S. Yasukawa, 977 "Extensions to Resource Reservation Protocol - Traffic 978 Engineering (RSVP-TE) for Point-to-Multipoint TE Label 979 Switched Paths (LSPs)", RFC 4875, May 2007. 981 [RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow, 982 "Bidirectional Forwarding Detection (BFD) for MPLS Label 983 Switched Paths (LSPs)", RFC 5884, June 2010. 985 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 986 "Label Distribution Protocol Extensions for Point-to- 987 Multipoint and Multipoint-to-Multipoint Label Switched 988 Paths", RFC 6388, November 2011. 990 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 991 J., and S. Amante, "Flow-Aware Transport of Pseudowires 992 over an MPLS Packet Switched Network", RFC 6391, 993 November 2011. 995 Appendix A. Applicability of LDP Entropy Label Capability TLV 997 In the case of unlabeled IPv4 (Internet) traffic, the Best Current 998 Practice is for an egress LSR to propagate eBGP learned routes within 999 a SP's Autonomous System after resetting the BGP next-hop attribute 1000 to one of its Loopback IP addresses. That Loopback IP address is 1001 injected into the Service Provider's IGP and, concurrently, a label 1002 assigned to it via LDP. Thus, when an ingress LSR is performing a 1003 forwarding lookup for a BGP destination it recursively resolves the 1004 associated next-hop to a Loopback IP address and associated LDP label 1005 of the egress LSR. 1007 Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy Label 1008 Capability TLV will typically be applied only to the FEC for the 1009 Loopback IP address of the egress LSR and the egress LSR need not 1010 announce an entropy label capability for the eBGP learned route. 1012 Authors' Addresses 1014 Kireeti Kompella 1015 Juniper Networks 1016 1194 N. Mathilda Ave. 1017 Sunnyvale, CA 94089 1018 US 1020 Email: kireeti.kompella@gmail.com 1021 John Drake 1022 Juniper Networks 1023 1194 N. Mathilda Ave. 1024 Sunnyvale, CA 94089 1025 US 1027 Email: jdrake@juniper.net 1029 Shane Amante 1030 Level 3 Communications, LLC 1031 1025 Eldorado Blvd 1032 Broomfield, CO 80021 1033 US 1035 Email: shane@level3.net 1037 Wim Henderickx 1038 Alcatel-Lucent 1039 Copernicuslaan 50 1040 2018 Antwerp 1041 Belgium 1043 Email: wim.henderickx@alcatel-lucent.com 1045 Lucy Yong 1046 Huawei USA 1047 5340 Legacy Dr. 1048 Plano, TX 75024 1049 US 1051 Email: lucy.yong@huawei.com