idnits 2.17.1 draft-ietf-mpls-entropy-label-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC5036, but the abstract doesn't seem to mention this, which it should. -- The draft header indicates that this document updates RFC3031, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3031, updated by this document, for RFC5378 checks: 1998-03-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 10, 2012) is 4306 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'L' is mentioned on line 704, but not defined == Missing Reference: 'E' is mentioned on line 704, but not defined == Missing Reference: 'TL4' is mentioned on line 798, but not defined -- Looks like a reference, but probably isn't: '1' on line 823 == Missing Reference: 'TL3' is mentioned on line 799, but not defined == Missing Reference: 'TL1' is mentioned on line 801, but not defined == Missing Reference: 'TL0' is mentioned on line 755, but not defined -- Looks like a reference, but probably isn't: '3' on line 825 == Missing Reference: 'AL' is mentioned on line 803, but not defined == Missing Reference: 'L4' is mentioned on line 823, but not defined == Missing Reference: 'L3' is mentioned on line 823, but not defined == Missing Reference: 'Rn' is mentioned on line 824, but not defined -- Looks like a reference, but probably isn't: '0' on line 825 == Unused Reference: 'RFC4364' is defined on line 946, but no explicit reference was found in the text == Unused Reference: 'RFC4761' is defined on line 957, but no explicit reference was found in the text == Unused Reference: 'RFC4762' is defined on line 961, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) -- Obsolete informational reference (is this intentional?): RFC 4447 (Obsoleted by RFC 8077) Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft J. Drake 4 Updates: 3031, 5036 (if approved) Juniper Networks 5 Intended status: Standards Track S. Amante 6 Expires: January 11, 2013 Level 3 Communications, LLC 7 W. Henderickx 8 Alcatel-Lucent 9 L. Yong 10 Huawei USA 11 July 10, 2012 13 The Use of Entropy Labels in MPLS Forwarding 14 draft-ietf-mpls-entropy-label-04 16 Abstract 18 Load balancing is a powerful tool for engineering traffic across a 19 network. This memo suggests ways of improving load balancing across 20 MPLS networks using the concept of "entropy labels". It defines the 21 concept, describes why entropy labels are useful, enumerates 22 properties of entropy labels that allow maximal benefit, and shows 23 how they can be signaled and used for various applications. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on January 11, 2013. 42 Copyright Notice 44 Copyright (c) 2012 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Conventions used . . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 6 62 2. Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3. Entropy Labels and Their Structure . . . . . . . . . . . . . . 8 64 4. Data Plane Processing of Entropy Labels . . . . . . . . . . . 9 65 4.1. Egress LSR . . . . . . . . . . . . . . . . . . . . . . . . 9 66 4.2. Ingress LSR . . . . . . . . . . . . . . . . . . . . . . . 10 67 4.3. Transit LSR . . . . . . . . . . . . . . . . . . . . . . . 11 68 4.4. Penultimate Hop LSR . . . . . . . . . . . . . . . . . . . 11 69 5. Signaling for Entropy Labels . . . . . . . . . . . . . . . . . 11 70 5.1. LDP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 71 5.1.1. Processing the ELC TLV . . . . . . . . . . . . . . . . 12 72 5.2. BGP Signaling . . . . . . . . . . . . . . . . . . . . . . 13 73 5.3. RSVP-TE Signaling . . . . . . . . . . . . . . . . . . . . 14 74 5.4. Multicast LSPs and Entropy Labels . . . . . . . . . . . . 14 75 6. Operations, Administration, and Maintenance (OAM) and 76 Entropy Labels . . . . . . . . . . . . . . . . . . . . . . . . 14 77 7. MPLS-TP and Entropy Labels . . . . . . . . . . . . . . . . . . 15 78 8. Entropy Labels in Various Scenarios . . . . . . . . . . . . . 15 79 8.1. LDP Tunnel . . . . . . . . . . . . . . . . . . . . . . . . 16 80 8.2. LDP Over RSVP-TE . . . . . . . . . . . . . . . . . . . . . 18 81 8.3. MPLS Applications . . . . . . . . . . . . . . . . . . . . 18 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 83 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 84 10.1. Reserved Label for ELI . . . . . . . . . . . . . . . . . . 19 85 10.2. LDP Entropy Label Capability TLV . . . . . . . . . . . . . 19 86 10.3. BGP Entropy Label Capability Attribute . . . . . . . . . . 20 87 10.4. RSVP-TE Entropy Label Capability flag . . . . . . . . . . 20 88 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 89 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 12.1. Normative References . . . . . . . . . . . . . . . . . . . 20 91 12.2. Informative References . . . . . . . . . . . . . . . . . . 21 92 Appendix A. Applicability of LDP Entropy Label Capability TLV . . 22 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 95 1. Introduction 97 Load balancing, or multi-pathing, is an attempt to balance traffic 98 across a network by allowing the traffic to use multiple paths. Load 99 balancing has several benefits: it eases capacity planning; it can 100 help absorb traffic surges by spreading them across multiple paths; 101 it allows better resilience by offering alternate paths in the event 102 of a link or node failure. 104 As providers scale their networks, they use several techniques to 105 achieve greater bandwidth between nodes. Two widely used techniques 106 are: Link Aggregation Group (LAG) and Equal-Cost Multi-Path (ECMP). 107 LAG is used to bond together several physical circuits between two 108 adjacent nodes so they appear to higher-layer protocols as a single, 109 higher bandwidth 'virtual' pipe. ECMP is used between two nodes 110 separated by one or more hops, to allow load balancing over several 111 shortest paths in the network. This is typically obtained by 112 arranging IGP metrics such that there are several equal cost paths 113 between source-destination pairs. Both of these techniques may, and 114 often do, co-exist in various parts of a given provider's network, 115 depending on various choices made by the provider. 117 A very important requirement when load balancing is that packets 118 belonging to a given 'flow' must be mapped to the same path, i.e., 119 the same exact sequence of links across the network. This is to 120 avoid jitter, latency and re-ordering issues for the flow. What 121 constitutes a flow varies considerably. A common example of a flow 122 is a TCP session. Other examples are an L2TP session corresponding 123 to a given broadband user, or traffic within an ATM virtual circuit. 125 To meet this requirement, a node uses certain fields, termed 'keys', 126 within a packet's header as input to a load balancing function 127 (typically a hash function) that selects the path for all packets in 128 a given flow. The keys chosen for the load balancing function depend 129 on the packet type; a typical set (for IP packets) is the IP source 130 and destination addresses, the protocol type, and (for TCP and UDP 131 traffic) the source and destination port numbers. An overly 132 conservative choice of fields may lead to many flows mapping to the 133 same hash value (and consequently poorer load balancing); an overly 134 aggressive choice may map a flow to multiple values, potentially 135 violating the above requirement. 137 For MPLS networks, most of the same principles (and benefits) apply. 138 However, finding useful keys in a packet for the purpose of load 139 balancing can be more of a challenge. In many cases, MPLS 140 encapsulation may require fairly deep inspection of packets to find 141 these keys at transit LSRs. 143 One way to eliminate the need for this deep inspection is to have the 144 ingress LSR of an MPLS Label Switched Path extract the appropriate 145 keys from a given packet, input them to its load balancing function, 146 and place the result in an additional label, termed the 'entropy 147 label', as part of the MPLS label stack it pushes onto that packet. 149 The packet's MPLS entire label stack can then be used by transit LSRs 150 to perform load balancing, as the entropy label introduces the right 151 level of "entropy" into the label stack. 153 There are five key reasons why this is beneficial: 155 1. at the ingress LSR, MPLS encapsulation hasn't yet occurred, so 156 deep inspection is not necessary; 158 2. the ingress LSR has more context and information about incoming 159 packets than transit LSRs; 161 3. ingress LSRs usually operate at lower bandwidths than transit 162 LSRs, allowing them to do more work per packet; 164 4. transit LSRs do not need to perform deep packet inspection and 165 can load balance effectively using only a packet's MPLS label 166 stack; and 168 5. transit LSRs, not having the full context that an ingress LSR 169 does, have the hard choice between potentially misinterpreting 170 fields in a packet as valid keys for load balancing (causing 171 packet ordering problems) or adopting a conservative approach 172 (giving rise to sub-optimal load balancing). Entropy labels 173 relieves them of making this choice. 175 This memo describes why entropy labels are needed and defines the 176 properties of entropy labels; in particular how they are generated 177 and received, and the expected behavior of transit LSRs. Finally, it 178 describes in general how signaling works and what needs to be 179 signaled, as well as specifics for the signaling of entropy labels 180 for LDP ([RFC5036]), BGP ([RFC3107]), and RSVP-TE ([RFC3209]). 182 1.1. Conventions used 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in [RFC2119]. 188 The following acronyms are used: 190 BoS: Bottom of Stack 192 CE: Customer Edge device 194 ECMP: Equal Cost Multi-Path 196 EL: Entropy Label 198 ELC: Entropy Label Capability 200 ELI: Entropy Label Indicator 202 FEC: Forwarding Equivalence Class 204 LAG: Link Aggregation Group 206 LER: Label Edge Router 208 LSR: Label Switching Router 210 PE: Provider Edge Router 212 PHP: Penultimate Hop Popping 214 TC: Traffic Class 216 TTL: Time-to-Live 218 UHP: Ultimate Hop Popping 220 VPLS: Virtual Private LAN (Local Area Network) Service 222 VPN: Virtual Private Network 224 The term ingress (or egress) LSR is used interchangeably with ingress 225 (or egress) LER. The term application throughout the text refers to 226 an MPLS application (such as a VPN or VPLS). 228 A label stack (say of three labels) is denoted by , where 229 L1 is the "outermost" label and L3 the innermost (closest to the 230 payload). Packet flows are depicted left to right, and signaling is 231 shown right to left (unless otherwise indicated). 233 The term 'label' is used both for the entire 32-bit label stack entry 234 and the 20-bit label field within a label stack entry. It should be 235 clear from the context which is meant. 237 1.2. Motivation 239 MPLS is very successful generic forwarding substrate that transports 240 several dozen types of protocols, most notably: IP, PWE3, VPLS and IP 241 VPNs. Within each type of protocol, there typically exist several 242 variants, each with a different set of load balancing keys, e.g., for 243 IP: IPv4, IPv6, IPv6 in IPv4, etc.; for PWE3: Ethernet, ATM, Frame- 244 Relay, etc. There are also several different types of Ethernet over 245 PW encapsulation, ATM over PW encapsulation, etc. as well. Finally, 246 given the popularity of MPLS, it is likely that it will continue to 247 be extended to transport new protocols. 249 Currently, each transit LSR along the path of a given LSP has to try 250 to infer the underlying protocol within an MPLS packet in order to 251 extract appropriate keys for load balancing. Unfortunately, if the 252 transit LSR is unable to infer the MPLS packet's protocol (as is 253 often the case), it will typically use the topmost (or all) MPLS 254 labels in the label stack as keys for the load balancing function. 255 The result may be an extremely inequitable distribution of traffic 256 across equal-cost paths exiting that LSR. This is because MPLS 257 labels are generally fairly coarse-grained forwarding labels that 258 typically describe a next-hop, or provide some of demultiplexing 259 and/or forwarding function, and do not describe the packet's 260 underlying protocol. 262 On the other hand, an ingress LSR (e.g., a PE router) has detailed 263 knowledge of an packet's contents, typically through a priori 264 configuration of the encapsulation(s) that are expected at a given 265 PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.). They also have more 266 flexible forwarding hardware. PE routers need this information and 267 these capabilities to: 269 a) apply the required services for the CE; 271 b) discern the packet's CoS forwarding treatment; 273 c) apply filters to forward or block traffic to/from the CE; 275 d) to forward routing/control traffic to an onboard management 276 processor; and, 278 e) load-balance the traffic on its uplinks to transit LSRs (e.g., 279 P routers). 281 By knowing the expected encapsulation types, an ingress LSR router 282 can apply a more specific set of payload parsing routines to extract 283 the keys appropriate for a given protocol. This allows for 284 significantly improved accuracy in determining the appropriate load 285 balancing behavior for each protocol. 287 If the ingress LSR were to capture the flow information so gathered 288 in a convenient form for downstream transit LSRs, transit LSRs could 289 remain completely oblivious to the contents of each MPLS packet, and 290 use only the captured flow information to perform load balancing. In 291 particular, there will be no reason to duplicate an ingress LSR's 292 complex packet/payload parsing functionality in a transit LSR. This 293 will result in less complex transit LSRs, enabling them to more 294 easily scale to higher forwarding rates, larger port density, lower 295 power consumption, etc. The idea in this memo is to capture this 296 flow information as a label, the so-called entropy label. 298 Ingress LSRs can also adapt more readily to new protocols and extract 299 the appropriate keys to use for load balancing packets of those 300 protocols. This means that deploying new protocols or services in 301 edge devices requires fewer concomitant changes in the core, 302 resulting in higher edge service velocity and at the same time more 303 stable core networks. 305 2. Approaches 307 There are two main approaches to encoding load balancing information 308 in the label stack. The first allocates multiple labels for a 309 particular Forwarding Equivalence Class (FEC). These labels are 310 equivalent in terms of forwarding semantics, but having multiple 311 labels allows flexibility in assigning labels to flows belonging to 312 the same FEC. This approach has the advantage that the label stack 313 has the same depth whether or not one uses label-based load 314 balancing; and so, consequently, there is no change to forwarding 315 operations on transit and egress LSRs. However, it has a major 316 drawback in that there is a significant increase in both signaling 317 and forwarding state. 319 The other approach encodes the load balancing information as an 320 additional label in the label stack, thus increasing the depth of the 321 label stack by one. With this approach, there is minimal change to 322 signaling state for a FEC; also, there is no change in forwarding 323 operations in transit LSRs, and no increase of forwarding state in 324 any LSR. The only purpose of the additional label is to increase the 325 entropy in the label stack, so this is called an "entropy label". 326 This memo focuses solely on this approach. 328 This latter approach uses upstream generated entropy labels, which 329 may conflict with downstream allocated application labels. There are 330 a few approaches to deal with this: 1) allocate a pair of labels for 331 each FEC, one that must have an entropy label below it, and one that 332 must not; 2) use a label (the "Entropy Label Indicator") to indicate 333 that the next label is an entropy label; and 3) allow entropy labels 334 only where there is no possible confusion. The first doubles control 335 and data plane state in the network; the last is too restrictive. 336 The approach taken here is the second. In making both the above 337 choices, the trade-off is to increase label stack depth rather than 338 control and data plane state in the network. 340 Finally, one may choose to associate ELs with MPLS tunnels (LSPs), or 341 with MPLS applications (e.g., VPNs). (What this entails is described 342 in later sections.) We take the former approach, for the following 343 reasons: 345 1. There are a small number of tunneling protocols for MPLS, but a 346 large and growing number of applications. Defining ELs on a 347 tunnel basis means simpler standards, lower development, 348 interoperability and testing efforts. 350 2. As a consequence, there will be much less churn in the network as 351 new applications (services) are defined and deployed. 353 3. Processing application labels in the data plane is more complex 354 than processing tunnel labels. Thus, it is preferable to burden 355 the latter rather than the former with EL processing. 357 4. Associating ELs with tunnels makes it simpler to deal with 358 hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs. 359 Each layer in the hierarchy can choose independently whether or 360 not they want ELs. 362 The cost of this approach is that ELIs will be mandatory; again, the 363 trade-off is the size of the label stack. To summarize, the net 364 increase in the label stack to use entropy labels is two: one 365 reserved label for the ELI, and the entropy label itself. 367 3. Entropy Labels and Their Structure 369 An entropy label (as used here) is a label: 371 1. that is not used for forwarding; 373 2. that is not signaled; and 375 3. whose only purpose in the label stack is to provide 'entropy' to 376 improve load balancing. 378 Entropy labels are generated by an ingress LSR, based entirely on 379 load balancing information. However, they MUST NOT have values in 380 the reserved label space (0-15) [IANA MPLS Label Values]. To ensure 381 that they are not used inadvertently for forwarding, entropy labels 382 SHOULD have a TTL of 0. The CoS field of an entropy label can be set 383 to any value deemed appropriate. 385 Since entropy labels are generated by an ingress LSR, an egress LSR 386 MUST be able to distinguish unambiguously between entropy labels and 387 application labels. This is accomplished by REQUIRING that the label 388 immediately preceding an entropy label (EL) in the MPLS label stack 389 be an 'entropy label indicator' (ELI), where preceding means closer 390 to the top of the label stack (farther from bottom of stack 391 indication). The ELI is a reserved label with value (TBD by IANA). 392 An ELI MUST have 'Bottom of Stack' (BoS) bit = 0 ([RFC3032]). The 393 TTL SHOULD be set to whatever value the label above it in the stack 394 has. The CoS field can be set to any value deemed appropriate; 395 typically, this will be the value in the label above the ELI in the 396 label stack. 398 Entropy labels are useful for pseudowires ([RFC4447]). [RFC6391] 399 explains how entropy labels can be used for RFC 4447-style 400 pseudowires, and thus is complementary to this memo, which focuses on 401 how entropy labels can be used for tunnels, and thus for all other 402 MPLS applications. 404 4. Data Plane Processing of Entropy Labels 406 4.1. Egress LSR 408 Suppose egress LSR Y is capable of processing entropy labels for a 409 tunnel. Y indicates this to all ingresses via signaling (see 410 Section 5). Y MUST be prepared to deal both with packets with an 411 imposed EL and those without; the ELI will distinguish these cases. 412 If a particular ingress chooses not to impose an EL, Y's processing 413 of the received label stack (which might be empty) is as if Y chose 414 not to accept ELs. 416 If an ingress X chooses to impose an EL, then Y will receive a tunnel 417 termination packet with label stack . Y recognizes TL as the label it distributed to its 419 upstreams for the tunnel, and pops it. (Note that TL may be the 420 implicit null label, in which case it doesn't appear in the label 421 stack.) Y then recognizes the ELI and pops two labels: the ELI and 422 the EL. Y then processes the remaining packet header as normal; this 423 may require further processing of tunnel termination, perhaps with 424 further ELI+EL pairs. When processing the final tunnel termination, 425 Y MAY enqueue the packet based on that tunnel TL's or ELI's TC value, 426 and MAY use the tunnel TL's or ELI's TTL to compute the TTL of the 427 remaining packet header. The EL's TTL MUST be ignored. 429 If any ELI processed by Y has BoS bit set, Y MUST discard the packet, 430 and MAY log an error. The EL's BoS bit will indicate whether or not 431 there are more labels in the stack. 433 4.2. Ingress LSR 435 If an egress LSR Y indicates via signaling that it can process ELs on 436 a particular tunnel, an ingress LSR X can choose whether or not to 437 insert ELs for packets going into that tunnel. Y MUST handle both 438 cases. 440 The steps that X performs to insert ELs are as follows: 442 1. On an incoming packet, identify the application to which the 443 packet belongs, and thereby pick the fields to input to the load 444 balancing function; call the output LB. 446 2. Determine the application label AL (if any). Push onto the 447 packet. 449 3. Based on the application, the load balancing output LB and other 450 factors, determine the egress LSR Y, the tunnel to Y, the 451 specific interface to the next hop, and thus the tunnel label TL. 452 Use LB to generate the entropy label EL. 454 4. If, for the chosen tunnel, Y has not indicated that it can 455 process ELs, push onto the packet. If Y has indicated that 456 it can process ELs for the tunnel, push onto the 457 packet. X SHOULD put the same TTL and TC fields for the ELI as 458 it does for TL. The TTL for the EL MUST be zero. The TC for the 459 EL may be any value. 461 5. X then determines whether further tunnel hierarchy is needed; if 462 so, X goes back to step 3, possibly with a new egress Y for the 463 new tunnel. Otherwise, X is done, and sends out the packet. 465 Notes: 467 a. X computes load balancing information and generates the EL based 468 on the incoming application packet, even though the signaling of 469 EL capability is associated with tunnels. 471 b. X MAY insert several entropy labels in the stack (each, of 472 course, preceded by an ELI), potentially one for each 473 hierarchical tunnel, provided that the egress for that tunnel has 474 indicated that it can process ELs for that tunnel. 476 c. X MUST NOT include an entropy label for a given tunnel unless the 477 egress LSR Y has indicated that it can process entropy labels for 478 that tunnel. 480 d. The signaling and use of entropy labels in one direction 481 (signaling from Y to X, and data path from X to Y) is completely 482 independent of the signaling and use of entropy labels in the 483 reverse direction (signaling from X to Y, and data path from Y to 484 X). 486 4.3. Transit LSR 488 Transit LSRs MAY operate with no change in forwarding behavior. The 489 following are suggestions for optimizations that improve load 490 balancing, reduce the amount of packet data processed, and/or enhance 491 backward compatibility. 493 If a transit LSR recognizes the ELI, it MAY choose to load balance 494 solely on the following label (the EL); otherwise, it SHOULD use as 495 much of the whole label stack as feasible as keys for the load 496 balancing function, with the exception that reserved labels MUST NOT 497 be used. 499 Some transit LSRs look beyond the label stack for better load 500 balancing information. This is a simple, backward compatible 501 approach in networks where some ingress LSRs impose ELs and others 502 don't. However, this is of limited incremental value if an EL is 503 indeed present, and requires more packet processing from the LSR. A 504 transit LSR MAY choose to parse the label stack for the presence of 505 the ELI, and look beyond the label stack only if it does not find it, 506 thus retaining the old behavior when needed, yet avoided unnecessary 507 work if not. 509 4.4. Penultimate Hop LSR 511 No change is needed at penultimate hop LSRs. 513 5. Signaling for Entropy Labels 515 An egress LSR Y can signal to ingress LSR(s) its ability to process 516 entropy labels (henceforth called "Entropy Label Capability" or ELC) 517 on a given tunnel. Note that Entropy Label Capability may be 518 asymmetric: if LSRs X and Y are at opposite ends of a tunnel, X may 519 be able to process entropy labels, whereas Y may not. The signaling 520 extensions below allow for this asymmetry. 522 For an illustration of signaling and forwarding with entropy labels, 523 see Section 8. 525 5.1. LDP Signaling 527 A new LDP TLV ([RFC5036]) is defined to signal an egress's ability to 528 process entropy labels. This is called the ELC TLV, and may appear 529 as an Optional Parameter of the Label Mapping Message TLV. 531 The presence of the ELC TLV in a Label Mapping Message indicates to 532 ingress LSRs that the egress LSR can process entropy labels for the 533 associated LDP tunnel. The ELC TLV has Type (TBD by IANA) and Length 534 0. 536 The structure of the ELC TLV is shown below. 538 0 1 2 3 539 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 |U|F| Type (TBD) | Length (0) | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 Figure 1: Entropy Label Capability TLV 546 where: 548 U: Unknown bit. This bit MUST be set to 1. If the ELC TLV is not 549 understood by the receiver, then it MUST be ignored. 551 F: Forward bit. This bit MUST be set be set to 1. Since the ELC 552 TLV is going to be propagated hop-by-hop, it should be forwarded 553 even by nodes that may not understand it. 555 Type: Type field. To be assigned by IANA. 557 Length: Length field. This field specifies the total length in 558 octets of the ELC TLV, and is currently defined to be 0. 560 5.1.1. Processing the ELC TLV 562 An LSR that receives a Label Mapping with the ELC TLV but does not 563 understand it MUST propagate it intact to its neighbors and MUST NOT 564 send a notification to the sender (following the meaning of the U- 565 and F-bits). 567 An LSR X may receive multiple Label Mappings for a given FEC F from 568 its neighbors. In its turn, X may advertise a Label Mapping for F to 569 its neighbors. If X understands the ELC TLV, and if any of the 570 advertisements it received for FEC F does not include the ELC TLV, X 571 MUST NOT include the ELC TLV in its own advertisements of F. If all 572 the advertised Mappings for F include the ELC TLV, then X MUST 573 advertise its Mapping for F with the ELC TLV. If any of X's 574 neighbors resends its Mapping, sends a new Mapping or Withdraws a 575 previously advertised Mapping for F, X MUST re-evaluate the status of 576 ELC for FEC F, and, if there is a change, X MUST re-advertise its 577 Mapping for F with the updated status of ELC. 579 5.2. BGP Signaling 581 When BGP [RFC4271] is used for distributing Network Layer 582 Reachability Information (NLRI) as described in, for example, 583 [RFC3107], the BGP UPDATE message may include the ELC attribute as 584 part of the Path Attributes. This is an optional, transitive BGP 585 attribute of type (to be assigned by IANA). The inclusion of this 586 attribute with an NLRI indicates that the advertising BGP router can 587 process entropy labels as an egress LSR for all routes in that NLRI. 589 A BGP speaker S that originates an UPDATE should include the ELC 590 attribute only if both of the following are true: 592 A1: S sets the BGP NEXT_HOP attribute to itself; AND 594 A2: S can process entropy labels. 596 Suppose a BGP speaker T receives an UPDATE U with the ELC attribute. 597 T has two choices. T can simply re-advertise U with the ELC 598 attribute if either of the following is true: 600 B1: T does not change the NEXT_HOP attribute; OR 602 B2: T simply swaps labels without popping the entire label stack and 603 processing the payload below. 605 An example of the use of B1 is Route Reflectors. 607 However, if T changes the NEXT_HOP attribute for U and in the data 608 plane pops the entire label stack to process the payload, T MAY 609 include an ELC attribute for UPDATE U' if both of the following are 610 true: 612 C1: T sets the NEXT_HOP attribute of U' to itself; AND 614 C2: T can process entropy labels. 616 Otherwise, T MUST remove the ELC attribute. 618 5.3. RSVP-TE Signaling 620 Entropy Label support is signaled in RSVP-TE [RFC3209] using the 621 Entropy Label Capability (ELC) flag in the Attribute Flags TLV of the 622 LSP_ATTRIBUTES object [RFC5420]. The presence of the ELC flag in a 623 Path message indicates that the ingress can process entropy labels in 624 the upstream direction; this only makes sense for a bidirectional LSP 625 and MUST be ignored otherwise. The presence of the ELC flag in a 626 Resv message indicates that the egress can process entropy labels in 627 the downstream direction. 629 The bit number for the ELC flag is to be assigned by IANA. 631 5.4. Multicast LSPs and Entropy Labels 633 Multicast LSPs [RFC4875], [RFC6388] typically do not use ECMP for 634 load balancing, as the combination of replication and multipathing 635 can lead to duplicate traffic delivery. However, these LSPs can 636 traverse bundled links [RFC4201] and LAGs. In both these cases, load 637 balancing is useful, and hence entropy labels can be of value for 638 multicast LSPs. 640 The methodology defined for entropy labels here will be used for 641 multicast LSPs; however, the details of signaling and processing ELs 642 for multicast LSPs will be specified in a companion document. 644 6. Operations, Administration, and Maintenance (OAM) and Entropy Labels 646 Generally OAM comprises a set of functions operating in the data 647 plane to allow a network operator to monitor its network 648 infrastructure and to implement mechanisms in order to enhance the 649 general behavior and the level of performance of its network, e.g., 650 the efficient and automatic detection, localization, diagnosis and 651 handling of defects. 653 Currently defined OAM mechanisms for MPLS include LSP Ping/Traceroute 654 [RFC4379] and Bidirectional Failure Detection (BFD) for MPLS 655 [RFC5884]. The latter provides connectivity verification between the 656 endpoints of an LSP, and recommends establishing a separate BFD 657 session for every path between the endpoints. 659 The LSP traceroute procedures of [RFC4379] allow an ingress LSR to 660 obtain label ranges that can be used to send packets on every path to 661 the egress LSR. It works by having ingress LSR sequentially ask the 662 transit LSRs along a particular path to a given egress LSR to return 663 a label range such that the inclusion of a label in that range in a 664 packet will cause the replying transit LSR to send that packet out 665 the egress interface for that path. The ingress provides the label 666 range returned by transit LSR N to transit LSR N + 1, which returns a 667 label range which is less than or equal in span to the range provided 668 to it. This process iterates until the penultimate transit LSR 669 replies to the ingress LSR with a label range that is acceptable to 670 it and to all LSRs along path preceding it for forwarding a packet 671 along the path. 673 However, the LSP traceroute procedures do not specify where in the 674 label stack the value from the label range is to be placed, whether 675 deep packet inspection is allowed and if so, which keys and key 676 values are to be used. 678 This memo updates LSP traceroute by specifying that the value from 679 the label range is to be placed in the entropy label. Deep packet 680 inspection is thus not necessary, although an LSR may use it, 681 provided it do so consistently, i.e., if the label range to go to a 682 given downstream LSR is computed with deep packet inspection, then 683 the data path should use the same approach and the same keys. 685 In order to have a BFD session on a given path, a value from the 686 label range for that path should be used as the EL value for BFD 687 packets sent on that path. 689 7. MPLS-TP and Entropy Labels 691 Since MPLS-TP does not use ECMP, entropy labels are not applicable to 692 an MPLS-TP deployment. 694 8. Entropy Labels in Various Scenarios 696 This section describes the use of entropy labels in various 697 scenarios. 699 In the figures below, the following conventions used to depict 700 processing between X and Y. Note that control plane signaling goes 701 right to left, whereas data plane processing goes left to right. 703 Protocols 704 Y: <--- [L, E] Y signals L to X 705 X ------------- Y 706 LS: Label stack 707 X: + X pushes 708 Y: - Y pops 710 This means that Y signals to X label L for an LDP tunnel. E can be 711 one of: 713 0: meaning egress is NOT entropy label capable, or 715 1: meaning egress is entropy label capable. 717 The line with LS: shows the label stack on the wire. Below that is 718 the operation that each LSR does in the data plane, where + means 719 push the following label stack, - means pop the following label 720 stack, L~L' means swap L with L', and * means that the operation is 721 not depicted. 723 8.1. LDP Tunnel 725 The following illustrates several simple intra-AS LDP tunnels. The 726 first diagram shows ultimate hop popping (UHP) with ingress inserting 727 an EL, the second UHP with no ELs, the third PHP with ELs, and 728 finally, PHP with no ELs, but also with an application label AL 729 (which could, for example, be a VPN label). 731 Note that, in all the cases below, the MPLS application does not 732 matter; it may be that X pushes some more labels (perhaps for a VPN 733 or VPLS) below the ones shown, and Y pops them. 735 A: <--- [TL4, 1] 736 B: <-- [TL3, 1] 737 ... 738 W: <-- [TL1, 1] 739 Y: <-- [TL0, 1] 740 X --------------- A --------- B ... W ---------- Y 741 LS: 742 X: + 743 A: TL4~TL3 744 B: TL3~TL2 745 ... 746 W: TL1~TL0 747 Y: - 749 LDP with UHP; ingress inserts ELs 751 A: <--- [TL4, 1] 752 B: <-- [TL3, 1] 753 ... 754 W: <-- [TL1, 1] 755 Y: <-- [TL0, 1] 756 X --------------- A --------- B ... W ---------- Y 757 LS: 758 X: + 759 A: TL4~TL3 760 B: TL3~TL2 761 ... 762 W: TL1~TL0 763 Y: - 765 LDP with UHP; ingress does not insert ELs 767 A: <--- [TL4, 1] 768 B: <-- [TL3, 1] 769 ... 770 W: <-- [TL1, 1] 771 Y: <-- [3, 1] 772 X --------------- A --------- B ... W ---------- Y 773 X: + 774 A: TL4~TL3 775 B: TL3~TL2 776 ... 777 W: -TL1 778 Y: - 780 LDP with PHP; ingress inserts ELs 782 A: <--- [TL4, 1] 783 B: <-- [TL3, 1] 784 ... 785 W: <-- [TL1, 1] 786 Y: <-- [3, 1] 787 VPN: <------------------------------------------ [AL] 788 X --------------- A --------- B ... W ---------- Y 789 LS: 790 X: + 791 A: TL4~TL3 792 B: TL3~TL2 793 ... 794 W: -TL1 795 Y: - 796 LDP with PHP + VPN; ingress does not insert ELs 798 A: <--- [TL4, 1] 799 B: <-- [TL3, 1] 800 ... 801 W: <-- [TL1, 1] 802 Y: <-- [3, 1] 803 VPN: <--------------------------------------------- [AL] 804 X --------------- A ------------ B ... W ---------- Y 805 LS: 806 X: + 807 A: TL4~TL3 808 B: TL3~TL2 809 ... 810 W: -TL1 811 Y: - 813 LDP with PHP + VPN; ingress inserts ELs 815 8.2. LDP Over RSVP-TE 817 The following illustrates "LDP over RSVP-TE" tunnels. X and Y are 818 the ingress and egress (respectively) of the LDP tunnel; A and W are 819 the ingress and egress of the RSVP-TE tunnel. It is assumed that 820 both the LDP and RSVP-TE tunnels have PHP. 822 LDP with ELs, RSVP-TE without ELs 823 LDP: <--- [L4, 1] <------- [L3, 1] <--- [3, 1] 824 RSVP-TE: <-- [Rn, 0] 825 <-- [3, 0] 826 X --------------- A --------- B ... W ---------- Y 827 LS: ... 828 DP: + L4~ * -L1 - 830 Figure 2: LDP over RSVP-TE Tunnels 832 8.3. MPLS Applications 834 An ingress LSR X must keep state per unicast tunnel as to whether the 835 egress for that tunnel can process entropy labels. X does not have 836 to keep state per application running over that tunnel. However, an 837 ingress PE can choose on a per-application basis whether or not to 838 insert ELs. For example, X may have an application for which it does 839 not wish to use ECMP (e.g., circuit emulation), or for which it does 840 not know which keys to use for load balancing (e.g., Appletalk over a 841 pseudowire). In either of those cases, X may choose not to insert 842 entropy labels, but may choose to insert entropy labels for an IP VPN 843 over the same tunnel. 845 9. Security Considerations 847 This document describes advertisement of the capability to support 848 receipt of entropy labels which an ingress LSR may insert in MPLS 849 packets in order to allow transit LSRs to attain better load 850 balancing across LAG and/or ECMP paths in the network. 852 This document does not introduce new security vulnerabilities to LDP, 853 BGP or RSVP-TE. Please refer to the Security Considerations section 854 of these protocols ([RFC5036], [RFC4271] and [RFC3209]) for security 855 mechanisms applicable to each. 857 Given that there is no end-user control over the values used for 858 entropy labels, there is little risk of Entropy Label forgery which 859 could cause uneven load-balancing in the network. 861 If Entropy Label Capability is not signaled from an egress PE to an 862 ingress PE, due to, for example, malicious configuration activity on 863 the egress PE, then the PE will fall back to not using entropy labels 864 for load-balancing traffic over LAG or ECMP paths which is in general 865 no worse than the behavior observed in current production networks. 866 That said, it is recommended that operators monitor changes to PE 867 configurations and, more importantly, the fairness of load 868 distribution over LAG or ECMP paths. If the fairness of load 869 distribution over a set of paths changes that could indicate a 870 misconfiguration, bug or other non-optimal behavior on their PEs and 871 they should take corrective action. 873 10. IANA Considerations 875 10.1. Reserved Label for ELI 877 IANA is requested to allocate a reserved label for the Entropy Label 878 Indicator (ELI) from the "Multiprotocol Label Switching Architecture 879 (MPLS) Label Values" Registry. 881 10.2. LDP Entropy Label Capability TLV 883 IANA is requested to allocate the next available value from the IETF 884 Consensus range in the LDP TLV Type Name Space Registry as the 885 "Entropy Label Capability TLV". 887 10.3. BGP Entropy Label Capability Attribute 889 IANA is requested to allocate the next available Path Attribute Type 890 Code from the "BGP Path Attributes" registry as the "BGP Entropy 891 Label Capability Attribute". 893 10.4. RSVP-TE Entropy Label Capability flag 895 IANA is requested to allocate a new bit from the "Attribute Flags" 896 sub-registry of the "RSVP TE Parameters" registry. 898 Bit | Name | Attribute | Attribute | RRO 899 No | | Flags Path | Flags Resv | 900 ----+--------------------------+------------+------------+----- 901 TBD Entropy Label Capability Yes Yes No 903 11. Acknowledgments 905 We wish to thank Ulrich Drafz for his contributions, as well as the 906 entire 'hash label' team for their valuable comments and discussion. 908 Sincere thanks to Nischal Sheth for his many suggestions and 909 comments, and his careful reading of the document, especially with 910 regard to data plane processing of entropy labels. 912 12. References 914 12.1. Normative References 916 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 917 Requirement Levels", BCP 14, RFC 2119, March 1997. 919 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 920 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 921 Encoding", RFC 3032, January 2001. 923 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 924 BGP-4", RFC 3107, May 2001. 926 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 927 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 928 Tunnels", RFC 3209, December 2001. 930 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 931 Specification", RFC 5036, October 2007. 933 [RFC5420] Farrel, A., Papadimitriou, D., Vasseur, JP., and A. 934 Ayyangar, "Encoding of Attributes for MPLS LSP 935 Establishment Using Resource Reservation Protocol Traffic 936 Engineering (RSVP-TE)", RFC 5420, February 2009. 938 12.2. Informative References 940 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 941 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 943 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 944 Protocol 4 (BGP-4)", RFC 4271, January 2006. 946 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 947 Networks (VPNs)", RFC 4364, February 2006. 949 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 950 Label Switched (MPLS) Data Plane Failures", RFC 4379, 951 February 2006. 953 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 954 Heron, "Pseudowire Setup and Maintenance Using the Label 955 Distribution Protocol (LDP)", RFC 4447, April 2006. 957 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 958 (VPLS) Using BGP for Auto-Discovery and Signaling", 959 RFC 4761, January 2007. 961 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 962 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 963 RFC 4762, January 2007. 965 [RFC4875] Aggarwal, R., Papadimitriou, D., and S. Yasukawa, 966 "Extensions to Resource Reservation Protocol - Traffic 967 Engineering (RSVP-TE) for Point-to-Multipoint TE Label 968 Switched Paths (LSPs)", RFC 4875, May 2007. 970 [RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow, 971 "Bidirectional Forwarding Detection (BFD) for MPLS Label 972 Switched Paths (LSPs)", RFC 5884, June 2010. 974 [RFC6388] Wijnands, IJ., Minei, I., Kompella, K., and B. Thomas, 975 "Label Distribution Protocol Extensions for Point-to- 976 Multipoint and Multipoint-to-Multipoint Label Switched 977 Paths", RFC 6388, November 2011. 979 [RFC6391] Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 980 J., and S. Amante, "Flow-Aware Transport of Pseudowires 981 over an MPLS Packet Switched Network", RFC 6391, 982 November 2011. 984 Appendix A. Applicability of LDP Entropy Label Capability TLV 986 In the case of unlabeled IPv4 (Internet) traffic, the Best Current 987 Practice is for an egress LSR to propagate eBGP learned routes within 988 a SP's Autonomous System after resetting the BGP next-hop attribute 989 to one of its Loopback IP addresses. That Loopback IP address is 990 injected into the Service Provider's IGP and, concurrently, a label 991 assigned to it via LDP. Thus, when an ingress LSR is performing a 992 forwarding lookup for a BGP destination it recursively resolves the 993 associated next-hop to a Loopback IP address and associated LDP label 994 of the egress LSR. 996 Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy Label 997 Capability TLV will typically be applied only to the FEC for the 998 Loopback IP address of the egress LSR and the egress LSR need not 999 announce an entropy label capability for the eBGP learned route. 1001 Authors' Addresses 1003 Kireeti Kompella 1004 Juniper Networks 1005 1194 N. Mathilda Ave. 1006 Sunnyvale, CA 94089 1007 US 1009 Email: kireeti@juniper.net 1011 John Drake 1012 Juniper Networks 1013 1194 N. Mathilda Ave. 1014 Sunnyvale, CA 94089 1015 US 1017 Email: jdrake@juniper.net 1018 Shane Amante 1019 Level 3 Communications, LLC 1020 1025 Eldorado Blvd 1021 Broomfield, CO 80021 1022 US 1024 Email: shane@level3.net 1026 Wim Henderickx 1027 Alcatel-Lucent 1028 Copernicuslaan 50 1029 2018 Antwerp 1030 Belgium 1032 Email: wim.henderickx@alcatel-lucent.com 1034 Lucy Yong 1035 Huawei USA 1036 5340 Legacy Dr. 1037 Plano, TX 75024 1038 US 1040 Email: lucy.yong@huawei.com