idnits 2.17.1 draft-ietf-mpls-entropy-label-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC3031, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3031, updated by this document, for RFC5378 checks: 1998-03-17) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 7, 2012) is 4366 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'L' is mentioned on line 702, but not defined == Missing Reference: 'E' is mentioned on line 702, but not defined == Missing Reference: 'TL4' is mentioned on line 779, but not defined -- Looks like a reference, but probably isn't: '1' on line 803 == Missing Reference: 'TL3' is mentioned on line 780, but not defined == Missing Reference: 'TL1' is mentioned on line 782, but not defined == Missing Reference: 'TL0' is mentioned on line 752, but not defined -- Looks like a reference, but probably isn't: '3' on line 805 == Missing Reference: 'AL' is mentioned on line 784, but not defined == Missing Reference: 'L4' is mentioned on line 803, but not defined == Missing Reference: 'L3' is mentioned on line 803, but not defined == Missing Reference: 'Rn' is mentioned on line 804, but not defined -- Looks like a reference, but probably isn't: '0' on line 805 == Unused Reference: 'RFC4364' is defined on line 929, but no explicit reference was found in the text == Unused Reference: 'RFC4761' is defined on line 940, but no explicit reference was found in the text == Unused Reference: 'RFC4762' is defined on line 944, but no explicit reference was found in the text == Unused Reference: 'RFC5586' is defined on line 956, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) -- Obsolete informational reference (is this intentional?): RFC 4379 (Obsoleted by RFC 8029) -- Obsolete informational reference (is this intentional?): RFC 4447 (Obsoleted by RFC 8077) Summary: 1 error (**), 0 flaws (~~), 15 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft J. Drake 4 Updates: 3031 (if approved) Juniper Networks 5 Intended status: Standards Track S. Amante 6 Expires: November 8, 2012 Level 3 Communications, LLC 7 W. Henderickx 8 Alcatel-Lucent 9 L. Yong 10 Huawei USA 11 May 7, 2012 13 The Use of Entropy Labels in MPLS Forwarding 14 draft-ietf-mpls-entropy-label-02 16 Abstract 18 Load balancing is a powerful tool for engineering traffic across a 19 network. This memo suggests ways of improving load balancing across 20 MPLS networks using the concept of "entropy labels". It defines the 21 concept, describes why entropy labels are useful, enumerates 22 properties of entropy labels that allow maximal benefit, and shows 23 how they can be signaled and used for various applications. 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on November 8, 2012. 42 Copyright Notice 44 Copyright (c) 2012 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Conventions used . . . . . . . . . . . . . . . . . . . . . 4 61 1.2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . 6 62 2. Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3. Entropy Labels and Their Structure . . . . . . . . . . . . . . 8 64 4. Data Plane Processing of Entropy Labels . . . . . . . . . . . 9 65 4.1. Egress LSR . . . . . . . . . . . . . . . . . . . . . . . . 9 66 4.2. Ingress LSR . . . . . . . . . . . . . . . . . . . . . . . 10 67 4.3. Transit LSR . . . . . . . . . . . . . . . . . . . . . . . 11 68 4.4. Penultimate Hop LSR . . . . . . . . . . . . . . . . . . . 11 69 5. Signaling for Entropy Labels . . . . . . . . . . . . . . . . . 11 70 5.1. LDP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 71 5.2. BGP Signaling . . . . . . . . . . . . . . . . . . . . . . 12 72 5.3. RSVP-TE Signaling . . . . . . . . . . . . . . . . . . . . 13 73 6. Operations, Administration, and Maintenance (OAM) and 74 Entropy Labels . . . . . . . . . . . . . . . . . . . . . . . . 13 75 7. MPLS-TP and Entropy Labels . . . . . . . . . . . . . . . . . . 14 76 8. Point-to-Multipoint LSPs and Entropy Labels . . . . . . . . . 15 77 9. Entropy Labels in Various Scenarios . . . . . . . . . . . . . 15 78 9.1. LDP Tunnel . . . . . . . . . . . . . . . . . . . . . . . . 16 79 9.2. LDP Over RSVP-TE . . . . . . . . . . . . . . . . . . . . . 18 80 9.3. MPLS Applications . . . . . . . . . . . . . . . . . . . . 18 81 10. Security Considerations . . . . . . . . . . . . . . . . . . . 18 82 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 83 11.1. Reserved Label for ELI . . . . . . . . . . . . . . . . . . 19 84 11.2. LDP Entropy Label Capability TLV . . . . . . . . . . . . . 19 85 11.3. BGP Entropy Label Capability Attribute . . . . . . . . . . 19 86 11.4. RSVP-TE Entropy Label Capability flag . . . . . . . . . . 19 87 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 88 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 89 13.1. Normative References . . . . . . . . . . . . . . . . . . . 20 90 13.2. Informative References . . . . . . . . . . . . . . . . . . 20 91 Appendix A. Applicability of LDP Entropy Label Capability TLV . . 21 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 94 1. Introduction 96 Load balancing, or multi-pathing, is an attempt to balance traffic 97 across a network by allowing the traffic to use multiple paths. Load 98 balancing has several benefits: it eases capacity planning; it can 99 help absorb traffic surges by spreading them across multiple paths; 100 it allows better resilience by offering alternate paths in the event 101 of a link or node failure. 103 As providers scale their networks, they use several techniques to 104 achieve greater bandwidth between nodes. Two widely used techniques 105 are: Link Aggregation Group (LAG) and Equal-Cost Multi-Path (ECMP). 106 LAG is used to bond together several physical circuits between two 107 adjacent nodes so they appear to higher-layer protocols as a single, 108 higher bandwidth 'virtual' pipe. ECMP is used between two nodes 109 separated by one or more hops, to allow load balancing over several 110 shortest paths in the network. This is typically obtained by 111 arranging IGP metrics such that there are several equal cost paths 112 between source-destination pairs. Both of these techniques may, and 113 often do, co-exist in various parts of a given provider's network, 114 depending on various choices made by the provider. 116 A very important requirement when load balancing is that packets 117 belonging to a given 'flow' must be mapped to the same path, i.e., 118 the same exact sequence of links across the network. This is to 119 avoid jitter, latency and re-ordering issues for the flow. What 120 constitutes a flow varies considerably. A common example of a flow 121 is a TCP session. Other examples are an L2TP session corresponding 122 to a given broadband user, or traffic within an ATM virtual circuit. 124 To meet this requirement, a node uses certain fields, termed 'keys', 125 within a packet's header as input to a load balancing function 126 (typically a hash function) that selects the path for all packets in 127 a given flow. The keys chosen for the load balancing function depend 128 on the packet type; a typical set (for IP packets) is the IP source 129 and destination addresses, the protocol type, and (for TCP and UDP 130 traffic) the source and destination port numbers. An overly 131 conservative choice of fields may lead to many flows mapping to the 132 same hash value (and consequently poorer load balancing); an overly 133 aggressive choice may map a flow to multiple values, potentially 134 violating the above requirement. 136 For MPLS networks, most of the same principles (and benefits) apply. 137 However, finding useful keys in a packet for the purpose of load 138 balancing can be more of a challenge. In many cases, MPLS 139 encapsulation may require fairly deep inspection of packets to find 140 these keys at transit LSRs. 142 One way to eliminate the need for this deep inspection is to have the 143 ingress LSR of an MPLS Label Switched Path extract the appropriate 144 keys from a given packet, input them to its load balancing function, 145 and place the result in an additional label, termed the 'entropy 146 label', as part of the MPLS label stack it pushes onto that packet. 148 The packet's MPLS entire label stack can then be used by transit LSRs 149 to perform load balancing, as the entropy label introduces the right 150 level of "entropy" into the label stack. 152 There are five key reasons why this is beneficial: 154 1. at the ingress LSR, MPLS encapsulation hasn't yet occurred, so 155 deep inspection is not necessary; 157 2. the ingress LSR has more context and information about incoming 158 packets than transit LSRs; 160 3. ingress LSRs usually operate at lower bandwidths than transit 161 LSRs, allowing them to do more work per packet; 163 4. transit LSRs do not need to perform deep packet inspection and 164 can load balance effectively using only a packet's MPLS label 165 stack; and 167 5. transit LSRs, not having the full context that an ingress LSR 168 does, have the hard choice between potentially misinterpreting 169 fields in a packet as valid keys for load balancing (causing 170 packet ordering problems) or adopting a conservative approach 171 (giving rise to sub-optimal load balancing). Entropy labels 172 relieves them of making this choice. 174 This memo describes why entropy labels are needed and defines the 175 properties of entropy labels; in particular how they are generated 176 and received, and the expected behavior of transit LSRs. Finally, it 177 describes in general how signaling works and what needs to be 178 signaled, as well as specifics for the signaling of entropy labels 179 for LDP ([RFC5036]), BGP ([RFC3107]), and RSVP-TE ([RFC3209]). 181 1.1. Conventions used 183 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 184 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 185 document are to be interpreted as described in [RFC2119]. 187 The following acronyms are used: 189 BoS: Bottom of Stack 191 CE: Customer Edge device 193 ECMP: Equal Cost Multi-Path 195 EL: Entropy Label 197 ELC: Entropy Label Capability 199 ELI: Entropy Label Indicator 201 FEC: Forwarding Equivalence Class 203 LAG: Link Aggregation Group 205 LER: Label Edge Router 207 LSR: Label Switching Router 209 PE: Provider Edge Router 211 PHP: Penultimate Hop Popping 213 TC: Traffic Class 215 TTL: Time-to-Live 217 UHP: Ultimate Hop Popping 219 VPLS: Virtual Private LAN (Local Area Network) Service 221 VPN: Virtual Private Network 223 The term ingress (or egress) LSR is used interchangeably with ingress 224 (or egress) LER. The term application throughout the text refers to 225 an MPLS application (such as a VPN or VPLS). 227 A label stack (say of three labels) is denoted by , where 228 L1 is the "outermost" label and L3 the innermost (closest to the 229 payload). Packet flows are depicted left to right, and signaling is 230 shown right to left (unless otherwise indicated). 232 The term 'label' is used both for the entire 32-bit label and the 20- 233 bit label field within a label. It should be clear from the context 234 which is meant. 236 1.2. Motivation 238 MPLS is very successful generic forwarding substrate that transports 239 several dozen types of protocols, most notably: IP, PWE3, VPLS and IP 240 VPNs. Within each type of protocol, there typically exist several 241 variants, each with a different set of load balancing keys, e.g., for 242 IP: IPv4, IPv6, IPv6 in IPv4, etc.; for PWE3: Ethernet, ATM, Frame- 243 Relay, etc. There are also several different types of Ethernet over 244 PW encapsulation, ATM over PW encapsulation, etc. as well. Finally, 245 given the popularity of MPLS, it is likely that it will continue to 246 be extended to transport new protocols. 248 Currently, each transit LSR along the path of a given LSP has to try 249 to infer the underlying protocol within an MPLS packet in order to 250 extract appropriate keys for load balancing. Unfortunately, if the 251 transit LSR is unable to infer the MPLS packet's protocol (as is 252 often the case), it will typically use the topmost (or all) MPLS 253 labels in the label stack as keys for the load balancing function. 254 The result may be an extremely inequitable distribution of traffic 255 across equal-cost paths exiting that LSR. This is because MPLS 256 labels are generally fairly coarse-grained forwarding labels that 257 typically describe a next-hop, or provide some of demultiplexing 258 and/or forwarding function, and do not describe the packet's 259 underlying protocol. 261 On the other hand, an ingress LSR (e.g., a PE router) has detailed 262 knowledge of an packet's contents, typically through a priori 263 configuration of the encapsulation(s) that are expected at a given 264 PE-CE interface, (e.g., IPv4, IPv6, VPLS, etc.). They also have more 265 flexible forwarding hardware. PE routers need this information and 266 these capabilities to: 268 a) apply the required services for the CE; 270 b) discern the packet's CoS forwarding treatment; 272 c) apply filters to forward or block traffic to/from the CE; 274 d) to forward routing/control traffic to an onboard management 275 processor; and, 277 e) load-balance the traffic on its uplinks to transit LSRs (e.g., 278 P routers). 280 By knowing the expected encapsulation types, an ingress LSR router 281 can apply a more specific set of payload parsing routines to extract 282 the keys appropriate for a given protocol. This allows for 283 significantly improved accuracy in determining the appropriate load 284 balancing behavior for each protocol. 286 If the ingress LSR were to capture the flow information so gathered 287 in a convenient form for downstream transit LSRs, transit LSRs could 288 remain completely oblivious to the contents of each MPLS packet, and 289 use only the captured flow information to perform load balancing. In 290 particular, there will be no reason to duplicate an ingress LSR's 291 complex packet/payload parsing functionality in a transit LSR. This 292 will result in less complex transit LSRs, enabling them to more 293 easily scale to higher forwarding rates, larger port density, lower 294 power consumption, etc. The idea in this memo is to capture this 295 flow information as a label, the so-called entropy label. 297 Ingress LSRs can also adapt more readily to new protocols and extract 298 the appropriate keys to use for load balancing packets of those 299 protocols. This means that deploying new protocols or services in 300 edge devices requires fewer concomitant changes in the core, 301 resulting in higher edge service velocity and at the same time more 302 stable core networks. 304 2. Approaches 306 There are two main approaches to encoding load balancing information 307 in the label stack. The first allocates multiple labels for a 308 particular Forwarding Equivalence Class (FEC). These labels are 309 equivalent in terms of forwarding semantics, but having multiple 310 labels allows flexibility in assigning labels to flows belonging to 311 the same FEC. This approach has the advantage that the label stack 312 has the same depth whether or not one uses label-based load 313 balancing; and so, consequently, there is no change to forwarding 314 operations on transit and egress LSRs. However, it has a major 315 drawback in that there is a significant increase in both signaling 316 and forwarding state. 318 The other approach encodes the load balancing information as an 319 additional label in the label stack, thus increasing the depth of the 320 label stack by one. With this approach, there is minimal change to 321 signaling state for a FEC; also, there is no change in forwarding 322 operations in transit LSRs, and no increase of forwarding state in 323 any LSR. The only purpose of the additional label is to increase the 324 entropy in the label stack, so this is called an "entropy label". 325 This memo focuses solely on this approach. 327 This latter approach uses upstream generated entropy labels, which 328 may conflict with downstream allocated application labels. There are 329 a few approaches to deal with this: 1) allocate a pair of labels for 330 each FEC, one that must have an entropy label below it, and one that 331 must not; 2) use a label (the "Entropy Label Indicator") to indicate 332 that the next label is an entropy label; and 3) allow entropy labels 333 only where there is no possible confusion. The first doubles control 334 and data plane state in the network; the last is too restrictive. 335 The approach taken here is the second. In making both the above 336 choices, the trade-off is to increase label stack depth rather than 337 control and data plane state in the network. 339 Finally, one may choose to associate ELs with MPLS tunnels (LSPs), or 340 with MPLS applications (e.g., VPNs). (What this entails is described 341 in later sections.) We take the former approach, for the following 342 reasons: 344 1. There are a small number of tunneling protocols for MPLS, but a 345 large and growing number of applications. Defining ELs on a 346 tunnel basis means simpler standards, lower development, 347 interoperability and testing efforts. 349 2. As a consequence, there will be much less churn in the network as 350 new applications (services) are defined and deployed. 352 3. Processing application labels in the data plane is more complex 353 than processing tunnel labels. Thus, it is preferable to burden 354 the latter rather than the former with EL processing. 356 4. Associating ELs with tunnels makes it simpler to deal with 357 hierarchy, be it LDP-over-RSVP-TE or Carrier's Carrier VPNs. 358 Each layer in the hierarchy can choose independently whether or 359 not they want ELs. 361 The cost of this approach is that ELIs will be mandatory; again, the 362 trade-off is the size of the label stack. To summarize, the net 363 increase in the label stack to use entropy labels is two: one 364 reserved label for the ELI, and the entropy label itself. 366 3. Entropy Labels and Their Structure 368 An entropy label (as used here) is a label: 370 1. that is not used for forwarding; 372 2. that is not signaled; and 374 3. whose only purpose in the label stack is to provide 'entropy' to 375 improve load balancing. 377 Entropy labels are generated by an ingress LSR, based entirely on 378 load balancing information. However, they MUST NOT have values in 379 the reserved label space (0-15) [IANA MPLS Label Values]. To ensure 380 that they are not used inadvertently for forwarding, entropy labels 381 SHOULD have a TTL of 0. The CoS field of an entropy label can be set 382 to any value deemed appropriate. 384 Since entropy labels are generated by an ingress LSR, an egress LSR 385 MUST be able to distinguish unambiguously between entropy labels and 386 application labels. This is accomplished by REQUIRING that the label 387 immediately preceding an entropy label (EL) in the MPLS label stack 388 be an 'entropy label indicator' (ELI). The ELI is a reserved label 389 with value (TBD by IANA). An ELI MUST have 'Bottom of Stack' (BoS) 390 bit = 0 ([RFC3032]). The TTL SHOULD be set to whatever value the 391 label above it in the stack has. The CoS field can be set to any 392 value deemed appropriate; typically, this will be the value in the 393 label above the ELI in the label stack. 395 Entropy labels are useful for pseudowires ([RFC4447]). 396 [I-D.ietf-pwe3-fat-pw] explains how entropy labels can be used for 397 RFC 4447-style pseudowires, and thus is complementary to this memo, 398 which focuses on how entropy labels can be used for tunnels, and thus 399 for all other MPLS applications. 401 4. Data Plane Processing of Entropy Labels 403 4.1. Egress LSR 405 Suppose egress LSR Y is capable of processing entropy labels for a 406 tunnel. Y indicates this to all ingresses via signaling (see 407 Section 5). Y MUST be prepared to deal both with packets with an 408 imposed EL and those without; the ELI will distinguish these cases. 409 If a particular ingress chooses not to impose an EL, Y's processing 410 of the received label stack (which might be empty) is as if Y chose 411 not to accept ELs. 413 If an ingress X chooses to impose an EL, then Y will receive a tunnel 414 termination packet with label stack . Y recognizes TL as the label it distributed to its 416 upstreams for the tunnel, and pops it. (Note that TL may be the 417 implicit null label, in which case it doesn't appear in the label 418 stack.) Y then recognizes the ELI and pops two labels: the ELI and 419 the EL. Y then processes the remaining packet header as normal; this 420 may require further processing of tunnel termination, perhaps with 421 further ELI+EL pairs. When processing the final tunnel termination, 422 Y MAY enqueue the packet based on that tunnel TL's or ELI's TC value, 423 and MAY use the tunnel TL's or ELI's TTL to compute the TTL of the 424 remaining packet header. The EL's TTL MUST be ignored. 426 If any ELI processed by Y has BoS bit set, Y MUST discard the packet, 427 and MAY log an error. The EL's BoS bit will indicate whether or not 428 there are more labels in the stack. 430 4.2. Ingress LSR 432 If an egress LSR Y indicates via signaling that it can process ELs on 433 a particular tunnel, an ingress LSR X can choose whether or not to 434 insert ELs for packets going into that tunnel. Y MUST handle both 435 cases. 437 The steps that X performs to insert ELs are as follows: 439 1. On an incoming packet, identify the application to which the 440 packet belongs, and thereby pick the fields to input to the load 441 balancing function; call the output LB. 443 2. Determine the application label AL (if any). Push onto the 444 packet. 446 3. Based on the application, the load balancing output LB and other 447 factors, determine the egress LSR Y, the tunnel to Y, the 448 specific interface to the next hop, and thus the tunnel label TL. 449 Use LB to generate the entropy label EL. 451 4. If, for the chosen tunnel, Y has not indicated that it can 452 process ELs, push onto the packet. If Y has indicated that 453 it can process ELs for the tunnel, push onto the 454 packet. X SHOULD put the same TTL and TC fields for the ELI as 455 it does for TL. The TTL for the EL MUST be zero. The TC for the 456 EL may be any value. 458 5. X then determines whether further tunnel hierarchy is needed; if 459 so, X goes back to step 3, possibly with a new egress Y for the 460 new tunnel. Otherwise, X is done, and sends out the packet. 462 Notes: 464 a. X computes load balancing information and generates the EL based 465 on the incoming application packet, even though the signaling of 466 EL capability is associated with tunnels. 468 b. X MAY insert several entropy labels in the stack (each, of 469 course, preceded by an ELI), potentially one for each 470 hierarchical tunnel, provided that the egress for that tunnel has 471 indicated that it can process ELs for that tunnel. 473 c. X MUST NOT include an entropy label for a given tunnel unless the 474 egress LSR Y has indicated that it can process entropy labels for 475 that tunnel. 477 d. The signaling and use of entropy labels in one direction 478 (signaling from Y to X, and data path from X to Y) is completely 479 independent of the signaling and use of entropy labels in the 480 reverse direction (signaling from X to Y, and data path from Y to 481 X). 483 4.3. Transit LSR 485 Transit LSRs MAY operate with no change in forwarding behavior. The 486 following are suggestions for optimizations that improve load 487 balancing, reduce the amount of packet data processed, and/or enhance 488 backward compatibility. 490 If a transit LSR recognizes the ELI, it MAY choose to load balance 491 solely on the following label (the EL); otherwise, it SHOULD use as 492 much of the whole label stack as feasible as keys for the load 493 balancing function, with the exception that reserved labels MUST NOT 494 be used. 496 Some transit LSRs look beyond the label stack for better load 497 balancing information. This is a simple, backward compatible 498 approach in networks where some ingress LSRs impose ELs and others 499 don't. However, this is of limited incremental value if an EL is 500 indeed present, and requires more packet processing from the LSR. A 501 transit LSR MAY choose to parse the label stack for the presence of 502 the ELI, and look beyond the label stack only if it does not find it, 503 thus retaining the old behavior when needed, yet avoided unnecessary 504 work if not. 506 4.4. Penultimate Hop LSR 508 No change is needed at penultimate hop LSRs. 510 5. Signaling for Entropy Labels 512 An egress LSR Y can signal to ingress LSR(s) its ability to process 513 entropy labels (henceforth called "Entropy Label Capability" or ELC) 514 on a given tunnel. Note that Entropy Label Capability may be 515 asymmetric: if LSRs X and Y are at opposite ends of a tunnel, X may 516 be able to process entropy labels, whereas Y may not. The signaling 517 extensions below allow for this asymmetry. 519 For an illustration of signaling and forwarding with entropy labels, 520 see Section 9. 522 5.1. LDP Signaling 524 A new LDP TLV ([RFC5036]) is defined to signal an egress's ability to 525 process entropy labels. This is called the ELC TLV, and may appear 526 as an Optional Parameter of the Label Mapping Message TLV. 528 The presence of the ELC TLV in a Label Mapping Message indicates to 529 ingress LSRs that the egress LSR can process entropy labels for the 530 associated LDP tunnel. The ELC TLV has Type (TBD by IANA) and Length 531 0. 533 The structure of the ELC TLV is shown below. 535 0 1 2 3 536 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 538 |U|F| Type (TBD) | Length (0) | 539 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 Figure 1: Entropy Label Capability TLV 543 where: 545 U: Unknown bit. This bit MUST be set to 1. If the Entropy Label 546 Capability TLV is not understood, then the TLV is not known to the 547 receiver and MUST be ignored. 549 F: Forward bit. This bit MUST be set be set to 1. Since this 550 Capability TLV is going to be propagated hop-by-hop, the TLV 551 should be forwarded even by nodes that may not understand it. 553 Type: Type field. To be assigned by IANA. 555 Length: Length field. This field specifies the total length in 556 octets of the ELC TLV, and is currently defined to be 0. 558 5.2. BGP Signaling 560 When BGP [RFC4271] is used for distributing Network Layer 561 Reachability Information (NLRI) as described in, for example, 562 [RFC3107], the BGP UPDATE message may include the ELC attribute as 563 part of the Path Attributes. This is an optional, transitive BGP 564 attribute of type (to be assigned by IANA). The inclusion of this 565 attribute with an NLRI indicates that the advertising BGP router can 566 process entropy labels as an egress LSR for all routes in that NLRI. 568 A BGP speaker S that originates an UPDATE should include the ELC 569 attribute only if both of the following are true: 571 A1: S sets the BGP NEXT_HOP attribute to itself; AND 573 A2: S can process entropy labels. 575 Suppose a BGP speaker T receives an UPDATE U with the ELC attribute. 576 T has two choices. T can simply re-advertise U with the ELC 577 attribute if either of the following is true: 579 B1: T does not change the NEXT_HOP attribute; OR 581 B2: T simply swaps labels without popping the entire label stack and 582 processing the payload below. 584 An example of the use of B1 is Route Reflectors. 586 However, if T changes the NEXT_HOP attribute for U and in the data 587 plane pops the entire label stack to process the payload, T MAY 588 include an ELC attribute for UPDATE U' if both of the following are 589 true: 591 C1: T sets the NEXT_HOP attribute of U' to itself; AND 593 C2: T can process entropy labels. 595 Otherwise, T MUST remove the ELC attribute. 597 5.3. RSVP-TE Signaling 599 Entropy Label support is signaled in RSVP-TE [RFC3209] using the 600 Entropy Label Capability (ELC) flag in the Attribute Flags TLV of the 601 LSP_ATTRIBUTES object [RFC5420]. The presence of the ELC flag in a 602 Path message indicates that the ingress can process entropy labels in 603 the upstream direction; this only makes sense for a bidirectional LSP 604 and MUST be ignored otherwise. The presence of the ELC flag in a 605 Resv message indicates that the egress can process entropy labels in 606 the downstream direction. 608 The bit number for the ELC flag is to be assigned by IANA. 610 6. Operations, Administration, and Maintenance (OAM) and Entropy Labels 612 Generally OAM comprises a set of functions operating in the data 613 plane to allow a network operator to monitor its network 614 infrastructure and to implement mechanisms in order to enhance the 615 general behavior and the level of performance of its network, e.g., 616 the efficient and automatic detection, localization, diagnosis and 617 handling of defects. 619 Currently defined OAM mechanisms for MPLS include LSP Ping/Traceroute 620 [RFC4379] and Bidirectional Failure Detection (BFD) for MPLS 621 [RFC5884]. The latter provides connectivity verification between the 622 endpoints of an LSP, and recommends establishing a separate BFD 623 session for every path between the endpoints. 625 The LSP traceroute procedures of [RFC4379] allow an ingress LSR to 626 obtain label ranges that can be used to send packets on every path to 627 the egress LSR. It works by having ingress LSR sequentially ask the 628 transit LSRs along a particular path to a given egress LSR to return 629 a label range such that the inclusion of a label in that range in a 630 packet will cause the replying transit LSR to send that packet out 631 the egress interface for that path. The ingress provides the label 632 range returned by transit LSR N to transit LSR N + 1, which returns a 633 label range which is less than or equal in span to the range provided 634 to it. This process iterates until the penultimate transit LSR 635 replies to the ingress LSR with a label range that is acceptable to 636 it and to all LSRs along path preceding it for forwarding a packet 637 along the path. 639 However, the LSP traceroute procedures do not specify where in the 640 label stack the value from the label range is to be placed, whether 641 deep packet inspection is allowed and if so, which keys and key 642 values are to be used. 644 This memo updates LSP traceroute by specifying that the value from 645 the label range is to be placed in the entropy label. Deep packet 646 inspection is thus not necessary, although an LSR may use it, 647 provided it do so consistently, i.e., if the label range to go to a 648 given downstream LSR is computed with deep packet inspection, then 649 the data path should use the same approach and the same keys. 651 In order to have a BFD session on a given path, a value from the 652 label range for that path should be used as the EL value for BFD 653 packets sent on that path. 655 7. MPLS-TP and Entropy Labels 657 Since MPLS-TP does not use ECMP, entropy labels are not applicable to 658 an MPLS-TP deployment. 660 8. Point-to-Multipoint LSPs and Entropy Labels 662 Point-to-Multipoint (P2MP) LSPs [RFC4875] typically do not use ECMP 663 for load balancing, as the combination of replication and 664 multipathing can lead to duplicate traffic delivery. However, P2MP 665 LSPs can traverse bundled links [RFC4201] and LAGs. In both these 666 cases, load balancing is useful, and hence entropy labels can be of 667 value for P2MP LSPs. 669 There is a potential complication with the use of entropy labels in 670 the context of P2MP LSPs, a consequence of the fact that the entire 671 label stack below the P2MP label must be the same for all egress 672 LSRs. This is that all egress LSRs must be willing to receive 673 entropy labels; if even one egress LSR is not willing, then entropy 674 labels MUST NOT be used for this P2MP LSP. 676 In this regard, the ingress LSR MUST keep track of the ability of 677 each egress LSR to process entropy labels, especially since the set 678 of egress LSRs of a given P2MP LSP may change over time. Whenever an 679 existing egress LSR leaves, or a new egress LSR joins the P2MP LSP, 680 the ingress MUST re-evaluate whether or not to include entropy labels 681 for the P2MP LSP. 683 In some cases, it may be feasible to deploy two P2MP LSPs, one to ELC 684 egress LSRs, and the other to the remaining non-ELC egress LSRs. 685 However, this requires more state in the network, more bandwidth, and 686 more operational overhead (tracking ELC LSRs, and provisioning P2MP 687 LSPs accordingly). Alternatively, an ingress LSR may choose to 688 signal two separate P2MP LSPs, one to ELC egresses, the other to non- 689 ELC egresses, trading off implementation complexity for operational 690 complexity. 692 9. Entropy Labels in Various Scenarios 694 This section describes the use of entropy labels in various 695 scenarios. 697 In the figures below, the following conventions used to depict 698 processing between X and Y. Note that control plane signaling goes 699 right to left, whereas data plane processing goes left to right. 701 Protocols 702 Y: <--- [L, E] Y signals L to X 703 X ------------- Y 704 LS: Label stack 705 X: + X pushes 706 Y: - Y pops 707 This means that Y signals to X label L for an LDP tunnel. E can be 708 one of: 710 0: meaning egress is NOT entropy label capable, or 712 1: meaning egress is entropy label capable. 714 The line with LS: shows the label stack on the wire. Below that is 715 the operation that each LSR does in the data plane, where + means 716 push the following label stack, - means pop the following label 717 stack, L~L' means swap L with L', and * means that the operation is 718 not depicted. 720 9.1. LDP Tunnel 722 The following illustrates several simple intra-AS LDP tunnels. The 723 first diagram shows ultimate hop popping (UHP) with ingress inserting 724 an EL, the second UHP with no ELs, the third PHP with ELs, and 725 finally, PHP with no ELs, but also with an application label AL 726 (which could, for example, be a VPN label). 728 Note that, in all the cases below, the MPLS application does not 729 matter; it may be that X pushes some more labels (perhaps for a VPN 730 or VPLS) below the ones shown, and Y pops them. 732 A: <--- [TL4, 1] 733 B: <-- [TL3, 1] 734 ... 735 W: <-- [TL1, 1] 736 Y: <-- [TL0, 1] 737 X --------------- A --------- B ... W ---------- Y 738 LS: 739 X: + 740 A: TL4~TL3 741 B: TL3~TL2 742 ... 743 W: TL1~TL0 744 Y: - 746 LDP with UHP; ingress inserts ELs 748 A: <--- [TL4, 1] 749 B: <-- [TL3, 1] 750 ... 751 W: <-- [TL1, 1] 752 Y: <-- [TL0, 1] 753 X --------------- A --------- B ... W ---------- Y 754 LS: 755 X: + 756 A: TL4~TL3 757 B: TL3~TL2 758 ... 759 W: TL1~TL0 760 Y: - 762 LDP with UHP; ingress does not insert ELs 764 A: <--- [TL4, 1] 765 B: <-- [TL3, 1] 766 ... 767 W: <-- [TL1, 1] 768 Y: <-- [3, 1] 769 X --------------- A --------- B ... W ---------- Y 770 X: + 771 A: TL4~TL3 772 B: TL3~TL2 773 ... 774 W: -TL1 775 Y: - 777 LDP with PHP; ingress inserts ELs 779 A: <--- [TL4, 1] 780 B: <-- [TL3, 1] 781 ... 782 W: <-- [TL1, 1] 783 Y: <-- [3, 1] 784 VPN: <------------------------------------------ [AL] 785 X --------------- A --------- B ... W ---------- Y 786 LS: 787 X: + 788 A: TL4~TL3 789 B: TL3~TL2 790 ... 791 W: -TL1 792 Y: - 793 LDP with PHP + VPN; ingress does not insert ELs 795 9.2. LDP Over RSVP-TE 797 The following illustrates "LDP over RSVP-TE" tunnels. X and Y are 798 the ingress and egress (respectively) of the LDP tunnel; A and W are 799 the ingress and egress of the RSVP-TE tunnel. It is assumed that 800 both the LDP and RSVP-TE tunnels have PHP. 802 LDP with ELs, RSVP-TE without ELs 803 LDP: <--- [L4, 1] <------- [L3, 1] <--- [3, 1] 804 RSVP-TE: <-- [Rn, 0] 805 <-- [3, 0] 806 X --------------- A --------- B ... W ---------- Y 807 LS: ... 808 DP: + L4~ * -L1 - 810 Figure 2: LDP over RSVP-TE Tunnels 812 9.3. MPLS Applications 814 An ingress LSR X must keep state per unicast tunnel as to whether the 815 egress for that tunnel can process entropy labels. X does not have 816 to keep state per application running over that tunnel. However, an 817 ingress PE can choose on a per-application basis whether or not to 818 insert ELs. For example, X may have an application for which it does 819 not wish to use ECMP (e.g., circuit emulation), or for which it does 820 not know which keys to use for load balancing (e.g., Appletalk over a 821 pseudowire). In either of those cases, X may choose not to insert 822 entropy labels, but may choose to insert entropy labels for an IP VPN 823 over the same tunnel. 825 10. Security Considerations 827 This document describes advertisement of the capability to support 828 receipt of entropy labels which an ingress LSR may insert in MPLS 829 packets in order to allow transit LSRs to attain better load 830 balancing across LAG and/or ECMP paths in the network. 832 This document does not introduce new security vulnerabilities to LDP, 833 BGP or RSVP-TE. Please refer to the Security Considerations section 834 of these protocols ([RFC5036], [RFC4271] and [RFC3209]) for security 835 mechanisms applicable to each. 837 Given that there is no end-user control over the values used for 838 entropy labels, there is little risk of Entropy Label forgery which 839 could cause uneven load-balancing in the network. 841 If Entropy Label Capability is not signaled from an egress PE to an 842 ingress PE, due to, for example, malicious configuration activity on 843 the egress PE, then the PE will fall back to not using entropy labels 844 for load-balancing traffic over LAG or ECMP paths which is in general 845 no worse than the behavior observed in current production networks. 846 That said, it is recommended that operators monitor changes to PE 847 configurations and, more importantly, the fairness of load 848 distribution over LAG or ECMP paths. If the fairness of load 849 distribution over a set of paths changes that could indicate a 850 misconfiguration, bug or other non-optimal behavior on their PEs and 851 they should take corrective action. 853 11. IANA Considerations 855 11.1. Reserved Label for ELI 857 IANA is requested to allocate a reserved label for the Entropy Label 858 Indicator (ELI) from the "Multiprotocol Label Switching Architecture 859 (MPLS) Label Values" Registry. 861 11.2. LDP Entropy Label Capability TLV 863 IANA is requested to allocate the next available value from the IETF 864 Consensus range in the LDP TLV Type Name Space Registry as the 865 "Entropy Label Capability TLV". 867 11.3. BGP Entropy Label Capability Attribute 869 IANA is requested to allocate the next available Path Attribute Type 870 Code from the "BGP Path Attributes" registry as the "BGP Entropy 871 Label Capability Attribute". 873 11.4. RSVP-TE Entropy Label Capability flag 875 IANA is requested to allocate a new bit from the "Attribute Flags" 876 sub-registry of the "RSVP TE Parameters" registry. 878 Bit | Name | Attribute | Attribute | RRO 879 No | | Flags Path | Flags Resv | 880 ----+--------------------------+------------+------------+----- 881 TBD Entropy Label Capability Yes Yes No 883 12. Acknowledgments 885 We wish to thank Ulrich Drafz for his contributions, as well as the 886 entire 'hash label' team for their valuable comments and discussion. 888 Sincere thanks to Nischal Sheth for his many suggestions and 889 comments, and his careful reading of the document, especially with 890 regard to data plane processing of entropy labels. 892 13. References 894 13.1. Normative References 896 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 897 Requirement Levels", BCP 14, RFC 2119, March 1997. 899 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 900 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 901 Encoding", RFC 3032, January 2001. 903 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 904 BGP-4", RFC 3107, May 2001. 906 [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., 907 and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP 908 Tunnels", RFC 3209, December 2001. 910 [RFC5420] Farrel, A., Papadimitriou, D., Vasseur, JP., and A. 911 Ayyangarps, "Encoding of Attributes for MPLS LSP 912 Establishment Using Resource Reservation Protocol Traffic 913 Engineering (RSVP-TE)", RFC 5420, February 2009. 915 13.2. Informative References 917 [I-D.ietf-pwe3-fat-pw] 918 Bryant, S., Filsfils, C., Drafz, U., Kompella, V., Regan, 919 J., and S. Amante, "Flow Aware Transport of Pseudowires 920 over an MPLS Packet Switched Network", 921 draft-ietf-pwe3-fat-pw-07 (work in progress), July 2011. 923 [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling 924 in MPLS Traffic Engineering (TE)", RFC 4201, October 2005. 926 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 927 Protocol 4 (BGP-4)", RFC 4271, January 2006. 929 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 930 Networks (VPNs)", RFC 4364, February 2006. 932 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 933 Label Switched (MPLS) Data Plane Failures", RFC 4379, 934 February 2006. 936 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 937 Heron, "Pseudowire Setup and Maintenance Using the Label 938 Distribution Protocol (LDP)", RFC 4447, April 2006. 940 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 941 (VPLS) Using BGP for Auto-Discovery and Signaling", 942 RFC 4761, January 2007. 944 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 945 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 946 RFC 4762, January 2007. 948 [RFC4875] Aggarwal, R., Papadimitriou, D., and S. Yasukawa, 949 "Extensions to Resource Reservation Protocol - Traffic 950 Engineering (RSVP-TE) for Point-to-Multipoint TE Label 951 Switched Paths (LSPs)", RFC 4875, May 2007. 953 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 954 Specification", RFC 5036, October 2007. 956 [RFC5586] Bocci, M., Vigoureux, M., and S. Bryant, "MPLS Generic 957 Associated Channel", RFC 5586, June 2009. 959 [RFC5884] Aggarwal, R., Kompella, K., Nadeau, T., and G. Swallow, 960 "Bidirectional Forwarding Detection (BFD) for MPLS Label 961 Switched Paths (LSPs)", RFC 5884, June 2010. 963 Appendix A. Applicability of LDP Entropy Label Capability TLV 965 In the case of unlabeled IPv4 (Internet) traffic, the Best Current 966 Practice is for an egress LSR to propagate eBGP learned routes within 967 a SP's Autonomous System after resetting the BGP next-hop attribute 968 to one of its Loopback IP addresses. That Loopback IP address is 969 injected into the Service Provider's IGP and, concurrently, a label 970 assigned to it via LDP. Thus, when an ingress LSR is performing a 971 forwarding lookup for a BGP destination it recursively resolves the 972 associated next-hop to a Loopback IP address and associated LDP label 973 of the egress LSR. 975 Thus, in the context of unlabeled IPv4 traffic, the LDP Entropy Label 976 Capability TLV will typically be applied only to the FEC for the 977 Loopback IP address of the egress LSR and the egress LSR need not 978 announce an entropy label capability for the eBGP learned route. 980 Authors' Addresses 982 Kireeti Kompella 983 Juniper Networks 984 1194 N. Mathilda Ave. 985 Sunnyvale, CA 94089 986 US 988 Email: kireeti@juniper.net 990 John Drake 991 Juniper Networks 992 1194 N. Mathilda Ave. 993 Sunnyvale, CA 94089 994 US 996 Email: jdrake@juniper.net 998 Shane Amante 999 Level 3 Communications, LLC 1000 1025 Eldorado Blvd 1001 Broomfield, CO 80021 1002 US 1004 Email: shane@level3.net 1006 Wim Henderickx 1007 Alcatel-Lucent 1008 Copernicuslaan 50 1009 2018 Antwerp 1010 Belgium 1012 Email: wim.henderickx@alcatel-lucent.com 1013 Lucy Yong 1014 Huawei USA 1015 5340 Legacy Dr. 1016 Plano, TX 75024 1017 US 1019 Email: lucy.yong@huawei.com