idnits 2.17.1 draft-ietf-mpls-spring-entropy-label-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 23, 2018) is 2137 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1000' on line 400 -- Looks like a reference, but probably isn't: '1999' on line 400 ** Downref: Normative reference to an Informational RFC: RFC 7855 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-mpls-13 == Outdated reference: A later version (-13) exists of draft-ietf-isis-mpls-elc-03 == Outdated reference: A later version (-15) exists of draft-ietf-ospf-mpls-elc-05 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Kini 3 Internet-Draft 4 Intended status: Standards Track K. Kompella 5 Expires: November 24, 2018 Juniper 6 S. Sivabalan 7 Cisco 8 S. Litkowski 9 Orange 10 R. Shakir 11 Google 12 J. Tantsura 13 May 23, 2018 15 Entropy label for SPRING tunnels 16 draft-ietf-mpls-spring-entropy-label-11 18 Abstract 20 Segment Routing (SR) leverages the source routing paradigm. A node 21 steers a packet through an ordered list of instructions, called 22 segments. Segment Routing can be applied to the Multi Protocol Label 23 Switching (MPLS) data plane. Entropy label (EL) is a technique used 24 in MPLS to improve load-balancing. This document examines and 25 describes how ELs are to be applied to Segment Routing MPLS. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at https://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on November 24, 2018. 44 Copyright Notice 46 Copyright (c) 2018 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (https://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 63 2. Abbreviations and Terminology . . . . . . . . . . . . . . . . 4 64 3. Use-case requiring multipath load-balancing . . . . . . . . . 5 65 4. Entropy Readable Label Depth . . . . . . . . . . . . . . . . 6 66 5. Maximum SID Depth . . . . . . . . . . . . . . . . . . . . . . 7 67 6. LSP stitching using the binding SID . . . . . . . . . . . . . 9 68 7. Insertion of entropy labels for SPRING path . . . . . . . . . 10 69 7.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 70 7.1.1. Example 1 where the ingress node has a sufficient MSD 11 71 7.1.2. Example 2 where the ingress node has not a sufficient 72 MSD . . . . . . . . . . . . . . . . . . . . . . . . . 12 73 7.2. Considerations for the placement of entropy labels . . . 13 74 7.2.1. ERLD value . . . . . . . . . . . . . . . . . . . . . 14 75 7.2.2. Segment type . . . . . . . . . . . . . . . . . . . . 14 76 7.2.2.1. Node-SID . . . . . . . . . . . . . . . . . . . . 15 77 7.2.2.2. Adjacency-set SID . . . . . . . . . . . . . . . . 15 78 7.2.2.3. Adjacency-SID representing a single IP link . . . 15 79 7.2.2.4. Adjacency-SID representing a single link within a 80 L2 bundle . . . . . . . . . . . . . . . . . . . . 16 81 7.2.2.5. Adjacency-SID representing a L2 bundle . . . . . 16 82 7.2.3. Maximizing number of LSRs that will load-balance . . 16 83 7.2.4. Preference for a part of the path . . . . . . . . . . 16 84 7.2.5. Combining criteria . . . . . . . . . . . . . . . . . 17 85 8. A simple example algorithm . . . . . . . . . . . . . . . . . 17 86 9. Deployment Considerations . . . . . . . . . . . . . . . . . . 18 87 10. Options considered . . . . . . . . . . . . . . . . . . . . . 18 88 10.1. Single EL at the bottom of the stack . . . . . . . . . . 18 89 10.2. An EL per segment in the stack . . . . . . . . . . . . . 19 90 10.3. A re-usable EL for a stack of tunnels . . . . . . . . . 19 91 10.4. EL at top of stack . . . . . . . . . . . . . . . . . . . 20 92 10.5. ELs at readable label stack depths . . . . . . . . . . . 20 93 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 94 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 95 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 96 14. Security Considerations . . . . . . . . . . . . . . . . . . . 22 97 15. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 98 15.1. Normative References . . . . . . . . . . . . . . . . . . 22 99 15.2. Informative References . . . . . . . . . . . . . . . . . 22 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 102 1. Introduction 104 Segment Routing [I-D.ietf-spring-segment-routing] is based on source 105 routed tunnels to steer a packet along a particular path. This path 106 is encoded as an ordered list of segments. When applied to the MPLS 107 dataplane [I-D.ietf-spring-segment-routing-mpls], each segment is an 108 LSP with an associated MPLS label value. Hence, label stacking is 109 used to represent the ordered list of segments and the label stack 110 associated with an SR tunnel can be seen as nested LSPs (LSP 111 hierarchy) in the MPLS architecture. 113 Using label stacking to encode the list of segment has implications 114 on the label stack depth. 116 Traffic load-balancing over ECMP (Equal Cost MultiPath) or LAGs (Link 117 Aggregation Groups) is usually based on a hashing function. The 118 local node who performs the load-balancing is required to read some 119 header fields in the incoming packets and then computes a hash based 120 on these fields. The result of the hash is finally mapped to a list 121 of outgoing nexthops. The hashing technique is required to perfom a 122 per-flow load-balancing and thus prevent packet disordering. For IP 123 traffic, the usual fields that are looked up are the source address, 124 the destination address, the protocol type, and, if the upper layer 125 is TCP or UDP, the source port and destination port can be added as 126 well in the hash. 128 The MPLS architecture brings some challenges on the load-balancing as 129 an LSR (Label Switch Router) should be able to look at header fields 130 that are beyond the MPLS label stack. An LSR must perform a deeper 131 inspection compared to an ingress router which could be challenging 132 for some hardware. Entropy label (EL) [RFC6790] is a technique used 133 in the MPLS data plane to provide entropy for load-balancing. The 134 idea behind entropy label is that the ingress router computes a hash 135 based on several fields from a given packet and place the result in 136 an additional label, named "entropy label". Then, this entropy label 137 can be used as part of the hash keys used by an LSR. Using the 138 entropy label in the hash keys reduces the need of a deep packet 139 inspection in the LSR while keeping a good level of entropy in the 140 load balancing. When entropy label is used, the keys used in the 141 hashing functions are still a local configuration matter and an LSR 142 may use solely the entropy label or a combination of multiple fields 143 from the incoming packet. 145 When using LSP hierarchies, there are implications on how [RFC6790] 146 should be applied. The current document addresses the case where a 147 hierarchy is created at a single LSR as required by Segment Routing. 149 A use-case requiring load-balancing with SR is given in Section 3. A 150 recommended solution is described in Section 7 keeping in 151 consideration the limitations of implementations when applying 152 [RFC6790] to deeper label stacks. Options that were considered to 153 arrive at the recommended solution are documented for historical 154 purposes in Section 10. 156 1.1. Requirements Language 158 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 159 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 160 "OPTIONAL" in this document are to be interpreted as described in BCP 161 14 [RFC2119] [RFC8174] when, and only when, they appear in all 162 capitals, as shown here. 164 2. Abbreviations and Terminology 166 EL - Entropy Label 168 ELI - Entropy Label Identifier 170 ELC - Entropy Label Capability 172 ERLD - Entropy Readable Label Depth 174 SR - Segment Routing 176 ECMP - Equal Cost Multi Path 178 LSR - Label Switch Router 180 MPLS - Multiprotocol Label Switching 182 MSD - Maximum SID Depth 184 SID - Segment Identifier 186 RLD - Readable Label Depth 188 OAM - Operation, Administration and Maintenance 190 3. Use-case requiring multipath load-balancing 192 +------+ 193 | | 194 +-------| P3 |-----+ 195 | +-----| |---+ | 196 L3| |L4 +------+ L1| |L2 +----+ 197 | | | | +--| P4 |--+ 198 +-----+ +-----+ +-----+ | +----+ | +-----+ 199 | S |-----| P1 |------------| P2 |--+ +--| D | 200 | | | | | |--+ +--| | 201 +-----+ +-----+ +-----+ | +----+ | +-----+ 202 +--| P5 |--+ 203 +----+ 204 S=Source LSR, D=Destination LSR, P1,P2,P3,P4,P5=Transit LSRs, 205 L1,L2,L3,L4=Links 207 Figure 1: Traffic engineering use-case 209 Traffic-engineering is one of the applications of MPLS and is also a 210 requirement for source routed tunnels with label stacks [RFC7855]. 211 Consider the topology shown in Figure 1. The LSR S requires data to 212 be sent to LSR D along a traffic-engineered path that goes over the 213 link L1. Good load-balancing is also required across equal cost 214 paths (including parallel links). To engineer traffic along a path 215 that takes link L1, the label stack that LSR S creates consists of a 216 label to the node SID of LSR P3, stacked over the label for the 217 adjacency SID of link L1 and that in turn is stacked over the label 218 to the node SID of LSR D. For simplicity lets assume that all LSRs 219 use the same label space (SRGB) for source routed label stacks. Let 220 L_N-Px denote the label to be used to reach the node SID of LSR Px. 221 Let L_A-Ln denote the label used for the adjacency SID for link Ln. 222 The LSR S must use the label stack for 223 traffic-engineering. However to achieve good load-balancing over the 224 equal cost paths P2-P4-D, P2-P5-D and the parallel links L3, L4, a 225 mechanism such as Entropy labels [RFC6790] should be adapted for 226 source routed label stacks. Indeed, the SPRING architecture with the 227 MPLS dataplane ([I-D.ietf-spring-segment-routing-mpls]) uses nested 228 MPLS LSPs composing the source routed label stacks. 230 An MPLS node may have limitations in the number of labels it can 231 push. It may also have a limitation in the number of labels it can 232 inspect when looking for hash keys during load-balancing. While 233 entropy label is normally inserted at the bottom of the transport 234 tunnel, this may prevent an LSR to take into account the EL in its 235 load-balancing function if the EL is too deep in the stack. In a 236 segment routing environment, it is important to define the 237 considerations that needs to be taken into account when inserting EL. 238 Multiple ways to apply entropy labels were considered and are 239 documented in Section 10 along with their trade-offs. A recommended 240 solution is described in Section 7. 242 4. Entropy Readable Label Depth 244 The Entropy Readable Label Depth (ERLD) is defined as the number of 245 labels a router can both: 247 a. Read in an MPLS packet received on its incoming interface(s) 248 (starting from the top of the stack). 250 b. Use in its load-balancing function. 252 The ERLD means that the router will perform load-balancing using the 253 EL label if the EL is placed within the ERLD first labels. 255 A router capable of reading N labels but not using an EL located 256 within those N labels MUST consider its ERLD to be 0. In a 257 distributed switching architecture, each linecard may have a 258 different capability in terms of ERLD. For simplicity, an 259 implementation MAY use the minimum ERLD between each linecard as the 260 ERLD value for the system. 262 Examples: 264 | Payload | 265 +----------+ 266 | Payload | | EL | P7 267 +----------+ +----------+ 268 | Payload | | EL | | ELI | 269 +----------+ +----------+ +----------+ 270 | Payload | | EL | | ELI | | Label 50 | 271 +----------+ +----------+ +----------+ +----------+ 272 | Payload | | EL | | ELI | | Label 40 | | Label 40 | 273 +----------+ +----------+ +----------+ +----------+ +----------+ 274 | EL | | ELI | | Label 30 | | Label 30 | | Label 30 | 275 +----------+ +----------+ +----------+ +----------+ +----------+ 276 | ELI | | Label 20 | | Label 20 | | Label 20 | | Label 20 | 277 +----------+ +----------+ +----------+ +----------+ +----------+ 278 | Label 16 | | Label 16 | | Label 16 | | Label 16 | | Label 16 | P1 279 +----------+ +----------+ +----------+ +----------+ +----------+ 280 Packet 1 Packet 2 Packet 3 Packet 4 Packet 5 282 Figure 2: Label stacks with ELI/EL 284 In the figure 2, we consider the displayed packets received on a 285 router interface. We consider also a single ERLD value for the 286 router. 288 o If the router has an ERLD of 3, it will be able to load-balance 289 Packet 1 displayed in Figure 2 using the EL as part of the load- 290 balancing keys. The ERLD value of 3 means that the router can 291 read and take into account the entropy label for load-balancing if 292 it is placed between position 1 (top) and position 3. 294 o If the router has an ERLD of 5, it will be able to load-balance 295 Packets 1 to 3 in Figure 2 using the EL as part of the load- 296 balancing keys. Packets 4 and 5 have the EL placed at a position 297 greater than 5, so the router is not able to read it and use as 298 part of the load-balancing keys. 300 o If the router has an ERLD of 10, it will be able to load-balance 301 all the packets displayed in Figure 2 using the EL as part of the 302 load-balancing keys. 304 To allow an efficient load-balancing based on entropy labels, a 305 router running SPRING SHOULD advertise its ERLD (or ERLDs), so all 306 the other SPRING routers in the network are aware of its capability. 307 How this advertisement is done is outside the scope of this document. 309 To advertise an ERLD value, a SPRING router: 311 o MUST be entropy label capable and, as a consequence, MUST apply 312 the dataplane procedures defined in [RFC6790]. 314 o MUST be able to read an ELI/EL which is located within its ERLD 315 value. 317 o MUST take into account this EL in its load-balancing function. 319 5. Maximum SID Depth 321 The Maximum SID Depth defines the maximum number of labels that a 322 particular node can impose on a packet. This includes any kind of 323 labels (service, entropy, transport...). In an MPLS network, the MSD 324 is a limit of the Ingress LSR (I-LSR) or any stitching node that 325 would perform an imposition of additional labels on an existing label 326 stack. 328 Depending of the number of MPLS operations (POP, SWAP...) to be 329 performed before the PUSH, the MSD may vary due to the hardware or 330 software limitations. As for the ERLD, there may also be different 331 MSD limits based on the linecard type used in a distributed switching 332 system. 334 When an external controller is used to program a label stack on a 335 particular node, this node MAY advertise its MSD value or a subset of 336 its MSD value to the controller. How this advertisement is done is 337 outside the scope of this document. As the controller does not have 338 the knowledge of the entire label stack to be pushed by the node, the 339 node may advertise an MSD value which is lower than its actual limit. 340 This gives the ability for the controller to program a label stack up 341 to the advertised MSD value while leaving room for the local node to 342 add more labels (e.g., service, entropy, transport...) without 343 reaching the hardware/software limit. 345 P7 ---- P8 ---- P9 346 / \ 347 PE1 --- P1 --- P2 --- P3 --- P4 --- P5 --- P6 --- PE2 348 | \ | 349 ----> P10 \ | 350 IP Pkt | \ | 351 P11 --- P12 --- P13 352 100 10000 354 Figure 3 356 In the figure 3, an IP packet comes in the MPLS network at PE1. All 357 metrics are considered equal to 1 except P12-P13 which is 10000 and 358 P11-P12 which is 100. PE1 wants to steer the traffic using a SPRING 359 path to PE2 along 360 PE1->P1->P7->P8->P9->P4->P5->P10->P11->P12->P13->PE2. By using 361 adjacency SIDs only, PE1 (acting as an I-LSR) will be required to 362 push 10 labels on the IP packet received and thus requires an MSD of 363 10. If the IP packet should be carried over an MPLS service like a 364 regular layer 3 VPN, an additional service label should be imposed, 365 requiring an MSD of 11 for PE1. In addition, if PE1 wants to insert 366 an ELI/EL for load-balancing purpose, PE1 will need to push 13 labels 367 on the IP packet requiring an MSD of 13. 369 In the SPRING architecture, Node SIDs or Binding SIDs can be used to 370 reduce the label stack size. As an example, to steer the traffic on 371 the same path as before, PE1 may be able to use the following label 372 stack: . In this example we 373 consider a combination of Node SIDs and a Binding SID advertised by 374 P5 that will stitch the traffic along the path P10->P11->P12->P13. 375 The instruction associated with the binding SID at P5 is thus to swap 376 Binding_P5 to Adj_P12-P13 and then push . P5 377 acts as a stitching node that pushes additional labels on an existing 378 label stack, P5's MSD needs also to be taken into account and may 379 limit the number of labels that could be imposed. 381 6. LSP stitching using the binding SID 383 The binding SID allows binding a segment identifier to an existing 384 LSP. As examples, the binding SID can represent an RSVP-TE tunnel, 385 an LDP path (through the mapping server advertisement), or a SPRING 386 path. Each LSP associated with a binding SID has its own entropy 387 label capability. 389 In the figure 3, we consider that: 391 o P6, PE2, P10, P11, P12, P13 are pure LDP routers. 393 o PE1, P1, P2, P3, P4, P7, P8, P9 are pure SPRING routers. 395 o P5 is running SPRING and LDP. 397 o P5 acts as a mapping server and advertises Prefix SIDs for the LDP 398 FECs: an index value of 20 is used for PE2. 400 o All SPRING routers use an SRGB of [1000, 1999]. 402 o P6 advertises label 20 for the PE2 FEC. 404 o Traffic from PE1 to PE2 uses the shortest path. 406 PE1 ----- P1 -- P2 -- P3 -- P4 ---- P5 --- P6 --- PE2 408 --> +----+ +----+ +----+ +----+ 409 IP Pkt | IP | | IP | | IP | | IP | 410 +----+ +----+ +----+ +----+ 411 |1020| |1020| | 20 | 412 +----+ +----+ +----+ 413 SPRING LDP 415 In term of packet forwarding, by learning the mapping-server 416 advertisement from P5, PE1 imposes a label 1020 to an IP packet 417 destinated to PE2. SPRING routers along the shortest path to PE2 418 will switch the traffic until it reaches P5 which will perform the 419 LSP stitching. P5 will swap the SPRING label 1020 to the LDP label 420 20 advertised by the nexthop P6. P6 will then forward the packet 421 using the LDP label towards PE2. 423 PE1 cannot push an ELI/EL for the binding SID without knowing that 424 the tail-end of the LSP associated with the binding (PE2) is entropy 425 label capable. 427 To accomodate the mix of signaling protocols involved during the 428 stitching, the entropy label capability SHOULD be propagated between 429 the signaling domains. Each binding SID SHOULD have its own entropy 430 label capability that MUST be inherited from the entropy label 431 capability of the associated LSP. If the router advertising the 432 binding SID does not know the ELC state of the target FEC, it MUST 433 NOT set the ELC for the binding SID. An ingress node MUST NOT push 434 an ELI/EL associated with a binding SID unless this binding SID has 435 the entropy label capability. How the entropy label capability is 436 advertised for a binding SID is outside the scope of this document. 438 In our example, if PE2 is LDP entropy label capable, it will add the 439 entropy label capability in its LDP advertisement. When P5 receives 440 the FEC/label binding for PE2, it learns about the ELC and can set 441 the ELC in the mapping server advertisement. Thus PE1 learns about 442 the ELC of PE2 and may push an ELI/EL associated with the binding 443 SID. 445 The proposed solution only works if the SPRING router advertising the 446 binding SID is also performing the dataplane LSP stitching. In our 447 example, if the mapping server function is hosted on P8 instead of 448 P5, P8 does not know about the ELC state of PE2's LDP FEC. As a 449 consequence, it does not set the ELC for the associated binding SID. 451 7. Insertion of entropy labels for SPRING path 453 7.1. Overview 455 The solution described in this section follows the dataplane 456 processing defined in [RFC6790]. Within a SPRING path, a node may be 457 ingress, egress, transit (regarding the entropy label processing 458 described in [RFC6790]), or it can be any combination of those. For 459 example: 461 o The ingress node of a SPRING domain may be an ingress node from an 462 entropy label perspective. 464 o Any LSR terminating a segment of the SPRING path is an egress node 465 (because it terminates the segment) but may also be a transit node 466 if the SPRING path is not terminated because there is a subsequent 467 SPRING MPLS label in the stack. 469 o Any LSR processing a binding SID may be a transit node and an 470 ingress node (because it may push additional labels when 471 processing the binding SID). 473 As described earlier, an LSR may have a limitation, ERLD, on the 474 depth of the label stack that it can read and process in order to do 475 multipath load-balancing based on entropy labels. 477 If an EL does not occur within the ERLD of an LSR in the label stack 478 of an MPLS packet that it receives, then it would lead to poor load- 479 balancing at that LSR. Hence an ELI/EL pair must be within the ERLD 480 of the LSR in order for the LSR to use the EL during load-balancing. 482 Adding a single ELI/EL pair for the entire SPRING path may lead also 483 to poor load-balancing as well because the EL/ELI may not occur 484 within the ERLD of some LSR on the path (if too deep) or may not be 485 present in the stack when it reaches some LSRs if it is too shallow. 487 In order for the EL to occur within the ERLD of LSRs along the path 488 corresponding to a SPRING label stack, multiple pairs MAY 489 be inserted in this label stack. 491 The insertion of the ELI/EL SHOULD occur only with a SPRING label 492 advertised by an LSR that advertised an ERLD (the LSR is entropy 493 label capable) or with a SPRING label associated with a binding SID 494 that has the ELC set. 496 The ELs among multiple pairs inserted in the stack MAY be 497 the same or different. The LSR that inserts pairs MAY have 498 limitations on the number of such pairs that it can insert and also 499 the depth at which it can insert them. If, due to limitations, the 500 inserted ELs are at positions such that an LSR along the path 501 receives an MPLS packet without an EL in the label stack within that 502 LSR's ERLD, then the load-balancing performed by that LSR would be 503 poor. An implementation MAY consider multiple criteria when 504 inserting pairs. 506 7.1.1. Example 1 where the ingress node has a sufficient MSD 508 ECMP LAG LAG 509 PE1 --- P1 --- P2 --- P3 --- P4 --- P5 --- P6 --- PE2 511 Figure 4 513 In the figure 4, PE1 wants to forward some MPLS VPN traffic over an 514 explicit path to PE2 resulting in the following label stack to be 515 pushed onto the received IP header: . PE1 is limited 517 to push a maximum of 11 labels (MSD=11). P2, P3 and P6 have an ERLD 518 of 3 while others have an ERLD of 10. 520 PE1 can only add two ELI/EL pairs in the label stack due to its MSD 521 limitation. It should insert them strategically to benefit load- 522 balancing along the longest part of the path. 524 PE1 may take into account multiple parameters when inserting ELs, as 525 examples: 527 o The ERLD value advertised by transit nodes. 529 o The requirement of load-balancing for a particular label value. 531 o Any service provider preference: favor beginning of the path or 532 end of the path. 534 In the figure 4, a good strategy may be to use the following stack 535 . The original stack requests P2 to forward 537 based on a L3 adjacency set that will require load-balancing. 538 Therefore it is important to ensure that P2 can load-balance 539 correctly. As P2 has a limited ERLD of 3, ELI/EL must be inserted 540 just next to the label that P2 will use to forward. On the path to 541 PE2, P3 has also a limited ERLD, but P3 will forward based on a basic 542 adjacency segment that may require no load-balancing. Therefore it 543 does not seem important to ensure that P3 can do load-balancing 544 despite of its limited ERLD. The next nodes along the forwarding 545 path have a high ERLD that does not cause any issue, except P6, 546 moreover P6 is using some LAGs to PE2 and so is expected to load- 547 balance. It becomes important to insert a new ELI/EL just next to P6 548 forwarding label. 550 In the case above, the ingress node had enough label push capacity to 551 ensure end-to-end load-balancing taking into the path attributes. 552 There might be some cases, where the ingress node may not have the 553 necessary label imposition capacity. 555 7.1.2. Example 2 where the ingress node has not a sufficient MSD 557 ECMP LAG ECMP ECMP 558 PE1 --- P1 --- P2 --- P3 --- P4 --- P5 --- P6 --- P7 --- P8 --- PE2 560 Figure 5 562 In the figure 5, PE1 wants to forward MPLS VPN traffic over an 563 explicit path to PE2 resulting in the following label stack to be 564 pushed onto the IP header: . PE1 is limited to push a maximum of 11 labels, P2, P3 567 and P6 have an ERLD of 3 while others have an ERLD of 15. 569 Using a similar strategy as the previous case may lead to a dilemma, 570 as PE1 can only push a single ELI/EL while we may need a minimum of 571 three to load-balance the end-to-end path. An optimized stack that 572 would enable end-to-end load-balancing may be: . 576 A decision needs to be taken to favor some part of the path for load- 577 balancing considering that load-balancing may not work on the other 578 part. A service provider may decide to place the ELI/EL after the P6 579 forwarding label as it will allow P4 and P6 to load-balance. Placing 580 the ELI/EL at bottom of the stack is also a possibility enabling 581 load-balancing for P4 and P8. 583 7.2. Considerations for the placement of entropy labels 585 The sample cases described in the previous section showed that 586 placing the ELI/EL when the maximum number of labels to be pushed is 587 limited is not an easy decision and multiple criteria may be taken 588 into account. 590 This section describes some considerations that could be taken into 591 account when placing ELI/ELs. This list of criteria is not 592 considered as exhaustive and an implementation MAY take into account 593 additional criteria or tie-breakers that are not documented here. 595 An implementation SHOULD try to maximize the load-balancing where 596 multiple ECMP paths are available and minimize the number of EL/ELIs 597 that need to be inserted. In case of a trade-off, an implementation 598 MAY provide flexibility to the operator to select the criteria to be 599 considered when placing EL/ELIs or the sub-objective for which to 600 optimize. 602 2 2 603 PE1 -- P1 -- P2 --P3 --- P4 --- P5 -- ... -- P8 -- P9 -- PE2 604 | | 605 P3'--- P4'--- P5' 607 Figure 6 609 The figure above will be used as reference in the following 610 subsections. All metrics are equal to 1, except P3-P4 and P4-P5 611 which have a metric 2. 613 7.2.1. ERLD value 615 As mentioned in Section 7.1, the ERLD value is an important parameter 616 to consider when inserting ELI/EL. If an ELI/EL does not fall within 617 the ERLD of a node on the path, the node will not be able to load- 618 balance the traffic efficiently. 620 The ERLD value can be advertised via protocols and those extensions 621 are described in separate documents [I-D.ietf-isis-mpls-elc] and 622 [I-D.ietf-ospf-mpls-elc]. 624 Let's consider a path from PE1 to PE2 using the following stack 625 pushed by PE1: . 627 Using the ERLD as an input parameter may help to minimize the number 628 of required ELI/EL pairs to be inserted. An ERLD value must be 629 retrieved for each SPRING label in the label stack. 631 For a label bound to an adjacency segment, the ERLD is the ERLD of 632 the node that advertised the adjacency segment. In the example 633 above, the ERLD associated with Adj_P1P2 would be the ERLD of router 634 P1 as P1 will perform the forwarding based on the Adj_P1P2 label. 636 For a label bound to a node segment, multiple strategies MAY be 637 implemented. An implementation may try to evaluate the minimum ERLD 638 value along the node segment path. If an implementation cannot find 639 the minimum ERLD along the path of the segment, it can use the ERLD 640 of the starting node instead. In the example above, if the 641 implementation supports computation of minimum ERLD along the path, 642 the ERLD associated with label Node_P9 would be the minimum ERLD 643 between nodes {P2,P3,P4 ..., P8}. If an implementation does not 644 support the computation of minimum ERLD, it should consider the ERLD 645 of P2 (starting node that will forward based on the Node_P9 label). 647 For a label bound to a binding segment, if the binding segment 648 describes a path, an implementation may also try to evaluate the 649 minimum ERLD along this path. If the implementation cannot find the 650 minimum ERLD along the path of the segment, it can use the ERLD of 651 the starting node instead. 653 7.2.2. Segment type 655 Depending of the type of segment a particular label is bound to, an 656 implementation may deduce that this particular label will be subject 657 to load-balancing on the path. 659 7.2.2.1. Node-SID 661 An MPLS label bound to a Node-SID represents a path that may cross 662 multiple hops. Load-balancing may be needed on the node starting 663 this path but also on any node along the path. 665 In the figure 6, let's consider a path from PE1 to PE2 using the 666 following stack pushed by PE1: . 669 If, for example, PE1 is limited to push 6 labels, it can add a single 670 ELI/EL within the label stack. An operator may want to favor a 671 placement that would allow load-balancing along the Node-SID path. 672 In the figure above, P3 which is along the Node-SID path requires 673 load-balancing on two equal-cost paths. 675 An implementation may try to evaluate if load-balancing is really 676 required within a node segment path. This could be done by running 677 an additional SPT computation and analysis of the node segment path 678 to prevent a node segment that does not really require load-balancing 679 from being preferred when placing EL/ELIs. Such inspection may be 680 time consuming for implementations and without a 100% guarantee, as a 681 node segment path may use LAG that could be invisible from the IP 682 topology. A simpler approach would be to consider that a label bound 683 to a Node-SID will be subject to load-balancing and requires an EL/ 684 ELI. 686 7.2.2.2. Adjacency-set SID 688 An adjacency-set is an adjacency SID that refers to a set of 689 adjacencies. When an adjacency-set segment is used within a label 690 stack, an implementation can deduce that load-balancing is expected 691 at the node that advertised this adjacency segment. An 692 implementation could then favor this particular label value when 693 placing ELI/ELs. 695 7.2.2.3. Adjacency-SID representing a single IP link 697 When an adjacency segment representing a single IP link is used 698 within a label stack, an implementation can deduce that load- 699 balancing may not be expected at the node that advertised this 700 adjacency segment. 702 The implementation could then decide to place ELI/ELs to favor other 703 LSRs than the one advertising this adjacency segment. 705 Readers should note that an adjacency segment representing a single 706 IP link may require load-balancing. This is the case when a LAG (L2 707 bundle) is implemented between two IP nodes and the L2 bundle SR 708 extensions [I-D.ietf-isis-l2bundles] are not implemented. In such a 709 case, it may be useful to insert an EL/ELI in a readable position for 710 the LSR advertising the label associated with the adjacency segment. 712 7.2.2.4. Adjacency-SID representing a single link within a L2 bundle 714 When L2 bundle SR extensions [I-D.ietf-isis-l2bundles] are used, 715 adjacency segments may be advertised for each member of the bundle. 716 In this case, an implementation can deduce that load-balancing is not 717 expected on the LSR advertising this segment and could then decide to 718 place ELI/ELs to favor other LSRs than the one advertising this 719 adjacency segment. 721 7.2.2.5. Adjacency-SID representing a L2 bundle 723 When L2 bundle SR extensions [I-D.ietf-isis-l2bundles] are used, an 724 adjacency segment may be advertised to represent the bundle. In this 725 case, an implementation can deduce that load-balancing is expected on 726 the LSR advertising this segment and could then decide to place ELI/ 727 ELs to favor this LSR. 729 7.2.3. Maximizing number of LSRs that will load-balance 731 When placing ELI/ELs, an implementation may try to maximize the 732 number of LSRs that both need to load-balance (i.e., have ECMP paths) 733 and that will be able to perform load-balancing (i.e., the EL label 734 is within their ERLD). 736 Let's consider a path from PE1 to PE2 using the following stack 737 pushed by PE1: . All 738 routers have an ERLD of 10, expect P1 and P2 which have an ERLD of 4. 739 PE1 is able to push 6 labels, so only a single ELI/EL can be added. 741 In the example above, adding ELI/EL next to Adj_P1P2 will only allow 742 load-balancing at P1 while inserting it next to Adj_PE2P9, will allow 743 load-balancing at P2,P3 ... P9 and maximizing the number of LSRs that 744 could perform load-balancing. 746 7.2.4. Preference for a part of the path 748 An implementation may propose to favor a part of the end-to-end path 749 when the number of EL/ELI that can be pushed is not enough to cover 750 the entire path. As example, a service provider may want to favor 751 load-balancing at the beginning of the path or at the end of path, so 752 the implementation should prefer putting the ELI/ELs near the top or 753 near of the bottom of the stack. 755 7.2.5. Combining criteria 757 An implementation can combine multiple criteria to determine the best 758 EL/ELIs placement. However, combining too many criteria may lead to 759 implementation complexity and high resource consumption. Each time 760 the network topology changes, a new evaluation of the EL/ELI 761 placement will be necessary for each impacted LSPs. 763 8. A simple example algorithm 765 A simple implementation might take into account ERLD when placing 766 ELI/EL while trying to minimize the number of EL/ELIs inserted and 767 trying to maximize the number of LSRs that can load-balance. 769 The example algorithm is based on the following considerations: 771 o An LSR that is limited in the number of pairs that it 772 can insert SHOULD insert such pairs deeper in the stack. 774 o An LSR should try to insert pairs at positions so that 775 for the maximum number of transit LSRs, the EL occurs within the 776 ERLD of those LSRs. 778 o An LSR should try to insert the minimum number of such pairs while 779 trying to satisfy the above criteria. 781 The pseudocode of the example algorithm is shown below. 783 Initialize the current EL insertion point to the 784 bottommost label in the stack that is EL-capable 785 while (local-node can push more pairs OR 786 insertion point is not above label stack) { 787 insert an pair below current insertion point 788 move new insertion point up from current insertion point until 789 ((last inserted EL is below the ERLD) AND (ERLD > 2) 790 AND 791 (new insertion point is EL-capable)) 792 set current insertion point to new insertion point 793 } 795 Figure 7: Example algorithm to insert pairs in a label 796 stack 798 When this algorithm is applied to the example described in Section 3, 799 it will result in ELs being inserted in two positions, one below the 800 label L_N-D and another below L_N-P3. Thus the resulting label stack 801 would be 803 9. Deployment Considerations 805 As long as LSR node dataplane capabilities are limited (number of 806 labels that can be pushed, or number of labels that can be 807 inspected), hop-by-hop load-balancing of SPRING encapsulated flows 808 will require trade-offs. 810 Entropy label is still a good and usable solution as it allows load- 811 balancing without having to perform a deep packet inspection on each 812 LSR: it does not seem reasonable to have an LSR inspecting UDP ports 813 within a GRE tunnel carried over a 15 label SPRING tunnel. 815 Due to the limited capacity of reading a deep stack of MPLS labels, 816 multiple EL/ELIs may be required within the stack which directly 817 impacts the capacity of the head-end to push a deep stack: each EL/ 818 ELI inserted requires two additional labels to be pushed. 820 Placement strategies of EL/ELIs are required to find the best trade- 821 off. Multiple criteria may be taken into account and some level of 822 customization (by the user) may be required to accommodate the 823 different deployments. Analyzing the path of each destination to 824 determine the best EL/ELI placement may be time consuming for the 825 control plane, we encourage implementations to find the best trade- 826 off between simplicity, resource consumption, and load-balancing 827 efficiency. 829 In future, hardware and software capacity may increase dataplane 830 capabilities and may be remove some of these limitations, increasing 831 load-balancing capability using entropy labels. 833 10. Options considered 835 Different options that were considered to arrive at the recommended 836 solution are documented in this section. 838 These options are detailed here only for historical purposes. 840 10.1. Single EL at the bottom of the stack 842 In this option, a single EL is used for the entire label stack. The 843 source LSR S encodes the entropy label at the bottom of the label 844 stack. In the example described in Section 3, it will result in the 845 label stack at LSR S to look like 846 . Note that the notation in [RFC6790] is 847 used to describe the label stack. An issue with this approach is 848 that as the label stack grows due an increase in the number of SIDs, 849 the EL goes correspondingly deeper in the label stack. Hence, 850 transit LSRs have to access a larger number of bytes in the packet 851 header when making forwarding decisions. In the example described in 852 Section 3, if we consider that the LSR P1 has an ERLD of 3, P1 would 853 load-balance traffic poorly on the parallel links L3 and L4 since the 854 EL is below the ERLD of P1. A load-balanced network design using 855 this approach must ensure that all intermediate LSRs have the 856 capability to read the maximum label stack depth as required for the 857 application that uses source routed stacking. 859 This option was rejected since there exist a number of hardware 860 implementations which have a low maximum readable label depth. 861 Choosing this option can lead to a loss of load-balancing using EL in 862 a significant part of the network when that is a critical requirement 863 in a service-provider network. 865 10.2. An EL per segment in the stack 867 In this option, each segment/label in the stack can be given its own 868 EL. When load-balancing is required to direct traffic on a segment, 869 the source LSR pushes an before pushing the label 870 associated to this segment . In the example described in Section 3, 871 the source LSR S encoded label stack would be where all the ELs can be the same. Accessing the 873 EL at an intermediate LSR is independent of the depth of the label 874 stack and hence independent of the specific application that uses 875 source routed tunnels with label stacking. A drawback is that the 876 depth of the label stack grows significantly, almost 3 times as the 877 number of labels in the label stack. The network design should 878 ensure that source LSRs have the capability to push such a deep label 879 stack. Also, the bandwidth overhead and potential MTU issues of deep 880 label stacks should be considered in the network design. 882 This option was rejected due to the existence of hardware 883 implementations that can push a limited number of labels on the label 884 stack. Choosing this option would result in a hardware requirement 885 to push two additional labels per tunnel label. Hence it would 886 restrict the number of tunnels that can be stacked in a LSP and hence 887 constrain the types of LSPs that can be created. This was considered 888 unacceptable. 890 10.3. A re-usable EL for a stack of tunnels 892 In this option an LSR that terminates a tunnel re-uses the EL of the 893 terminated tunnel for the next inner tunnel. It does this by storing 894 the EL from the outer tunnel when that tunnel is terminated and re- 895 inserting it below the next inner tunnel label during the label swap 896 operation. The LSR that stacks tunnels should insert an EL below the 897 outermost tunnel. It should not insert ELs for any inner tunnels. 898 Also, the penultimate hop LSR of a segment must not pop the ELI and 899 EL even though they are exposed as the top labels since the 900 terminating LSR of that segment would re-use the EL for the next 901 segment. 903 In Section 3 above, the source LSR S encoded label stack would be 904 . At P1, the outgoing label stack 905 would be after it has load-balanced 906 to one of the links L3 or L4. At P3 the outgoing label stack would 907 be . At P2, the outgoing label stack would be and it would load-balance to one of the nexthop LSRs P4 909 or P5. Accessing the EL at an intermediate LSR (e.g., P1) is 910 independent of the depth of the label stack and hence independent of 911 the specific use-case to which the label stack is applied. 913 This option was rejected due to the significant change in label swap 914 operations that would be required for existing hardware. 916 10.4. EL at top of stack 918 A slight variant of the re-usable EL option is to keep the EL at the 919 top of the stack rather than below the tunnel label. In this case, 920 each LSR that is not terminating a segment should continue to keep 921 the received EL at the top of the stack when forwarding the packet 922 along the segment. An LSR that terminates a segment should use the 923 EL from the terminated segment at the top of the stack when 924 forwarding onto the next segment. 926 This option was rejected due to the significant change in label swap 927 operations that would be required for existing hardware. 929 10.5. ELs at readable label stack depths 931 In this option the source LSR inserts ELs for tunnels in the label 932 stack at depths such that each LSR along the path that must load 933 balance is able to access at least one EL. Note that the source LSR 934 may have to insert multiple ELs in the label stack at different 935 depths for this to work since intermediate LSRs may have differing 936 capabilities in accessing the depth of a label stack. The label 937 stack depth access value of intermediate LSRs must be known to create 938 such a label stack. How this value is determined is outside the 939 scope of this document. This value can be advertised using a 940 protocol such as an IGP. 942 Applying this method to the example in Section 3 above, if LSR P1 943 needs to have the EL within a depth of 4, then the source LSR S 944 encoded label stack would be where all the ELs would typically have the same value. 947 In the case where the ERLD has different values along the path and 948 the LSR that is inserting pairs has no limit on how many 949 pairs it can insert, and it knows the appropriate positions in the 950 stack where they should be inserted, this option is the same as the 951 recommended solution in Section 7. 953 Note that a refinement of this solution which balances the number of 954 pushed labels against the desired entropy is the solution described 955 in Section 7. 957 11. Acknowledgements 959 The authors would like to thank John Drake, Loa Andersson, Curtis 960 Villamizar, Greg Mirsky, Markus Jork, Kamran Raza, Carlos Pignataro, 961 Bruno Decraene, Chris Bowers and Nobo Akiya for their review comments 962 and suggestions. 964 12. Contributors 966 Xiaohu Xu 967 Huawei 969 Email: xuxiaohu@huawei.com 971 Wim Hendrickx 972 Nokia 974 Email: wim.henderickx@nokia.com 976 Gunter Van De Velde 977 Nokia 979 Email: gunter.van_de_velde@nokia.com 981 Acee Lindem 982 Cisco 984 Email: acee@cisco.com 986 13. IANA Considerations 988 This memo includes no request to IANA. Note to RFC Editor: Remove 989 this section before publication. 991 14. Security Considerations 993 This document does not introduce any new security considerations 994 beyond those already listed in [RFC6790]. 996 15. References 998 15.1. Normative References 1000 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1001 Requirement Levels", BCP 14, RFC 2119, 1002 DOI 10.17487/RFC2119, March 1997, 1003 . 1005 [RFC6790] Kompella, K., Drake, J., Amante, S., Henderickx, W., and 1006 L. Yong, "The Use of Entropy Labels in MPLS Forwarding", 1007 RFC 6790, DOI 10.17487/RFC6790, November 2012, 1008 . 1010 [RFC7855] Previdi, S., Ed., Filsfils, C., Ed., Decraene, B., 1011 Litkowski, S., Horneffer, M., and R. Shakir, "Source 1012 Packet Routing in Networking (SPRING) Problem Statement 1013 and Requirements", RFC 7855, DOI 10.17487/RFC7855, May 1014 2016, . 1016 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1017 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1018 May 2017, . 1020 [I-D.ietf-spring-segment-routing] 1021 Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., 1022 Litkowski, S., and R. Shakir, "Segment Routing 1023 Architecture", draft-ietf-spring-segment-routing-15 (work 1024 in progress), January 2018. 1026 [I-D.ietf-spring-segment-routing-mpls] 1027 Bashandy, A., Filsfils, C., Previdi, S., Decraene, B., 1028 Litkowski, S., and R. Shakir, "Segment Routing with MPLS 1029 data plane", draft-ietf-spring-segment-routing-mpls-13 1030 (work in progress), April 2018. 1032 15.2. Informative References 1034 [I-D.ietf-isis-mpls-elc] 1035 Xu, X., Kini, S., Sivabalan, S., Filsfils, C., and S. 1036 Litkowski, "Signaling Entropy Label Capability and 1037 Readable Label-stack Depth Using IS-IS", draft-ietf-isis- 1038 mpls-elc-03 (work in progress), January 2018. 1040 [I-D.ietf-ospf-mpls-elc] 1041 Xu, X., Kini, S., Sivabalan, S., Filsfils, C., and S. 1042 Litkowski, "Signaling Entropy Label Capability and 1043 Readable Label-stack Depth Using OSPF", draft-ietf-ospf- 1044 mpls-elc-05 (work in progress), January 2018. 1046 [I-D.ietf-isis-l2bundles] 1047 Ginsberg, L., Bashandy, A., Filsfils, C., Nanduri, M., and 1048 E. Aries, "Advertising L2 Bundle Member Link Attributes in 1049 IS-IS", draft-ietf-isis-l2bundles-07 (work in progress), 1050 May 2017. 1052 Authors' Addresses 1054 Sriganesh Kini 1056 EMail: sriganeshkini@gmail.com 1058 Kireeti Kompella 1059 Juniper 1061 EMail: kireeti@juniper.net 1063 Siva Sivabalan 1064 Cisco 1066 EMail: msiva@cisco.com 1068 Stephane Litkowski 1069 Orange 1071 EMail: stephane.litkowski@orange.com 1073 Rob Shakir 1074 Google 1076 EMail: rjs@rob.sh 1078 Jeff Tantsura 1080 EMail: jefftant@gmail.com