idnits 2.17.1 draft-kaliraj-idr-multinexthop-attribute-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 9 instances of too long lines in the document, the longest one being 23 characters in excess of 72. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (28 December 2021) is 849 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'MPLS-NAMESPACES' is mentioned on line 863, but not defined == Missing Reference: 'FLWSPC-REDIR-IP' is mentioned on line 858, but not defined == Missing Reference: 'SRTE-COLOR-ONLY' is mentioned on line 868, but not defined == Missing Reference: 'ADDPATH-GUIDELINES' is mentioned on line 849, but not defined == Missing Reference: 'RFC3032' is mentioned on line 580, but not defined == Missing Reference: 'BGP-CT' is mentioned on line 854, but not defined ** Obsolete normative reference: RFC 3392 (Obsoleted by RFC 5492) Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Vairavakkalai 3 Internet-Draft M. Jeyananth 4 Intended status: Standards Track Juniper Networks, Inc. 5 Expires: 1 July 2022 G. Mishra 6 Verizon Communications Inc. 7 28 December 2021 9 BGP MultiNexthop attribute 10 draft-kaliraj-idr-multinexthop-attribute-02 12 Abstract 14 Today, a BGP speaker can advertise one nexthop for a set of NLRIs in 15 an Update. This nexthop can be encoded in either the BGP-Nexthop 16 attribute (code 3), or inside the MP_REACH attribute (code 14). 18 For cases where multiple nexthops need to be advertised, BGP-Addpath 19 is used. Though Addpath allows basic ability to advertise multiple- 20 nexthops, it does not allow the sender to specify desired 21 relationship between the multiple nexthops being advertised e.g., 22 relative-preference, type of load-balancing. These are local 23 decisions at the receiving speaker based on local configuration and 24 path-selection between the various additional-paths, which may tie- 25 break on some arbitrary step like Router-Id or BGP nexthop address. 27 Some scenarios with a BGP-free core may benefit from having a 28 mechanism, where egress-node can signal multiple-nexthops along with 29 their relationship, in one BGP route, to ingress nodes. This 30 document defines a new BGP attribute "MultiNexthop (MNH)" that can be 31 used for this purpose. 33 This attribute can be used for both labeled and unlabled BGP 34 families. The MNH can be used to advertise MPLS label along with 35 nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). 36 Such that, mechanisms at the transport layer can work uniformly on 37 labeled and unlabled BGP families. Service route scale can be 38 confined closer to the service edge nodes, making the transport layer 39 nodes light and nimble. They dont have any service route state, only 40 have service end-point state. 42 The MNH plays different role in "downstream allocation" scenario than 43 "upstream allocation" scenario. E.g. for RFC8277 families that 44 advertise downstream allocated labels, the MNH can play the "Label 45 Descriptor" role, describing the forwarding semantics of the label 46 being advertised. This can be useful in network visualization and 47 controller based traffic engineering (e.g. EPE). 49 Requirements Language 51 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 52 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 53 document are to be interpreted as described in RFC 2119 [RFC2119]. 55 Status of This Memo 57 This Internet-Draft is submitted in full conformance with the 58 provisions of BCP 78 and BCP 79. 60 Internet-Drafts are working documents of the Internet Engineering 61 Task Force (IETF). Note that other groups may also distribute 62 working documents as Internet-Drafts. The list of current Internet- 63 Drafts is at https://datatracker.ietf.org/drafts/current/. 65 Internet-Drafts are draft documents valid for a maximum of six months 66 and may be updated, replaced, or obsoleted by other documents at any 67 time. It is inappropriate to use Internet-Drafts as reference 68 material or to cite them other than as "work in progress." 70 This Internet-Draft will expire on 1 July 2022. 72 Copyright Notice 74 Copyright (c) 2021 IETF Trust and the persons identified as the 75 document authors. All rights reserved. 77 This document is subject to BCP 78 and the IETF Trust's Legal 78 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 79 license-info) in effect on the date of publication of this document. 80 Please review these documents carefully, as they describe your rights 81 and restrictions with respect to this document. Code Components 82 extracted from this document must include Revised BSD License text as 83 described in Section 4.e of the Trust Legal Provisions and are 84 provided without warranty as described in the Revised BSD License. 86 Table of Contents 88 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 89 2. Use-cases examples . . . . . . . . . . . . . . . . . . . . . 4 90 2.1. Optimal forwarding exit-points signaling to 91 ingress-node . . . . . . . . . . . . . . . . . . . . . . 4 92 2.2. Choosing a received label based on it's forwarding-semantic 93 at advertising node . . . . . . . . . . . . . . . . . . . 5 94 2.3. Signaling desired forwarding behavior when installing MPLS 95 Upstream labels at receiving node . . . . . . . . . . . . 5 96 2.4. Load-balancing over EBGP parallel links . . . . . . . . . 5 97 2.5. Flowspec routes with multiple Redirect-IP nexthops . . . 6 98 2.6. Color-Only resolution nexthop . . . . . . . . . . . . . . 6 99 3. The "MultiNexthop (MNH)" BGP attribute encoding . . . . . . . 6 100 3.1. Operations . . . . . . . . . . . . . . . . . . . . . . . 8 101 3.1.1. BGP Capability for MNH attribute . . . . . . . . . . 8 102 3.1.2. Scope of use, and propagation . . . . . . . . . . . . 8 103 3.1.3. Interaction of MNH with Nexthop (in attr-code 3, 104 14) . . . . . . . . . . . . . . . . . . . . . . . . . 8 105 3.1.4. Interaction with Addpath . . . . . . . . . . . . . . 9 106 3.1.5. Path-selection considerations . . . . . . . . . . . . 9 107 3.1.6. NH-Flags U bit, denoting upstream/downstream 108 semantics . . . . . . . . . . . . . . . . . . . . . . 9 109 3.2. Nexthop Forwarding Semantics TLV . . . . . . . . . . . . 10 110 3.3. Nexthop-Leg Descriptor TLV . . . . . . . . . . . . . . . 11 111 3.4. Nexthop Attributes Sub-TLV . . . . . . . . . . . . . . . 12 112 3.4.1. IP Address . . . . . . . . . . . . . . . . . . . . . 12 113 3.4.2. Labeled IP nexthop . . . . . . . . . . . . . . . . . 13 114 3.4.3. Transport Class ID (Color) . . . . . . . . . . . . . 14 115 3.4.4. Available Bandwidth . . . . . . . . . . . . . . . . . 15 116 3.4.5. Load balance factor . . . . . . . . . . . . . . . . . 16 117 3.4.6. Forwarding-context name . . . . . . . . . . . . . . . 17 118 3.4.7. Forwarding-context Route-Target . . . . . . . . . . . 17 119 4. Error handling procedures . . . . . . . . . . . . . . . . . . 18 120 5. Scaling considerations . . . . . . . . . . . . . . . . . . . 19 121 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 122 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 123 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 124 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 125 9.1. Normative References . . . . . . . . . . . . . . . . . . 20 126 9.2. References . . . . . . . . . . . . . . . . . . . . . . . 20 127 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 129 1. Introduction 131 Today, a BGP speaker can advertise one nexthop for a set of NLRIs in 132 an Update. This nexthop can be encoded in either the top-level BGP- 133 Nexthop attribute (code 3), or inside the MP_REACH attribute (code 134 14). 136 For cases where multiple nexthops need to be advertised, BGP-Addpath 137 is used. Though Addpath allows basic ability to advertise multiple- 138 nexthops, it does not allow the sender to specify desired 139 relationship between the multiple nexthops being advertised e.g., 140 relative-ordering, type of load-balancing, fast-reroute. These are 141 local decision at the receiving node based on local configuration and 142 path-selection between the various additional-paths, which may tie- 143 break on some arbitrary step like Router-Id or BGP nexthop address. 145 Some scenarios with a BGP-free core may benefit from having a 146 mechanism, where egress-node can signal multiple-nexthops along with 147 their relationship to ingress nodes. This document defines a new BGP 148 attribute "MultiNexthop (MNH)" that can be used for this purpose. 150 This attribute can be used for both labeled and unlabled BGP 151 families. The MNH can be used to advertise MPLS label along with 152 nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). 153 Such that, mechanisms at the transport layer can work uniformly on 154 labeled and unlabled BGP families. Service route scale can be 155 confined closer to the service edge nodes, making the transport layer 156 nodes light and nimble. They dont have any service route state, only 157 have service end-point state. 159 The MNH plays different role in "downstream allocation" scenario than 160 "upstream allocation" scenario. E.g. for RFC8277 families that 161 advertise downstream allocated labels, the MNH can play the "Label 162 Descriptor" role, describing the forwarding semantics of the label 163 being advertised. This can be useful in network visualization and 164 controller based traffic engineering (e.g. EPE). 166 A new BGP capability ([RFC3392]) called "MultiNexthop (MNH" is 167 defined with type code: IANA TBD. This capability is used to express 168 the ability to send and receive MNH attribute. 170 2. Use-cases examples 172 2.1. Optimal forwarding exit-points signaling to ingress-node 174 In a BGP free core, one can dynamically signal to the ingress-node, 175 how traffic should be load-balanced towards a set of exit-nodes, in 176 one BGP-route containing this attribute. 178 Example, for prefix1, perform equal cost load-balancing towards exit- 179 nodes A, B; where-as for prefix2, perform unequal-cost load-balancing 180 (40%, 30%, 30%) towards exit-nodes A, B, C. 182 Example, for prefix1, use PE1 as primary-nexthop and use PE2 as a 183 backup-nexthop. 185 2.2. Choosing a received label based on it's forwarding-semantic at 186 advertising node 188 In Downstream label allocation case, the MNH plays role of "Label 189 descriptor" and describes the forwarding treatment given to the label 190 at the advertising speaker. The receiving speaker can benefit from 191 this information as in the following examples: 193 - For a Prefix, a label with FRR enabled nexthop-set can be preferred 194 to another label with a nexthop-set that doesn't provide FRR. 196 - For a Prefix, a label pointing to 10g nexthop can be preferred to 197 another label pointing to a 1g nexthop 199 - Set of labels advertised can be aggregated, if they have same 200 forwarding semantics (e.g. VPN per-prefix-label case) 202 2.3. Signaling desired forwarding behavior when installing MPLS 203 Upstream labels at receiving node 205 In Upstream label allocation case, the receiving speaker's 206 forwarding-state can be controlled by the advertising speaker, thus 207 enabling a standardized API to program desired MPLS forwarding-state 208 at the receiving node. This is described in the [MPLS-NAMESPACES] 210 2.4. Load-balancing over EBGP parallel links 212 Consider N parallel links between two EBGP speakers. There are 213 different models possible to do load balancing over these links: 215 N single-hop EBGP sessions over the N links. Interface addresses 216 are used as next-hops. N copies of the RIB are exchanged to form 217 N-way ECMP paths. The routes advertised on the N sessions can be 218 attached with Link bandwidth comunity to perform weighted ECMP. 220 1 multi-hop EBGP session between loopback addresses, reachable via 221 static route over the N links. Loopback addresses are used as 222 next-hops. 1 copy of the RIB is exchanged with loopback address as 223 nexthop. And a static route can be configured to the loopback 224 address to perform desired N-way ECMP path. M loopbacks are 225 configured in this model, to achieve M different load balancing 226 schemes: ECMP, weighted ECMP, Fast-reroute enabled paths etc. 228 1 multi-hop EBGP session between loopback addresses, reachable via 229 static route over the N links. Interface addresses are used as 230 next-hops, without using additional loopbacks. 1 copy of the RIB 231 is exchanged with MNH attribute to form N-way ECMP paths, weighted 232 ECMP, Fast-reroute backup paths etc. BFD may be used to these 233 directly connected BGP nexthops to detect liveness. 235 2.5. Flowspec routes with multiple Redirect-IP nexthops 237 There are existing protocol machinery which can benefit from the 238 ability of MNH to clearly specify fallback behavior when multiple 239 nexthops are involved. One example is the scenario described in 240 [FLWSPC-REDIR-IP] where multiple Redirect-to-IP nexthop addresses 241 exist for a Flowspec prefix. In such a scenario, the receiving 242 speakers may redirect the traffic to different nexthops, based on 243 variables like IGP-cost. If instead, the MNH was used to specify the 244 redirect-to-IP nexthop, then the order of preference between the 245 different nexthops can be clearly specified using one flowspec route 246 carrying a MNH containing those different nexthop-addresses 247 specifying the desired preference-order. Such that, irrespective of 248 IGP-cost, the receiving speakers will redirect the flow towards the 249 same traffic collector device. 251 2.6. Color-Only resolution nexthop 253 Another existing protocol machinery that manufactures nexthop 254 addresses from overloaded extended color community is specified in 255 [SRTE-COLOR-ONLY]. In a way, the color field is overloaded to carry 256 one anycast BGP next-hop with pre-specified fallback options. This 257 approach gives us only two next-hops to play with. The 'BGP nexthop 258 address' and the 'Color-only nexthop' 260 Instead, the MNH could be used to achieve the same result with more 261 flexibility. Multiple BGP nexthops can be carried, each resolving 262 over a desired Transport class (Color), and with customizable 263 fallback order. And the solution will work for non-SRTE networks as- 264 well. 266 3. The "MultiNexthop (MNH)" BGP attribute encoding 268 "MultiNexthop (MNH)" is a new BGP optional non-transitive attribute 269 (code TBD), that can be used to convey multiple-nexthops to a BGP- 270 speaker. This attribute describes forwarding semantics using one or 271 more Nexthop-Forwarding-Semantics TLV. 273 0 1 2 3 274 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 276 |1 0 0 1(Flags) |Attr. Type Code| Length | 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 | MNH-Flags | PNH-Len | ..Advertising| 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 | PNH Address /32 or /128.. | Num-Nexthops | 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 | ...one or more "Nexthop-Forwarding-Semantics TLV"... | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 Fig 1: MultiNexthop - BGP Attribute 287 - Flags 288 BGP Path-attribute flags. 1001 to indicate Optional 289 Non-Transitive, Extended-length field. 291 - Attr. Type Code 292 IANA TBD. 294 - Length 295 Two bytes field stating length of attribute value in bytes. 297 - MNH-Flags 298 16 bit flag (UR..R) 299 Only one bit MSB is defined currently, others are reserved. 300 R: Reserved 301 U: 1 means the Upstream-allocation, attribute describes 302 forwarding state desired at receiving speaker. 303 0 means the Downstream-allocation, attribute describes 304 forwarding state present at advertising-speaker. 305 - PNH-Len 306 Protocol-NH Length in bits (= 32 or 128) Advertising PNH IPv4 or IPv6 308 - PNH-address 309 BGP Protocol Nexthop address (Len = 32 or 128) advertised in NEXT_HOP or 310 MP_REACH_NLRI attr. Used to sanity-check this attribute. 312 - Num-Nexthops 313 Number of nexthop addresses carried in the MNH. 314 >1 if ECMP or Alternate-paths. 316 Sec 3.2 describes the Nexthop-Forwarding-Semantics TLV. 318 3.1. Operations 320 3.1.1. BGP Capability for MNH attribute 322 A new BGP capability [RFC3392] called "MultiNexthop (MNH)" is defined 323 with type code: IANA TBD. The MNH attribute MUST NOT be sent to a 324 BGP speaker that has not advertise the MNH capability. A BGP speaker 325 MUST ignore the MNH attribute received from a peer which has not 326 advertised the MNH attribute. 328 3.1.2. Scope of use, and propagation 330 The MNH attribute is intended to be used in a BGP free core, between 331 egress and ingress BGP speakers that understand this attribute. 333 Also, it is required to avoid un-intentionally leaking it to other AS 334 on an EBGP session, via a BGP speaker that does not understand MNH 335 attribute. 337 To achieve this, the attribute is defined as "optional non- 338 transitive", and uses a new BGP capability. If a MNH-attribute is 339 received by a PE BGP-speaker that does not understand it, the 340 optional non-transitive nature avoids unintentionally propagating it 341 towards EBGP-peers. 343 This also means that a RR needs to be upgraded to support this 344 attribute before any PEs in the network can make use of it. When a 345 RR receives the MNH-attribute from a client that supports the 346 attribute, it propagates the attribute as-is when reflecting the 347 route with nexthop unchanged. 349 When a BGP speaker receives the MNH-attribute from another speaker 350 that did not advertise support of the attribute, the attribute is 351 ignored. 353 The MNH attribute capability provides additonaly protection against 354 receiving this attribute from EBGP peers, when not intended. 356 3.1.3. Interaction of MNH with Nexthop (in attr-code 3, 14) 358 When adding a MultiNexthop attribute to an advertised BGP route, the 359 speaker MUST put the same next-hop address in the Advertising PNH 360 field as it put in the Nexthop field inside NEXT_HOP attribute or 361 MP_REACH_NLRI attribute. Any speaker that recognizes this attribute 362 and changes the PNH while re-advertising the route MUST remove the 363 MultiNexthop-Attribute in the re-advertisement. The speaker MAY 364 however add a new MultiNexthop-Attribute to the re-advertisement; 365 while doing so the speaker MUST record in the "Advertising-PNH" field 366 the same next-hop address as used in NEXT_HOP field or MP_REACH_NLRI 367 attribute. 369 A speaker receiving a MNH attribute SHOULD ignore it if the next-hop 370 address contained in Advertising-PNH field is not the same as the 371 next-hop address contained in NEXT_HOP field or MP_REACH_NLRI field. 373 3.1.4. Interaction with Addpath 375 [ADDPATH-GUIDELINES] suggests the following: 377 "Diverse path: A BGP path associated with a different BGP next-hop 378 and BGP router than some other set of paths. The BGP router 379 associated with a path is inferred from the ORIGINATOR_ID attribute 380 or, if there is none, the BGP Identifier of the peer that advertised 381 the path." 383 When selecting "diverse paths" for ADD_PATH as specified above, the 384 MNH attribute should also be compared if it exists, to determine if 385 two routes have "different BGP next-hop". 387 3.1.5. Path-selection considerations 389 While tie breaking in the path-selection as described in RFC-4271, 390 9.1.2.2. step (e) viz. the "IGP cost to nexthop", consider the 391 highest cost among the nexthop-legs present in this attribute. 393 3.1.6. NH-Flags U bit, denoting upstream/downstream semantics 395 U-bit being Set indicates that this attribute describes what the 396 forwarding semantics of an Upstream-allocated label at the receiving- 397 speaker should be. All other bits in NH-Flags are currently 398 reserved, MUST be set to 0 by sender and MUST be ignored by receiver. 400 This attribute can be used for both labeled and unlabled BGP 401 families. 403 A MultiNexthop attribute with U=0 is called "Label Descriptor" role. 404 A BGP speaker advertising a downstream-allocated label-route MAY add 405 this attribute to the BGP route Update, to "describe" to the 406 receiving speaker what the label's forwarding semantics at the 407 sending speaker is. 409 Today semantics of a downstream-allocated label is known only to the 410 egress-node advertising the label. The speaker receiving the label- 411 binding doesn't know what the label's forwarding semantic at the 412 advertiser is. In some environments, it may be useful to convey this 413 information to the receiving speaker. This may help in better 414 debugging and manageability, or enable the receiving speaker, which 415 could also be some centralized controller, make better decisions 416 about which label to use, based on the label's forwarding-semantic. 418 While doing upstream-label allocation, this attribute (U-bit Set) can 419 be used to convey the forwarding-semantics at the receiving node 420 should be. Details of the BGP protocol extensions required for 421 signaling upstream-label allocation are out of scope of this 422 document, and are described in [MPLS-NAMESPACES]. 424 In rest of this document, the use of term "Label" will mean 425 downstream allocated label, unless specified otherwise as upstream- 426 allocated label. 428 When using the MultiNexthop attribute for IP-routes, U-bit is Set. 429 Since IP prefixes are by nature upstream allocated. 431 3.2. Nexthop Forwarding Semantics TLV 433 Each Forwarding-Semantics TLV expresses a nexthop leg's forwarding 434 action. i.e. a "FwdAction" with an associated Nexthop. The type of 435 actions defined by this TLV are given below. The "Nexthop-Leg" field 436 takes appropriate values based on the FwdAction. 438 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | FwdAction | Len | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 | ...Nexthop-Leg Descriptor-TLV... | 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 445 Fig 2: Nexthop Forwarding Semantics TLV 447 FwdAction Meaning 448 --------- ------------- 449 1 Forward 450 2 Pop-And-Forward 451 3 Swap 452 4 Push 453 5 Pop-And-Lookup 454 6 Replicate 456 - Len 457 Length of Nexthop Forwarding Semantics TLV including all 458 Nexthop-Leg Descriptor TLVs. 460 Meaning of most of the above FwdAction semantics is well understood. 461 FwdAction 1 is applicable for both IP and MPLS routes. FwdActions 462 2-5 are applicable for MPLS routes only. FwdActions 1 and 6 are 463 applicable for Flowspec routes for Redirect and Mirror actions. 465 The "Forward" action means forward the IP/MPLS packet with the 466 destination prefix (IP-dest-addr/MPLS-label) value unchanged. For IP 467 routes, this is the forwarding-action given for next-hop addresses 468 contained in BGP path-attributes: Nexthop (code 3) or MP_REACH_NLRI 469 (code 14). For MPLS routes, usage of this action is equivalent to 470 SWAP with same label-value; one such usage is explained in 471 [MPLS-NAMESPACES] when Upstream-label-allocation is in use. 473 The "Pop-And-Forward" action means Pop the MPLS-label and forward the 474 payload towards the Nexthop IP-address specified in the sub-TLV, 475 using appropriate encapsulation to reach the Nexthop. 477 The "Pop-And-Lookup" action may result in a MPLS-lookup or an upper- 478 layer header (like IPv4, IPv6) lookup, depending on whether the label 479 that was popped was the bottom of stack label. 481 If an incompatible FwdAction is received for a prefix-type, or an 482 unsupported FwdAction is received, it is considered a semantic-error 483 and MUST be dealt with as explained in section 5. 485 3.3. Nexthop-Leg Descriptor TLV 487 The Nexthop-Leg Descriptor TLV describes various attributes of the 488 Nexthop-legs that the FwdAction is associated with. 490 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 | NhopDescrType | Len | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 | Flags | Relative-Preference | 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 | ..Nexthop Attributes SubTLV.. | 497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 498 | ..Nexthop Attributes SubTLV.. | 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 501 Fig 3: Nexthop-Leg Descriptor TLV 503 NhopDescrType Meaning 504 ------------- --------- 505 1 IPv4-nexthop 506 2 IPv6-nexthop 507 3 Labeled-IP-Nexthop 508 4 Forwarding-Context-Nexthop 510 - Len (2 octets) 511 Length in bytes of Nexthop-Leg Descriptor TLV, including Flags, Relative-Preference and all 512 Nexthop Attributes SubTLVs. 514 - Flags 515 2 octets. Must send zero. Must ignore on receive. 517 - Relative-Preference 518 Unsigned 2 octet integer specifying relative order or 519 preference, to use in FIB. Use in FIB all usable legs with lowest 520 relative-weight. If multiple legs exist with that weight, form ECMP. 522 3.4. Nexthop Attributes Sub-TLV 524 SubTLV type Meaning 525 ----------- ---------- 526 1 IP-Address 527 2 Labeled-IP-Nexthop 528 3 Transport Class ID (Color) 529 4 Bandwidth 530 5 Load-Balance-Factor 531 6 Forwarding-context Name 532 7 Forwarding-context Route-Target 534 3.4.1. IP Address 535 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 537 | Attr SubTLV Type = 1 | Len (2 bytes) | 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 | Flags (2 bytes) | PfxLen | ..IPv4 or | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | IPv6 Address .. | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 - Len (2 octets) 545 Length in bytes of remaining portion of SubTLV. 547 - Flags 548 2 octets. Must send zero. Must ignore on receive. 550 - PfxLen (1 octet) 551 Length in bits of Nexthop IP-address (32 or 128) 553 - IPv4 or IPv6 Address 554 Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address. 556 Fig 4: IP-Address attribute sub-TLV 558 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 559 with FwdAction of Pop-And-Forward or Forward. 561 3.4.2. Labeled IP nexthop 562 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564 | Attr SubTLV Type = 2 | Len (2 bytes) | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 | Flags (2 bytes) | Label (20 bits) | 567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 568 | |Rsrv |S| PfxLen | ..IPv4 or IPv6 Address .. | 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 - Len (2 octets) 572 Length in bytes of remaining portion of SubTLV. 574 - Flags (2 octets): 575 ELC (MSB bit): indicates if this egress NH is Entropy Label Capable. 576 Remaining bits are Reserved. Must send zero. Must ignore on receive. 578 - Label: 579 The Label field is a 20-bit field containing an MPLS label value 580 (see [RFC3032]). 582 - Rsrv: 583 This 3-bit field SHOULD be set to zero on transmission and 584 MUST be ignored on reception. 586 - S: 587 This 1-bit field MUST be set to one on last label being pushed. 589 - PfxLen (1 octet) 590 Length in bits of Nexthop IP-address (32 or 128) 592 - IPv4 or IPv6 Address 593 Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address. 595 Fig 5: "Labeled nexthop" attribute sub-TLV 597 This sub-TLV would be valid with Nexthop-Leg Forwarding-Semantics TLV 598 with FwdAction of Swap or Push. 600 3.4.3. Transport Class ID (Color) 602 The Nexthop can be associated with a Transport Class, so as to 603 resolve a path that satisfies required Transport tunnel 604 characteristics. Transport Class is defined in [BGP-CT] 605 Transport Class is a per-nexthop scoped attribute. Without MNH, the 606 Transport class is applied to the nexthop IP-address encoded in the 607 BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute 608 (code 14). With MNH, the Transport Class can be specified per 609 Nexthop-Leg TLV. It is applied to the IP-address encoded in the 610 Nexthop Attribute Sub-TLVs of type "IP Address", "Labeled IP 611 nexthop". 613 The format of the Transport Class ID Sub-TLV is as follows: 615 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | Attr SubTLV Type = 3 | Len (2 bytes) | 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 | Transport Class ID (4 bytes) | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 - Len (2 octets) 623 Length in bytes of remaining portion of SubTLV. 625 - Transport Class ID (Color): 626 This is a 32 bit identifier, associated with the Nexthop address. 627 The Nexthop specified in "IP-address or Labeled Nexthop" TLVs 628 are resolved over tunnels of this color. 629 Defined in [BGP-CT] [draft-kaliraj-idr-bgp-classful-transport-planes] 631 Fig 6: "Transport Class ID (Color)" attribute sub-TLV 633 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 634 with FwdAction of Forward, Swap or Push. 636 3.4.4. Available Bandwidth 637 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | Attr SubTLV Type = 4 | Len (2 bytes) | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 641 | Bandwidth (8 octets) | 642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 | Bandwidth (contd.) | 644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 646 - Len (2 octets) 647 Length in bytes of remaining portion of SubTLV. 649 - Bandwidth 650 The bandwidth of the link expressed as 8 octets, 651 units being bits per second. 653 Fig 6: "Bandwidth" attribute sub-TLV 655 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 656 with FwdAction of Forward, Swap or Push. 658 This sub-TLV would also be valid in a Label-Descriptor-attribute 659 whose U-bit is reset. 661 3.4.5. Load balance factor 663 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 665 | Attr SubTLV Type = 5 | Len (2 bytes) | 666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 667 | Balance Percentage | 668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 670 - Len (2 octets) 671 Length in bytes of remaining portion of SubTLV. 673 - Balance Percentage: 674 This is the explicit "balance percentage" requested by the sender, 675 for unequal load-balancing over these Nexthop-Descriptor-TLV legs. 676 This balance percentage would override the implicit 677 balance-percentage calculated using "Bandwidth" attribute 678 sub-TLV. 680 Fig 7: "Load-Balance-Factor" attribute sub-TLV 681 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 682 with FwdAction of Forward, Swap or Push. 684 This is the explicit "balance percentage" requested by the sender, 685 for unequal load-balancing over these Nexthop-Descriptor-TLV legs. 686 This balance percentage would override the implicit balance- 687 percentage calculated using "Bandwidth" attribute sub-TLV 689 3.4.6. Forwarding-context name 691 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 693 | Attr SubTLV Type = 6 | Len (2 bytes) | 694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 | NameLen (2 octets) | ..Fwd-Context-name...(unicode)| 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 - Len (2 octets) 699 Length in bytes of remaining portion of SubTLV. 701 - NameLen (2 octets) 702 Length in bytes of Fwd-Context-Name 704 - Forwarding Context Name: 705 Name of forwarding context (e.g. VRF-name) where lookup should happen. 707 Fig 8: Forwarding-Context name attribute sub-TLV 709 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 710 with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Fowarding- 711 context-name identfies the forwarding-context (for e.g. the VRF- 712 name) where the lookup should happen after pop label. 714 3.4.7. Forwarding-context Route-Target 715 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 | Attr SubTLV Type = 7 | Len (2 bytes) | 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | Type (2 octets) | ...Route Target... (8 octets)| 720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 721 | ..Route Target... (continued) | 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 |...Route Target... (8 octets) | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 - Len (2 octets) 727 Length in bytes of remaining portion of SubTLV. 729 - Type: 730 value of 1 indicates Route Target follows. 732 - Route Target: 733 Import Route Target of the forwarding context 734 (e.g. VRF-name) where lookup should happen. 736 Fig 9: "Route-Target identifying the Forwarding-Context" attribute 737 sub-TLV 739 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV 740 with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Route 741 Target identfies the forwarding-context (for e.g. VRF) where the 742 lookup should happen after pop label. 744 If any of these sub-TLVs or FwdAction combinations are unrecognized 745 or unsupported by a receiving speaker, it is considered a semantic 746 error for that speaker, and in such case error-handling procedures 747 described in section 4 should be followed. 749 4. Error handling procedures 751 When U-bit is Reset, this attribute is used to describe the label 752 advertised by the BGP-peer. If the value in the attribute is 753 syntactically parse-able, but not semantically valid, the receiving 754 speaker should deal with the error gracefully and MUST NOT tear down 755 the BGP session. In such cases the rest of the BGP-update can be 756 consumed if possibe. 758 When U-bit is Set, this attribute is used to specify the forwarding 759 action at the receiving BGP-peer. If the value in the attribute is 760 syntactically parse-able, but not semantically valid, the receiving 761 speaker SHOULD deal with the error gracefully by ignoring the MNH 762 attribute, and continue processing the route. It MUST NOT tear down 763 the BGP session. 765 If a MNH with U-bit Reset is received for an IP-route (SAFI Unicast), 766 the MNH attribute SHOULD be ignored. Because IP route prefixes are 767 upstream allocated by nature. 769 If a MNH with U-bit Reset is received for an [MPLS-NAMESPACES] route, 770 the MNH attribute SHOULD be ignored. Because the label prefix in 771 MPLS-NAMESPACE family routes is upstream allocated. 773 The receiving BGP speaker MAY consider the "Num-Nexthop" value in a 774 MNH attribute (U-bit Set) not acceptable, based on it's forwarding 775 capabilities. In such cases, the MNH attribute SHOULD be considered 776 Unusable, and not be used, ignored on receipt. The condition SHOULD 777 be dealt gracefully and MUST NOT tear down the BGP session. 779 5. Scaling considerations 781 The MNH attribute allows receiving multiple nexthops on the same BGP 782 session. This flexibility also opens up the possibility that a peer 783 can send large number of multipath (ECMP/UCMP/FRR) nexthops that may 784 overwhelm the local system's forwarding plane. Prefix-limit based 785 checks will not avoid this situation. 787 To keep the scaling limits under check, a BGP speaker MAY keep 788 account of number of unique multipath nexthops that are received from 789 a BGP peer, and impose a configurable max-limit on that. This is 790 especially useful for EBGP peers. 792 A good scaling property of conveying multipath nexthops using the MNH 793 attribute with N nexthop legs on one BGP session, as against BGP 794 routes on N BGP sessions is that, it limits the amount of 795 transitionary multipath combinatorial state in the latter model. 796 Because the final multipath state is conveyed by one route update in 797 deterministic manner, there is no transitionary multipath 798 combinatorial explosion created during establishment of N sessions. 800 6. IANA Considerations 802 This document makes request to IANA to allocate the following codes 803 in BGP attributes registry. 805 1. MultiNexthop (MNH) BGP-attribute: A new BGP attribute code TBD. 807 This document makes request to IANA to allocate the following sub 808 registries for MNH attribute:. 810 1. "FwdAction" type as defined in 3.1. 812 2. Nexthop-Leg Descriptor TLV:"NhopDescrType" as defined in 3.2. 814 3. "Nexthop Attributes Sub-TLV type" as defined in 3.3. 816 This document makes request to IANA to allocate a BGP capability code 817 TBD for MNH attribute:. 819 Note to RFC Editor: this section may be removed on publication as an 820 RFC. 822 7. Security Considerations 824 The attribute is defined as optional non-transitive BGP attribute, 825 such that it does not accidentally get propagated or leaked via BGP 826 speakers that dont support this feature, especially does not 827 unintentionally leak across EBGP boundaries. 829 8. Acknowledgements 831 Thanks to Robert Raszuk, Gyan Mishra, Ron Bonica for the review, 832 discussions and input to the draft. 834 9. References 836 9.1. Normative References 838 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 839 Requirement Levels", BCP 14, RFC 2119, 840 DOI 10.17487/RFC2119, March 1997, 841 . 843 [RFC3392] Chandra, R. and J. Scudder, "Capabilities Advertisement 844 with BGP-4", RFC 3392, DOI 10.17487/RFC3392, November 845 2002, . 847 9.2. References 849 [ADDPATH-GUIDELINES] 850 Uttaro, Ed., "BGP Flow-Spec Redirect to IP Action", 25 851 April 2016, . 854 [BGP-CT] Vairavakkalai, Ed., "BGP Classful Transport Planes", 25 855 August 2021, . 858 [FLWSPC-REDIR-IP] 859 Simpson, Ed., "BGP Flow-Spec Redirect to IP Action", 2 860 February 2015, . 863 [MPLS-NAMESPACES] 864 Vairavakkalai, Ed., "BGP signalled MPLS-namespaces", 28 865 December 2021, . 868 [SRTE-COLOR-ONLY] 869 Filsfils, Ed., "BGP Flow-Spec Redirect to IP Action", 21 870 February 2018, . 873 Authors' Addresses 875 Kaliraj Vairavakkalai 876 Juniper Networks, Inc. 877 1194 N. Mathilda Ave. 878 Sunnyvale, CA 94089 879 United States of America 881 Email: kaliraj@juniper.net 883 Minto Jeyananth 884 Juniper Networks, Inc. 885 1194 N. Mathilda Ave. 886 Sunnyvale, CA 94089 887 United States of America 889 Email: minto@juniper.net 891 Gyan Mishra 892 Verizon Communications Inc. 893 13101 Columbia Pike 894 Silver Spring, MD 20904 895 United States of America 897 Email: gyan.s.mishra@verizon.com