Network Working Group K. Vairavakkalai Internet-Draft M. Jeyananth Intended status: Standards Track Juniper Networks, Inc. Expires: 1 July 2022 G. Mishra Verizon Communications Inc. 28 December 2021 BGP MultiNexthop attribute draft-kaliraj-idr-multinexthop-attribute-02 Abstract Today, a BGP speaker can advertise one nexthop for a set of NLRIs in an Update. This nexthop can be encoded in either the BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute (code 14). For cases where multiple nexthops need to be advertised, BGP-Addpath is used. Though Addpath allows basic ability to advertise multiple- nexthops, it does not allow the sender to specify desired relationship between the multiple nexthops being advertised e.g., relative-preference, type of load-balancing. These are local decisions at the receiving speaker based on local configuration and path-selection between the various additional-paths, which may tie- break on some arbitrary step like Router-Id or BGP nexthop address. Some scenarios with a BGP-free core may benefit from having a mechanism, where egress-node can signal multiple-nexthops along with their relationship, in one BGP route, to ingress nodes. This document defines a new BGP attribute "MultiNexthop (MNH)" that can be used for this purpose. This attribute can be used for both labeled and unlabled BGP families. The MNH can be used to advertise MPLS label along with nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). Such that, mechanisms at the transport layer can work uniformly on labeled and unlabled BGP families. Service route scale can be confined closer to the service edge nodes, making the transport layer nodes light and nimble. They dont have any service route state, only have service end-point state. The MNH plays different role in "downstream allocation" scenario than "upstream allocation" scenario. E.g. for RFC8277 families that advertise downstream allocated labels, the MNH can play the "Label Descriptor" role, describing the forwarding semantics of the label being advertised. This can be useful in network visualization and controller based traffic engineering (e.g. EPE). Vairavakkalai, et al. Expires 1 July 2022 [Page 1] Internet-Draft BGP MultiNexthop attribute December 2021 Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 1 July 2022. Copyright Notice Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Use-cases examples . . . . . . . . . . . . . . . . . . . . . 4 2.1. Optimal forwarding exit-points signaling to ingress-node . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Choosing a received label based on it's forwarding-semantic at advertising node . . . . . . . . . . . . . . . . . . . 5 2.3. Signaling desired forwarding behavior when installing MPLS Upstream labels at receiving node . . . . . . . . . . . . 5 2.4. Load-balancing over EBGP parallel links . . . . . . . . . 5 Vairavakkalai, et al. Expires 1 July 2022 [Page 2] Internet-Draft BGP MultiNexthop attribute December 2021 2.5. Flowspec routes with multiple Redirect-IP nexthops . . . 6 2.6. Color-Only resolution nexthop . . . . . . . . . . . . . . 6 3. The "MultiNexthop (MNH)" BGP attribute encoding . . . . . . . 6 3.1. Operations . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1. BGP Capability for MNH attribute . . . . . . . . . . 8 3.1.2. Scope of use, and propagation . . . . . . . . . . . . 8 3.1.3. Interaction of MNH with Nexthop (in attr-code 3, 14) . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.4. Interaction with Addpath . . . . . . . . . . . . . . 9 3.1.5. Path-selection considerations . . . . . . . . . . . . 9 3.1.6. NH-Flags U bit, denoting upstream/downstream semantics . . . . . . . . . . . . . . . . . . . . . . 9 3.2. Nexthop Forwarding Semantics TLV . . . . . . . . . . . . 10 3.3. Nexthop-Leg Descriptor TLV . . . . . . . . . . . . . . . 11 3.4. Nexthop Attributes Sub-TLV . . . . . . . . . . . . . . . 12 3.4.1. IP Address . . . . . . . . . . . . . . . . . . . . . 12 3.4.2. Labeled IP nexthop . . . . . . . . . . . . . . . . . 13 3.4.3. Transport Class ID (Color) . . . . . . . . . . . . . 14 3.4.4. Available Bandwidth . . . . . . . . . . . . . . . . . 15 3.4.5. Load balance factor . . . . . . . . . . . . . . . . . 16 3.4.6. Forwarding-context name . . . . . . . . . . . . . . . 17 3.4.7. Forwarding-context Route-Target . . . . . . . . . . . 17 4. Error handling procedures . . . . . . . . . . . . . . . . . . 18 5. Scaling considerations . . . . . . . . . . . . . . . . . . . 19 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 20 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 9.1. Normative References . . . . . . . . . . . . . . . . . . 20 9.2. References . . . . . . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 1. Introduction Today, a BGP speaker can advertise one nexthop for a set of NLRIs in an Update. This nexthop can be encoded in either the top-level BGP- Nexthop attribute (code 3), or inside the MP_REACH attribute (code 14). For cases where multiple nexthops need to be advertised, BGP-Addpath is used. Though Addpath allows basic ability to advertise multiple- nexthops, it does not allow the sender to specify desired relationship between the multiple nexthops being advertised e.g., relative-ordering, type of load-balancing, fast-reroute. These are local decision at the receiving node based on local configuration and path-selection between the various additional-paths, which may tie- break on some arbitrary step like Router-Id or BGP nexthop address. Vairavakkalai, et al. Expires 1 July 2022 [Page 3] Internet-Draft BGP MultiNexthop attribute December 2021 Some scenarios with a BGP-free core may benefit from having a mechanism, where egress-node can signal multiple-nexthops along with their relationship to ingress nodes. This document defines a new BGP attribute "MultiNexthop (MNH)" that can be used for this purpose. This attribute can be used for both labeled and unlabled BGP families. The MNH can be used to advertise MPLS label along with nexthop for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). Such that, mechanisms at the transport layer can work uniformly on labeled and unlabled BGP families. Service route scale can be confined closer to the service edge nodes, making the transport layer nodes light and nimble. They dont have any service route state, only have service end-point state. The MNH plays different role in "downstream allocation" scenario than "upstream allocation" scenario. E.g. for RFC8277 families that advertise downstream allocated labels, the MNH can play the "Label Descriptor" role, describing the forwarding semantics of the label being advertised. This can be useful in network visualization and controller based traffic engineering (e.g. EPE). A new BGP capability ([RFC3392]) called "MultiNexthop (MNH" is defined with type code: IANA TBD. This capability is used to express the ability to send and receive MNH attribute. 2. Use-cases examples 2.1. Optimal forwarding exit-points signaling to ingress-node In a BGP free core, one can dynamically signal to the ingress-node, how traffic should be load-balanced towards a set of exit-nodes, in one BGP-route containing this attribute. Example, for prefix1, perform equal cost load-balancing towards exit- nodes A, B; where-as for prefix2, perform unequal-cost load-balancing (40%, 30%, 30%) towards exit-nodes A, B, C. Example, for prefix1, use PE1 as primary-nexthop and use PE2 as a backup-nexthop. Vairavakkalai, et al. Expires 1 July 2022 [Page 4] Internet-Draft BGP MultiNexthop attribute December 2021 2.2. Choosing a received label based on it's forwarding-semantic at advertising node In Downstream label allocation case, the MNH plays role of "Label descriptor" and describes the forwarding treatment given to the label at the advertising speaker. The receiving speaker can benefit from this information as in the following examples: - For a Prefix, a label with FRR enabled nexthop-set can be preferred to another label with a nexthop-set that doesn't provide FRR. - For a Prefix, a label pointing to 10g nexthop can be preferred to another label pointing to a 1g nexthop - Set of labels advertised can be aggregated, if they have same forwarding semantics (e.g. VPN per-prefix-label case) 2.3. Signaling desired forwarding behavior when installing MPLS Upstream labels at receiving node In Upstream label allocation case, the receiving speaker's forwarding-state can be controlled by the advertising speaker, thus enabling a standardized API to program desired MPLS forwarding-state at the receiving node. This is described in the [MPLS-NAMESPACES] 2.4. Load-balancing over EBGP parallel links Consider N parallel links between two EBGP speakers. There are different models possible to do load balancing over these links: N single-hop EBGP sessions over the N links. Interface addresses are used as next-hops. N copies of the RIB are exchanged to form N-way ECMP paths. The routes advertised on the N sessions can be attached with Link bandwidth comunity to perform weighted ECMP. 1 multi-hop EBGP session between loopback addresses, reachable via static route over the N links. Loopback addresses are used as next-hops. 1 copy of the RIB is exchanged with loopback address as nexthop. And a static route can be configured to the loopback address to perform desired N-way ECMP path. M loopbacks are configured in this model, to achieve M different load balancing schemes: ECMP, weighted ECMP, Fast-reroute enabled paths etc. Vairavakkalai, et al. Expires 1 July 2022 [Page 5] Internet-Draft BGP MultiNexthop attribute December 2021 1 multi-hop EBGP session between loopback addresses, reachable via static route over the N links. Interface addresses are used as next-hops, without using additional loopbacks. 1 copy of the RIB is exchanged with MNH attribute to form N-way ECMP paths, weighted ECMP, Fast-reroute backup paths etc. BFD may be used to these directly connected BGP nexthops to detect liveness. 2.5. Flowspec routes with multiple Redirect-IP nexthops There are existing protocol machinery which can benefit from the ability of MNH to clearly specify fallback behavior when multiple nexthops are involved. One example is the scenario described in [FLWSPC-REDIR-IP] where multiple Redirect-to-IP nexthop addresses exist for a Flowspec prefix. In such a scenario, the receiving speakers may redirect the traffic to different nexthops, based on variables like IGP-cost. If instead, the MNH was used to specify the redirect-to-IP nexthop, then the order of preference between the different nexthops can be clearly specified using one flowspec route carrying a MNH containing those different nexthop-addresses specifying the desired preference-order. Such that, irrespective of IGP-cost, the receiving speakers will redirect the flow towards the same traffic collector device. 2.6. Color-Only resolution nexthop Another existing protocol machinery that manufactures nexthop addresses from overloaded extended color community is specified in [SRTE-COLOR-ONLY]. In a way, the color field is overloaded to carry one anycast BGP next-hop with pre-specified fallback options. This approach gives us only two next-hops to play with. The 'BGP nexthop address' and the 'Color-only nexthop' Instead, the MNH could be used to achieve the same result with more flexibility. Multiple BGP nexthops can be carried, each resolving over a desired Transport class (Color), and with customizable fallback order. And the solution will work for non-SRTE networks as- well. 3. The "MultiNexthop (MNH)" BGP attribute encoding "MultiNexthop (MNH)" is a new BGP optional non-transitive attribute (code TBD), that can be used to convey multiple-nexthops to a BGP- speaker. This attribute describes forwarding semantics using one or more Nexthop-Forwarding-Semantics TLV. Vairavakkalai, et al. Expires 1 July 2022 [Page 6] Internet-Draft BGP MultiNexthop attribute December 2021 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1 0 0 1(Flags) |Attr. Type Code| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MNH-Flags | PNH-Len | ..Advertising| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PNH Address /32 or /128.. | Num-Nexthops | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...one or more "Nexthop-Forwarding-Semantics TLV"... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig 1: MultiNexthop - BGP Attribute - Flags BGP Path-attribute flags. 1001 to indicate Optional Non-Transitive, Extended-length field. - Attr. Type Code IANA TBD. - Length Two bytes field stating length of attribute value in bytes. - MNH-Flags 16 bit flag (UR..R) Only one bit MSB is defined currently, others are reserved. R: Reserved U: 1 means the Upstream-allocation, attribute describes forwarding state desired at receiving speaker. 0 means the Downstream-allocation, attribute describes forwarding state present at advertising-speaker. - PNH-Len Protocol-NH Length in bits (= 32 or 128) Advertising PNH IPv4 or IPv6 - PNH-address BGP Protocol Nexthop address (Len = 32 or 128) advertised in NEXT_HOP or MP_REACH_NLRI attr. Used to sanity-check this attribute. - Num-Nexthops Number of nexthop addresses carried in the MNH. >1 if ECMP or Alternate-paths. Sec 3.2 describes the Nexthop-Forwarding-Semantics TLV. Vairavakkalai, et al. Expires 1 July 2022 [Page 7] Internet-Draft BGP MultiNexthop attribute December 2021 3.1. Operations 3.1.1. BGP Capability for MNH attribute A new BGP capability [RFC3392] called "MultiNexthop (MNH)" is defined with type code: IANA TBD. The MNH attribute MUST NOT be sent to a BGP speaker that has not advertise the MNH capability. A BGP speaker MUST ignore the MNH attribute received from a peer which has not advertised the MNH attribute. 3.1.2. Scope of use, and propagation The MNH attribute is intended to be used in a BGP free core, between egress and ingress BGP speakers that understand this attribute. Also, it is required to avoid un-intentionally leaking it to other AS on an EBGP session, via a BGP speaker that does not understand MNH attribute. To achieve this, the attribute is defined as "optional non- transitive", and uses a new BGP capability. If a MNH-attribute is received by a PE BGP-speaker that does not understand it, the optional non-transitive nature avoids unintentionally propagating it towards EBGP-peers. This also means that a RR needs to be upgraded to support this attribute before any PEs in the network can make use of it. When a RR receives the MNH-attribute from a client that supports the attribute, it propagates the attribute as-is when reflecting the route with nexthop unchanged. When a BGP speaker receives the MNH-attribute from another speaker that did not advertise support of the attribute, the attribute is ignored. The MNH attribute capability provides additonaly protection against receiving this attribute from EBGP peers, when not intended. 3.1.3. Interaction of MNH with Nexthop (in attr-code 3, 14) When adding a MultiNexthop attribute to an advertised BGP route, the speaker MUST put the same next-hop address in the Advertising PNH field as it put in the Nexthop field inside NEXT_HOP attribute or MP_REACH_NLRI attribute. Any speaker that recognizes this attribute and changes the PNH while re-advertising the route MUST remove the MultiNexthop-Attribute in the re-advertisement. The speaker MAY however add a new MultiNexthop-Attribute to the re-advertisement; while doing so the speaker MUST record in the "Advertising-PNH" field Vairavakkalai, et al. Expires 1 July 2022 [Page 8] Internet-Draft BGP MultiNexthop attribute December 2021 the same next-hop address as used in NEXT_HOP field or MP_REACH_NLRI attribute. A speaker receiving a MNH attribute SHOULD ignore it if the next-hop address contained in Advertising-PNH field is not the same as the next-hop address contained in NEXT_HOP field or MP_REACH_NLRI field. 3.1.4. Interaction with Addpath [ADDPATH-GUIDELINES] suggests the following: "Diverse path: A BGP path associated with a different BGP next-hop and BGP router than some other set of paths. The BGP router associated with a path is inferred from the ORIGINATOR_ID attribute or, if there is none, the BGP Identifier of the peer that advertised the path." When selecting "diverse paths" for ADD_PATH as specified above, the MNH attribute should also be compared if it exists, to determine if two routes have "different BGP next-hop". 3.1.5. Path-selection considerations While tie breaking in the path-selection as described in RFC-4271, 9.1.2.2. step (e) viz. the "IGP cost to nexthop", consider the highest cost among the nexthop-legs present in this attribute. 3.1.6. NH-Flags U bit, denoting upstream/downstream semantics U-bit being Set indicates that this attribute describes what the forwarding semantics of an Upstream-allocated label at the receiving- speaker should be. All other bits in NH-Flags are currently reserved, MUST be set to 0 by sender and MUST be ignored by receiver. This attribute can be used for both labeled and unlabled BGP families. A MultiNexthop attribute with U=0 is called "Label Descriptor" role. A BGP speaker advertising a downstream-allocated label-route MAY add this attribute to the BGP route Update, to "describe" to the receiving speaker what the label's forwarding semantics at the sending speaker is. Today semantics of a downstream-allocated label is known only to the egress-node advertising the label. The speaker receiving the label- binding doesn't know what the label's forwarding semantic at the advertiser is. In some environments, it may be useful to convey this information to the receiving speaker. This may help in better Vairavakkalai, et al. Expires 1 July 2022 [Page 9] Internet-Draft BGP MultiNexthop attribute December 2021 debugging and manageability, or enable the receiving speaker, which could also be some centralized controller, make better decisions about which label to use, based on the label's forwarding-semantic. While doing upstream-label allocation, this attribute (U-bit Set) can be used to convey the forwarding-semantics at the receiving node should be. Details of the BGP protocol extensions required for signaling upstream-label allocation are out of scope of this document, and are described in [MPLS-NAMESPACES]. In rest of this document, the use of term "Label" will mean downstream allocated label, unless specified otherwise as upstream- allocated label. When using the MultiNexthop attribute for IP-routes, U-bit is Set. Since IP prefixes are by nature upstream allocated. 3.2. Nexthop Forwarding Semantics TLV Each Forwarding-Semantics TLV expresses a nexthop leg's forwarding action. i.e. a "FwdAction" with an associated Nexthop. The type of actions defined by this TLV are given below. The "Nexthop-Leg" field takes appropriate values based on the FwdAction. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | FwdAction | Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...Nexthop-Leg Descriptor-TLV... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig 2: Nexthop Forwarding Semantics TLV FwdAction Meaning --------- ------------- 1 Forward 2 Pop-And-Forward 3 Swap 4 Push 5 Pop-And-Lookup 6 Replicate - Len Length of Nexthop Forwarding Semantics TLV including all Nexthop-Leg Descriptor TLVs. Vairavakkalai, et al. Expires 1 July 2022 [Page 10] Internet-Draft BGP MultiNexthop attribute December 2021 Meaning of most of the above FwdAction semantics is well understood. FwdAction 1 is applicable for both IP and MPLS routes. FwdActions 2-5 are applicable for MPLS routes only. FwdActions 1 and 6 are applicable for Flowspec routes for Redirect and Mirror actions. The "Forward" action means forward the IP/MPLS packet with the destination prefix (IP-dest-addr/MPLS-label) value unchanged. For IP routes, this is the forwarding-action given for next-hop addresses contained in BGP path-attributes: Nexthop (code 3) or MP_REACH_NLRI (code 14). For MPLS routes, usage of this action is equivalent to SWAP with same label-value; one such usage is explained in [MPLS-NAMESPACES] when Upstream-label-allocation is in use. The "Pop-And-Forward" action means Pop the MPLS-label and forward the payload towards the Nexthop IP-address specified in the sub-TLV, using appropriate encapsulation to reach the Nexthop. The "Pop-And-Lookup" action may result in a MPLS-lookup or an upper- layer header (like IPv4, IPv6) lookup, depending on whether the label that was popped was the bottom of stack label. If an incompatible FwdAction is received for a prefix-type, or an unsupported FwdAction is received, it is considered a semantic-error and MUST be dealt with as explained in section 5. 3.3. Nexthop-Leg Descriptor TLV The Nexthop-Leg Descriptor TLV describes various attributes of the Nexthop-legs that the FwdAction is associated with. 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NhopDescrType | Len | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Relative-Preference | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ..Nexthop Attributes SubTLV.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ..Nexthop Attributes SubTLV.. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fig 3: Nexthop-Leg Descriptor TLV Vairavakkalai, et al. Expires 1 July 2022 [Page 11] Internet-Draft BGP MultiNexthop attribute December 2021 NhopDescrType Meaning ------------- --------- 1 IPv4-nexthop 2 IPv6-nexthop 3 Labeled-IP-Nexthop 4 Forwarding-Context-Nexthop - Len (2 octets) Length in bytes of Nexthop-Leg Descriptor TLV, including Flags, Relative-Preference and all Nexthop Attributes SubTLVs. - Flags 2 octets. Must send zero. Must ignore on receive. - Relative-Preference Unsigned 2 octet integer specifying relative order or preference, to use in FIB. Use in FIB all usable legs with lowest relative-weight. If multiple legs exist with that weight, form ECMP. 3.4. Nexthop Attributes Sub-TLV SubTLV type Meaning ----------- ---------- 1 IP-Address 2 Labeled-IP-Nexthop 3 Transport Class ID (Color) 4 Bandwidth 5 Load-Balance-Factor 6 Forwarding-context Name 7 Forwarding-context Route-Target 3.4.1. IP Address Vairavakkalai, et al. Expires 1 July 2022 [Page 12] Internet-Draft BGP MultiNexthop attribute December 2021 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 1 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags (2 bytes) | PfxLen | ..IPv4 or | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IPv6 Address .. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Flags 2 octets. Must send zero. Must ignore on receive. - PfxLen (1 octet) Length in bits of Nexthop IP-address (32 or 128) - IPv4 or IPv6 Address Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address. Fig 4: IP-Address attribute sub-TLV This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Pop-And-Forward or Forward. 3.4.2. Labeled IP nexthop Vairavakkalai, et al. Expires 1 July 2022 [Page 13] Internet-Draft BGP MultiNexthop attribute December 2021 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 2 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags (2 bytes) | Label (20 bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |Rsrv |S| PfxLen | ..IPv4 or IPv6 Address .. | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Flags (2 octets): ELC (MSB bit): indicates if this egress NH is Entropy Label Capable. Remaining bits are Reserved. Must send zero. Must ignore on receive. - Label: The Label field is a 20-bit field containing an MPLS label value (see [RFC3032]). - Rsrv: This 3-bit field SHOULD be set to zero on transmission and MUST be ignored on reception. - S: This 1-bit field MUST be set to one on last label being pushed. - PfxLen (1 octet) Length in bits of Nexthop IP-address (32 or 128) - IPv4 or IPv6 Address Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address. Fig 5: "Labeled nexthop" attribute sub-TLV This sub-TLV would be valid with Nexthop-Leg Forwarding-Semantics TLV with FwdAction of Swap or Push. 3.4.3. Transport Class ID (Color) The Nexthop can be associated with a Transport Class, so as to resolve a path that satisfies required Transport tunnel characteristics. Transport Class is defined in [BGP-CT] Vairavakkalai, et al. Expires 1 July 2022 [Page 14] Internet-Draft BGP MultiNexthop attribute December 2021 Transport Class is a per-nexthop scoped attribute. Without MNH, the Transport class is applied to the nexthop IP-address encoded in the BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute (code 14). With MNH, the Transport Class can be specified per Nexthop-Leg TLV. It is applied to the IP-address encoded in the Nexthop Attribute Sub-TLVs of type "IP Address", "Labeled IP nexthop". The format of the Transport Class ID Sub-TLV is as follows: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 3 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transport Class ID (4 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Transport Class ID (Color): This is a 32 bit identifier, associated with the Nexthop address. The Nexthop specified in "IP-address or Labeled Nexthop" TLVs are resolved over tunnels of this color. Defined in [BGP-CT] [draft-kaliraj-idr-bgp-classful-transport-planes] Fig 6: "Transport Class ID (Color)" attribute sub-TLV This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Forward, Swap or Push. 3.4.4. Available Bandwidth Vairavakkalai, et al. Expires 1 July 2022 [Page 15] Internet-Draft BGP MultiNexthop attribute December 2021 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 4 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Bandwidth (8 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Bandwidth (contd.) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Bandwidth The bandwidth of the link expressed as 8 octets, units being bits per second. Fig 6: "Bandwidth" attribute sub-TLV This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Forward, Swap or Push. This sub-TLV would also be valid in a Label-Descriptor-attribute whose U-bit is reset. 3.4.5. Load balance factor 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 5 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Balance Percentage | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Balance Percentage: This is the explicit "balance percentage" requested by the sender, for unequal load-balancing over these Nexthop-Descriptor-TLV legs. This balance percentage would override the implicit balance-percentage calculated using "Bandwidth" attribute sub-TLV. Fig 7: "Load-Balance-Factor" attribute sub-TLV Vairavakkalai, et al. Expires 1 July 2022 [Page 16] Internet-Draft BGP MultiNexthop attribute December 2021 This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Forward, Swap or Push. This is the explicit "balance percentage" requested by the sender, for unequal load-balancing over these Nexthop-Descriptor-TLV legs. This balance percentage would override the implicit balance- percentage calculated using "Bandwidth" attribute sub-TLV 3.4.6. Forwarding-context name 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 6 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NameLen (2 octets) | ..Fwd-Context-name...(unicode)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - NameLen (2 octets) Length in bytes of Fwd-Context-Name - Forwarding Context Name: Name of forwarding context (e.g. VRF-name) where lookup should happen. Fig 8: Forwarding-Context name attribute sub-TLV This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Fowarding- context-name identfies the forwarding-context (for e.g. the VRF- name) where the lookup should happen after pop label. 3.4.7. Forwarding-context Route-Target Vairavakkalai, et al. Expires 1 July 2022 [Page 17] Internet-Draft BGP MultiNexthop attribute December 2021 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Attr SubTLV Type = 7 | Len (2 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type (2 octets) | ...Route Target... (8 octets)| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ..Route Target... (continued) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |...Route Target... (8 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - Len (2 octets) Length in bytes of remaining portion of SubTLV. - Type: value of 1 indicates Route Target follows. - Route Target: Import Route Target of the forwarding context (e.g. VRF-name) where lookup should happen. Fig 9: "Route-Target identifying the Forwarding-Context" attribute sub-TLV This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Route Target identfies the forwarding-context (for e.g. VRF) where the lookup should happen after pop label. If any of these sub-TLVs or FwdAction combinations are unrecognized or unsupported by a receiving speaker, it is considered a semantic error for that speaker, and in such case error-handling procedures described in section 4 should be followed. 4. Error handling procedures When U-bit is Reset, this attribute is used to describe the label advertised by the BGP-peer. If the value in the attribute is syntactically parse-able, but not semantically valid, the receiving speaker should deal with the error gracefully and MUST NOT tear down the BGP session. In such cases the rest of the BGP-update can be consumed if possibe. Vairavakkalai, et al. Expires 1 July 2022 [Page 18] Internet-Draft BGP MultiNexthop attribute December 2021 When U-bit is Set, this attribute is used to specify the forwarding action at the receiving BGP-peer. If the value in the attribute is syntactically parse-able, but not semantically valid, the receiving speaker SHOULD deal with the error gracefully by ignoring the MNH attribute, and continue processing the route. It MUST NOT tear down the BGP session. If a MNH with U-bit Reset is received for an IP-route (SAFI Unicast), the MNH attribute SHOULD be ignored. Because IP route prefixes are upstream allocated by nature. If a MNH with U-bit Reset is received for an [MPLS-NAMESPACES] route, the MNH attribute SHOULD be ignored. Because the label prefix in MPLS-NAMESPACE family routes is upstream allocated. The receiving BGP speaker MAY consider the "Num-Nexthop" value in a MNH attribute (U-bit Set) not acceptable, based on it's forwarding capabilities. In such cases, the MNH attribute SHOULD be considered Unusable, and not be used, ignored on receipt. The condition SHOULD be dealt gracefully and MUST NOT tear down the BGP session. 5. Scaling considerations The MNH attribute allows receiving multiple nexthops on the same BGP session. This flexibility also opens up the possibility that a peer can send large number of multipath (ECMP/UCMP/FRR) nexthops that may overwhelm the local system's forwarding plane. Prefix-limit based checks will not avoid this situation. To keep the scaling limits under check, a BGP speaker MAY keep account of number of unique multipath nexthops that are received from a BGP peer, and impose a configurable max-limit on that. This is especially useful for EBGP peers. A good scaling property of conveying multipath nexthops using the MNH attribute with N nexthop legs on one BGP session, as against BGP routes on N BGP sessions is that, it limits the amount of transitionary multipath combinatorial state in the latter model. Because the final multipath state is conveyed by one route update in deterministic manner, there is no transitionary multipath combinatorial explosion created during establishment of N sessions. 6. IANA Considerations This document makes request to IANA to allocate the following codes in BGP attributes registry. 1. MultiNexthop (MNH) BGP-attribute: A new BGP attribute code TBD. Vairavakkalai, et al. Expires 1 July 2022 [Page 19] Internet-Draft BGP MultiNexthop attribute December 2021 This document makes request to IANA to allocate the following sub registries for MNH attribute:. 1. "FwdAction" type as defined in 3.1. 2. Nexthop-Leg Descriptor TLV:"NhopDescrType" as defined in 3.2. 3. "Nexthop Attributes Sub-TLV type" as defined in 3.3. This document makes request to IANA to allocate a BGP capability code TBD for MNH attribute:. Note to RFC Editor: this section may be removed on publication as an RFC. 7. Security Considerations The attribute is defined as optional non-transitive BGP attribute, such that it does not accidentally get propagated or leaked via BGP speakers that dont support this feature, especially does not unintentionally leak across EBGP boundaries. 8. Acknowledgements Thanks to Robert Raszuk, Gyan Mishra, Ron Bonica for the review, discussions and input to the draft. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3392] Chandra, R. and J. Scudder, "Capabilities Advertisement with BGP-4", RFC 3392, DOI 10.17487/RFC3392, November 2002, . 9.2. References [ADDPATH-GUIDELINES] Uttaro, Ed., "BGP Flow-Spec Redirect to IP Action", 25 April 2016, . Vairavakkalai, et al. Expires 1 July 2022 [Page 20] Internet-Draft BGP MultiNexthop attribute December 2021 [BGP-CT] Vairavakkalai, Ed., "BGP Classful Transport Planes", 25 August 2021, . [FLWSPC-REDIR-IP] Simpson, Ed., "BGP Flow-Spec Redirect to IP Action", 2 February 2015, . [MPLS-NAMESPACES] Vairavakkalai, Ed., "BGP signalled MPLS-namespaces", 28 December 2021, . [SRTE-COLOR-ONLY] Filsfils, Ed., "BGP Flow-Spec Redirect to IP Action", 21 February 2018, . Authors' Addresses Kaliraj Vairavakkalai Juniper Networks, Inc. 1194 N. Mathilda Ave. Sunnyvale, CA 94089 United States of America Email: kaliraj@juniper.net Minto Jeyananth Juniper Networks, Inc. 1194 N. Mathilda Ave. Sunnyvale, CA 94089 United States of America Email: minto@juniper.net Gyan Mishra Verizon Communications Inc. 13101 Columbia Pike Silver Spring, MD 20904 United States of America Email: gyan.s.mishra@verizon.com Vairavakkalai, et al. Expires 1 July 2022 [Page 21]