Network Working Group K. Patel Internet-Draft Arrcus, Inc. Intended status: Standards Track A. Lindem Expires:February 4,July 30, 2021 Cisco Systems S. Zandi LinkedIn W. Henderickx NokiaAugust 3, 2020January 26, 2021 BGP Link-State Shortest Path First (SPF) RoutingExtensions for BGP Protocol draft-ietf-lsvr-bgp-spf-11draft-ietf-lsvr-bgp-spf-12 Abstract Many Massively Scaled Data Centers (MSDCs) have converged on simplified layer 3 routing. Furthermore, requirements for operational simplicity have led many of these MSDCs to converge on BGP as their single routing protocol for both their fabric routing and their Data Center Interconnect (DCI) routing. This document describesa solution which leveragesextensions to BGP to use BGP Link-State distribution and the Shortest Path First (SPF) algorithmsimilar toused by Internal Gateway Protocols (IGPs) such as OSPF. In doing this, it allows BGP to be efficiently used as both the underlay protocol and the overlay protocol in MSDCs. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onFebruary 4,July 30, 2021. Copyright Notice Copyright (c)20202021 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. BGP Shortest Path First (SPF) Motivation . . . . . . . . 41.2.1.3. Document Overview . . . . . . . . . . . . . . . . . . . . 6 1.4. Requirements Language . . . . . . . . . . . . . . . . . .56 2. Base BGP Protocol Relationship . . . . . . . . . . . . . . . 6 3. BGP Link-State (BGP-LS) Relationship . . . . . . . . . . . . 7 4. BGP Peering Models . . . . . . . . . . . . . . . . . . . . .5 2.1.8 4.1. BGP Single-Hop Peering on Network Node Connections . . .5 2.2.8 4.2. BGP Peering BetweenDirectly Connected NetworkDirectly-Connected Nodes . .5 2.3.. . . . 8 4.3. BGP Peering in Route-Reflector or Controller Topology . .6 3.9 5. BGP Shortest Path Routing (SPF) Protocol Extensions . . . . . 9 5.1. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . 9 5.1.1. BGP-LS-SPF NLRI TLVs . .6 4. Extensions to. . . . . . . . . . . . . . 9 5.1.2. BGP-LS Attribute . . . . . . . . . . . . . . . . . . 10 5.2. Extensions to BGP-LS . . . . .6 4.1.. . . . . . . . . . . . . 11 5.2.1. Node NLRI Usage . . . . . . . . . . . . . . . . . . .. . 7 4.1.1.11 5.2.1.1. Node NLRI Attribute SPF Capability TLV . . . . .. . 7 4.1.2. BGP-LS11 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV . .. . . . 8 4.2.12 5.2.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . .. . 8 4.2.1. BGP-LS13 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length TLVs. . . . 9 4.2.2. BGP-LS14 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV . . 14 5.2.3. IPv4/IPv6 Prefix NLRI Usage . . . .9 4.3. Prefix NLRI Usage. . . . . . . . . 15 5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status TLV . 16 5.2.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . 16 5.3. NEXT_HOP Manipulation . .10 4.3.1. BGP-LS Prefix NLRI Attribute SPF Status TLV. . . . .10 4.4. BGP-LS Attribute Sequence-Number TLV. . . . . . . . . .10 5.. 17 6. Decision Process with SPF Algorithm . . . . . . . . . . . . .11 5.1. Phase-118 6.1. BGP NLRI Selection . . . . . . . . . . . . . . .12 5.2. Dual Stack Support . .. . . . 19 6.1.1. BGP Self-Originated NLRI . . . . . . . . . . . . .13 5.3. SPF Calculation based on BGP-LS NLRI. 20 6.2. Dual Stack Support . . . . . . . . .13 5.4. NEXT_HOP Manipulation. . . . . . . . . . 20 6.3. SPF Calculation based on BGP-LS-SPF NLRI . . . . . . . .16 5.5.20 6.4. IPv4/IPv6 Unicast Address Family Interaction . . . . . .16 5.6.25 6.5. NLRI Advertisementand Convergence. . . . . . . . . . .17 5.6.1.. . . . . . . . 25 6.5.1. Link/Prefix Failure Convergence . . . . . . . . . . .17 5.6.2.25 6.5.2. Node Failure Convergence . . . . . . . . . . . . . .17 5.7.26 7. Error Handling . . . . . . . . . . . . . . . . . . . . .18 6.. . 26 7.1. Processing of BGP-LS-SPF TLVs . . . . . . . . . . . . . . 26 7.2. Processing of BGP-LS-SPF NLRIs . . . . . . . . . . . . . 27 7.3. Processing of BGP-LS Attribute . . . . . . . . . . . . . 28 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . .18 7.29 9. Security Considerations . . . . . . . . . . . . . . . . . . .18 8.30 10. Management Considerations . . . . . . . . . . . . . . . . . .18 8.1.31 10.1. Configuration . . . . . . . . . . . . . . . . . . . . . 31 10.1.1. Link Metric Configuration . .18 8.2.. . . . . . . . . . . 31 10.1.2. backoff-config . . . . . . . . . . . . . . . . . . . 31 10.2. Operational Data . . . . . . . . . . . . . . . . . . . .19 9.31 11. Implementation Status . . . . . . . . . . . . . . . . . . . .19 10.32 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . .19 11.32 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . .20 12.32 14. References . . . . . . . . . . . . . . . . . . . . . . . . .20 12.1.33 14.1. Normative References . . . . . . . . . . . . . . . . . .20 12.2. Information33 14.2. Informational References . . . . . . . . . . . . . . . .. 2135 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . .2336 1. Introduction Many Massively Scaled Data Centers (MSDCs) have converged on simplified layer 3 routing. Furthermore, requirements for operational simplicity have led many of these MSDCs to converge on BGP [RFC4271] as their single routing protocol for both their fabric routing and their Data Center Interconnect (DCI)routing. Requirements and procedures for using BGP are described inrouting [RFC7938]. This document describes an alternative solution which leverages BGP- LS [RFC7752] and the Shortest Path First algorithmsimilar toused by Internal Gateway Protocols (IGPs) such as OSPF [RFC2328].[RFC4271] defines the Decision Process that is used to select routes for subsequent advertisement by applying the policies inThis document leverages both thelocal Policy Information Base (PIB) toBGP protocol [RFC4271] and theroutes stored in its Adj-RIBs- In.BGP- LS [RFC7752] protocols. Theoutput of the Decision Process isrelationship, as well as thesetscope ofroutes that are announced by a BGP speaker to its peers. These selected routeschanges arestored by a BGP speakerdescribed respectively inthe speaker's Adj-RIBs-Out according to policy. [RFC7752] describes a mechanism by which link-state and TE information can be collected from networksSection 2 andshared with external components using BGP. This is achieved by defining NLRI advertised within the BGP-LS/BGP-LS-SPF AFI/SAFI.Section 3. TheBGP-LS extensions defined in [RFC7752] makes use of the Decision Process defined in [RFC4271]. This document augments [RFC7752] by replacing its use of the existing Decision Process. Rather than reusing the BGP-LS SAFI, the BGP-LS-modifications to [RFC4271] for BGP SPFSAFI is introduceddescribed herein only apply toinsure backward compatibility. The Phase 1IPv4 and2 decision functions of the Decision Process are replaced with the Shortest Path First (SPF) algorithm also knownIPv6 as underlay unicast Subsequent Address Families Identifiers (SAFIs). Operations for any other BGP SAFIs are outside theDijkstra algorithm. The Phase 3 decision function is also simplified since it is no longer dependent on the previous phases.scope of this document. This solution avails the benefits of both BGP and SPF-based IGPs. These include TCP based flow-control, no periodic link-state refresh, and completely incremental NLRI advertisement. These advantages can reduce the overhead in MSDCs where there is a high degree of Equal CostMulti- PathMulti-Path (ECMPs) and the topology is very stable. Additionally, using an SPF-based computation can support fast convergence and the computation of Loop-Free Alternatives(LFAs) [RFC5286](LFAs). The SPF LFA extensions defined in [RFC5286] can be similarly applied to BGP SPF calculations. However, theeventdetails are a matter oflink failures.implementation detail. Furthermore, aBGP basedBGP-based solution lends itself to multiple peering models including those incorporatingroute- reflectorsroute-reflectors [RFC4456] or controllers.Support for Multiple Topology1.1. Terminology This specification reuses terms defined in section 1.1 of [RFC4271] including BGP speaker, NLRI, and Route. Additionally, this document introduces the following terms: BGP SPF Routing(MTR)Domain: A set of BGP routers that are under a single administrative domain and exchange link-state information using the BGP-LS-SPF SAFI and compute routes using BGP SPF as described herein. BGP-LS-SPF NLRI: This refers to BGP-LS Network Layer Reachability Information (NLRI) that is being advertised in[RFC4915]the BGP-LS-SPF SAFI (Section 5.1) and isan areabeing used forfurther study dependent on deployment requirements. 1.1.BGP SPF route computation. Dijkstra Algorithm: An algorithm for computing the shortest path from a given node in a graph to every other node in the graph. At each iteration of the algorithm, there is a list of candidate vertices. Paths from the root to these vertices have been found, but not necessarily the shortest ones. However, the paths to the candidate vertex that is closest to the root are guaranteed to be shortest; this vertex is added to the shortest-path tree, removed from the candidate list, and its adjacent vertices are examined for possible addition to/modification of the candidate list. The algorithm then iterates again. It terminates when the candidate list becomes empty. [RFC2328] 1.2. BGP Shortest Path First (SPF) Motivation Given that [RFC7938] already describes how BGP could be used as the sole routing protocol in an MSDC, one might question the motivation for defining an alternate BGP deployment model when a mature solution exists. For both alternatives, BGP offers the operational benefits of a single routingprotocol.protocol as opposed to the combination of an IGP for the underlay and BGP as an overlay. However, BGP SPF offers some unique advantages above and beyond standard BGP distance-vector routing. With BGP SPF, the standard hop-by-hop peering model is relaxed. A primary advantage is that allBGPBGP-LS-SPF speakers in the BGP SPF routing domain will have a complete view of the topology. This will allow support for ECMP, IP fast-reroute (e.g., Loop-Free Alternatives), Shared Risk Link Groups (SRLGs), and other routing enhancements without advertisement ofadditionadditional BGP paths [RFC7911] or other extensions. In short, the advantages of an IGP such as OSPF [RFC2328] are availed in BGP. With the simplified BGP decision process as defined in Section5.1,6, NLRI changes can be disseminated throughout the BGP routing domain much more rapidly (equivalent to IGPs with the proper implementation). The added advantage of BGP using TCP for reliable transport leverages TCP's inherent flow-control and guaranteed in- order delivery. Another primary advantage is a potential reduction in NLRI advertisement. With standard BGP distance-vector routing, a single link failure may impact 100s or 1000s prefixes and result in the withdrawal or re-advertisement of the attendant NLRI. With BGP SPF, only the BGP speakers corresponding to the link NLRI need to withdraw the correspondingBGP-LSBGP-LS-SPF Link NLRI.This advantageAdditionally, the changed NLRI willcontributebe advertised immediately as opposed toboth faster convergence and better scaling.normal BGP where it is only advertised after the best route selection. These advantages will afford NLRI dissemination throughout the BGP SPF routing domain with efficiencies similar to link-state protocols. With controller and route-reflector peering models, BGP SPF advertisement and distributed computation require a minimal number of sessions and copies of the NLRI since only the latest version of the NLRI from the originator is required. Given that verification of the adjacencies is done outside of BGP (see Section2),4), each BGP speaker will only need as many sessions and copies of the NLRI as required for redundancy(e.g., one for the SPF computation and another for backup). Functions such as Optimized Route Reflection (ORR) are supported without extension by virtue of the primary advantages.(see Section 4). Additionally, a controller could inject topology that is learned outside the BGP SPF routing domain. Given that controllers are already consuming BGP-LS NLRI [RFC7752],reusingthis functionality can be reused forthe BGP-LS SPF leverages the existing controller implementations.BGP-LS-SPF NLRI. Another potential advantage of BGP SPF is that both IPv6 and IPv4 can both be supportedin the same address familyusing the BGP-LS-SPF SAFI with the sametopology.BGP-LS-SPF NLRIs. In many MSDC fabrics, the IPv4 and IPv6 topologies are congruent. Althoughnot described in this version ofbeyond the scope of this document, multi- topology extensionscancould be used to support separate IPv4, IPv6, unicast, and multicast topologies while sharing the same NLRI. Finally, the BGP SPF topology can be used as an underlay for other BGPaddress familiesSAFIs (using the existing model) and realize all the above advantages.A simplified1.3. Document Overview The document begins with sections defining the precise relationship that BGP SPF has with both the base BGP protocol [RFC4271] (Section 2) and the BGP Link-State (BGP-LS) extensions [RFC7752] (Section 3). This is required to dispel the notion that BGP SPF is an independent protocol. The BGP peeringmodel using IPv6 link-local addressesmodels, asnext-hops can be deployed similarwell as the their respective trade-offs are then discussed in Section 4. The remaining sections, which make up the bulk of the document, define the protocol enhancements necessary to[RFC5549]. 1.2.support BGP SPF. The BGP-LS extensions to support BGP SPF are defined in Section 5. The replacement of the base BGP decision process with the SPF computation is specified in Section 6. Finally, BGP SPF error handling is defined in Section 7 1.4. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. Base BGP Protocol Relationship With the exception of the decision process, the BGP SPF extensions leverage the BGP protocol [RFC4271] without change. This includes the BGP protocol Finite State Machine, BGP messages and their encodings, processing of BGP messages, BGP attributes and path attributes, BGP NLRI encodings, and any error handling defined in the [RFC4271] and [RFC7606]. Due to the changes to the decision process, there are mechanisms and encodings that are no longer applicable. While not necessarily required for computation, the ORIGIN, AS_PATH, MULTI_EXIT_DISC, LOCAL_PREF, and NEXT_HOP path attributes are mandatory and will be validated. The ATOMIC_AGGEGATE, and AGGREGATOR are not applicable within the context of BGP SPF and SHOULD NOT be advertised. However, if they are advertised, they will be accepted, validated, and propagated consistent with the BGP protocol. Section 9 of [RFC4271] defines the decision process that is used to select routes for subsequent advertisement by applying the policies in the local Policy Information Base (PIB) to the routes stored in its Adj-RIBs-In. The output of the Decision Process is the set of routes that are announced by a BGP speaker to its peers. These selected routes are stored by a BGP speaker in the speaker's Adj- RIBs-Out according to policy. The BGP SPF extension fundamentally changes the decision process, as described herein, to be more like a link-state protocol (e.g., OSPF [RFC2328]). Specifically: 1. BGP advertisements are readvertised to neighbors immediately without waiting or dependence on the route computation as specified in phase 3 of the base BGP decision process. Multiple peering models are supported as specified in Section 4. 2. Determining the degree of preference for BGP routes for the SPF calculation as described in phase 1 of the base BGP decision process is replaced with the mechanisms in Section 6.1. 3. Phase 2 of the base BGP protocol decision process is replaced with the Shortest Path First (SPF) algorithm, also known as the Dijkstra algorithm Section 1.1. 3. BGP Link-State (BGP-LS) Relationship [RFC7752] describes a mechanism by which link-state and TE information can be collected from networks and shared with external entities using BGP. This is achieved by defining NLRI advertised using the BGP-LS AFI. The BGP-LS extensions defined in [RFC7752] make use of the decision process defined in [RFC4271]. This document reuses NLRI and TLVs defined in [RFC7752]. Rather than reusing the BGP-LS SAFI, the BGP-LS-SPF SAFI Section 5.1 is introduced to insure backward compatibility for the BGP-LS SAFI usage. The BGP SPF extensions reuse the Node, Link, and Prefix NLRI defined in [RFC7752]. The usage of the BGP-LS NLRI, metric attributes, and attribute extensions is described in Section 5.2.1. The usage of others BGP-LS attributes is not precluded and is, in fact, expected. However, the details are beyond the scope of this document and will be specified in future documents. Support for Multiple Topology Routing (MTR) similar to the OSPF MTR computation described in [RFC4915] is beyond the scope of this document. Consequently, the usage of the Multi-Topology TLV as described in section 3.2.1.5 of [RFC7752] is not specified. The rules for setting the NLRI next-hop path attribute for the BGP- LS-SPF SAFI will follow the BGP-LS SAFI as specified in section 3.4 of [RFC7752]. 4. BGP Peering Models Depending on therequirements,topology, scaling,andcapabilities of theBGPBGP-LS-SPF speakers, and redundancy requirements, various peering models are supported. The onlyrequirement isrequirements are that all BGP SPF speakers in the BGP SPF routing domainreceive link- state NLRI on a timely basis,exchange BGP-LS-SPF NLRI, run an SPF calculation, and update theirdata planerouting table appropriately.The content of the Link NLRI is described in Section 4.2. 2.1.4.1. BGP Single-Hop Peering on Network Node Connections The simplest peering model is the onedescribed in section 5.2.1 of [RFC7938]. In this model,where EBGP single-hop sessions are established over direct point-to-point links interconnecting theSPF domain nodes. Fornodes in thepurposes ofBGPSPF, Link NLRI is only advertised if aSPF routing domain. Once the single-hop BGP session has been established and theLink-State/SPF address familyBGP-LS-SPF AFI/SAFI capability has been exchanged[RFC4790] on[RFC4760] for the correspondingsession.session, then the link is considered up from a BGP SPF perspective and the corresponding BGP- LS-SPF Link NLRI is advertised. If the session goes down, the corresponding Link NLRI will be withdrawn. Topologically, this would be equivalent to the peering model in [RFC7938] where there is a BGP session on every link in the data center switch fabric.2.2.The content of the Link NLRI is described in Section 5.2.2. 4.2. BGP Peering BetweenDirectly Connected NetworkDirectly-Connected Nodes In this model,BGPBGP-LS-SPF speakers peer with alldirectly connected networkdirectly-connected nodes but the sessions may bemulti-hopbetween loopback addresses (i.e., two- hop sessions) and the direct connection discovery and liveliness detection forthose connectionsthe interconnecting links are independent of the BGP protocol.Howthe scope of this document. For example, liveliness detection could be done using the BFD protocol [RFC5880]. Precisely how discovery and liveliness detection is accomplished is outside the scope of this document. Consequently, there will be a single BGP session even if there are multiple direct connections betweenBGPBGP-LS- SPF speakers.For the purposes of BGP SPF,BGP-LS-SPF Link NLRI is advertised as long as a BGP session has been established, theLink-State/SPF address familyBGP-LS-SPF AFI/SAFI capability has been exchanged[RFC4790][RFC4760], and thecorrespondinglink isconsidered is up and considered operational.operational as determined using liveliness detection mechanisms outside the scope of this document. This is much like the previous peering model only peering ison a singlebetween loopbackaddressaddresses and theswitch fabricinterconnecting links can be unnumbered. However, since therewill be the same number ofare BGP sessionsas withbetween every directly-connected node in theprevious peering model unlessBGP SPF routing domain, there is only a reduction in BGP sessions when there are parallel links betweenswitches in the fabric. 2.3.nodes. 4.3. BGP Peering in Route-Reflector or Controller Topology In this model,BGPBGP-LS-SPF speakers peer solely with one or more Route Reflectors [RFC4456] or controllers. As in the previous model, direct connection discovery and liveliness detection for thoseconnectionslinks in the BGP SPF routing domain are done outside of the BGP protocol.More specifically, the Liveliness detection is done using BFD protocol described in [RFC5880]. For the purposes of BGP SPF,BGP-LS-SPF Link NLRI is advertised as long as the corresponding link isup andconsideredoperational.up as per the chosen liveness detection mechanism. This peering model, known as sparse peering, allows formanyfewer BGP sessions and, consequently, fewer instances of the same NLRI received from multiple peers.ItNormally, the route-reflectors or controller BGP sessions would be on directly-connected links to avoid dependence on another routing protocol for session connectivity. However, multi-hop peering is not precluded. The number of BGP sessions is dependent on the redundancy requirements and the stability of the BGP sessions. This is discussed in greater detail in [I-D.ietf-lsvr-applicability].3.5. BGP Shortest Path Routing (SPF) Protocol Extensions 5.1. BGP-LS Shortest Path Routing (SPF) SAFI In order to replace thePhase 1 and 2 decision functions of theexistingDecision ProcessBGP decision process with anSPF-based Decision Process and streamline the Phase 3SPF- based decisionfunctionsprocess in a backward compatiblemanner,manner by not impacting the BGP-LS SAFI, thisdraftdocument introduces theBGP-LS-SFP SAFI for BGP-LS SPF operation.BGP-LS-SPF SAFI. The BGP-LS-SPF (AFI 16388 / SAFITBD1) [RFC4790]80) [RFC4760] is allocated by IANA as specified in the Section6. A8. In order for two BGP-LS-SPF speakers to exchange BGPspeaker using the BGP-LSSPFextensions described hereinNLRI, they MUST exchange theAFI/SAFI usingMultiprotocol Extensions CapabilityCode[RFC5492] [RFC4760] to ensure that they are both capable of properly processing such NLRI. This is done with AFI 16388 / SAFI 80 for BGP-LS-SPF advertised within the BGP SPF Routing Domain. The BGP-LS-SPF SAFI is used to carry IPv4 and IPv6 prefix information in a format facilitating an SPF-based decision process. 5.1.1. BGP-LS-SPF NLRI TLVs The NLRI format of BGP-LS-SPF SAFI uses exactly same format as the BGP-LS AFI [RFC7752]. In other words, all the TLVs used in BGP-LS AFI are applicable and used for the BGP-LS-SPF SAFI. These TLVs within BGP-LS-SPF NLRI advertise information that describes links, nodes, and prefixes comprising IGP link-state information. In order to compare the NLRI efficiently, it is REQUIRED that all the TLVs within the given NLRI must be ordered in ascending order by the TLV type. For multiple TLVs of same type within a single NLRI, it is REQUIRED that these TLVs are ordered in ascending order by the TLV value field. Comparison of the value fields is performed by treating the entire value field as a hexadecimal string. NLRIs having TLVs which do not follow the ordering rules MUST be considered as malformed and discarded with appropriate error logging. [RFC7752] defines certain NLRI TLVs as a mandatory TLVs. These TLVs are considered mandatory for the BGP-LS-SPF SAFI as well. All the other TLVs are considered as an optional TLVs. 5.1.2. BGP-LS Attribute The BGP-LS attribute of the BGP-LS-SPF SAFI uses exactly same format of the BGP-LS AFI [RFC7752]. In other words, all the TLVs used in BGP-LS attribute of the BGP-LS AFI are applicable and used for the BGP-LS attribute of the BGP-LS-SPF SAFI. This attribute is an optional, non-transitive BGP attribute that is used to carry link, node, and prefix properties and attributes. The BGP-LS attribute is a set of TLVs. The BGP-LS attribute may potentially grow large in size depending on the amount of link-state information associated with a single Link- State NLRI. The BGP specification [RFC4271] mandates a maximum BGP message size of 4096 octets. It is RECOMMENDED that an implementation support [RFC8654] in order to accommodate larger size of information within the BGP-LS Attribute. BGP-LS-SPF speakers MUST ensure that they limit the TLVs included in theSPF routing domain. 4.BGP-LS Attribute to ensure that a BGP update message for a single Link-State NLRI does not cross the maximum limit for a BGP message. The determination of the types of TLVs to be included by the BGP-LS-SPF speaker originating the attribute is outside the scope of this document. When a BGP-LS-SPF speaker finds that it is exceeding the maximum BGP message size due to addition or update of some other BGP Attribute (e.g., AS_PATH), it MUST consider the BGP-LS Attribute to be malformed and the attribute discard handling of [RFC7606] applies. In order to compare the BGP-LS attribute efficiently, it is REQUIRED that all the TLVs within the given attribute must be ordered in ascending order by the TLV type. For multiple TLVs of same type within a single attribute, it is REQUIRED that these TLVs are ordered in ascending order by the TLV value field. Comparison of the value fields is performed by treating the entire value field as a hexadecimal string. Attributes having TLVs which do not follow the ordering rules MUST NOT be considered as malformed. All TLVs within the BGP-LS Attribute are considered optional unless specified otherwise. 5.2. Extensions to BGP-LS [RFC7752] describes a mechanism by which link-state and TE information can be collected fromnetworksIGPs and shared with external components using the BGP protocol. It describes both the definition ofBGP-LSthe BGP-LS-SPF NLRI thatdescribesadvertise links, nodes, and prefixes comprising IGP link-state information and the definition of a BGP path attribute (BGP-LS attribute) that carries link, node, and prefix properties and attributes, such as the link and prefix metric or auxiliaryRouter- IDsRouter-IDs of nodes, etc.TheThis document extends the usage of BGP-LS NLRI for the purpose of BGP SPF calculation via advertisement in the BGP-LS-SPF SAFI. The protocolwill be usedidentifier specified in the Protocol-ID field [RFC7752] will represent the origin of the advertised NLRI. For Node NLRI and Link NLRI, this MUST be the direct protocol (4). Node or Link NLRI with a Protocol-ID other than direct will be considered malformed. For Prefix NLRI, the specifiedin table 1Protocol-ID MUST be the origin of[I-D.ietf-idr-bgpls-segment-routing-epe].the prefix. The local and remote node descriptors for all NLRIwill beMUST include the BGPRouter-IDIdentifier (TLV 516) andeitherthe AS Number (TLV 512)[RFC7752] or the[RFC7752]. The BGP Confederation Member (TLV 517)[RFC8402]. However, if the BGP Router-ID[RFC7752] isknown tonot appliable and SHOULD not beunique within the BGP Routing domain,included. If TLV 517 is included, itcanwill beused as the sole descriptor. 4.1.ignored. 5.2.1. Node NLRI Usage TheBGPNode NLRIwillMUST be advertised unconditionally by all routers in the BGP SPF routing domain.4.1.1.5.2.1.1. Node NLRI Attribute SPF Capability TLV The SPF capability isa newan additional Node AttributeTLV that will be added to those defined in table 7 of [RFC7752]. The newTLV. This attribute TLVwill onlyMUST beapplicable when BGP is specified inincluded with theNode NLRI Protocol ID field.BGP-LS-SPF SAFI and SHOULD NOT be used for other SAFIs. TheTBDTLV type 1180 will bedefinedassigned by IANA. ThenewNode Attribute TLV will contain a single-octet SPF algorithm as defined in[RFC8402].[RFC8665]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type (1180) | Length - (1 Octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPF Algorithm | +-+-+-+-+-+-+-+-+ The SPFAlgorithm may take the following values: 0 - Normal Shortest Path First (SPF)algorithmbased on link metric. This isinherits thestandard shortest path algorithm as computed byvalues from the IGPprotocol. Consistent with the deployed practice for link-state protocols,Algorithm0 permits any node to overwrite the SPF path with a different path based on its local policy. 1 - Strict ShortestTypes registry [RFC8665]. Algorithm 0, (Shortest PathFirstAlgorithm (SPF)algorithmbased on linkmetric. The algorithmmetric, isidentical to Algorithm 0 but Algorithm 1 requires that all nodes along the path will honor the SPF routing decision. Local policy at the node claiming supportsupported and described in Section 6.3. Support forAlgorithm 1 MUST NOT alter the SPF paths computed by Algorithm 1. Note that usage of Strict Shortest Path First (SPF)other algorithm types isdefined inbeyond theIGP algorithm registry but usage is restricted to [I-D.ietf-idr-bgpls-segment-routing-epe]. Hence, its usage for BGP- LS SPF is outscope ofscope.this specification. When computing the SPF for a given BGP routing domain, only BGP nodes advertising the SPF capabilityattributeTLV with same SPF algorithm will be included in the Shortest Path Tree (SPT).4.1.2. BGP-LSAn implementation MAY optionally log detection of a BGP node that has either not advertised the SPF capability TLV or is advertising the SPF capability TLV with an algorithm type other than 0. 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV A BGP-LS Attribute TLVto BGP-LSof the BGP-LS-SPF Node NLRI is defined to indicate the status of the node with respect to the BGP SPF calculation. This will be used to rapidly take a node out of service Section 6.5.2 or to indicate the node is not to be used for transit (i.e., non-local)traffic.traffic Section 6.3. If the SPF Status TLV is not included with the Node NLRI, the node is considered to be up and is available for transit traffic. The SPF status is acted upon with the execution of the next SPF calculation Section 6.3. A single TLV type will be shared by the BGP-LS-SPF Node, Link, and Prefix NLRI. The TLV type 1184 will be assigned by IANA. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TBDType (1184) | Length (1 Octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPF Status | +-+-+-+-+-+-+-+-+ BGP Status Values: 0 - Reserved 1 - Node Unreachable with respect to BGP SPF 2 - Node does not support transit with respect to BGP SPF 3-254 - Undefined 255 - Reserved4.2.If the SPF Status TLV is received and the corresponding Node NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Node NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing a SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this condition for further analysis. 5.2.2. Link NLRI Usage The criteria for advertisement of Link NLRI are discussed in Section2.4. Link NLRI is advertised with unique local and remote node descriptorsas described above and unique link identifiersdependent on the IP addressing. For IPv4 links, thelinkslink's local IPv4 (TLV 259) and remote IPv4 (TLV 260) addresses will be used. For IPv6 links, the local IPv6 (TLV 261) and remote IPv6 (TLV 262) addresses will be used. For unnumbered links, the link local/remote identifiers (TLV 258) will be used. For links supporting having both IPv4 and IPv6 addresses, both sets of descriptorsmayMAY be included in the same Link NLRI. The link identifiers are described in table 5 of [RFC7752]. For a link to be used in Shortest Path Tree (SPT) for a given address family, i.e., IPv4 or IPv6, both routers connecting the link MUST have an address in the same subnet for that address family. However, an IPv4 or IPv6 prefix associated with the link MAY be installed without the corresponding address on the other side of link. The link IGP metric attribute TLV (TLV 1095)as well as any others required for non-SPF purposes SHOULDMUST be advertised.TheIf a BGP speaker receives a Link NLRI without an IGP metricvalue in this TLV is variable length dependent on specific protocol usage (refer to section 3.3.2.4 in [RFC7752]). For simplicity,attribute TLV, then it SHOULD consider theBGP-LSreceived NLRI as a malformed and the receiving BGP speaker MUST handle such malformed NLRI as 'Treat-as- withdraw' [RFC7606]. The BGP SPF metric lengthwill beis 4 octets. Like OSPF [RFC2328], a cost is associated with the output side of each router interface. This cost is configurable by the system administrator. The lower the cost, the more likely the interface is to be used to forward data traffic. One possible default for metric would be to give each interface a cost of 1 making it effectively a hop count. Algorithms such as setting the metric inversely to the link speed asdonesupported in the OSPF MIB [RFC4750] MAY be supported. However, this is beyond the scope of this document.4.2.1. BGP-LSRefer to Section 10.1.1 for operational guidance. The usage of other link attribute TLVs is beyond the scope of this document. 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length TLVs Two BGP-LS Attribute TLVsto BGP-LSof the BGP-LS-SPF Link NLRI are defined to advertise the prefix length associated with the IPv4 and IPv6 linkprefixes.prefixes derived from the link descriptor addresses. The prefix length is used for the optional installation of prefixes corresponding to Link NLRI as defined in Section5.3.6.3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| TBD IPv4|IPv4 (1182) or IPv6 Type|(1183)| Length (1 Octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prefix-Length | +-+-+-+-+-+-+-+-+ Prefix-length - A one-octet length restricted to 1-32 for IPv4 Link NLRI endpoint prefixes and 1-128 for IPv6 Link NLRI endpoint prefixes.4.2.2. BGP-LSThe Prefix-Length TLV is only relevant to Link NLRIs. The Prefix- Length TLVs MUST be discarded as an error and not passed to other BGP peers as specified in [RFC7606] when received with any NLRIs other than Link NRLIs. An implementation MAY log an error for further analysis. The maximum prefix-length for IPv4 Prefix-Length TLV is 32 bits. A prefix-length field indicating a larger value than 32 bits MUST be discarded as an error and the received TLV is not passed to other BGP peers as specified in [RFC7606]. The corresponding Link NLRI is considered as malformed and MUST be handled as 'Treat-as-withdraw'. An implementation MAY log an error for further analysis. The maximum prefix-length for IPv6 Prefix-Length Type is 128 bits. A prefix-length field indicating a larger value than 128 bits MUST be discarded as an error and the received TLV is not passed to other BGP peers as specified in [RFC7606]. The corresponding Link NLRI is considered as malformed and MUST be handled as 'Treat-as-withdraw'. An implementation MAY log an error for further analysis. 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV A BGP-LS Attribute TLVto BGP-LSof the BGP-LS-SPF Link NLRI is defined to indicate the status of the link with respect to the BGP SPF calculation. This will be used to expedite convergence for link failures as discussed in Section5.6.1.6.5.1. If the SPF Status TLV is not included with the Link NLRI, the link is considered up and available. The SPF status is acted upon with the execution of the next SPF calculation Section 6.3. A single TLV type will be shared by the Node, Link, and Prefix NLRI. The TLV type 1184 will be assigned by IANA. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TBDType (1184) | Length (1 Octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPF Status | +-+-+-+-+-+-+-+-+ BGP Status Values: 0 - Reserved 1 - Link Unreachable with respect to BGP SPF 2-254 - Undefined 255 - Reserved4.3.If the SPF Status TLV is received and the corresponding Link NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Link NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing an SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this information for further analysis. 5.2.3. IPv4/IPv6 Prefix NLRI Usage IPv4/IPv6 Prefix NLRI is advertised with alocal node descriptor as described aboveLocal Node Descriptor and the prefix andlength used aslength. The Prefix Descriptors field includes thedescriptorsIP Reachability Information TLV (TLV 265) as described in [RFC7752]. The prefix metric attribute TLV (TLV 1155)as well as any others required for non-SPF purposes SHOULDMUST be advertised. The IGP Route Tag TLV (TLV 1153) MAY be advertised. The usage of other attribute TLVs is beyond the scope of this document. For loopback prefixes, the metric should be 0. Fornon- loopbacknon-loopback prefixes, the setting of the metric is a local matter and beyond the scope of this document.4.3.1. BGP-LS5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status TLV A BGP-LS Attribute TLV toBGP-LSBGP-LS-SPF Prefix NLRI is defined to indicate the status of the prefix with respect to the BGP SPF calculation. This will be used to expedite convergence for prefix unreachability as discussed in Section5.6.1.6.5.1. If the SPF Status TLV is not included with the Prefix NLRI, the prefix is considered reachable. A single TLV type will be shared by the Node, Link, and Prefix NLRI. The TLV type 1184 will be assigned by IANA. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |TBDType (1184) | Length (1 Octet) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPF Status | +-+-+-+-+-+-+-+-+ BGP Status Values: 0 - Reserved 1 - PrefixdownUnreachable with respect to SPF 2-254 - Undefined 255 - Reserved4.4.If the SPF Status TLV is received and the corresponding Prefix NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Prefix NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing an SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this information for further analysis. 5.2.4. BGP-LS Attribute Sequence-Number TLV AnewBGP-LS Attribute TLVto BGP-LSof the BGP-LS-SPF NLRI types is defined to assure the most recent version of a given NLRI is used in the SPF computation. TheTBDSequence-Number TLV is mandatory for BGP-LS-SPF NLRI. The TLV typewill be defined1181 has been assigned by IANA. Thenew BGP- LSBGP-LS Attribute TLV will contain an 8-octet sequence number. The usage of the Sequence Number TLV is described in Section5.1.6.1. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type (1181) | Length (8 Octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (High-Order 32 Bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number (Low-Order 32 Bits) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Sequence Number The 64-bitstrictly increasingstrictly-increasing sequence numberisMUST be incremented for every self-originated version ofBGP-LS NLRI originated.BGP-LS-SPF NLRI. BGP speakers implementing this specification MUST use available mechanisms to preserve the sequence number's strictly increasing property for the deployed life of the BGP speaker (including cold restarts). One mechanism for accomplishing this would be to use the high-order 32 bits of the sequence number as a wrap/boot count that is incrementedanytimeany time the BGP router loses its sequence number state or thelow-orderlow- order 32 bits wrap. When incrementing the sequence number for each self-originated NLRI, the sequence number should be treated as an unsigned 64-bit value. If the lower-order 32-bit value wraps, the higher-order 32-bit value should be incremented and saved in non-volatile storage. If by some chance theBGP SpeakerBGP-LS-SPF speaker is deployed long enough that there is a possibility that the 64-bit sequence number may wrap or aBGP SpeakerBGP-LS-SPF speaker completely loses its sequence number state (e.g., the BGP speaker hardware is replaced or experiences a cold-start), thephase 1 decision functionBGP NLRI selection rules (see Section5.1) rules6.1) will insure convergence,albeit,albeit not immediately.5.The Sequence-Number TLV is mandatory for BGP-LS-SPF NLRI. If the Sequence-Number TLV is not received then the corresponding Link NLRI is considered as malformed and MUST be handled as 'Treat-as- withdraw'. An implementation MAY log an error for further analysis. 5.3. NEXT_HOP Manipulation All BGP peers that support SPF extensions would locally compute the Loc-RIB Next-Hop as a result of the SPF process. Consequently, the Next-Hop is always ignored on receipt. The Next-Hop address MUST be encoded as described in [RFC4760]. BGP speakers MUST interpret the Next-Hop address of MP_REACH_NLRI attribute as an IPv4 address whenever the length of the Next-Hop address is 4 octets, and as a IPv6 address whenever the length of the Next-Hop address is 16 octets. [RFC4760] modifies the rules of NEXT_HOP attribute whenever the multiprotocol extensions for BGP-4 are enabled. BGP speakers MUST set the NEXT_HOP attribute according to the rules specified in [RFC4760] as the BGP-LS-SPF routing information is carried within the multiprotocol extensions for BGP-4. 6. Decision Process with SPF Algorithm The Decision Process described in [RFC4271] takes place in three distinct phases. The Phase 1 decision function of the Decision Process is responsible for calculating the degree of preference for each route received from a BGP speaker's peer. The Phase 2 decision function is invoked on completion of the Phase 1 decision function and is responsible for choosing the best route out of all those available for each distinct destination, and for installing each chosen route into the Loc-RIB. The combination of the Phase 1 and 2 decision functions is characterized as a Path Vector algorithm. The SPF based Decision process replaces the BGPbest-pathDecision process described in [RFC4271]. This process starts with selecting only those Node NLRI whose SPF capability TLV matches with the localBGPBGP- LS-SPF speaker's SPF capability TLV value. Since Link-State NLRI always contains the local node descriptor[RFC7752], it will only beSection 5.2.1, each NLRI is uniquely originated by a singleBGPBGP-LS-SPF speaker in the BGP SPF routingdomain.domain (the BGP node matching the NLRI's Node Descriptors). Instances of the same NLRI originated by multiple BGP speakers would be indicative of a configuration error or a masquerading attack (Section 9). These selected Node NLRI and their Link/Prefix NLRI are used to build a directed graph during the SPFcomputation.computation as described below. The bestpathsroutes for BGP prefixes are installed in the RIB as a result of the SPF process. When BGP-LS-SPF NLRI is received, all that is required is to determine whether it is thebest-pathmost recent by examining the Node-ID and sequence number as described in Section5.1.6.1. If the receivedbest- pathNLRIhadhas changed, it will be advertised to other BGP-LS-SPF peers. If the attributes have changed (other than the sequence number), a BGP SPF calculation will bescheduled.triggered. However, a changed NLRI MAY be advertised immediately to other peersalmost immediatelyandpropagation of changes can approach IGP convergence times. To accomplish this,prior to any SPF calculation. Note that the BGP MinRouteAdvertisementIntervalTimer and MinASOriginationIntervalTimer [RFC4271] timers are not applicable to the BGP-LS-SPF SAFI.Rather,The scheduling of the SPFcalculations SHOULDcalculation, as described in Section 6.3, is an implementation issue. Scheduling MAY betriggered anddampened consistent with the SPF back-off algorithm specified in [RFC8405]. The Phase 3 decision function of the Decision Process [RFC4271] is also simplified since under normal SPF operation, a BGP speakerwouldMUST advertise theNLRI selected for the SPFchanged NLRIs to all BGP peers with theBGP-LS/BGP-LS-SPF AFI/SAFI. Application of policy would not be prevented however its usage to best-path process would be limited asBGP-LS-SPF AFI/ SAFI and install theSPF relies solely on link metrics. 5.1. Phase-1changed routes in the Global RIB. The only exception are unchanged NLRIs or stale NLRIs, i.e., NLRI received with a less recent (numerically smaller) sequence number. 6.1. BGP NLRI Selection The rules forNLRIall BGP-LS-SPF NLRIs selectionare greatly simplified from [RFC4271]. 1. If the NLRI is received fromfor phase 1 of the BGPspeaker originating the NLRI (asdecision process, section 9.1.1 [RFC4271], no longer apply. 1. Routes originated by directly connected BGP SPF peers are preferred. This condition can be determined bythecomparing the BGPRouter IDIdentifiers in theNLRIreceived Local Nodeidentifiers with the BGP speaker Router ID), then it is preferred over the same NLRI from non-originators.Descriptor and OPEN message. This rule will assure that stale NLRI is updated even if a BGP-LS router loses its sequence number state due to acold-start.cold- start. 2.If the Sequence-Number TLV is present in the BGP-LS Attribute, then theThe NLRI with the mostrecent,recent Sequence Number TLV, i.e., highest sequence number is selected.BGP-LS NLRI with a Sequence-Number TLV will be considered more recent than NLRI without a BGP-LS Attribute or a BGP-LS Attribute that doesn't include the Sequence-Number TLV.3. Thefinal tie-breaker is the NLRIroute received from the BGPSpeakerSPF speaker with the numericallylargestlarger BGPRouter ID.Identifier is preferred. When a BGP SPF speaker completely loses its sequence number state, i.e., due to a cold start, or in the unlikely possibility that that 64-bit sequence number wraps, the BGP routing domain will still converge. This is due to the fact that BGP speakers adjacent to the router will always accept self-originated NLRI from the associated speaker as more recent (rule # 1). When a BGP speaker reestablishes a connection with its peers, any existing session will be taken down and stale NLRI will bereplaced by the new NLRI and stale NLRI will be discarded independent of whether or not BGP graceful restart is deployed, [RFC4724].replaced. The adjacent BGP speaker will update their NLRIadvertisements in turnadvertisements, hop by hop, until the BGP routing domain has converged. The modified SPF Decision Process performs an SPF calculation rooted at the BGP speaker using the metrics from the Link Attribute IGP Metric TLV (1095) and the PrefixNLRIAttributeTLVsPrefix Metric TLV (1155) [RFC7752]. As a result, any other BGP attributes that would influence theDecisionBGP decision process defined in [RFC4271]likeincluding ORIGIN, MULTI_EXIT_DISC, and LOCAL_PREF attributes are ignored by the SPF algorithm. Furthermore, the NEXT_HOP attribute value is preserved but otherwise ignored during the SPF computation for BGP- LS-SPF NLRIs. The AS_PATH and AS4_PATH [RFC6793] attributes are preserved and used for loop detection [RFC4271]. They are ignored during the SPF computation for BGP-LS-SPF NRLIs. 6.1.1. BGP Self-Originated NLRI Node, Link, orbest-path. 5.2.Prefix NLRI with Node Descriptors matching the local BGP speaker are considered self-originated. When self-originated NLRI is received and it doesn't match the local node's NLRI content (including sequence number), special processing is required. o If a self-originated NLRI is received and the sequence number is more recent (i.e., greater than the local node's sequence number for the NLRI), the NLRI sequence number will be advanced to one greater than the received sequence number and the NLRI will be readvertised to all peers. o If self-originated NLRI is received and the sequence number is the same as the local node's sequence number but the attributes differ, the NLRI sequence number will be advanced to one greater than the received sequence number and the NLRI will be readvertised to all peers. o If self-originated Link or Prefix NLRI is received and the Link or Prefix NLRI is no longer being advertised by the local node, the NLRI will be withdrawn. The above actions are performed immediately when the first instance of a newer self-originated NLRI is received. In this case, the newer instance is considered to be a stale instance that was advertised by the local node prior to a restart where the NLRI state is lost. However, if subsequent newer self-originated NLRI is received for the same Node, Link, or Prefix NLRI, the readvertisement or withdrawal is delayed by 5 seconds since it is likely being advertised by a misconfigured or rogue BGP-LS-SPF speaker Section 9. 6.2. Dual Stack Support The SPF-based decision process operates on Node, Link, and Prefix NLRIs that support both IPv4 and IPv6 addresses. Whether to run a single SPFinstancecomputation or multiple SPFinstancescomputations for separate AFs isa matter of a local implementation.an implementation matter. Normally, IPv4 next-hops are calculated for IPv4 prefixes and IPv6 next-hops are calculated for IPv6 prefixes.However, an interesting use-case is deployment of [RFC5549] where IPv6 next-hops are calculated for both IPv4 and IPv6 prefixes. As stated in Section 1, support for Multiple Topology Routing (MTR) is an area for future study. 5.3.6.3. SPF Calculation based onBGP-LSBGP-LS-SPF NLRI This section details theBGP-LS SPFBGP-LS-SPF local routing information base (RIB) calculation. The router will useBGP-LSBGP-LS-SPF Node, Link, and Prefix NLRI topopulate the local RIBcompute routes using the following algorithm. This calculation yields the set ofintra-arearoutes associated with the BGP-LS domain. A router calculates the shortest-path tree using itself as the root.Variations and optimizations ofOptimizations to the BGP-LS-SPF algorithm arevalid as long as it yieldspossible but MUST yield the same set of routes. The algorithm below supports Equal Cost Multi-Path (ECMP) routes. Weighted Unequal CostMulti-PathMulti- Path routes are out of scope. The organization of this section owes heavily to section 16 of [RFC2328]. The following abstract data structures are defined in order to specify the algorithm. o Local Route Information Base(RIB)(LOC-RIB) - Thisis abstractrouting table contains reachability information (i.e., next hops) for all prefixes (both IPv4 and IPv6) as well asthe Node NLRIBGP-LS-SPF node reachability. Implementations may choose to implement thisaswith separate RIBs for each address family and/or Prefix versus NodeNLRI.reachability. It is synonymous with the Loc-RIB specified in [RFC4271]. o Global Routing Information Base (GLOBAL-RIB) - This is Routing Information Base (RIB) containing the current routes that are installed in the router's forwarding plane. This is commonly referred to in networking parlance as "the RIB". o Link State NLRI Database (LSNDB) - Database ofBGP-LSBGP-LS-SPF NLRI that facilitates access to all Node, Link, and PrefixNLRI as well as all the Link and Prefix NLRI corresponding to a given NodeNLRI.Other optimization, such as, resolving bi-directional connectivity associations between Link NLRI are possible but of scope of this document.o Candidate List (CAN-LIST) - This is a list of candidate Node NLRIs. The list is sorted by the cost to reach the Node NLRI with thelowest costNode NLRI with the lowest reachability cost at thefronthead of the list.ItThis facilitates execution of the Dijkstra algorithm Section 1.1 where the shortest paths between the local node and other nodes in graph area computed. The CAN-LIST is typically implemented as a heap but otherconcretedata structures havealsobeen used. The algorithm is comprised of the steps below: 1. The currentlocal RIBLOC-RIB is invalidated, and the CAN-LIST isinvalidated.initialized to empty. Thelocal RIBLOC-RIB is rebuilt during the course of the SPF computation. The existing routing entries are preserved for comparison to determine changes that need to beinstalled inmade to theglobal RIB.GLOBAL-RIB in step 6. 2. The computing router's Node NLRI isinstalledupdated in thelocal RIBLOC-RIB with a cost of 0 andasthesole entry inNode NLRI is also added to thecandidate list.CAN-LIST. The next-hop list is set to the internal loopback next-hop. 3. The Node NLRI with the lowest cost is removed from the candidate list for processing. If the BGP-LS Node attribute includes an SPF Status TLV (Section4.1.2)5.2.1.2) indicating the node is unreachable, the Node NLRI is ignored and the next lowest cost Node NLRI is selected from candidate list. The Node corresponding to this NLRI will be referred to as theCurrentCurrent- Node. If the candidate list is empty, the SPF calculation has completed and the algorithm proceeds to step 6. 4. All the Prefix NLRI with the same Node Identifiers as theCurrent NodeCurrent-Node will be considered for installation. The next- hop(s) for these Prefix NLRI are inherited from the Current-Node. The cost for each prefix is the metric advertised in the PrefixNLRIAttribute Prefix Metric TLV (1155) added to the cost to reach theCurrent Node.Current-Node. The following will be done for each Prefix NLRI (referred to as the Current-Prefix): * If the BGP-LS Prefix attribute includes an SPF Status TLV indicating the prefix is unreachable, theBGP-LS Prefix NLRICurrent-Prefix is considered unreachable and the nextBGP-LSPrefix NLRI isexamined.examined in Step 4. * If the Current-Prefix's corresponding prefix is in thelocal RIBLOC-RIB and the cost isgreaterless than theCurrent route'sCurrent-Prefix's metric, thePrefix NLRICurrent-Prefix does not contribute to the route and the next Prefix NLRI isignored.examined in Step 4. * If the Current-Prefix's corresponding prefix is not in thelocal RIBLOC-RIB, the prefix is installed with the Current-Node's next- hops installed as the LOC-RIB route's next-hops and the metric being updated. If the IGP Route Tag TLV (1153) is included in the Current-Prefix's NLRI Attribute, the tag(s) are installed in the current LOC-RIB route's tag(s). * If the Current-Prefix's corresponding prefix is in the LOC-RIB and the cost is less than the current route's metric, thePrefixprefix is installed with theCurrent Node'sCurrent-Node's next-hops replacing thelocal RIBLOC-RIB route'snext- hopsnext-hops and the metric beingupdated.updated and any route tags removed. If the IGP Route Tag TLV (1153) is included in the Current-Prefix's NLRI Attribute, the tag(s) are installed in the current LOC-RIB route's tag(s). * If the Current-Prefix's corresponding prefix is in thelocal RIBLOC-RIB and the cost is the same as the current route's metric, thePrefix is installed with the Current Node'sCurrent-Node's next-hopsbeingwill be merged withlocal RIBLOC-RIB route's next-hops. If the IGP Route Tag TLV (1153) is included in the Current-Prefix's NLRI Attribute, the tag(s) are merged into the LOC-RIB route's current tags. 5. All the Link NLRI with the same Node Identifiers as theCurrentCurrent- Node will be considered for installation. Each link will be examined and will be referred to in the following text as theCurrent Link.Current-Link. The cost of theCurrent LinkCurrent-Link is the advertisedmetric inIGP Metric TLV (1095) from the Link NLRI BGP-LS attribute added to the cost to reach theCurrent Node. * Optionally,Current-Node. If the Current-Node is for the local BGP Router, the next-hop for the link will be a direct next-hop pointing to the corresponding local interface. For any other Current-Node, the next-hop(s) for the Current-Link will be inherited from the Current-Node. The following will be done for each link: A. The prefix(es) associated with theCurrent LinkCurrent-Link are installed into thelocal RIBLOC-RIB using the same rules as were used for Prefix NLRI in the previous steps.*Optionally, in deployments where BGP-SPF routers have limited routing table capacity, installation of these subnets can be suppressed. Suppression will have an operational impact as the IPv4/IPv6 link endpoint addresses will not be reachable and tools such as traceroute will display addresses that are not reachable. B. If thecurrent NodeCurrent-Node NLRI attributes includes the SPF status TLV (Section4.1.2)5.2.1.2) and the status indicates that the Node doesn't support transit, the next link for thecurrent nodeCurrent-Node isprocessed. *processed in Step 5. C. If the Current-Link's NLRI attribute includes an SPF Status TLV indicating the link is down, the BGP-LS-SPF Link NLRI is considered down and the next link for the Current-Node is examined in Step 5. D. TheCurrent Link's endpointCurrent-Link's Remote Node NLRI is accessed (i.e., the Node NLRI with the same Node identifiers as theLink endpoint).Current- Link's Remote Node Descriptors). If it exists, it will be referred to as theEndpoint Node NLRIRemote-Node and the algorithm will proceed as follows: + If theBGP-LS LinkRemote-Node's NLRI attribute includes an SPF Status TLV indicating thelink is down, the BGP-LS Link NLRInode isconsidered down andunreachable, the nextBGP-LS Link NLRIlink for the Current-Node isexamined.examined in Step 5. + All the Link NLRI corresponding theEndpoint Node NLRIRemote-Node will be searched for aback-linkLink NLRI pointing to thecurrent node. Both theCurrent-Node. Each Link NLRI is examined for Remote Nodeidentifiers andDescriptors matching the Current-Node and Linkendpoint identifiers inDescriptors matching theEndpoint Node's Link NLRI must match forCurrent-Link (e.g., sharing amatch.common IPv4 or IPv6 subnet). Ifthere is no correspondingboth these conditions are satisfied for one of the Remote-Node's links, the bi-directional connectivity check succeeds and the Remote-Node may be processed further. The Remote-Node's Link NLRIcorrespondingproviding bi-directional connectivity will be referred to as theEndpoint Node NLRI,Remote-Link. If no Remote-Link is found, theEndpoint Nodenext link for the Current-Node is examined in Step 5. + If the Remote-Link NLRIfailsattribute includes an SPF Status TLV indicating thebi-directional connectivity testlink is down, the Remote-Link NLRI is considered down and the next link for the Current-Node isnot processed further.examined in Step 5. + If theEndpoint Node NLRIRemote-Node is not on thecandidate list,CAN-LIST, it is inserted based on thelinkcost. The Remote Node's costand BGP Identifier (the latter being used as a tie-breaker).is the cost of Current-Node added the Current-Link's IGP Metric TLV (1095). The next-hop(s) for the Remote-Node are inherited from the Current-Link. + If theEndpoint NodeRemote-Node NLRI is already on thecandidate listCAN-LIST with alowerhigher cost, itneed notmust beinserted again.removed and reinserted with the Remote-Node cost based on the Current-Link (as calculated in the previous step). The next-hop(s) for the Remote- Node are inherited from the Current-Link. + If theEndpoint NodeRemote-Node NLRI is already on thecandidate listCAN-LIST witha higherthe same cost, itmustneed not beremoved andreinserted on the CAN-LIST. However, the Current-Link's next-hop(s) must be merged into the current set of next-hops for the Remote-Node. + If the Remote-Node NLRI is already on the CAN-LIST with a lowercost. *cost, it need not be reinserted on the CAN-LIST. E. Return to step 3 to process the next lowest cost Node NLRI on thecandidate list.CAN-LIST. 6. Thelocal RIBLOC-RIB is examined and changes (adds, deletes, modifications) are installed into theglobal RIB. 5.4. NEXT_HOP Manipulation A BGP speaker that supports SPF extensions MAY interact with peers that don't support SPF extensions.GLOBAL-RIB. For each route in the LOC-RIB: * If theBGP-LS address family is advertised to a peer not supportingroute was added during the current BGP SPFextensions described herein, thencomputation, install theBGP speaker MUST conform toroute into theNEXT_HOP rules specified in [RFC4271] when announcingGLOBAL-RIB. * If theLink-State address family routes to those peers. Allroute modified during the current BGPpeers that supportSPFextensions would locally computecomputation (e.g., metric, tags, or next-hops), update theLoc-RIB next-hops as a result ofroute in theSPF process. Consequently,GLOBAL-RIB. * If theNEXT_HOP attribute is always ignored on receipt. However,route was not installed during the current BGPspeakers SHOULD setSPF computation, remove theNEXT_HOP address according toroute from both theNEXT_HOP attribute rules specified in [RFC4271]. 5.5.GLOBAL-RIB and the LOC-RIB. 6.4. IPv4/IPv6 Unicast Address Family Interaction While theBGP-LS SPFBGP-LS-SPF address family and the IPv4/IPv6 unicast address families MAY install routes into the same device routing tables, they will operate independently much the same as OSPF and IS-IS would operate today (i.e., "Ships-in-the-Night" mode). Therewill beis no implicit route redistribution between the BGP address families.However, implementation specific redistribution mechanisms SHOULD be made available with the restriction that redistribution of BGP-LS SPF routes into the IPv4 address family applies only to IPv4 routes and redistribution of BGP-LS SPF route into the IPv6 address family applies only to IPv6 routes. Given the fact that SPF algorithms are based on the assumption that all routers in the routing domain calculate the precisely the same SPF tree and install the same set of routes, itIt is RECOMMENDED thatBGP-LS SPFBGP-LS-SPF IPv4/IPv6routesroute computation and installation be given scheduling priority by defaultwhen installed into their respective RIBs. In common implementations the prioritizationover other BGP address families as these address families are considered as underlay SAFIs. Similarly, it isgoverned byRECOMMENDED that the route preference or administrative distancewith lower being more preferred. 5.6.give active route installation preference to BGP-LS-SPF IPv4/IPv6 routes over BGP routes from other AFI/SAFIs. However, this preference MAY be overridden by an operator-configured policy. 6.5. NLRI Advertisementand Convergence 5.6.1.6.5.1. Link/Prefix Failure Convergence A local failure will prevent a link from being used in the SPF calculation due to the IGP bi-directional connectivity requirement. Consequently, local link failuresshouldSHOULD always be given priority over updates (e.g., withdrawing all routes learned on a session) in order to ensure the highest priority propagation and optimal convergence. An IGP such as OSPF [RFC2328] will stop using the link as soon as the Router-LSA for one side of the link is received. Withnormala BGP advertisement, the link would continue to be used until the last copy of theBGP-LSBGP-LS-SPF Link NLRI is withdrawn. In order to avoid this delay, the originator of the Link NLRIwillSHOULD advertise a more recent versionofwith an increased Sequence Number TLV for theBGP-LSBGP-LS-SPF Link NLRI including the SPF Status TLVSection 4.2.2(Section 5.2.2.2) indicating the link is down with respect to BGP SPF. After some configurable period of time, which is an implementation dependent, e.g., 2-3 seconds, theBGP-LSBGP-LS-SPF Link NLRI can be withdrawn with no consequence. If the link becomes available in that period, the originator of theBGP-LSBGP-LS- SPF LINK NLRI will simply advertise a more recent version of theBGP-LSBGP- LS-SPF Link NLRI without the SPF Status TLV in the BGP-LS Link Attributes. Similarly, when a prefix becomes unreachable, a more recent version of theBGP-LSBGP-LS-SPF Prefix NLRI will be advertised with the SPF Status TLVSection 4.3.1(Section 5.2.3.1) indicating the prefix is unreachable in the BGP-LS Prefix Attributes and the prefix will be considered unreachable with respect to BGP SPF. After some configurable period of time, which is implementation dependent, e.g., 2-3 seconds, theBGP-LSBGP-LS-SPF Prefix NLRI can be withdrawn with no consequence. If the prefix becomes reachable in that period, the originator of theBGP-LSBGP- LS-SPF Prefix NLRI will simply advertise a more recent version of theBGP-LSBGP-LS-SPF Prefix NLRI without the SPF Status TLV in the BGP-LS Prefix Attributes.5.6.2.6.5.2. Node Failure Convergence With BGP without graceful restart [RFC4724], all the NLRI advertised by a node are implicitly withdrawn when a session failure is detected. If fast failure detection such as BFD is utilized, and the node is on the fastest converging path, the most recent versions ofBGP-LSBGP-LS-SPF NLRI may bewithdrawn while these versions are in-flight on longer paths.withdrawn. This will resulttheinto an older version of the NLRI being used until the new versions arrive and, potentially, unnecessary route flaps. Therefore,BGP-LS SPFBGP-LS-SPF NLRI SHOULD always be retained before being implicitly withdrawn for abriefconfigurable implementation-dependent interval, e.g., 2-3 seconds. This will not delay convergence since the adjacent nodes will detect the link failure and advertise a more recent NLRI indicating the link is down with respect to BGP SPFSection 5.6.1(Section 6.5.1) and theBGP-SPFBGP SPF calculation willfailurefail the bi-directional connectivitycheck. 5.7.check Section 6.3. 7. Error Handling This section describes the Error Handling actions, as described in [RFC7606], that are specific to SAFI BGP-LS-SPF BGP Update message processing. 7.1. Processing of BGP-LS-SPF TLVs When a BGP speaker receives a BGP Update containing a malformed Node NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and MUST NOT pass it to other BGP peers as specified in [RFC7606]. When discarding an associated Node NLRI with a malformed TLV, a BGP speaker SHOULD log an error for further analysis. When a BGP speaker receives a BGP Update containing a malformed Link NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and MUST NOT pass it to other BGP peers as specified in [RFC7606]. When discarding an associated Link NLRI with a malformed TLV, a BGP speaker SHOULD log an error for further analysis. When a BGP speaker receives a BGP Update containing a malformed Prefix NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and MUST NOT pass it to other BGP peers as specified in [RFC7606]. When discarding an associated Prefix NLRI with a malformed TLV, a BGP speaker SHOULD log an error for further analysis. When a BGP speaker receives a BGP Update containing a malformed SPF Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and the Node NLRI andnotMUST NOT pass it to other BGP peers as specified in [RFC7606]. When discarding a Node NLRI with a malformed TLV, a BGP speaker SHOULD log an error for further analysis. When a BGP speaker receives a BGP Update containing a malformed IPv4 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and the Node NLRI and MUST NOT pass it to other BGP peers as specified in [RFC7606]. The corresponding Link NLRI is considered as malformed and MUST be handled as 'Treat-as- withdraw'. An implementation MAY log an error for further analysis. When a BGP speaker receives a BGP Update containing a malformed IPv6 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and the Node NLRI and MUST NOT pass it to other BGP peers as specified in [RFC7606]. The corresponding Link NLRI is considered as malformed and MUST be handled as 'Treat-as- withdraw'. An implementation MAY log an error for further analysis. 7.2. Processing of BGP-LS-SPF NLRIs A Link-State NLRI MUST NOT be considered as malformed or invalid based on the inclusion/exclusion of TLVs or contents of the TLV fields (i.e., semantic errors), as described in Section 5.1 and Section 5.1.1. A BGP-LS-SPF Speaker MUST perform the following syntactic validation of the BGP-LS-SPF NLRI to determine if it is malformed. 1. Does the sum of all TLVs found in the BGP MP_REACH_NLRI attribute correspond to the BGP MP_REACH_NLRI length? 2. Does the sum of all TLVs found in the BGP MP_UNREACH_NLRI attribute correspond to the BGP MP_UNREACH_NLRI length? 3. Does the sum of all TLVs found in a BGP-LS-SPF NLRI correspond to the Total NLRI Length field of all its Descriptors? 4. When an NLRI TLV is recognized, is the length of the TLV and its sub-TLVs valid? 5. Has the syntactic correctness of the NLRI fields been verified as per [RFC7606]? 6. Has the rule regarding ordering of TLVs been followed as described in Section 5.1.1? When the error determined allows for the router to skip the malformed NLRI(s) and continue processing of the rest of the update message (e.g., when the TLV ordering rule is violated), then it MUST handle such malformed NLRIs as 'Treat-as-withdraw'. In other cases, where the error in the NLRI encoding results in the inability to process the BGP update message (e.g., length related encoding errors), then the router SHOULD handle such malformed NLRIs as 'AFI/SAFI disable' when other AFI/SAFI besides BGP-LS are being advertised over the same session. Alternately, the router MUST perform 'session reset' when the session is only being used for BGP-LS-SPF or when its 'AFI/SAFI disable' action is not possible. 7.3. Processing of BGP-LS Attribute A BGP-LS Attribute MUST NOT be considered as malformed or invalid based on the inclusion/exclusion of TLVs or contents of the TLV fields (i.e., semantic errors), as described in Section 5.1 and Section 5.1.1. A BGP-LS-SPF Speaker MUST perform the following syntactic validation of the BGP-LS Attribute to determine if it is malformed. 1. Does the sum of all TLVs found in the BGP-LS-SPF Attribute correspond to the BGP-LS Attribute length? 2. Has the syntactic correctness of the Attributes (including BGP-LS Attribute) been verified as per [RFC7606]? 3. Is the length of each TLV and, when the TLV is recognized then, its sub-TLVs in the BGP-LS Attribute valid? When the detected error allows for the router to skip the malformed BGP-LS Attribute and continue processing of the rest of the update message (e.g., when the BGP-LS Attribute length and the total Path Attribute Length are correct but some TLV/sub-TLV length within the BGP-LS Attribute is invalid), then it MUST handle such malformed BGP- LS Attribute as 'Attribute Discard'. In other cases, when the error in the BGP-LS Attribute encoding results in the inability to process the BGP update message, then the handling is the same as described above for malformed NLRI. Note that the 'Attribute Discard' action results in the loss of all TLVs in the BGP-LS Attribute and not the removal of a specific malformed TLV. The removal of specific malformed TLVs may give a wrong indication to a BGP-LS-SPF speaker that the specific information is being deleted or is not available. When a BGP-LS-SPF speaker receives an update message with Link-State NLRI(s) in the MP_REACH_NLRI but without the BGP-LS-SPF Attribute, it is most likely an indication that a BGP-LS-SPF speaker preceding it has performed the 'Attribute Discard' fault handling. An implementation SHOULD preserve and propagate the Link-State NLRIs in such an update message so that the BGP-LS-SPF speaker can detect the loss of link-state information for that object and not assume its deletion/withdrawal. This also makes it possible for a network operator to trace back to the BGP-LS-SPF speaker which actually detected a problem with the BGP-LS Attribute. An implementation SHOULD log an error for further analysis for problems detected during syntax validation. When a BGP speaker receives a BGP Update containing a malformed IGP metric TLV in the Link NLRI BGP-LS Attribute [RFC7752], it MUST ignore the received TLV and the Link NLRI and MUST NOT pass it to other BGP peers as specified in [RFC7606]. When discarding a Link NLRI with a malformed TLV, a BGP speaker SHOULD log an error for further analysis. 8. IANA Considerations This document definesan AFI/SAFIthe use of SAFI (80) forBGP-LSBGP SPF operation Section 5.1, and requests IANA to assign theBGP-LS/BGP-LS-SPF (AFI 16388 / SAFI TBD1) as describedvalue from the First Come First Serve (FCFS) range in[RFC4760].the Subsequent Address Family Identifiers (SAFI) Parameters registry. This document also defines five attribute TLVsfor BGP-LSof BGP-LS-SPF NLRI. We request IANA to assign types for the SPF capability TLV, Sequence Number TLV, IPv4 Link Prefix-Length TLV, IPv6 Link Prefix-Length TLV, and SPF Status TLV from the "BGP-LS Node Descriptor, Link Descriptor, Prefix Descriptor, and Attribute TLVs" Registry.7.+-------------------------+-----------------+--------------------+ | Attribute TLV | Suggested Value | NLRI Applicability | +-------------------------+-----------------+--------------------+ | SPF Capability | 1180 | Node | | SPF Status | 1184 | Node, Link, Prefix | | IPv4 Link Prefix Length | 1182 | Link | | IPv6 Link Prefix Length | 1183 | Link | | Sequence Number | 1181 | Node, Link, Prefix | +-------------------------+-----------------+--------------------+ Table 1: NLRI Attribute TLVs 9. Security Considerations Thisextension todocument defines a BGP SAFI, i.e., the BGP-LS-SPF SAFI. This document does not change the underlying security issues inherent in theexisting [RFC4271], [RFC4724],BGP protocol [RFC4271]. The Security Considerations discussed in [RFC4271] apply to the BGP SPF functionality as well. The analysis of the security issues for BGP mentioned in [RFC4272] and[RFC7752]. 8.[RFC6952] also applies to this document. The analysis of Generic Threats to Routing Protocols done in [RFC4593] is also worth noting. As the modifications described in this document for BGP SPF apply to IPv4 Unicast and IPv6 Unicast as undelay SAFIs in a single BGP SPF Routing Domain, the BGP security solutions described in [RFC6811] and [RFC8205] are somewhat constricted as they are meant to apply for inter-domain BGP where multiple BGP Routing Domains are typically involved. The BGP-LS-SPF SAFI NLRI described in this document are typically advertised between EBGP or IBGP speakers under a single administrative domain. In the context of the BGP peering associated with this document, a BGP speaker MUST NOT accept updates from a peer that is not within any administrative control of an operator. That is, a participating BGP speaker SHOULD be aware of the nature of its peering relationships. Such protection can be achieved by manual configuration of peers at the BGP speaker. In order to mitigate the risk of peering with BGP speakers masquerading as legitimate authorized BGP speakers, it is recommended that the TCP Authentication Option (TCP-AO) [RFC5925] be used to authenticate BGP sessions. If an authorized BGP peer is compromised, that BGP peer could advertise modified Node, Link, or Prefix NLRI will result in misrouting, repeating origination of NLRI, and/or excessive SPF calculations. When a BGP speaker detects that its self-originated NLRI is being originated by another BGP speaker, an appropriate error should be logged so that the operator can take corrective action. 10. Management Considerations This section includes unique management considerations for theBGP-LS SPFBGP- LS-SPF address family.8.1.10.1. Configuration All routers in BGP SPF Routing Domain are under a single administrative domain allowing for consistent configuration. 10.1.1. Link Metric Configuration Within a BGP SPF Routing Domain, the IGP metrics for all advertised links SHOULD be configured or defaulted consistently. For example, if a default metric is used for one router's links, then a similar metric should be used for all router's links. Similarly, if the link cost is derived from using the inverse of the link bandwidth on one router, then this SHOULD be done for all routers and the same reference bandwidth should be used to derive the inversely proportional metric. Failure to do so will not result in correct routing based on link metric. 10.1.2. backoff-config In addition to configuration of theBGP-LS SPFBGP-LS-SPF address family, implementations SHOULD support the "Shortest Path First (SPF) Back- Off Delay Algorithm for Link-State IGPs" [RFC8405]. If supported, configuration of the INITIAL_SPF_DELAY, SHORT_SPF_DELAY, LONG_SPF_DELAY, TIME_TO_LEARN, and HOLDDOWN_INTERVALas documented inMUST be supported [RFC8405].8.2.Section 6 of [RFC8405] recommends consistent configuration of these values throughout the IGP routing domain and this also applies to the BGP SPF Routing Domain. 10.2. Operational Data In order to troubleshoot SPF issues, implementations SHOULD support an SPF log including entries for previous SPFcomputations,computations. Each SPF log entry would include theBGP-LSBGP-LS-SPF NLRI SPF triggering the SPF, SPF scheduled time, SPF start time, SPF end time, and SPF type if different types of SPF are supported. Since the size of the log will be finite, implementations SHOULD also maintain counters for the total number of SPF computationsof each typeand the total number of SPF triggering events. Additionally, to troubleshoot SPF scheduling and back-off [RFC8405], the current SPF back-off state, remainingtime-to-learn,time- to-learn, remaining holddown, last trigger event time, last SPF time, and next SPF time should be available.9.11. Implementation Status Note RFC Editor: Please remove this section and the associated references prior to publication. This section records the status of known implementations of the protocol defined by this specification at the time of posting of thisInternet-Draft,Internet-Draft and is based on a proposal described in [RFC7942]. The description of implementations in this section is intended to assist the IETF in its decision processes in progressing drafts to RFCs. Please note that the listing of any individual implementation here does not imply endorsement by the IETF. Furthermore, no effort has been spent to verify the information presented here that was supplied by IETF contributors. This is not intended as, and must not be construed to be, a catalog of available implementations or their features. Readers are advised to note that other implementations may exist. According to RFC 7942, "this will allow reviewers and working groups to assign due consideration to documents that have the benefit of running code, which may serve as evidence of valuable experimentation and feedback that have made the implemented protocols more mature. It is up to the individual working groups to use this information as they see fit". TheBGP-LS SPF implementatationBGP-LS-SPF implementation status is documented in [I-D.psarkar-lsvr-bgp-spf-impl].10.12. Acknowledgements The authors would like to thank Sue Hares, Jorge Rabadan, Boris Hassanov, Dan Frost, Matt Anderson, Fred Baker, and Lukas Krattiger for their review and comments. Thanks to Pushpasis Sarkar for discussions on preventing a BGP SPF Router from being used for non- local traffic (i.e., transit traffic). The authors extend special thanks to Eric Rosen for fruitful discussions onBGP-LS SPFBGP-LS-SPF convergence as compared to IGPs.11.13. Contributors In addition to the authors listed on the front page, the following co-authors have contributed to the document. Derek Yeung Arrcus, Inc. derek@arrcus.com Gunter Van De Velde Nokia gunter.van_de_velde@nokia.com Abhay Roy Arrcus, Inc. abhay@arrcus.com Venu Venugopal Cisco Systems venuv@cisco.com Chaitanya Yadlapalli AT&T cy098d@att.com12.14. References12.1.14.1. Normative References[I-D.ietf-idr-bgpls-segment-routing-epe] Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, S., and J. Dong, "BGP-LS extensions for Segment Routing BGP Egress Peer Engineering", draft-ietf-idr-bgpls- segment-routing-epe-19 (work in progress), May 2019.[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>. [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, DOI 10.17487/RFC4271, January 2006, <https://www.rfc-editor.org/info/rfc4271>. [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", RFC 4272, DOI 10.17487/RFC4272, January 2006, <https://www.rfc-editor.org/info/rfc4272>. [RFC4593] Barbir, A., Murphy, S., and Y. Yang, "Generic Threats to Routing Protocols", RFC 4593, DOI 10.17487/RFC4593, October 2006, <https://www.rfc-editor.org/info/rfc4593>. [RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., Coltun, R., and F. Baker, "OSPF Version 2 Management Information Base", RFC 4750, DOI 10.17487/RFC4750, December 2006, <https://www.rfc-editor.org/info/rfc4750>. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, January 2007, <https://www.rfc-editor.org/info/rfc4760>. [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 2009, <https://www.rfc-editor.org/info/rfc5492>. [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP Authentication Option", RFC 5925, DOI 10.17487/RFC5925, June 2010, <https://www.rfc-editor.org/info/rfc5925>. [RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet Autonomous System (AS) Number Space", RFC 6793, DOI 10.17487/RFC6793, December 2012, <https://www.rfc-editor.org/info/rfc6793>. [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. Austein, "BGP Prefix Origin Validation", RFC 6811, DOI 10.17487/RFC6811, January 2013, <https://www.rfc-editor.org/info/rfc6811>. [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. Patel, "Revised Error Handling for BGP UPDATE Messages", RFC 7606, DOI 10.17487/RFC7606, August 2015, <https://www.rfc-editor.org/info/rfc7606>. [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and S. Ray, "North-Bound Distribution of Link-State and Traffic Engineering (TE) Information Using BGP", RFC 7752, DOI 10.17487/RFC7752, March 2016, <https://www.rfc-editor.org/info/rfc7752>. [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., Decraene, B., Litkowski, S.,[RFC8205] Lepinski, M., Ed. andR. Shakir, "Segment Routing Architecture",K. Sriram, Ed., "BGPsec Protocol Specification", RFC8402,8205, DOI10.17487/RFC8402, July 2018, <https://www.rfc-editor.org/info/rfc8402>.10.17487/RFC8205, September 2017, <https://www.rfc-editor.org/info/rfc8205>. [RFC8405] Decraene, B., Litkowski, S., Gredler, H., Lindem, A., Francois, P., and C. Bowers, "Shortest Path First (SPF) Back-Off Delay Algorithm for Link-State IGPs", RFC 8405, DOI 10.17487/RFC8405, June 2018, <https://www.rfc-editor.org/info/rfc8405>.12.2. Information[RFC8654] Bush, R., Patel, K., and D. Ward, "Extended Message Support for BGP", RFC 8654, DOI 10.17487/RFC8654, October 2019, <https://www.rfc-editor.org/info/rfc8654>. [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF Extensions for Segment Routing", RFC 8665, DOI 10.17487/RFC8665, December 2019, <https://www.rfc-editor.org/info/rfc8665>. 14.2. Informational References [I-D.ietf-lsvr-applicability] Patel, K., Lindem, A., Zandi, S., and G. Dawra, "Usage and Applicability of Link State Vector Routing in Data Centers", draft-ietf-lsvr-applicability-05 (work in progress), March 2020. [I-D.psarkar-lsvr-bgp-spf-impl] Sarkar, P., Patel, K., Pallagatti, S., and s. sajibasil@gmail.com, "BGP Shortest Path Routing Extension Implementation Report", draft-psarkar-lsvr-bgp-spf-impl-00 (work in progress), June 2020. [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998, <https://www.rfc-editor.org/info/rfc2328>. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, <https://www.rfc-editor.org/info/rfc4456>. [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, DOI 10.17487/RFC4724, January 2007, <https://www.rfc-editor.org/info/rfc4724>.[RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., Coltun, R., and F. Baker, "OSPF Version 2 Management Information Base", RFC 4750, DOI 10.17487/RFC4750, December 2006, <https://www.rfc-editor.org/info/rfc4750>. [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, "Multiprotocol Extensions for BGP-4", RFC 4760, DOI 10.17487/RFC4760, January 2007, <https://www.rfc-editor.org/info/rfc4760>. [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet Application Protocol Collation Registry", RFC 4790, DOI 10.17487/RFC4790, March 2007, <https://www.rfc-editor.org/info/rfc4790>.[RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", RFC 4915, DOI 10.17487/RFC4915, June 2007, <https://www.rfc-editor.org/info/rfc4915>. [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, DOI 10.17487/RFC5286, September 2008, <https://www.rfc-editor.org/info/rfc5286>.[RFC5549] Le Faucheur, F. and E. Rosen, "Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop", RFC 5549, DOI 10.17487/RFC5549, May 2009, <https://www.rfc-editor.org/info/rfc5549>.[RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, <https://www.rfc-editor.org/info/rfc5880>. [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of BGP, LDP, PCEP, and MSDP Issues According to the Keying and Authentication for Routing Protocols (KARP) Design Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013, <https://www.rfc-editor.org/info/rfc6952>. [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, "Advertisement of Multiple Paths in BGP", RFC 7911, DOI 10.17487/RFC7911, July 2016, <https://www.rfc-editor.org/info/rfc7911>. [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of BGP for Routing in Large-Scale Data Centers", RFC 7938, DOI 10.17487/RFC7938, August 2016, <https://www.rfc-editor.org/info/rfc7938>. [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running Code: The Implementation Status Section", BCP 205, RFC 7942, DOI 10.17487/RFC7942, July 2016, <https://www.rfc-editor.org/info/rfc7942>. Authors' Addresses Keyur Patel Arrcus, Inc. Email: keyur@arrcus.com Acee Lindem Cisco Systems 301 Midenhall Way Cary, NC 27513 USA Email: acee@cisco.com Shawn Zandi LinkedIn 222 2nd Street San Francisco, CA 94105 USA Email: szandi@linkedin.com Wim Henderickx Nokia Antwerp Belgium Email: wim.henderickx@nokia.com