idnits 2.17.1 draft-ietf-lsvr-bgp-spf-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The protocol identifier specified in the Protocol-ID field [RFC7752] will represent the origin of the advertised NLRI. For Node NLRI and Link NLRI, this MUST be the direct protocol (4). Node or Link NLRI with a Protocol-ID other than direct will be considered malformed. For Prefix NLRI, the specified Protocol-ID MUST be the origin of the prefix. The local and remote node descriptors for all NLRI MUST include the BGP Identifier (TLV 516) and the AS Number (TLV 512) [RFC7752]. The BGP Confederation Member (TLV 517) [RFC7752] is not appliable and SHOULD not be included. If TLV 517 is included, it will be ignored. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If the SPF Status TLV is received and the corresponding Node NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Node NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing a SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this condition for further analysis. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If the SPF Status TLV is received and the corresponding Link NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Link NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing an SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this information for further analysis. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: If the SPF Status TLV is received and the corresponding Prefix NLRI has not been received, then the SPF Status TLV is ignored and not used in SPF computation but is still announced to other BGP speakers. An implementation MAY log an error for further analysis. If a BGP speaker received the Prefix NLRI but the SPF Status TLV is not received, then any previously received information is considered as implicitly withdrawn and the update is propagated to other BGP speakers. A BGP speaker receiving a BGP Update containing an SPF Status TLV in the BGP-LS attribute [RFC7752] with a value that is outside the range of defined values SHOULD be processed and announced to other BGP speakers. However, a BGP speaker MUST not use the Status TLV in its SPF computation. An implementation MAY log this information for further analysis. -- The document date (January 26, 2021) is 1184 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4272 ** Downref: Normative reference to an Informational RFC: RFC 4593 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) == Outdated reference: A later version (-11) exists of draft-ietf-lsvr-applicability-05 == Outdated reference: A later version (-01) exists of draft-psarkar-lsvr-bgp-spf-impl-00 Summary: 3 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Patel 3 Internet-Draft Arrcus, Inc. 4 Intended status: Standards Track A. Lindem 5 Expires: July 30, 2021 Cisco Systems 6 S. Zandi 7 LinkedIn 8 W. Henderickx 9 Nokia 10 January 26, 2021 12 BGP Link-State Shortest Path First (SPF) Routing 13 draft-ietf-lsvr-bgp-spf-12 15 Abstract 17 Many Massively Scaled Data Centers (MSDCs) have converged on 18 simplified layer 3 routing. Furthermore, requirements for 19 operational simplicity have led many of these MSDCs to converge on 20 BGP as their single routing protocol for both their fabric routing 21 and their Data Center Interconnect (DCI) routing. This document 22 describes extensions to BGP to use BGP Link-State distribution and 23 the Shortest Path First (SPF) algorithm used by Internal Gateway 24 Protocols (IGPs) such as OSPF. In doing this, it allows BGP to be 25 efficiently used as both the underlay protocol and the overlay 26 protocol in MSDCs. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on July 30, 2021. 45 Copyright Notice 47 Copyright (c) 2021 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 64 1.2. BGP Shortest Path First (SPF) Motivation . . . . . . . . 4 65 1.3. Document Overview . . . . . . . . . . . . . . . . . . . . 6 66 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 6 67 2. Base BGP Protocol Relationship . . . . . . . . . . . . . . . 6 68 3. BGP Link-State (BGP-LS) Relationship . . . . . . . . . . . . 7 69 4. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 8 70 4.1. BGP Single-Hop Peering on Network Node Connections . . . 8 71 4.2. BGP Peering Between Directly-Connected Nodes . . . . . . 8 72 4.3. BGP Peering in Route-Reflector or Controller Topology . . 9 73 5. BGP Shortest Path Routing (SPF) Protocol Extensions . . . . . 9 74 5.1. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . 9 75 5.1.1. BGP-LS-SPF NLRI TLVs . . . . . . . . . . . . . . . . 9 76 5.1.2. BGP-LS Attribute . . . . . . . . . . . . . . . . . . 10 77 5.2. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . 11 78 5.2.1. Node NLRI Usage . . . . . . . . . . . . . . . . . . . 11 79 5.2.1.1. Node NLRI Attribute SPF Capability TLV . . . . . 11 80 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV . . 12 81 5.2.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . 13 82 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length TLVs 14 83 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV . . 14 84 5.2.3. IPv4/IPv6 Prefix NLRI Usage . . . . . . . . . . . . . 15 85 5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status TLV . 16 86 5.2.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . 16 87 5.3. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 17 88 6. Decision Process with SPF Algorithm . . . . . . . . . . . . . 18 89 6.1. BGP NLRI Selection . . . . . . . . . . . . . . . . . . . 19 90 6.1.1. BGP Self-Originated NLRI . . . . . . . . . . . . . . 20 91 6.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 20 92 6.3. SPF Calculation based on BGP-LS-SPF NLRI . . . . . . . . 20 93 6.4. IPv4/IPv6 Unicast Address Family Interaction . . . . . . 25 94 6.5. NLRI Advertisement . . . . . . . . . . . . . . . . . . . 25 95 6.5.1. Link/Prefix Failure Convergence . . . . . . . . . . . 25 96 6.5.2. Node Failure Convergence . . . . . . . . . . . . . . 26 97 7. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 26 98 7.1. Processing of BGP-LS-SPF TLVs . . . . . . . . . . . . . . 26 99 7.2. Processing of BGP-LS-SPF NLRIs . . . . . . . . . . . . . 27 100 7.3. Processing of BGP-LS Attribute . . . . . . . . . . . . . 28 101 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 102 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 103 10. Management Considerations . . . . . . . . . . . . . . . . . . 31 104 10.1. Configuration . . . . . . . . . . . . . . . . . . . . . 31 105 10.1.1. Link Metric Configuration . . . . . . . . . . . . . 31 106 10.1.2. backoff-config . . . . . . . . . . . . . . . . . . . 31 107 10.2. Operational Data . . . . . . . . . . . . . . . . . . . . 31 108 11. Implementation Status . . . . . . . . . . . . . . . . . . . . 32 109 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32 110 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 32 111 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 112 14.1. Normative References . . . . . . . . . . . . . . . . . . 33 113 14.2. Informational References . . . . . . . . . . . . . . . . 35 114 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 36 116 1. Introduction 118 Many Massively Scaled Data Centers (MSDCs) have converged on 119 simplified layer 3 routing. Furthermore, requirements for 120 operational simplicity have led many of these MSDCs to converge on 121 BGP [RFC4271] as their single routing protocol for both their fabric 122 routing and their Data Center Interconnect (DCI) routing [RFC7938]. 123 This document describes an alternative solution which leverages BGP- 124 LS [RFC7752] and the Shortest Path First algorithm used by Internal 125 Gateway Protocols (IGPs) such as OSPF [RFC2328]. 127 This document leverages both the BGP protocol [RFC4271] and the BGP- 128 LS [RFC7752] protocols. The relationship, as well as the scope of 129 changes are described respectively in Section 2 and Section 3. The 130 modifications to [RFC4271] for BGP SPF described herein only apply to 131 IPv4 and IPv6 as underlay unicast Subsequent Address Families 132 Identifiers (SAFIs). Operations for any other BGP SAFIs are outside 133 the scope of this document. 135 This solution avails the benefits of both BGP and SPF-based IGPs. 136 These include TCP based flow-control, no periodic link-state refresh, 137 and completely incremental NLRI advertisement. These advantages can 138 reduce the overhead in MSDCs where there is a high degree of Equal 139 Cost Multi-Path (ECMPs) and the topology is very stable. 140 Additionally, using an SPF-based computation can support fast 141 convergence and the computation of Loop-Free Alternatives (LFAs). 142 The SPF LFA extensions defined in [RFC5286] can be similarly applied 143 to BGP SPF calculations. However, the details are a matter of 144 implementation detail. Furthermore, a BGP-based solution lends 145 itself to multiple peering models including those incorporating 146 route-reflectors [RFC4456] or controllers. 148 1.1. Terminology 150 This specification reuses terms defined in section 1.1 of [RFC4271] 151 including BGP speaker, NLRI, and Route. 153 Additionally, this document introduces the following terms: 155 BGP SPF Routing Domain: A set of BGP routers that are under a single 156 administrative domain and exchange link-state information using 157 the BGP-LS-SPF SAFI and compute routes using BGP SPF as described 158 herein. 160 BGP-LS-SPF NLRI: This refers to BGP-LS Network Layer Reachability 161 Information (NLRI) that is being advertised in the BGP-LS-SPF SAFI 162 (Section 5.1) and is being used for BGP SPF route computation. 164 Dijkstra Algorithm: An algorithm for computing the shortest path 165 from a given node in a graph to every other node in the graph. At 166 each iteration of the algorithm, there is a list of candidate 167 vertices. Paths from the root to these vertices have been found, 168 but not necessarily the shortest ones. However, the paths to the 169 candidate vertex that is closest to the root are guaranteed to be 170 shortest; this vertex is added to the shortest-path tree, removed 171 from the candidate list, and its adjacent vertices are examined 172 for possible addition to/modification of the candidate list. The 173 algorithm then iterates again. It terminates when the candidate 174 list becomes empty. [RFC2328] 176 1.2. BGP Shortest Path First (SPF) Motivation 178 Given that [RFC7938] already describes how BGP could be used as the 179 sole routing protocol in an MSDC, one might question the motivation 180 for defining an alternate BGP deployment model when a mature solution 181 exists. For both alternatives, BGP offers the operational benefits 182 of a single routing protocol as opposed to the combination of an IGP 183 for the underlay and BGP as an overlay. However, BGP SPF offers some 184 unique advantages above and beyond standard BGP distance-vector 185 routing. With BGP SPF, the standard hop-by-hop peering model is 186 relaxed. 188 A primary advantage is that all BGP-LS-SPF speakers in the BGP SPF 189 routing domain will have a complete view of the topology. This will 190 allow support for ECMP, IP fast-reroute (e.g., Loop-Free 191 Alternatives), Shared Risk Link Groups (SRLGs), and other routing 192 enhancements without advertisement of additional BGP paths [RFC7911] 193 or other extensions. In short, the advantages of an IGP such as OSPF 194 [RFC2328] are availed in BGP. 196 With the simplified BGP decision process as defined in Section 6, 197 NLRI changes can be disseminated throughout the BGP routing domain 198 much more rapidly (equivalent to IGPs with the proper 199 implementation). The added advantage of BGP using TCP for reliable 200 transport leverages TCP's inherent flow-control and guaranteed in- 201 order delivery. 203 Another primary advantage is a potential reduction in NLRI 204 advertisement. With standard BGP distance-vector routing, a single 205 link failure may impact 100s or 1000s prefixes and result in the 206 withdrawal or re-advertisement of the attendant NLRI. With BGP SPF, 207 only the BGP speakers corresponding to the link NLRI need to withdraw 208 the corresponding BGP-LS-SPF Link NLRI. Additionally, the changed 209 NLRI will be advertised immediately as opposed to normal BGP where it 210 is only advertised after the best route selection. These advantages 211 will afford NLRI dissemination throughout the BGP SPF routing domain 212 with efficiencies similar to link-state protocols. 214 With controller and route-reflector peering models, BGP SPF 215 advertisement and distributed computation require a minimal number of 216 sessions and copies of the NLRI since only the latest version of the 217 NLRI from the originator is required. Given that verification of the 218 adjacencies is done outside of BGP (see Section 4), each BGP speaker 219 will only need as many sessions and copies of the NLRI as required 220 for redundancy (see Section 4). Additionally, a controller could 221 inject topology that is learned outside the BGP SPF routing domain. 223 Given that controllers are already consuming BGP-LS NLRI [RFC7752], 224 this functionality can be reused for BGP-LS-SPF NLRI. 226 Another potential advantage of BGP SPF is that both IPv6 and IPv4 can 227 both be supported using the BGP-LS-SPF SAFI with the same BGP-LS-SPF 228 NLRIs. In many MSDC fabrics, the IPv4 and IPv6 topologies are 229 congruent. Although beyond the scope of this document, multi- 230 topology extensions could be used to support separate IPv4, IPv6, 231 unicast, and multicast topologies while sharing the same NLRI. 233 Finally, the BGP SPF topology can be used as an underlay for other 234 BGP SAFIs (using the existing model) and realize all the above 235 advantages. 237 1.3. Document Overview 239 The document begins with sections defining the precise relationship 240 that BGP SPF has with both the base BGP protocol [RFC4271] 241 (Section 2) and the BGP Link-State (BGP-LS) extensions [RFC7752] 242 (Section 3). This is required to dispel the notion that BGP SPF is 243 an independent protocol. The BGP peering models, as well as the 244 their respective trade-offs are then discussed in Section 4. The 245 remaining sections, which make up the bulk of the document, define 246 the protocol enhancements necessary to support BGP SPF. The BGP-LS 247 extensions to support BGP SPF are defined in Section 5. The 248 replacement of the base BGP decision process with the SPF computation 249 is specified in Section 6. Finally, BGP SPF error handling is 250 defined in Section 7 252 1.4. Requirements Language 254 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 255 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 256 "OPTIONAL" in this document are to be interpreted as described in BCP 257 14 [RFC2119] [RFC8174] when, and only when, they appear in all 258 capitals, as shown here. 260 2. Base BGP Protocol Relationship 262 With the exception of the decision process, the BGP SPF extensions 263 leverage the BGP protocol [RFC4271] without change. This includes 264 the BGP protocol Finite State Machine, BGP messages and their 265 encodings, processing of BGP messages, BGP attributes and path 266 attributes, BGP NLRI encodings, and any error handling defined in the 267 [RFC4271] and [RFC7606]. 269 Due to the changes to the decision process, there are mechanisms and 270 encodings that are no longer applicable. While not necessarily 271 required for computation, the ORIGIN, AS_PATH, MULTI_EXIT_DISC, 272 LOCAL_PREF, and NEXT_HOP path attributes are mandatory and will be 273 validated. The ATOMIC_AGGEGATE, and AGGREGATOR are not applicable 274 within the context of BGP SPF and SHOULD NOT be advertised. However, 275 if they are advertised, they will be accepted, validated, and 276 propagated consistent with the BGP protocol. 278 Section 9 of [RFC4271] defines the decision process that is used to 279 select routes for subsequent advertisement by applying the policies 280 in the local Policy Information Base (PIB) to the routes stored in 281 its Adj-RIBs-In. The output of the Decision Process is the set of 282 routes that are announced by a BGP speaker to its peers. These 283 selected routes are stored by a BGP speaker in the speaker's Adj- 284 RIBs-Out according to policy. 286 The BGP SPF extension fundamentally changes the decision process, as 287 described herein, to be more like a link-state protocol (e.g., OSPF 288 [RFC2328]). Specifically: 290 1. BGP advertisements are readvertised to neighbors immediately 291 without waiting or dependence on the route computation as 292 specified in phase 3 of the base BGP decision process. Multiple 293 peering models are supported as specified in Section 4. 295 2. Determining the degree of preference for BGP routes for the SPF 296 calculation as described in phase 1 of the base BGP decision 297 process is replaced with the mechanisms in Section 6.1. 299 3. Phase 2 of the base BGP protocol decision process is replaced 300 with the Shortest Path First (SPF) algorithm, also known as the 301 Dijkstra algorithm Section 1.1. 303 3. BGP Link-State (BGP-LS) Relationship 305 [RFC7752] describes a mechanism by which link-state and TE 306 information can be collected from networks and shared with external 307 entities using BGP. This is achieved by defining NLRI advertised 308 using the BGP-LS AFI. The BGP-LS extensions defined in [RFC7752] 309 make use of the decision process defined in [RFC4271]. This document 310 reuses NLRI and TLVs defined in [RFC7752]. Rather than reusing the 311 BGP-LS SAFI, the BGP-LS-SPF SAFI Section 5.1 is introduced to insure 312 backward compatibility for the BGP-LS SAFI usage. 314 The BGP SPF extensions reuse the Node, Link, and Prefix NLRI defined 315 in [RFC7752]. The usage of the BGP-LS NLRI, metric attributes, and 316 attribute extensions is described in Section 5.2.1. The usage of 317 others BGP-LS attributes is not precluded and is, in fact, expected. 318 However, the details are beyond the scope of this document and will 319 be specified in future documents. 321 Support for Multiple Topology Routing (MTR) similar to the OSPF MTR 322 computation described in [RFC4915] is beyond the scope of this 323 document. Consequently, the usage of the Multi-Topology TLV as 324 described in section 3.2.1.5 of [RFC7752] is not specified. 326 The rules for setting the NLRI next-hop path attribute for the BGP- 327 LS-SPF SAFI will follow the BGP-LS SAFI as specified in section 3.4 328 of [RFC7752]. 330 4. BGP Peering Models 332 Depending on the topology, scaling, capabilities of the BGP-LS-SPF 333 speakers, and redundancy requirements, various peering models are 334 supported. The only requirements are that all BGP SPF speakers in 335 the BGP SPF routing domain exchange BGP-LS-SPF NLRI, run an SPF 336 calculation, and update their routing table appropriately. 338 4.1. BGP Single-Hop Peering on Network Node Connections 340 The simplest peering model is the one where EBGP single-hop sessions 341 are established over direct point-to-point links interconnecting the 342 nodes in the BGP SPF routing domain. Once the single-hop BGP session 343 has been established and the BGP-LS-SPF AFI/SAFI capability has been 344 exchanged [RFC4760] for the corresponding session, then the link is 345 considered up from a BGP SPF perspective and the corresponding BGP- 346 LS-SPF Link NLRI is advertised. If the session goes down, the 347 corresponding Link NLRI will be withdrawn. Topologically, this would 348 be equivalent to the peering model in [RFC7938] where there is a BGP 349 session on every link in the data center switch fabric. The content 350 of the Link NLRI is described in Section 5.2.2. 352 4.2. BGP Peering Between Directly-Connected Nodes 354 In this model, BGP-LS-SPF speakers peer with all directly-connected 355 nodes but the sessions may be between loopback addresses (i.e., two- 356 hop sessions) and the direct connection discovery and liveliness 357 detection for the interconnecting links are independent of the BGP 358 protocol. the scope of this document. For example, liveliness 359 detection could be done using the BFD protocol [RFC5880]. Precisely 360 how discovery and liveliness detection is accomplished is outside the 361 scope of this document. Consequently, there will be a single BGP 362 session even if there are multiple direct connections between BGP-LS- 363 SPF speakers. BGP-LS-SPF Link NLRI is advertised as long as a BGP 364 session has been established, the BGP-LS-SPF AFI/SAFI capability has 365 been exchanged [RFC4760], and the link is operational as determined 366 using liveliness detection mechanisms outside the scope of this 367 document. This is much like the previous peering model only peering 368 is between loopback addresses and the interconnecting links can be 369 unnumbered. However, since there are BGP sessions between every 370 directly-connected node in the BGP SPF routing domain, there is only 371 a reduction in BGP sessions when there are parallel links between 372 nodes. 374 4.3. BGP Peering in Route-Reflector or Controller Topology 376 In this model, BGP-LS-SPF speakers peer solely with one or more Route 377 Reflectors [RFC4456] or controllers. As in the previous model, 378 direct connection discovery and liveliness detection for those links 379 in the BGP SPF routing domain are done outside of the BGP protocol. 380 BGP-LS-SPF Link NLRI is advertised as long as the corresponding link 381 is considered up as per the chosen liveness detection mechanism. 383 This peering model, known as sparse peering, allows for fewer BGP 384 sessions and, consequently, fewer instances of the same NLRI received 385 from multiple peers. Normally, the route-reflectors or controller 386 BGP sessions would be on directly-connected links to avoid dependence 387 on another routing protocol for session connectivity. However, 388 multi-hop peering is not precluded. The number of BGP sessions is 389 dependent on the redundancy requirements and the stability of the BGP 390 sessions. This is discussed in greater detail in 391 [I-D.ietf-lsvr-applicability]. 393 5. BGP Shortest Path Routing (SPF) Protocol Extensions 395 5.1. BGP-LS Shortest Path Routing (SPF) SAFI 397 In order to replace the existing BGP decision process with an SPF- 398 based decision process in a backward compatible manner by not 399 impacting the BGP-LS SAFI, this document introduces the BGP-LS-SPF 400 SAFI. The BGP-LS-SPF (AFI 16388 / SAFI 80) [RFC4760] is allocated by 401 IANA as specified in the Section 8. In order for two BGP-LS-SPF 402 speakers to exchange BGP SPF NLRI, they MUST exchange the 403 Multiprotocol Extensions Capability [RFC5492] [RFC4760] to ensure 404 that they are both capable of properly processing such NLRI. This is 405 done with AFI 16388 / SAFI 80 for BGP-LS-SPF advertised within the 406 BGP SPF Routing Domain. The BGP-LS-SPF SAFI is used to carry IPv4 407 and IPv6 prefix information in a format facilitating an SPF-based 408 decision process. 410 5.1.1. BGP-LS-SPF NLRI TLVs 412 The NLRI format of BGP-LS-SPF SAFI uses exactly same format as the 413 BGP-LS AFI [RFC7752]. In other words, all the TLVs used in BGP-LS 414 AFI are applicable and used for the BGP-LS-SPF SAFI. These TLVs 415 within BGP-LS-SPF NLRI advertise information that describes links, 416 nodes, and prefixes comprising IGP link-state information. 418 In order to compare the NLRI efficiently, it is REQUIRED that all the 419 TLVs within the given NLRI must be ordered in ascending order by the 420 TLV type. For multiple TLVs of same type within a single NLRI, it is 421 REQUIRED that these TLVs are ordered in ascending order by the TLV 422 value field. Comparison of the value fields is performed by treating 423 the entire value field as a hexadecimal string. NLRIs having TLVs 424 which do not follow the ordering rules MUST be considered as 425 malformed and discarded with appropriate error logging. 427 [RFC7752] defines certain NLRI TLVs as a mandatory TLVs. These TLVs 428 are considered mandatory for the BGP-LS-SPF SAFI as well. All the 429 other TLVs are considered as an optional TLVs. 431 5.1.2. BGP-LS Attribute 433 The BGP-LS attribute of the BGP-LS-SPF SAFI uses exactly same format 434 of the BGP-LS AFI [RFC7752]. In other words, all the TLVs used in 435 BGP-LS attribute of the BGP-LS AFI are applicable and used for the 436 BGP-LS attribute of the BGP-LS-SPF SAFI. This attribute is an 437 optional, non-transitive BGP attribute that is used to carry link, 438 node, and prefix properties and attributes. The BGP-LS attribute is 439 a set of TLVs. 441 The BGP-LS attribute may potentially grow large in size depending on 442 the amount of link-state information associated with a single Link- 443 State NLRI. The BGP specification [RFC4271] mandates a maximum BGP 444 message size of 4096 octets. It is RECOMMENDED that an 445 implementation support [RFC8654] in order to accommodate larger size 446 of information within the BGP-LS Attribute. BGP-LS-SPF speakers MUST 447 ensure that they limit the TLVs included in the BGP-LS Attribute to 448 ensure that a BGP update message for a single Link-State NLRI does 449 not cross the maximum limit for a BGP message. The determination of 450 the types of TLVs to be included by the BGP-LS-SPF speaker 451 originating the attribute is outside the scope of this document. 452 When a BGP-LS-SPF speaker finds that it is exceeding the maximum BGP 453 message size due to addition or update of some other BGP Attribute 454 (e.g., AS_PATH), it MUST consider the BGP-LS Attribute to be 455 malformed and the attribute discard handling of [RFC7606] applies. 457 In order to compare the BGP-LS attribute efficiently, it is REQUIRED 458 that all the TLVs within the given attribute must be ordered in 459 ascending order by the TLV type. For multiple TLVs of same type 460 within a single attribute, it is REQUIRED that these TLVs are ordered 461 in ascending order by the TLV value field. Comparison of the value 462 fields is performed by treating the entire value field as a 463 hexadecimal string. Attributes having TLVs which do not follow the 464 ordering rules MUST NOT be considered as malformed. 466 All TLVs within the BGP-LS Attribute are considered optional unless 467 specified otherwise. 469 5.2. Extensions to BGP-LS 471 [RFC7752] describes a mechanism by which link-state and TE 472 information can be collected from IGPs and shared with external 473 components using the BGP protocol. It describes both the definition 474 of the BGP-LS-SPF NLRI that advertise links, nodes, and prefixes 475 comprising IGP link-state information and the definition of a BGP 476 path attribute (BGP-LS attribute) that carries link, node, and prefix 477 properties and attributes, such as the link and prefix metric or 478 auxiliary Router-IDs of nodes, etc. This document extends the usage 479 of BGP-LS NLRI for the purpose of BGP SPF calculation via 480 advertisement in the BGP-LS-SPF SAFI. 482 The protocol identifier specified in the Protocol-ID field [RFC7752] 483 will represent the origin of the advertised NLRI. For Node NLRI and 484 Link NLRI, this MUST be the direct protocol (4). Node or Link NLRI 485 with a Protocol-ID other than direct will be considered malformed. 486 For Prefix NLRI, the specified Protocol-ID MUST be the origin of the 487 prefix. The local and remote node descriptors for all NLRI MUST 488 include the BGP Identifier (TLV 516) and the AS Number (TLV 512) 489 [RFC7752]. The BGP Confederation Member (TLV 517) [RFC7752] is not 490 appliable and SHOULD not be included. If TLV 517 is included, it 491 will be ignored. 493 5.2.1. Node NLRI Usage 495 The Node NLRI MUST be advertised unconditionally by all routers in 496 the BGP SPF routing domain. 498 5.2.1.1. Node NLRI Attribute SPF Capability TLV 500 The SPF capability is an additional Node Attribute TLV. This 501 attribute TLV MUST be included with the BGP-LS-SPF SAFI and SHOULD 502 NOT be used for other SAFIs. The TLV type 1180 will be assigned by 503 IANA. The Node Attribute TLV will contain a single-octet SPF 504 algorithm as defined in [RFC8665]. 506 0 1 2 3 507 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 508 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 | Type (1180) | Length - (1 Octet) | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | SPF Algorithm | 512 +-+-+-+-+-+-+-+-+ 514 The SPF algorithm inherits the values from the IGP Algorithm Types 515 registry [RFC8665]. Algorithm 0, (Shortest Path Algorithm (SPF) 516 based on link metric, is supported and described in Section 6.3. 517 Support for other algorithm types is beyond the scope of this 518 specification. 520 When computing the SPF for a given BGP routing domain, only BGP nodes 521 advertising the SPF capability TLV with same SPF algorithm will be 522 included in the Shortest Path Tree (SPT). An implementation MAY 523 optionally log detection of a BGP node that has either not advertised 524 the SPF capability TLV or is advertising the SPF capability TLV with 525 an algorithm type other than 0. 527 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV 529 A BGP-LS Attribute TLV of the BGP-LS-SPF Node NLRI is defined to 530 indicate the status of the node with respect to the BGP SPF 531 calculation. This will be used to rapidly take a node out of service 532 Section 6.5.2 or to indicate the node is not to be used for transit 533 (i.e., non-local) traffic Section 6.3. If the SPF Status TLV is not 534 included with the Node NLRI, the node is considered to be up and is 535 available for transit traffic. The SPF status is acted upon with the 536 execution of the next SPF calculation Section 6.3. A single TLV type 537 will be shared by the BGP-LS-SPF Node, Link, and Prefix NLRI. The 538 TLV type 1184 will be assigned by IANA. 540 0 1 2 3 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | Type (1184) | Length (1 Octet) | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | SPF Status | 546 +-+-+-+-+-+-+-+-+ 548 BGP Status Values: 0 - Reserved 549 1 - Node Unreachable with respect to BGP SPF 550 2 - Node does not support transit with respect 551 to BGP SPF 552 3-254 - Undefined 553 255 - Reserved 555 If the SPF Status TLV is received and the corresponding Node NLRI has 556 not been received, then the SPF Status TLV is ignored and not used in 557 SPF computation but is still announced to other BGP speakers. An 558 implementation MAY log an error for further analysis. If a BGP 559 speaker received the Node NLRI but the SPF Status TLV is not 560 received, then any previously received information is considered as 561 implicitly withdrawn and the update is propagated to other BGP 562 speakers. A BGP speaker receiving a BGP Update containing a SPF 563 Status TLV in the BGP-LS attribute [RFC7752] with a value that is 564 outside the range of defined values SHOULD be processed and announced 565 to other BGP speakers. However, a BGP speaker MUST not use the 566 Status TLV in its SPF computation. An implementation MAY log this 567 condition for further analysis. 569 5.2.2. Link NLRI Usage 571 The criteria for advertisement of Link NLRI are discussed in 572 Section 4. 574 Link NLRI is advertised with unique local and remote node descriptors 575 dependent on the IP addressing. For IPv4 links, the link's local 576 IPv4 (TLV 259) and remote IPv4 (TLV 260) addresses will be used. For 577 IPv6 links, the local IPv6 (TLV 261) and remote IPv6 (TLV 262) 578 addresses will be used. For unnumbered links, the link local/remote 579 identifiers (TLV 258) will be used. For links supporting having both 580 IPv4 and IPv6 addresses, both sets of descriptors MAY be included in 581 the same Link NLRI. The link identifiers are described in table 5 of 582 [RFC7752]. 584 For a link to be used in Shortest Path Tree (SPT) for a given address 585 family, i.e., IPv4 or IPv6, both routers connecting the link MUST 586 have an address in the same subnet for that address family. However, 587 an IPv4 or IPv6 prefix associated with the link MAY be installed 588 without the corresponding address on the other side of link. 590 The link IGP metric attribute TLV (TLV 1095) MUST be advertised. If 591 a BGP speaker receives a Link NLRI without an IGP metric attribute 592 TLV, then it SHOULD consider the received NLRI as a malformed and the 593 receiving BGP speaker MUST handle such malformed NLRI as 'Treat-as- 594 withdraw' [RFC7606]. The BGP SPF metric length is 4 octets. Like 595 OSPF [RFC2328], a cost is associated with the output side of each 596 router interface. This cost is configurable by the system 597 administrator. The lower the cost, the more likely the interface is 598 to be used to forward data traffic. One possible default for metric 599 would be to give each interface a cost of 1 making it effectively a 600 hop count. Algorithms such as setting the metric inversely to the 601 link speed as supported in the OSPF MIB [RFC4750] MAY be supported. 602 However, this is beyond the scope of this document. Refer to 603 Section 10.1.1 for operational guidance. 605 The usage of other link attribute TLVs is beyond the scope of this 606 document. 608 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length TLVs 610 Two BGP-LS Attribute TLVs of the BGP-LS-SPF Link NLRI are defined to 611 advertise the prefix length associated with the IPv4 and IPv6 link 612 prefixes derived from the link descriptor addresses. The prefix 613 length is used for the optional installation of prefixes 614 corresponding to Link NLRI as defined in Section 6.3. 616 0 1 2 3 617 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 |IPv4 (1182) or IPv6 Type (1183)| Length (1 Octet) | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 621 | Prefix-Length | 622 +-+-+-+-+-+-+-+-+ 624 Prefix-length - A one-octet length restricted to 1-32 for IPv4 625 Link NLRI endpoint prefixes and 1-128 for IPv6 626 Link NLRI endpoint prefixes. 628 The Prefix-Length TLV is only relevant to Link NLRIs. The Prefix- 629 Length TLVs MUST be discarded as an error and not passed to other BGP 630 peers as specified in [RFC7606] when received with any NLRIs other 631 than Link NRLIs. An implementation MAY log an error for further 632 analysis. 634 The maximum prefix-length for IPv4 Prefix-Length TLV is 32 bits. A 635 prefix-length field indicating a larger value than 32 bits MUST be 636 discarded as an error and the received TLV is not passed to other BGP 637 peers as specified in [RFC7606]. The corresponding Link NLRI is 638 considered as malformed and MUST be handled as 'Treat-as-withdraw'. 639 An implementation MAY log an error for further analysis. 641 The maximum prefix-length for IPv6 Prefix-Length Type is 128 bits. A 642 prefix-length field indicating a larger value than 128 bits MUST be 643 discarded as an error and the received TLV is not passed to other BGP 644 peers as specified in [RFC7606]. The corresponding Link NLRI is 645 considered as malformed and MUST be handled as 'Treat-as-withdraw'. 646 An implementation MAY log an error for further analysis. 648 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV 650 A BGP-LS Attribute TLV of the BGP-LS-SPF Link NLRI is defined to 651 indicate the status of the link with respect to the BGP SPF 652 calculation. This will be used to expedite convergence for link 653 failures as discussed in Section 6.5.1. If the SPF Status TLV is not 654 included with the Link NLRI, the link is considered up and available. 655 The SPF status is acted upon with the execution of the next SPF 656 calculation Section 6.3. A single TLV type will be shared by the 657 Node, Link, and Prefix NLRI. The TLV type 1184 will be assigned by 658 IANA. 660 0 1 2 3 661 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 663 | Type (1184) | Length (1 Octet) | 664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 665 | SPF Status | 666 +-+-+-+-+-+-+-+-+ 668 BGP Status Values: 0 - Reserved 669 1 - Link Unreachable with respect to BGP SPF 670 2-254 - Undefined 671 255 - Reserved 673 If the SPF Status TLV is received and the corresponding Link NLRI has 674 not been received, then the SPF Status TLV is ignored and not used in 675 SPF computation but is still announced to other BGP speakers. An 676 implementation MAY log an error for further analysis. If a BGP 677 speaker received the Link NLRI but the SPF Status TLV is not 678 received, then any previously received information is considered as 679 implicitly withdrawn and the update is propagated to other BGP 680 speakers. A BGP speaker receiving a BGP Update containing an SPF 681 Status TLV in the BGP-LS attribute [RFC7752] with a value that is 682 outside the range of defined values SHOULD be processed and announced 683 to other BGP speakers. However, a BGP speaker MUST not use the 684 Status TLV in its SPF computation. An implementation MAY log this 685 information for further analysis. 687 5.2.3. IPv4/IPv6 Prefix NLRI Usage 689 IPv4/IPv6 Prefix NLRI is advertised with a Local Node Descriptor and 690 the prefix and length. The Prefix Descriptors field includes the IP 691 Reachability Information TLV (TLV 265) as described in [RFC7752]. 692 The prefix metric attribute TLV (TLV 1155) MUST be advertised. The 693 IGP Route Tag TLV (TLV 1153) MAY be advertised. The usage of other 694 attribute TLVs is beyond the scope of this document. For loopback 695 prefixes, the metric should be 0. For non-loopback prefixes, the 696 setting of the metric is a local matter and beyond the scope of this 697 document. 699 5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status TLV 701 A BGP-LS Attribute TLV to BGP-LS-SPF Prefix NLRI is defined to 702 indicate the status of the prefix with respect to the BGP SPF 703 calculation. This will be used to expedite convergence for prefix 704 unreachability as discussed in Section 6.5.1. If the SPF Status TLV 705 is not included with the Prefix NLRI, the prefix is considered 706 reachable. A single TLV type will be shared by the Node, Link, and 707 Prefix NLRI. The TLV type 1184 will be assigned by IANA. 709 0 1 2 3 710 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | Type (1184) | Length (1 Octet) | 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | SPF Status | 715 +-+-+-+-+-+-+-+-+ 717 BGP Status Values: 0 - Reserved 718 1 - Prefix Unreachable with respect to SPF 719 2-254 - Undefined 720 255 - Reserved 722 If the SPF Status TLV is received and the corresponding Prefix NLRI 723 has not been received, then the SPF Status TLV is ignored and not 724 used in SPF computation but is still announced to other BGP speakers. 725 An implementation MAY log an error for further analysis. If a BGP 726 speaker received the Prefix NLRI but the SPF Status TLV is not 727 received, then any previously received information is considered as 728 implicitly withdrawn and the update is propagated to other BGP 729 speakers. A BGP speaker receiving a BGP Update containing an SPF 730 Status TLV in the BGP-LS attribute [RFC7752] with a value that is 731 outside the range of defined values SHOULD be processed and announced 732 to other BGP speakers. However, a BGP speaker MUST not use the 733 Status TLV in its SPF computation. An implementation MAY log this 734 information for further analysis. 736 5.2.4. BGP-LS Attribute Sequence-Number TLV 738 A BGP-LS Attribute TLV of the BGP-LS-SPF NLRI types is defined to 739 assure the most recent version of a given NLRI is used in the SPF 740 computation. The Sequence-Number TLV is mandatory for BGP-LS-SPF 741 NLRI. The TLV type 1181 has been assigned by IANA. The BGP-LS 742 Attribute TLV will contain an 8-octet sequence number. The usage of 743 the Sequence Number TLV is described in Section 6.1. 745 0 1 2 3 746 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 | Type (1181) | Length (8 Octets) | 749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 750 | Sequence Number (High-Order 32 Bits) | 751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 752 | Sequence Number (Low-Order 32 Bits) | 753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 755 Sequence Number 757 The 64-bit strictly-increasing sequence number MUST be incremented 758 for every self-originated version of BGP-LS-SPF NLRI. BGP speakers 759 implementing this specification MUST use available mechanisms to 760 preserve the sequence number's strictly increasing property for the 761 deployed life of the BGP speaker (including cold restarts). One 762 mechanism for accomplishing this would be to use the high-order 32 763 bits of the sequence number as a wrap/boot count that is incremented 764 any time the BGP router loses its sequence number state or the low- 765 order 32 bits wrap. 767 When incrementing the sequence number for each self-originated NLRI, 768 the sequence number should be treated as an unsigned 64-bit value. 769 If the lower-order 32-bit value wraps, the higher-order 32-bit value 770 should be incremented and saved in non-volatile storage. If by some 771 chance the BGP-LS-SPF speaker is deployed long enough that there is a 772 possibility that the 64-bit sequence number may wrap or a BGP-LS-SPF 773 speaker completely loses its sequence number state (e.g., the BGP 774 speaker hardware is replaced or experiences a cold-start), the BGP 775 NLRI selection rules (see Section 6.1) will insure convergence, 776 albeit not immediately. 778 The Sequence-Number TLV is mandatory for BGP-LS-SPF NLRI. If the 779 Sequence-Number TLV is not received then the corresponding Link NLRI 780 is considered as malformed and MUST be handled as 'Treat-as- 781 withdraw'. An implementation MAY log an error for further analysis. 783 5.3. NEXT_HOP Manipulation 785 All BGP peers that support SPF extensions would locally compute the 786 Loc-RIB Next-Hop as a result of the SPF process. Consequently, the 787 Next-Hop is always ignored on receipt. The Next-Hop address MUST be 788 encoded as described in [RFC4760]. BGP speakers MUST interpret the 789 Next-Hop address of MP_REACH_NLRI attribute as an IPv4 address 790 whenever the length of the Next-Hop address is 4 octets, and as a 791 IPv6 address whenever the length of the Next-Hop address is 16 792 octets. 794 [RFC4760] modifies the rules of NEXT_HOP attribute whenever the 795 multiprotocol extensions for BGP-4 are enabled. BGP speakers MUST 796 set the NEXT_HOP attribute according to the rules specified in 797 [RFC4760] as the BGP-LS-SPF routing information is carried within the 798 multiprotocol extensions for BGP-4. 800 6. Decision Process with SPF Algorithm 802 The Decision Process described in [RFC4271] takes place in three 803 distinct phases. The Phase 1 decision function of the Decision 804 Process is responsible for calculating the degree of preference for 805 each route received from a BGP speaker's peer. The Phase 2 decision 806 function is invoked on completion of the Phase 1 decision function 807 and is responsible for choosing the best route out of all those 808 available for each distinct destination, and for installing each 809 chosen route into the Loc-RIB. The combination of the Phase 1 and 2 810 decision functions is characterized as a Path Vector algorithm. 812 The SPF based Decision process replaces the BGP Decision process 813 described in [RFC4271]. This process starts with selecting only 814 those Node NLRI whose SPF capability TLV matches with the local BGP- 815 LS-SPF speaker's SPF capability TLV value. Since Link-State NLRI 816 always contains the local node descriptor Section 5.2.1, each NLRI is 817 uniquely originated by a single BGP-LS-SPF speaker in the BGP SPF 818 routing domain (the BGP node matching the NLRI's Node Descriptors). 819 Instances of the same NLRI originated by multiple BGP speakers would 820 be indicative of a configuration error or a masquerading attack 821 (Section 9). These selected Node NLRI and their Link/Prefix NLRI are 822 used to build a directed graph during the SPF computation as 823 described below. The best routes for BGP prefixes are installed in 824 the RIB as a result of the SPF process. 826 When BGP-LS-SPF NLRI is received, all that is required is to 827 determine whether it is the most recent by examining the Node-ID and 828 sequence number as described in Section 6.1. If the received NLRI 829 has changed, it will be advertised to other BGP-LS-SPF peers. If the 830 attributes have changed (other than the sequence number), a BGP SPF 831 calculation will be triggered. However, a changed NLRI MAY be 832 advertised immediately to other peers and prior to any SPF 833 calculation. Note that the BGP MinRouteAdvertisementIntervalTimer 834 and MinASOriginationIntervalTimer [RFC4271] timers are not applicable 835 to the BGP-LS-SPF SAFI. The scheduling of the SPF calculation, as 836 described in Section 6.3, is an implementation issue. Scheduling MAY 837 be dampened consistent with the SPF back-off algorithm specified in 838 [RFC8405]. 840 The Phase 3 decision function of the Decision Process [RFC4271] is 841 also simplified since under normal SPF operation, a BGP speaker MUST 842 advertise the changed NLRIs to all BGP peers with the BGP-LS-SPF AFI/ 843 SAFI and install the changed routes in the Global RIB. The only 844 exception are unchanged NLRIs or stale NLRIs, i.e., NLRI received 845 with a less recent (numerically smaller) sequence number. 847 6.1. BGP NLRI Selection 849 The rules for all BGP-LS-SPF NLRIs selection for phase 1 of the BGP 850 decision process, section 9.1.1 [RFC4271], no longer apply. 852 1. Routes originated by directly connected BGP SPF peers are 853 preferred. This condition can be determined by comparing the BGP 854 Identifiers in the received Local Node Descriptor and OPEN 855 message. This rule will assure that stale NLRI is updated even 856 if a BGP-LS router loses its sequence number state due to a cold- 857 start. 859 2. The NLRI with the most recent Sequence Number TLV, i.e., highest 860 sequence number is selected. 862 3. The route received from the BGP SPF speaker with the numerically 863 larger BGP Identifier is preferred. 865 When a BGP SPF speaker completely loses its sequence number state, 866 i.e., due to a cold start, or in the unlikely possibility that that 867 64-bit sequence number wraps, the BGP routing domain will still 868 converge. This is due to the fact that BGP speakers adjacent to the 869 router will always accept self-originated NLRI from the associated 870 speaker as more recent (rule # 1). When a BGP speaker reestablishes 871 a connection with its peers, any existing session will be taken down 872 and stale NLRI will be replaced. The adjacent BGP speaker will 873 update their NLRI advertisements, hop by hop, until the BGP routing 874 domain has converged. 876 The modified SPF Decision Process performs an SPF calculation rooted 877 at the BGP speaker using the metrics from the Link Attribute IGP 878 Metric TLV (1095) and the Prefix Attribute Prefix Metric TLV (1155) 879 [RFC7752]. As a result, any other BGP attributes that would 880 influence the BGP decision process defined in [RFC4271] including 881 ORIGIN, MULTI_EXIT_DISC, and LOCAL_PREF attributes are ignored by the 882 SPF algorithm. Furthermore, the NEXT_HOP attribute value is 883 preserved but otherwise ignored during the SPF computation for BGP- 884 LS-SPF NLRIs. The AS_PATH and AS4_PATH [RFC6793] attributes are 885 preserved and used for loop detection [RFC4271]. They are ignored 886 during the SPF computation for BGP-LS-SPF NRLIs. 888 6.1.1. BGP Self-Originated NLRI 890 Node, Link, or Prefix NLRI with Node Descriptors matching the local 891 BGP speaker are considered self-originated. When self-originated 892 NLRI is received and it doesn't match the local node's NLRI content 893 (including sequence number), special processing is required. 895 o If a self-originated NLRI is received and the sequence number is 896 more recent (i.e., greater than the local node's sequence number 897 for the NLRI), the NLRI sequence number will be advanced to one 898 greater than the received sequence number and the NLRI will be 899 readvertised to all peers. 901 o If self-originated NLRI is received and the sequence number is the 902 same as the local node's sequence number but the attributes 903 differ, the NLRI sequence number will be advanced to one greater 904 than the received sequence number and the NLRI will be 905 readvertised to all peers. 907 o If self-originated Link or Prefix NLRI is received and the Link or 908 Prefix NLRI is no longer being advertised by the local node, the 909 NLRI will be withdrawn. 911 The above actions are performed immediately when the first instance 912 of a newer self-originated NLRI is received. In this case, the newer 913 instance is considered to be a stale instance that was advertised by 914 the local node prior to a restart where the NLRI state is lost. 915 However, if subsequent newer self-originated NLRI is received for the 916 same Node, Link, or Prefix NLRI, the readvertisement or withdrawal is 917 delayed by 5 seconds since it is likely being advertised by a 918 misconfigured or rogue BGP-LS-SPF speaker Section 9. 920 6.2. Dual Stack Support 922 The SPF-based decision process operates on Node, Link, and Prefix 923 NLRIs that support both IPv4 and IPv6 addresses. Whether to run a 924 single SPF computation or multiple SPF computations for separate AFs 925 is an implementation matter. Normally, IPv4 next-hops are calculated 926 for IPv4 prefixes and IPv6 next-hops are calculated for IPv6 927 prefixes. 929 6.3. SPF Calculation based on BGP-LS-SPF NLRI 931 This section details the BGP-LS-SPF local routing information base 932 (RIB) calculation. The router will use BGP-LS-SPF Node, Link, and 933 Prefix NLRI to compute routes using the following algorithm. This 934 calculation yields the set of routes associated with the BGP-LS 935 domain. A router calculates the shortest-path tree using itself as 936 the root. Optimizations to the BGP-LS-SPF algorithm are possible but 937 MUST yield the same set of routes. The algorithm below supports 938 Equal Cost Multi-Path (ECMP) routes. Weighted Unequal Cost Multi- 939 Path routes are out of scope. The organization of this section owes 940 heavily to section 16 of [RFC2328]. 942 The following abstract data structures are defined in order to 943 specify the algorithm. 945 o Local Route Information Base (LOC-RIB) - This routing table 946 contains reachability information (i.e., next hops) for all 947 prefixes (both IPv4 and IPv6) as well as BGP-LS-SPF node 948 reachability. Implementations may choose to implement this with 949 separate RIBs for each address family and/or Prefix versus Node 950 reachability. It is synonymous with the Loc-RIB specified in 951 [RFC4271]. 953 o Global Routing Information Base (GLOBAL-RIB) - This is Routing 954 Information Base (RIB) containing the current routes that are 955 installed in the router's forwarding plane. This is commonly 956 referred to in networking parlance as "the RIB". 958 o Link State NLRI Database (LSNDB) - Database of BGP-LS-SPF NLRI 959 that facilitates access to all Node, Link, and Prefix NLRI. 961 o Candidate List (CAN-LIST) - This is a list of candidate Node 962 NLRIs. The list is sorted by the cost to reach the Node NLRI with 963 the Node NLRI with the lowest reachability cost at the head of the 964 list. This facilitates execution of the Dijkstra algorithm 965 Section 1.1 where the shortest paths between the local node and 966 other nodes in graph area computed. The CAN-LIST is typically 967 implemented as a heap but other data structures have been used. 969 The algorithm is comprised of the steps below: 971 1. The current LOC-RIB is invalidated, and the CAN-LIST is 972 initialized to empty. The LOC-RIB is rebuilt during the course 973 of the SPF computation. The existing routing entries are 974 preserved for comparison to determine changes that need to be 975 made to the GLOBAL-RIB in step 6. 977 2. The computing router's Node NLRI is updated in the LOC-RIB with a 978 cost of 0 and the Node NLRI is also added to the CAN-LIST. The 979 next-hop list is set to the internal loopback next-hop. 981 3. The Node NLRI with the lowest cost is removed from the candidate 982 list for processing. If the BGP-LS Node attribute includes an 983 SPF Status TLV (Section 5.2.1.2) indicating the node is 984 unreachable, the Node NLRI is ignored and the next lowest cost 985 Node NLRI is selected from candidate list. The Node 986 corresponding to this NLRI will be referred to as the Current- 987 Node. If the candidate list is empty, the SPF calculation has 988 completed and the algorithm proceeds to step 6. 990 4. All the Prefix NLRI with the same Node Identifiers as the 991 Current-Node will be considered for installation. The next- 992 hop(s) for these Prefix NLRI are inherited from the Current-Node. 993 The cost for each prefix is the metric advertised in the Prefix 994 Attribute Prefix Metric TLV (1155) added to the cost to reach the 995 Current-Node. The following will be done for each Prefix NLRI 996 (referred to as the Current-Prefix): 998 * If the BGP-LS Prefix attribute includes an SPF Status TLV 999 indicating the prefix is unreachable, the Current-Prefix is 1000 considered unreachable and the next Prefix NLRI is examined in 1001 Step 4. 1003 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1004 and the cost is less than the Current-Prefix's metric, the 1005 Current-Prefix does not contribute to the route and the next 1006 Prefix NLRI is examined in Step 4. 1008 * If the Current-Prefix's corresponding prefix is not in the 1009 LOC-RIB, the prefix is installed with the Current-Node's next- 1010 hops installed as the LOC-RIB route's next-hops and the metric 1011 being updated. If the IGP Route Tag TLV (1153) is included in 1012 the Current-Prefix's NLRI Attribute, the tag(s) are installed 1013 in the current LOC-RIB route's tag(s). 1015 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1016 and the cost is less than the current route's metric, the 1017 prefix is installed with the Current-Node's next-hops 1018 replacing the LOC-RIB route's next-hops and the metric being 1019 updated and any route tags removed. If the IGP Route Tag TLV 1020 (1153) is included in the Current-Prefix's NLRI Attribute, the 1021 tag(s) are installed in the current LOC-RIB route's tag(s). 1023 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1024 and the cost is the same as the current route's metric, the 1025 Current-Node's next-hops will be merged with LOC-RIB route's 1026 next-hops. If the IGP Route Tag TLV (1153) is included in the 1027 Current-Prefix's NLRI Attribute, the tag(s) are merged into 1028 the LOC-RIB route's current tags. 1030 5. All the Link NLRI with the same Node Identifiers as the Current- 1031 Node will be considered for installation. Each link will be 1032 examined and will be referred to in the following text as the 1033 Current-Link. The cost of the Current-Link is the advertised IGP 1034 Metric TLV (1095) from the Link NLRI BGP-LS attribute added to 1035 the cost to reach the Current-Node. If the Current-Node is for 1036 the local BGP Router, the next-hop for the link will be a direct 1037 next-hop pointing to the corresponding local interface. For any 1038 other Current-Node, the next-hop(s) for the Current-Link will be 1039 inherited from the Current-Node. The following will be done for 1040 each link: 1042 A. The prefix(es) associated with the Current-Link are installed 1043 into the LOC-RIB using the same rules as were used for Prefix 1044 NLRI in the previous steps. Optionally, in deployments where 1045 BGP-SPF routers have limited routing table capacity, 1046 installation of these subnets can be suppressed. Suppression 1047 will have an operational impact as the IPv4/IPv6 link 1048 endpoint addresses will not be reachable and tools such as 1049 traceroute will display addresses that are not reachable. 1051 B. If the Current-Node NLRI attributes includes the SPF status 1052 TLV (Section 5.2.1.2) and the status indicates that the Node 1053 doesn't support transit, the next link for the Current-Node 1054 is processed in Step 5. 1056 C. If the Current-Link's NLRI attribute includes an SPF Status 1057 TLV indicating the link is down, the BGP-LS-SPF Link NLRI is 1058 considered down and the next link for the Current-Node is 1059 examined in Step 5. 1061 D. The Current-Link's Remote Node NLRI is accessed (i.e., the 1062 Node NLRI with the same Node identifiers as the Current- 1063 Link's Remote Node Descriptors). If it exists, it will be 1064 referred to as the Remote-Node and the algorithm will proceed 1065 as follows: 1067 + If the Remote-Node's NLRI attribute includes an SPF Status 1068 TLV indicating the node is unreachable, the next link for 1069 the Current-Node is examined in Step 5. 1071 + All the Link NLRI corresponding the Remote-Node will be 1072 searched for a Link NLRI pointing to the Current-Node. 1073 Each Link NLRI is examined for Remote Node Descriptors 1074 matching the Current-Node and Link Descriptors matching 1075 the Current-Link (e.g., sharing a common IPv4 or IPv6 1076 subnet). If both these conditions are satisfied for one 1077 of the Remote-Node's links, the bi-directional 1078 connectivity check succeeds and the Remote-Node may be 1079 processed further. The Remote-Node's Link NLRI providing 1080 bi-directional connectivity will be referred to as the 1081 Remote-Link. If no Remote-Link is found, the next link 1082 for the Current-Node is examined in Step 5. 1084 + If the Remote-Link NLRI attribute includes an SPF Status 1085 TLV indicating the link is down, the Remote-Link NLRI is 1086 considered down and the next link for the Current-Node is 1087 examined in Step 5. 1089 + If the Remote-Node is not on the CAN-LIST, it is inserted 1090 based on the cost. The Remote Node's cost is the cost of 1091 Current-Node added the Current-Link's IGP Metric TLV 1092 (1095). The next-hop(s) for the Remote-Node are inherited 1093 from the Current-Link. 1095 + If the Remote-Node NLRI is already on the CAN-LIST with a 1096 higher cost, it must be removed and reinserted with the 1097 Remote-Node cost based on the Current-Link (as calculated 1098 in the previous step). The next-hop(s) for the Remote- 1099 Node are inherited from the Current-Link. 1101 + If the Remote-Node NLRI is already on the CAN-LIST with 1102 the same cost, it need not be reinserted on the CAN-LIST. 1103 However, the Current-Link's next-hop(s) must be merged 1104 into the current set of next-hops for the Remote-Node. 1106 + If the Remote-Node NLRI is already on the CAN-LIST with a 1107 lower cost, it need not be reinserted on the CAN-LIST. 1109 E. Return to step 3 to process the next lowest cost Node NLRI on 1110 the CAN-LIST. 1112 6. The LOC-RIB is examined and changes (adds, deletes, 1113 modifications) are installed into the GLOBAL-RIB. For each route 1114 in the LOC-RIB: 1116 * If the route was added during the current BGP SPF computation, 1117 install the route into the GLOBAL-RIB. 1119 * If the route modified during the current BGP SPF computation 1120 (e.g., metric, tags, or next-hops), update the route in the 1121 GLOBAL-RIB. 1123 * If the route was not installed during the current BGP SPF 1124 computation, remove the route from both the GLOBAL-RIB and the 1125 LOC-RIB. 1127 6.4. IPv4/IPv6 Unicast Address Family Interaction 1129 While the BGP-LS-SPF address family and the IPv4/IPv6 unicast address 1130 families MAY install routes into the same device routing tables, they 1131 will operate independently much the same as OSPF and IS-IS would 1132 operate today (i.e., "Ships-in-the-Night" mode). There is no 1133 implicit route redistribution between the BGP address families. 1135 It is RECOMMENDED that BGP-LS-SPF IPv4/IPv6 route computation and 1136 installation be given scheduling priority by default over other BGP 1137 address families as these address families are considered as underlay 1138 SAFIs. Similarly, it is RECOMMENDED that the route preference or 1139 administrative distance give active route installation preference to 1140 BGP-LS-SPF IPv4/IPv6 routes over BGP routes from other AFI/SAFIs. 1141 However, this preference MAY be overridden by an operator-configured 1142 policy. 1144 6.5. NLRI Advertisement 1146 6.5.1. Link/Prefix Failure Convergence 1148 A local failure will prevent a link from being used in the SPF 1149 calculation due to the IGP bi-directional connectivity requirement. 1150 Consequently, local link failures SHOULD always be given priority 1151 over updates (e.g., withdrawing all routes learned on a session) in 1152 order to ensure the highest priority propagation and optimal 1153 convergence. 1155 An IGP such as OSPF [RFC2328] will stop using the link as soon as the 1156 Router-LSA for one side of the link is received. With a BGP 1157 advertisement, the link would continue to be used until the last copy 1158 of the BGP-LS-SPF Link NLRI is withdrawn. In order to avoid this 1159 delay, the originator of the Link NLRI SHOULD advertise a more recent 1160 version with an increased Sequence Number TLV for the BGP-LS-SPF Link 1161 NLRI including the SPF Status TLV (Section 5.2.2.2) indicating the 1162 link is down with respect to BGP SPF. After some configurable period 1163 of time, which is an implementation dependent, e.g., 2-3 seconds, the 1164 BGP-LS-SPF Link NLRI can be withdrawn with no consequence. If the 1165 link becomes available in that period, the originator of the BGP-LS- 1166 SPF LINK NLRI will simply advertise a more recent version of the BGP- 1167 LS-SPF Link NLRI without the SPF Status TLV in the BGP-LS Link 1168 Attributes. 1170 Similarly, when a prefix becomes unreachable, a more recent version 1171 of the BGP-LS-SPF Prefix NLRI will be advertised with the SPF Status 1172 TLV (Section 5.2.3.1) indicating the prefix is unreachable in the 1173 BGP-LS Prefix Attributes and the prefix will be considered 1174 unreachable with respect to BGP SPF. After some configurable period 1175 of time, which is implementation dependent, e.g., 2-3 seconds, the 1176 BGP-LS-SPF Prefix NLRI can be withdrawn with no consequence. If the 1177 prefix becomes reachable in that period, the originator of the BGP- 1178 LS-SPF Prefix NLRI will simply advertise a more recent version of the 1179 BGP-LS-SPF Prefix NLRI without the SPF Status TLV in the BGP-LS 1180 Prefix Attributes. 1182 6.5.2. Node Failure Convergence 1184 With BGP without graceful restart [RFC4724], all the NLRI advertised 1185 by a node are implicitly withdrawn when a session failure is 1186 detected. If fast failure detection such as BFD is utilized, and the 1187 node is on the fastest converging path, the most recent versions of 1188 BGP-LS-SPF NLRI may be withdrawn. This will result into an older 1189 version of the NLRI being used until the new versions arrive and, 1190 potentially, unnecessary route flaps. Therefore, BGP-LS-SPF NLRI 1191 SHOULD always be retained before being implicitly withdrawn for a 1192 configurable implementation-dependent interval, e.g., 2-3 seconds. 1193 This will not delay convergence since the adjacent nodes will detect 1194 the link failure and advertise a more recent NLRI indicating the link 1195 is down with respect to BGP SPF (Section 6.5.1) and the BGP SPF 1196 calculation will fail the bi-directional connectivity check 1197 Section 6.3. 1199 7. Error Handling 1201 This section describes the Error Handling actions, as described in 1202 [RFC7606], that are specific to SAFI BGP-LS-SPF BGP Update message 1203 processing. 1205 7.1. Processing of BGP-LS-SPF TLVs 1207 When a BGP speaker receives a BGP Update containing a malformed Node 1208 NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST ignore 1209 the received TLV and MUST NOT pass it to other BGP peers as specified 1210 in [RFC7606]. When discarding an associated Node NLRI with a 1211 malformed TLV, a BGP speaker SHOULD log an error for further 1212 analysis. 1214 When a BGP speaker receives a BGP Update containing a malformed Link 1215 NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST ignore 1216 the received TLV and MUST NOT pass it to other BGP peers as specified 1217 in [RFC7606]. When discarding an associated Link NLRI with a 1218 malformed TLV, a BGP speaker SHOULD log an error for further 1219 analysis. 1221 When a BGP speaker receives a BGP Update containing a malformed 1222 Prefix NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST 1223 ignore the received TLV and MUST NOT pass it to other BGP peers as 1224 specified in [RFC7606]. When discarding an associated Prefix NLRI 1225 with a malformed TLV, a BGP speaker SHOULD log an error for further 1226 analysis. 1228 When a BGP speaker receives a BGP Update containing a malformed SPF 1229 Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it MUST 1230 ignore the received TLV and the Node NLRI and MUST NOT pass it to 1231 other BGP peers as specified in [RFC7606]. When discarding a Node 1232 NLRI with a malformed TLV, a BGP speaker SHOULD log an error for 1233 further analysis. 1235 When a BGP speaker receives a BGP Update containing a malformed IPv4 1236 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], it 1237 MUST ignore the received TLV and the Node NLRI and MUST NOT pass it 1238 to other BGP peers as specified in [RFC7606]. The corresponding Link 1239 NLRI is considered as malformed and MUST be handled as 'Treat-as- 1240 withdraw'. An implementation MAY log an error for further analysis. 1242 When a BGP speaker receives a BGP Update containing a malformed IPv6 1243 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], it 1244 MUST ignore the received TLV and the Node NLRI and MUST NOT pass it 1245 to other BGP peers as specified in [RFC7606]. The corresponding Link 1246 NLRI is considered as malformed and MUST be handled as 'Treat-as- 1247 withdraw'. An implementation MAY log an error for further analysis. 1249 7.2. Processing of BGP-LS-SPF NLRIs 1251 A Link-State NLRI MUST NOT be considered as malformed or invalid 1252 based on the inclusion/exclusion of TLVs or contents of the TLV 1253 fields (i.e., semantic errors), as described in Section 5.1 and 1254 Section 5.1.1. 1256 A BGP-LS-SPF Speaker MUST perform the following syntactic validation 1257 of the BGP-LS-SPF NLRI to determine if it is malformed. 1259 1. Does the sum of all TLVs found in the BGP MP_REACH_NLRI attribute 1260 correspond to the BGP MP_REACH_NLRI length? 1262 2. Does the sum of all TLVs found in the BGP MP_UNREACH_NLRI 1263 attribute correspond to the BGP MP_UNREACH_NLRI length? 1265 3. Does the sum of all TLVs found in a BGP-LS-SPF NLRI correspond to 1266 the Total NLRI Length field of all its Descriptors? 1268 4. When an NLRI TLV is recognized, is the length of the TLV and its 1269 sub-TLVs valid? 1271 5. Has the syntactic correctness of the NLRI fields been verified as 1272 per [RFC7606]? 1274 6. Has the rule regarding ordering of TLVs been followed as 1275 described in Section 5.1.1? 1277 When the error determined allows for the router to skip the malformed 1278 NLRI(s) and continue processing of the rest of the update message 1279 (e.g., when the TLV ordering rule is violated), then it MUST handle 1280 such malformed NLRIs as 'Treat-as-withdraw'. In other cases, where 1281 the error in the NLRI encoding results in the inability to process 1282 the BGP update message (e.g., length related encoding errors), then 1283 the router SHOULD handle such malformed NLRIs as 'AFI/SAFI disable' 1284 when other AFI/SAFI besides BGP-LS are being advertised over the same 1285 session. Alternately, the router MUST perform 'session reset' when 1286 the session is only being used for BGP-LS-SPF or when its 'AFI/SAFI 1287 disable' action is not possible. 1289 7.3. Processing of BGP-LS Attribute 1291 A BGP-LS Attribute MUST NOT be considered as malformed or invalid 1292 based on the inclusion/exclusion of TLVs or contents of the TLV 1293 fields (i.e., semantic errors), as described in Section 5.1 and 1294 Section 5.1.1. 1296 A BGP-LS-SPF Speaker MUST perform the following syntactic validation 1297 of the BGP-LS Attribute to determine if it is malformed. 1299 1. Does the sum of all TLVs found in the BGP-LS-SPF Attribute 1300 correspond to the BGP-LS Attribute length? 1302 2. Has the syntactic correctness of the Attributes (including BGP-LS 1303 Attribute) been verified as per [RFC7606]? 1305 3. Is the length of each TLV and, when the TLV is recognized then, 1306 its sub-TLVs in the BGP-LS Attribute valid? 1308 When the detected error allows for the router to skip the malformed 1309 BGP-LS Attribute and continue processing of the rest of the update 1310 message (e.g., when the BGP-LS Attribute length and the total Path 1311 Attribute Length are correct but some TLV/sub-TLV length within the 1312 BGP-LS Attribute is invalid), then it MUST handle such malformed BGP- 1313 LS Attribute as 'Attribute Discard'. In other cases, when the error 1314 in the BGP-LS Attribute encoding results in the inability to process 1315 the BGP update message, then the handling is the same as described 1316 above for malformed NLRI. 1318 Note that the 'Attribute Discard' action results in the loss of all 1319 TLVs in the BGP-LS Attribute and not the removal of a specific 1320 malformed TLV. The removal of specific malformed TLVs may give a 1321 wrong indication to a BGP-LS-SPF speaker that the specific 1322 information is being deleted or is not available. 1324 When a BGP-LS-SPF speaker receives an update message with Link-State 1325 NLRI(s) in the MP_REACH_NLRI but without the BGP-LS-SPF Attribute, it 1326 is most likely an indication that a BGP-LS-SPF speaker preceding it 1327 has performed the 'Attribute Discard' fault handling. An 1328 implementation SHOULD preserve and propagate the Link-State NLRIs in 1329 such an update message so that the BGP-LS-SPF speaker can detect the 1330 loss of link-state information for that object and not assume its 1331 deletion/withdrawal. This also makes it possible for a network 1332 operator to trace back to the BGP-LS-SPF speaker which actually 1333 detected a problem with the BGP-LS Attribute. 1335 An implementation SHOULD log an error for further analysis for 1336 problems detected during syntax validation. 1338 When a BGP speaker receives a BGP Update containing a malformed IGP 1339 metric TLV in the Link NLRI BGP-LS Attribute [RFC7752], it MUST 1340 ignore the received TLV and the Link NLRI and MUST NOT pass it to 1341 other BGP peers as specified in [RFC7606]. When discarding a Link 1342 NLRI with a malformed TLV, a BGP speaker SHOULD log an error for 1343 further analysis. 1345 8. IANA Considerations 1347 This document defines the use of SAFI (80) for BGP SPF operation 1348 Section 5.1, and requests IANA to assign the value from the First 1349 Come First Serve (FCFS) range in the Subsequent Address Family 1350 Identifiers (SAFI) Parameters registry. 1352 This document also defines five attribute TLVs of BGP-LS-SPF NLRI. 1353 We request IANA to assign types for the SPF capability TLV, Sequence 1354 Number TLV, IPv4 Link Prefix-Length TLV, IPv6 Link Prefix-Length TLV, 1355 and SPF Status TLV from the "BGP-LS Node Descriptor, Link Descriptor, 1356 Prefix Descriptor, and Attribute TLVs" Registry. 1358 +-------------------------+-----------------+--------------------+ 1359 | Attribute TLV | Suggested Value | NLRI Applicability | 1360 +-------------------------+-----------------+--------------------+ 1361 | SPF Capability | 1180 | Node | 1362 | SPF Status | 1184 | Node, Link, Prefix | 1363 | IPv4 Link Prefix Length | 1182 | Link | 1364 | IPv6 Link Prefix Length | 1183 | Link | 1365 | Sequence Number | 1181 | Node, Link, Prefix | 1366 +-------------------------+-----------------+--------------------+ 1368 Table 1: NLRI Attribute TLVs 1370 9. Security Considerations 1372 This document defines a BGP SAFI, i.e., the BGP-LS-SPF SAFI. This 1373 document does not change the underlying security issues inherent in 1374 the BGP protocol [RFC4271]. The Security Considerations discussed in 1375 [RFC4271] apply to the BGP SPF functionality as well. The analysis 1376 of the security issues for BGP mentioned in [RFC4272] and [RFC6952] 1377 also applies to this document. The analysis of Generic Threats to 1378 Routing Protocols done in [RFC4593] is also worth noting. As the 1379 modifications described in this document for BGP SPF apply to IPv4 1380 Unicast and IPv6 Unicast as undelay SAFIs in a single BGP SPF Routing 1381 Domain, the BGP security solutions described in [RFC6811] and 1382 [RFC8205] are somewhat constricted as they are meant to apply for 1383 inter-domain BGP where multiple BGP Routing Domains are typically 1384 involved. The BGP-LS-SPF SAFI NLRI described in this document are 1385 typically advertised between EBGP or IBGP speakers under a single 1386 administrative domain. 1388 In the context of the BGP peering associated with this document, a 1389 BGP speaker MUST NOT accept updates from a peer that is not within 1390 any administrative control of an operator. That is, a participating 1391 BGP speaker SHOULD be aware of the nature of its peering 1392 relationships. Such protection can be achieved by manual 1393 configuration of peers at the BGP speaker. 1395 In order to mitigate the risk of peering with BGP speakers 1396 masquerading as legitimate authorized BGP speakers, it is recommended 1397 that the TCP Authentication Option (TCP-AO) [RFC5925] be used to 1398 authenticate BGP sessions. If an authorized BGP peer is compromised, 1399 that BGP peer could advertise modified Node, Link, or Prefix NLRI 1400 will result in misrouting, repeating origination of NLRI, and/or 1401 excessive SPF calculations. When a BGP speaker detects that its 1402 self-originated NLRI is being originated by another BGP speaker, an 1403 appropriate error should be logged so that the operator can take 1404 corrective action. 1406 10. Management Considerations 1408 This section includes unique management considerations for the BGP- 1409 LS-SPF address family. 1411 10.1. Configuration 1413 All routers in BGP SPF Routing Domain are under a single 1414 administrative domain allowing for consistent configuration. 1416 10.1.1. Link Metric Configuration 1418 Within a BGP SPF Routing Domain, the IGP metrics for all advertised 1419 links SHOULD be configured or defaulted consistently. For example, 1420 if a default metric is used for one router's links, then a similar 1421 metric should be used for all router's links. Similarly, if the link 1422 cost is derived from using the inverse of the link bandwidth on one 1423 router, then this SHOULD be done for all routers and the same 1424 reference bandwidth should be used to derive the inversely 1425 proportional metric. Failure to do so will not result in correct 1426 routing based on link metric. 1428 10.1.2. backoff-config 1430 In addition to configuration of the BGP-LS-SPF address family, 1431 implementations SHOULD support the "Shortest Path First (SPF) Back- 1432 Off Delay Algorithm for Link-State IGPs" [RFC8405]. If supported, 1433 configuration of the INITIAL_SPF_DELAY, SHORT_SPF_DELAY, 1434 LONG_SPF_DELAY, TIME_TO_LEARN, and HOLDDOWN_INTERVAL MUST be 1435 supported [RFC8405]. Section 6 of [RFC8405] recommends consistent 1436 configuration of these values throughout the IGP routing domain and 1437 this also applies to the BGP SPF Routing Domain. 1439 10.2. Operational Data 1441 In order to troubleshoot SPF issues, implementations SHOULD support 1442 an SPF log including entries for previous SPF computations. Each SPF 1443 log entry would include the BGP-LS-SPF NLRI SPF triggering the SPF, 1444 SPF scheduled time, SPF start time, SPF end time, and SPF type if 1445 different types of SPF are supported. Since the size of the log will 1446 be finite, implementations SHOULD also maintain counters for the 1447 total number of SPF computations and the total number of SPF 1448 triggering events. Additionally, to troubleshoot SPF scheduling and 1449 back-off [RFC8405], the current SPF back-off state, remaining time- 1450 to-learn, remaining holddown, last trigger event time, last SPF time, 1451 and next SPF time should be available. 1453 11. Implementation Status 1455 Note RFC Editor: Please remove this section and the associated 1456 references prior to publication. 1458 This section records the status of known implementations of the 1459 protocol defined by this specification at the time of posting of this 1460 Internet-Draft and is based on a proposal described in [RFC7942]. 1461 The description of implementations in this section is intended to 1462 assist the IETF in its decision processes in progressing drafts to 1463 RFCs. Please note that the listing of any individual implementation 1464 here does not imply endorsement by the IETF. Furthermore, no effort 1465 has been spent to verify the information presented here that was 1466 supplied by IETF contributors. This is not intended as, and must not 1467 be construed to be, a catalog of available implementations or their 1468 features. Readers are advised to note that other implementations may 1469 exist. 1471 According to RFC 7942, "this will allow reviewers and working groups 1472 to assign due consideration to documents that have the benefit of 1473 running code, which may serve as evidence of valuable experimentation 1474 and feedback that have made the implemented protocols more mature. 1475 It is up to the individual working groups to use this information as 1476 they see fit". 1478 The BGP-LS-SPF implementation status is documented in 1479 [I-D.psarkar-lsvr-bgp-spf-impl]. 1481 12. Acknowledgements 1483 The authors would like to thank Sue Hares, Jorge Rabadan, Boris 1484 Hassanov, Dan Frost, Matt Anderson, Fred Baker, and Lukas Krattiger 1485 for their review and comments. Thanks to Pushpasis Sarkar for 1486 discussions on preventing a BGP SPF Router from being used for non- 1487 local traffic (i.e., transit traffic). 1489 The authors extend special thanks to Eric Rosen for fruitful 1490 discussions on BGP-LS-SPF convergence as compared to IGPs. 1492 13. Contributors 1494 In addition to the authors listed on the front page, the following 1495 co-authors have contributed to the document. 1497 Derek Yeung 1498 Arrcus, Inc. 1499 derek@arrcus.com 1501 Gunter Van De Velde 1502 Nokia 1503 gunter.van_de_velde@nokia.com 1505 Abhay Roy 1506 Arrcus, Inc. 1507 abhay@arrcus.com 1509 Venu Venugopal 1510 Cisco Systems 1511 venuv@cisco.com 1513 Chaitanya Yadlapalli 1514 AT&T 1515 cy098d@att.com 1517 14. References 1519 14.1. Normative References 1521 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1522 Requirement Levels", BCP 14, RFC 2119, 1523 DOI 10.17487/RFC2119, March 1997, 1524 . 1526 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1527 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1528 DOI 10.17487/RFC4271, January 2006, 1529 . 1531 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 1532 RFC 4272, DOI 10.17487/RFC4272, January 2006, 1533 . 1535 [RFC4593] Barbir, A., Murphy, S., and Y. Yang, "Generic Threats to 1536 Routing Protocols", RFC 4593, DOI 10.17487/RFC4593, 1537 October 2006, . 1539 [RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., 1540 Coltun, R., and F. Baker, "OSPF Version 2 Management 1541 Information Base", RFC 4750, DOI 10.17487/RFC4750, 1542 December 2006, . 1544 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1545 "Multiprotocol Extensions for BGP-4", RFC 4760, 1546 DOI 10.17487/RFC4760, January 2007, 1547 . 1549 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 1550 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 1551 2009, . 1553 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1554 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1555 June 2010, . 1557 [RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet 1558 Autonomous System (AS) Number Space", RFC 6793, 1559 DOI 10.17487/RFC6793, December 2012, 1560 . 1562 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1563 Austein, "BGP Prefix Origin Validation", RFC 6811, 1564 DOI 10.17487/RFC6811, January 2013, 1565 . 1567 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1568 Patel, "Revised Error Handling for BGP UPDATE Messages", 1569 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1570 . 1572 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1573 S. Ray, "North-Bound Distribution of Link-State and 1574 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1575 DOI 10.17487/RFC7752, March 2016, 1576 . 1578 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1579 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1580 May 2017, . 1582 [RFC8205] Lepinski, M., Ed. and K. Sriram, Ed., "BGPsec Protocol 1583 Specification", RFC 8205, DOI 10.17487/RFC8205, September 1584 2017, . 1586 [RFC8405] Decraene, B., Litkowski, S., Gredler, H., Lindem, A., 1587 Francois, P., and C. Bowers, "Shortest Path First (SPF) 1588 Back-Off Delay Algorithm for Link-State IGPs", RFC 8405, 1589 DOI 10.17487/RFC8405, June 2018, 1590 . 1592 [RFC8654] Bush, R., Patel, K., and D. Ward, "Extended Message 1593 Support for BGP", RFC 8654, DOI 10.17487/RFC8654, October 1594 2019, . 1596 [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, 1597 H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 1598 Extensions for Segment Routing", RFC 8665, 1599 DOI 10.17487/RFC8665, December 2019, 1600 . 1602 14.2. Informational References 1604 [I-D.ietf-lsvr-applicability] 1605 Patel, K., Lindem, A., Zandi, S., and G. Dawra, "Usage and 1606 Applicability of Link State Vector Routing in Data 1607 Centers", draft-ietf-lsvr-applicability-05 (work in 1608 progress), March 2020. 1610 [I-D.psarkar-lsvr-bgp-spf-impl] 1611 Sarkar, P., Patel, K., Pallagatti, S., and s. 1612 sajibasil@gmail.com, "BGP Shortest Path Routing Extension 1613 Implementation Report", draft-psarkar-lsvr-bgp-spf-impl-00 1614 (work in progress), June 2020. 1616 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 1617 DOI 10.17487/RFC2328, April 1998, 1618 . 1620 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 1621 Reflection: An Alternative to Full Mesh Internal BGP 1622 (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, 1623 . 1625 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 1626 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 1627 DOI 10.17487/RFC4724, January 2007, 1628 . 1630 [RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. 1631 Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", 1632 RFC 4915, DOI 10.17487/RFC4915, June 2007, 1633 . 1635 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 1636 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 1637 DOI 10.17487/RFC5286, September 2008, 1638 . 1640 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1641 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1642 . 1644 [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of 1645 BGP, LDP, PCEP, and MSDP Issues According to the Keying 1646 and Authentication for Routing Protocols (KARP) Design 1647 Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013, 1648 . 1650 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 1651 "Advertisement of Multiple Paths in BGP", RFC 7911, 1652 DOI 10.17487/RFC7911, July 2016, 1653 . 1655 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 1656 BGP for Routing in Large-Scale Data Centers", RFC 7938, 1657 DOI 10.17487/RFC7938, August 2016, 1658 . 1660 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 1661 Code: The Implementation Status Section", BCP 205, 1662 RFC 7942, DOI 10.17487/RFC7942, July 2016, 1663 . 1665 Authors' Addresses 1667 Keyur Patel 1668 Arrcus, Inc. 1670 Email: keyur@arrcus.com 1672 Acee Lindem 1673 Cisco Systems 1674 301 Midenhall Way 1675 Cary, NC 27513 1676 USA 1678 Email: acee@cisco.com 1679 Shawn Zandi 1680 LinkedIn 1681 222 2nd Street 1682 San Francisco, CA 94105 1683 USA 1685 Email: szandi@linkedin.com 1687 Wim Henderickx 1688 Nokia 1689 Antwerp 1690 Belgium 1692 Email: wim.henderickx@nokia.com