idnits 2.17.1 draft-ietf-lsvr-bgp-spf-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The protocol identifier specified in the Protocol-ID field [RFC7752] will represent the origin of the advertised NLRI. For Node NLRI and Link NLRI, this MUST be the direct protocol (4). Node or Link NLRI with a Protocol-ID other than direct will be considered malformed. For Prefix NLRI, the specified Protocol-ID MUST be the origin of the prefix. The local and remote node descriptors for all NLRI MUST include the BGP Identifier (TLV 516) and the AS Number (TLV 512) [RFC7752]. The BGP Confederation Member (TLV 517) [RFC7752] is not appliable and SHOULD not be included. If TLV 517 is included, it will be ignored. -- The document date (15 February 2022) is 795 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4272 ** Downref: Normative reference to an Informational RFC: RFC 4593 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) == Outdated reference: A later version (-11) exists of draft-ietf-lsvr-applicability-05 == Outdated reference: A later version (-01) exists of draft-psarkar-lsvr-bgp-spf-impl-00 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Patel 3 Internet-Draft Arrcus, Inc. 4 Intended status: Standards Track A. Lindem 5 Expires: 19 August 2022 Cisco Systems 6 S. Zandi 7 LinkedIn 8 W. Henderickx 9 Nokia 10 15 February 2022 12 BGP Link-State Shortest Path First (SPF) Routing 13 draft-ietf-lsvr-bgp-spf-16 15 Abstract 17 Many Massively Scaled Data Centers (MSDCs) have converged on 18 simplified layer 3 routing. Furthermore, requirements for 19 operational simplicity have led many of these MSDCs to converge on 20 BGP as their single routing protocol for both their fabric routing 21 and their Data Center Interconnect (DCI) routing. This document 22 describes extensions to BGP to use BGP Link-State distribution and 23 the Shortest Path First (SPF) algorithm used by Internal Gateway 24 Protocols (IGPs) such as OSPF. In doing this, it allows BGP to be 25 efficiently used as both the underlay protocol and the overlay 26 protocol in MSDCs. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 19 August 2022. 45 Copyright Notice 47 Copyright (c) 2022 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Revised BSD License text as 56 described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Revised BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.2. BGP Shortest Path First (SPF) Motivation . . . . . . . . 4 64 1.3. Document Overview . . . . . . . . . . . . . . . . . . . . 6 65 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 6 66 2. Base BGP Protocol Relationship . . . . . . . . . . . . . . . 6 67 3. BGP Link-State (BGP-LS) Relationship . . . . . . . . . . . . 7 68 4. BGP Peering Models . . . . . . . . . . . . . . . . . . . . . 8 69 4.1. BGP Single-Hop Peering on Network Node Connections . . . 8 70 4.2. BGP Peering Between Directly-Connected Nodes . . . . . . 8 71 4.3. BGP Peering in Route-Reflector or Controller Topology . . 9 72 5. BGP Shortest Path Routing (SPF) Protocol Extensions . . . . . 9 73 5.1. BGP-LS Shortest Path Routing (SPF) SAFI . . . . . . . . . 9 74 5.1.1. BGP-LS-SPF NLRI TLVs . . . . . . . . . . . . . . . . 9 75 5.1.2. BGP-LS Attribute . . . . . . . . . . . . . . . . . . 10 76 5.2. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . 11 77 5.2.1. Node NLRI Usage . . . . . . . . . . . . . . . . . . . 11 78 5.2.1.1. BGP-LS-SPF Node NLRI Attribute SPF Capability 79 TLV . . . . . . . . . . . . . . . . . . . . . . . . 11 80 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV . . 12 81 5.2.2. Link NLRI Usage . . . . . . . . . . . . . . . . . . . 13 82 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length 83 TLVs . . . . . . . . . . . . . . . . . . . . . . . 14 84 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV . . 15 85 5.2.3. IPv4/IPv6 Prefix NLRI Usage . . . . . . . . . . . . . 16 86 5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status 87 TLV . . . . . . . . . . . . . . . . . . . . . . . . 16 88 5.2.4. BGP-LS Attribute Sequence-Number TLV . . . . . . . . 17 89 5.3. NEXT_HOP Manipulation . . . . . . . . . . . . . . . . . . 18 90 6. Decision Process with SPF Algorithm . . . . . . . . . . . . . 18 91 6.1. BGP NLRI Selection . . . . . . . . . . . . . . . . . . . 19 92 6.1.1. BGP Self-Originated NLRI . . . . . . . . . . . . . . 20 93 6.2. Dual Stack Support . . . . . . . . . . . . . . . . . . . 21 94 6.3. SPF Calculation based on BGP-LS-SPF NLRI . . . . . . . . 21 95 6.4. IPv4/IPv6 Unicast Address Family Interaction . . . . . . 26 96 6.5. NLRI Advertisement . . . . . . . . . . . . . . . . . . . 26 97 6.5.1. Link/Prefix Failure Convergence . . . . . . . . . . . 26 98 6.5.2. Node Failure Convergence . . . . . . . . . . . . . . 27 99 7. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 27 100 7.1. Processing of BGP-LS-SPF TLVs . . . . . . . . . . . . . . 27 101 7.2. Processing of BGP-LS-SPF NLRIs . . . . . . . . . . . . . 28 102 7.3. Processing of BGP-LS Attribute . . . . . . . . . . . . . 29 103 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 104 9. Security Considerations . . . . . . . . . . . . . . . . . . . 31 105 10. Management Considerations . . . . . . . . . . . . . . . . . . 32 106 10.1. Configuration . . . . . . . . . . . . . . . . . . . . . 32 107 10.1.1. Link Metric Configuration . . . . . . . . . . . . . 32 108 10.1.2. backoff-config . . . . . . . . . . . . . . . . . . . 32 109 10.2. Operational Data . . . . . . . . . . . . . . . . . . . . 33 110 11. Implementation Status . . . . . . . . . . . . . . . . . . . . 33 111 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 34 112 13. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 34 113 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 114 14.1. Normative References . . . . . . . . . . . . . . . . . . 34 115 14.2. Informational References . . . . . . . . . . . . . . . . 36 116 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 118 1. Introduction 120 Many Massively Scaled Data Centers (MSDCs) have converged on 121 simplified layer 3 routing. Furthermore, requirements for 122 operational simplicity have led many of these MSDCs to converge on 123 BGP [RFC4271] as their single routing protocol for both their fabric 124 routing and their Data Center Interconnect (DCI) routing [RFC7938]. 125 This document describes an alternative solution which leverages BGP- 126 LS [RFC7752] and the Shortest Path First algorithm used by Internal 127 Gateway Protocols (IGPs) such as OSPF [RFC2328]. 129 This document leverages both the BGP protocol [RFC4271] and the BGP- 130 LS [RFC7752] protocols. The relationship, as well as the scope of 131 changes are described respectively in Section 2 and Section 3. The 132 modifications to [RFC4271] for BGP SPF described herein only apply to 133 IPv4 and IPv6 as underlay unicast Subsequent Address Families 134 Identifiers (SAFIs). Operations for any other BGP SAFIs are outside 135 the scope of this document. 137 This solution avails the benefits of both BGP and SPF-based IGPs. 138 These include TCP based flow-control, no periodic link-state refresh, 139 and completely incremental NLRI advertisement. These advantages can 140 reduce the overhead in MSDCs where there is a high degree of Equal 141 Cost Multi-Path (ECMPs) and the topology is very stable. 142 Additionally, using an SPF-based computation can support fast 143 convergence and the computation of Loop-Free Alternatives (LFAs). 144 The SPF LFA extensions defined in [RFC5286] can be similarly applied 145 to BGP SPF calculations. However, the details are a matter of 146 implementation detail. Furthermore, a BGP-based solution lends 147 itself to multiple peering models including those incorporating 148 route-reflectors [RFC4456] or controllers. 150 1.1. Terminology 152 This specification reuses terms defined in section 1.1 of [RFC4271] 153 including BGP speaker, NLRI, and Route. 155 Additionally, this document introduces the following terms: 157 BGP SPF Routing Domain: A set of BGP routers that are under a single 158 administrative domain and exchange link-state information using 159 the BGP-LS-SPF SAFI and compute routes using BGP SPF as described 160 herein. 162 BGP-LS-SPF NLRI: This refers to BGP-LS Network Layer Reachability 163 Information (NLRI) that is being advertised in the BGP-LS-SPF SAFI 164 (Section 5.1) and is being used for BGP SPF route computation. 166 Dijkstra Algorithm: An algorithm for computing the shortest path 167 from a given node in a graph to every other node in the graph. At 168 each iteration of the algorithm, there is a list of candidate 169 vertices. Paths from the root to these vertices have been found, 170 but not necessarily the shortest ones. However, the paths to the 171 candidate vertex that is closest to the root are guaranteed to be 172 shortest; this vertex is added to the shortest-path tree, removed 173 from the candidate list, and its adjacent vertices are examined 174 for possible addition to/modification of the candidate list. The 175 algorithm then iterates again. It terminates when the candidate 176 list becomes empty. [RFC2328] 178 1.2. BGP Shortest Path First (SPF) Motivation 180 Given that [RFC7938] already describes how BGP could be used as the 181 sole routing protocol in an MSDC, one might question the motivation 182 for defining an alternate BGP deployment model when a mature solution 183 exists. For both alternatives, BGP offers the operational benefits 184 of a single routing protocol as opposed to the combination of an IGP 185 for the underlay and BGP as an overlay. However, BGP SPF offers some 186 unique advantages above and beyond standard BGP distance-vector 187 routing. With BGP SPF, the standard hop-by-hop peering model is 188 relaxed. 190 A primary advantage is that all BGP SPF speakers in the BGP SPF 191 routing domain will have a complete view of the topology. This will 192 allow support for ECMP, IP fast-reroute (e.g., Loop-Free 193 Alternatives), Shared Risk Link Groups (SRLGs), and other routing 194 enhancements without advertisement of additional BGP paths [RFC7911] 195 or other extensions. In short, the advantages of an IGP such as OSPF 196 [RFC2328] are availed in BGP. 198 With the simplified BGP decision process as defined in Section 6, 199 NLRI changes can be disseminated throughout the BGP routing domain 200 much more rapidly (equivalent to IGPs with the proper 201 implementation). The added advantage of BGP using TCP for reliable 202 transport leverages TCP's inherent flow-control and guaranteed in- 203 order delivery. 205 Another primary advantage is a potential reduction in NLRI 206 advertisement. With standard BGP distance-vector routing, a single 207 link failure may impact 100s or 1000s prefixes and result in the 208 withdrawal or re-advertisement of the attendant NLRI. With BGP SPF, 209 only the BGP SPF speakers corresponding to the link NLRI need to 210 withdraw the corresponding BGP-LS-SPF Link NLRI. Additionally, the 211 changed NLRI will be advertised immediately as opposed to normal BGP 212 where it is only advertised after the best route selection. These 213 advantages will afford NLRI dissemination throughout the BGP SPF 214 routing domain with efficiencies similar to link-state protocols. 216 With controller and route-reflector peering models, BGP SPF 217 advertisement and distributed computation require a minimal number of 218 sessions and copies of the NLRI since only the latest version of the 219 NLRI from the originator is required. Given that verification of the 220 adjacencies is done outside of BGP (see Section 4), each BGP SPF 221 speaker will only need as many sessions and copies of the NLRI as 222 required for redundancy (see Section 4). Additionally, a controller 223 could inject topology that is learned outside the BGP SPF routing 224 domain. 226 Given that controllers are already consuming BGP-LS NLRI [RFC7752], 227 this functionality can be reused for BGP-LS-SPF NLRI. 229 Another advantage of BGP SPF is that both IPv6 and IPv4 can be 230 supported using the BGP-LS-SPF SAFI with the same BGP-LS-SPF NLRIs. 231 In many MSDC fabrics, the IPv4 and IPv6 topologies are congruent, 232 refer to Section 5.2.2 and Section 5.2.3. Although beyond the scope 233 of this document, multi-topology extensions could be used to support 234 separate IPv4, IPv6, unicast, and multicast topologies while sharing 235 the same NLRI. 237 Finally, the BGP SPF topology can be used as an underlay for other 238 BGP SAFIs (using the existing model) and realize all the above 239 advantages. 241 1.3. Document Overview 243 The document begins with sections defining the precise relationship 244 that BGP SPF has with both the base BGP protocol [RFC4271] 245 (Section 2) and the BGP Link-State (BGP-LS) extensions [RFC7752] 246 (Section 3). This is required to dispel the notion that BGP SPF is 247 an independent protocol. The BGP peering models, as well as the 248 their respective trade-offs are then discussed in Section 4. The 249 remaining sections, which make up the bulk of the document, define 250 the protocol enhancements necessary to support BGP SPF. The BGP-LS 251 extensions to support BGP SPF are defined in Section 5. The 252 replacement of the base BGP decision process with the SPF computation 253 is specified in Section 6. Finally, BGP SPF error handling is 254 defined in Section 7 256 1.4. Requirements Language 258 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 259 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 260 "OPTIONAL" in this document are to be interpreted as described in BCP 261 14 [RFC2119] [RFC8174] when, and only when, they appear in all 262 capitals, as shown here. 264 2. Base BGP Protocol Relationship 266 With the exception of the decision process, the BGP SPF extensions 267 leverage the BGP protocol [RFC4271] without change. This includes 268 the BGP protocol Finite State Machine, BGP messages and their 269 encodings, processing of BGP messages, BGP attributes and path 270 attributes, BGP NLRI encodings, and any error handling defined in the 271 [RFC4271] and [RFC7606]. 273 Due to the changes to the decision process, there are mechanisms and 274 encodings that are no longer applicable. While not necessarily 275 required for computation, the ORIGIN, AS_PATH, MULTI_EXIT_DISC, 276 LOCAL_PREF, and NEXT_HOP path attributes are mandatory and will be 277 validated. The ATOMIC_AGGEGATE, and AGGREGATOR are not applicable 278 within the context of BGP SPF and SHOULD NOT be advertised. However, 279 if they are advertised, they will be accepted, validated, and 280 propagated consistent with the BGP protocol. 282 Section 9 of [RFC4271] defines the decision process that is used to 283 select routes for subsequent advertisement by applying the policies 284 in the local Policy Information Base (PIB) to the routes stored in 285 its Adj-RIBs-In. The output of the Decision Process is the set of 286 routes that are announced by a BGP speaker to its peers. These 287 selected routes are stored by a BGP speaker in the speaker's Adj- 288 RIBs-Out according to policy. 290 The BGP SPF extension fundamentally changes the decision process, as 291 described herein, to be more like a link-state protocol (e.g., OSPF 292 [RFC2328]). Specifically: 294 1. BGP advertisements are readvertised to neighbors immediately 295 without waiting or dependence on the route computation as 296 specified in phase 3 of the base BGP decision process. Multiple 297 peering models are supported as specified in Section 4. 299 2. Determining the degree of preference for BGP routes for the SPF 300 calculation as described in phase 1 of the base BGP decision 301 process is replaced with the mechanisms in Section 6.1. 303 3. Phase 2 of the base BGP protocol decision process is replaced 304 with the Shortest Path First (SPF) algorithm, also known as the 305 Dijkstra algorithm Section 1.1. 307 3. BGP Link-State (BGP-LS) Relationship 309 [RFC7752] describes a mechanism by which link-state and TE 310 information can be collected from networks and shared with external 311 entities using BGP. This is achieved by defining NLRI advertised 312 using the BGP-LS AFI. The BGP-LS extensions defined in [RFC7752] 313 make use of the decision process defined in [RFC4271]. This document 314 reuses NLRI and TLVs defined in [RFC7752]. Rather than reusing the 315 BGP-LS SAFI, the BGP-LS-SPF SAFI Section 5.1 is introduced to insure 316 backward compatibility for the BGP-LS SAFI usage. 318 The BGP SPF extensions reuse the Node, Link, and Prefix NLRI defined 319 in [RFC7752]. The usage of the BGP-LS NLRI, attributes, and 320 attribute extensions is described in Section 5.2. The usage of 321 others BGP-LS attributes is not precluded and is, in fact, expected. 322 However, the details are beyond the scope of this document and will 323 be specified in future documents. 325 Support for Multiple Topology Routing (MTR) similar to the OSPF MTR 326 computation described in [RFC4915] is beyond the scope of this 327 document. Consequently, the usage of the Multi-Topology TLV as 328 described in section 3.2.1.5 of [RFC7752] is not specified. 330 The rules for setting the NLRI next-hop path attribute for the BGP- 331 LS-SPF SAFI will follow the BGP-LS SAFI as specified in section 3.4 332 of [RFC7752]. 334 4. BGP Peering Models 336 Depending on the topology, scaling, capabilities of the BGP SPF 337 speakers, and redundancy requirements, various peering models are 338 supported. The only requirements are that all BGP SPF speakers in 339 the BGP SPF routing domain exchange BGP-LS-SPF NLRI, run an SPF 340 calculation, and update their routing table appropriately. 342 4.1. BGP Single-Hop Peering on Network Node Connections 344 The simplest peering model is the one where EBGP single-hop sessions 345 are established over direct point-to-point links interconnecting the 346 nodes in the BGP SPF routing domain. Once the single-hop BGP session 347 has been established and the BGP-LS-SPF AFI/SAFI capability has been 348 exchanged [RFC4760] for the corresponding session, then the link is 349 considered up from a BGP SPF perspective and the corresponding BGP- 350 LS-SPF Link NLRI is advertised. If the session goes down, the 351 corresponding Link NLRI will be withdrawn. Topologically, this would 352 be equivalent to the peering model in [RFC7938] where there is a BGP 353 session on every link in the data center switch fabric. The content 354 of the Link NLRI is described in Section 5.2.2. 356 4.2. BGP Peering Between Directly-Connected Nodes 358 In this model, BGP SPF speakers peer with all directly-connected 359 nodes but the sessions may be between loopback addresses (i.e., two- 360 hop sessions) and the direct connection discovery and liveliness 361 detection for the interconnecting links are independent of the BGP 362 protocol. For example, liveliness detection could be done using the 363 BFD protocol [RFC5880]. Precisely how discovery and liveliness 364 detection is accomplished is outside the scope of this document. 365 Consequently, there will be a single BGP session even if there are 366 multiple direct connections between BGP SPF speakers. BGP-LS-SPF 367 Link NLRI is advertised as long as a BGP session has been 368 established, the BGP-LS-SPF AFI/SAFI capability has been exchanged 369 [RFC4760], and the link is operational as determined using liveliness 370 detection mechanisms outside the scope of this document. This is 371 much like the previous peering model only peering is between loopback 372 addresses and the interconnecting links can be unnumbered. However, 373 since there are BGP sessions between every directly-connected node in 374 the BGP SPF routing domain, there is only a reduction in BGP sessions 375 when there are parallel links between nodes. 377 4.3. BGP Peering in Route-Reflector or Controller Topology 379 In this model, BGP SPF speakers peer solely with one or more Route 380 Reflectors [RFC4456] or controllers. As in the previous model, 381 direct connection discovery and liveliness detection for those links 382 in the BGP SPF routing domain are done outside of the BGP protocol. 383 BGP-LS-SPF Link NLRI is advertised as long as the corresponding link 384 is considered up as per the chosen liveness detection mechanism. 386 This peering model, known as sparse peering, allows for fewer BGP 387 sessions and, consequently, fewer instances of the same NLRI received 388 from multiple peers. Normally, the route-reflectors or controller 389 BGP sessions would be on directly-connected links to avoid dependence 390 on another routing protocol for session connectivity. However, 391 multi-hop peering is not precluded. The number of BGP sessions is 392 dependent on the redundancy requirements and the stability of the BGP 393 sessions. This is discussed in greater detail in 394 [I-D.ietf-lsvr-applicability]. 396 5. BGP Shortest Path Routing (SPF) Protocol Extensions 398 5.1. BGP-LS Shortest Path Routing (SPF) SAFI 400 In order to replace the existing BGP decision process with an SPF- 401 based decision process in a backward compatible manner by not 402 impacting the BGP-LS SAFI, this document introduces the BGP-LS-SPF 403 SAFI. The BGP-LS-SPF (AFI 16388 / SAFI 80) [RFC4760] is allocated by 404 IANA as specified in the Section 8. In order for two BGP SPF 405 speakers to exchange BGP SPF NLRI, they MUST exchange the 406 Multiprotocol Extensions Capability [RFC5492] [RFC4760] to ensure 407 that they are both capable of properly processing such NLRI. This is 408 done with AFI 16388 / SAFI 80 for BGP-LS-SPF advertised within the 409 BGP SPF Routing Domain. The BGP-LS-SPF SAFI is used to carry IPv4 410 and IPv6 prefix information in a format facilitating an SPF-based 411 decision process. 413 5.1.1. BGP-LS-SPF NLRI TLVs 415 The NLRI format of BGP-LS-SPF SAFI uses exactly same format as the 416 BGP-LS AFI [RFC7752]. In other words, all the TLVs used in BGP-LS 417 AFI are applicable and used for the BGP-LS-SPF SAFI. These TLVs 418 within BGP-LS-SPF NLRI advertise information that describes links, 419 nodes, and prefixes comprising IGP link-state information. 421 In order to compare the NLRI efficiently, it is REQUIRED that all the 422 TLVs within the given NLRI must be ordered in ascending order by the 423 TLV type. For multiple TLVs of same type within a single NLRI, it is 424 REQUIRED that these TLVs are ordered in ascending order by the TLV 425 value field. Comparison of the value fields is performed by treating 426 the entire value field as a hexadecimal string. NLRIs having TLVs 427 which do not follow the ordering rules MUST be considered as 428 malformed and discarded with appropriate error logging. 430 [RFC7752] defines certain NLRI TLVs as a mandatory TLVs. These TLVs 431 are considered mandatory for the BGP-LS-SPF SAFI as well. All the 432 other TLVs are considered as an optional TLVs. 434 Given that there is a single BGP-LS Attribute for all the BGP-LS-SPF 435 NLRI in a BGP Update, Section 3.3, [RFC7752], a BGP Update will 436 normally contain a single BGP-LS-SPF NLRI since advertising multiple 437 NLRI would imply identical attributes. 439 5.1.2. BGP-LS Attribute 441 The BGP-LS attribute of the BGP-LS-SPF SAFI uses exactly same format 442 of the BGP-LS AFI [RFC7752]. In other words, all the TLVs used in 443 BGP-LS attribute of the BGP-LS AFI are applicable and used for the 444 BGP-LS attribute of the BGP-LS-SPF SAFI. This attribute is an 445 optional, non-transitive BGP attribute that is used to carry link, 446 node, and prefix properties and attributes. The BGP-LS attribute is 447 a set of TLVs. 449 The BGP-LS attribute may potentially grow large in size depending on 450 the amount of link-state information associated with a single Link- 451 State NLRI. The BGP specification [RFC4271] mandates a maximum BGP 452 message size of 4096 octets. It is RECOMMENDED that an 453 implementation support [RFC8654] in order to accommodate larger size 454 of information within the BGP-LS Attribute. BGP SPF speakers MUST 455 ensure that they limit the TLVs included in the BGP-LS Attribute to 456 ensure that a BGP update message for a single Link-State NLRI does 457 not cross the maximum limit for a BGP message. The determination of 458 the types of TLVs to be included by the BGP SPF speaker originating 459 the attribute is outside the scope of this document. When a BGP SPF 460 speaker finds that it is exceeding the maximum BGP message size due 461 to addition or update of some other BGP Attribute (e.g., AS_PATH), it 462 MUST consider the BGP-LS Attribute to be malformed and the attribute 463 discard handling of [RFC7606] applies. 465 In order to compare the BGP-LS attribute efficiently, it is REQUIRED 466 that all the TLVs within the given attribute must be ordered in 467 ascending order by the TLV type. For multiple TLVs of same type 468 within a single attribute, it is REQUIRED that these TLVs are ordered 469 in ascending order by the TLV value field. Comparison of the value 470 fields is performed by treating the entire value field as a 471 hexadecimal string. Attributes having TLVs which do not follow the 472 ordering rules MUST NOT be considered as malformed. 474 All TLVs within the BGP-LS Attribute are considered optional unless 475 specified otherwise. 477 5.2. Extensions to BGP-LS 479 [RFC7752] describes a mechanism by which link-state and TE 480 information can be collected from IGPs and shared with external 481 components using the BGP protocol. It describes both the definition 482 of the BGP-LS NLRI that advertise links, nodes, and prefixes 483 comprising IGP link-state information and the definition of a BGP 484 path attribute (BGP-LS attribute) that carries link, node, and prefix 485 properties and attributes, such as the link and prefix metric or 486 auxiliary Router-IDs of nodes, etc. This document extends the usage 487 of BGP-LS NLRI for the purpose of BGP SPF calculation via 488 advertisement in the BGP-LS-SPF SAFI. 490 The protocol identifier specified in the Protocol-ID field [RFC7752] 491 will represent the origin of the advertised NLRI. For Node NLRI and 492 Link NLRI, this MUST be the direct protocol (4). Node or Link NLRI 493 with a Protocol-ID other than direct will be considered malformed. 494 For Prefix NLRI, the specified Protocol-ID MUST be the origin of the 495 prefix. The local and remote node descriptors for all NLRI MUST 496 include the BGP Identifier (TLV 516) and the AS Number (TLV 512) 497 [RFC7752]. The BGP Confederation Member (TLV 517) [RFC7752] is not 498 appliable and SHOULD not be included. If TLV 517 is included, it 499 will be ignored. 501 5.2.1. Node NLRI Usage 503 The Node NLRI MUST be advertised unconditionally by all routers in 504 the BGP SPF routing domain. 506 5.2.1.1. BGP-LS-SPF Node NLRI Attribute SPF Capability TLV 508 The SPF capability is an additional Node Attribute TLV. This 509 attribute TLV MUST be included with the BGP-LS-SPF SAFI and SHOULD 510 NOT be used for other SAFIs. The TLV type 1180 will be assigned by 511 IANA. The Node Attribute TLV will contain a single-octet SPF 512 algorithm as defined in [RFC8665]. 514 0 1 2 3 515 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 | Type (1180) | Length - (1 Octet) | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 | SPF Algorithm | 520 +-+-+-+-+-+-+-+-+ 522 The SPF algorithm inherits the values from the IGP Algorithm Types 523 registry [RFC8665]. Algorithm 0, (Shortest Path Algorithm (SPF) 524 based on link metric, is supported and described in Section 6.3. 525 Support for other algorithm types is beyond the scope of this 526 specification. 528 When computing the SPF for a given BGP routing domain, only BGP nodes 529 advertising the SPF capability TLV with same SPF algorithm will be 530 included in the Shortest Path Tree (SPT) Section 6.3. An 531 implementation MAY optionally log detection of a BGP node that has 532 either not advertised the SPF capability TLV or is advertising the 533 SPF capability TLV with an algorithm type other than 0. 535 5.2.1.2. BGP-LS-SPF Node NLRI Attribute SPF Status TLV 537 A BGP-LS Attribute TLV of the BGP-LS-SPF Node NLRI is defined to 538 indicate the status of the node with respect to the BGP SPF 539 calculation. This will be used to rapidly take a node out of service 540 Section 6.5.2 or to indicate the node is not to be used for transit 541 (i.e., non-local) traffic Section 6.3. If the SPF Status TLV is not 542 included with the Node NLRI, the node is considered to be up and is 543 available for transit traffic. The SPF status is acted upon with the 544 execution of the next SPF calculation Section 6.3. A single TLV type 545 will be shared by the BGP-LS-SPF Node, Link, and Prefix NLRI. The 546 TLV type 1184 will be assigned by IANA. 548 0 1 2 3 549 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 551 | Type (1184) | Length (1 Octet) | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | SPF Status | 554 +-+-+-+-+-+-+-+-+ 556 BGP Status Values: 0 - Reserved 557 1 - Node Unreachable with respect to BGP SPF 558 2 - Node does not support transit with respect 559 to BGP SPF 560 3-254 - Undefined 561 255 - Reserved 563 The BGP-LS-SPF Node Attribute SPF Status TLV, Link Attribute SPF 564 Status TLV, and Prefix Attribute SPF Status TLV use the same TLV Type 565 (1184). This implies that a BGP Update cannot contain multiple NLRI 566 with differing status. If the BGP-LS-SPF Status TLV is advertised 567 and the advertised value is not defined for all NLRI included in the 568 BGP update, then the SPF Status TLV is ignored and not used in SPF 569 computation but is still announced to other BGP SPF speakers. An 570 implementation MAY log an error for further analysis. 572 If a BGP SPF speaker received the Node NLRI but the SPF Status TLV is 573 not received, then any previously received information is considered 574 as implicitly withdrawn and the update is propagated to other BGP SPF 575 speakers. A BGP SPF speaker receiving a BGP Update containing a SPF 576 Status TLV in the BGP-LS attribute [RFC7752] with a value that is 577 outside the range of defined values SHOULD be processed and announced 578 to other BGP SPF speakers. However, a BGP SPF speaker MUST NOT use 579 the Status TLV in its SPF computation. An implementation MAY log 580 this condition for further analysis. 582 5.2.2. Link NLRI Usage 584 The criteria for advertisement of Link NLRI are discussed in 585 Section 4. 587 Link NLRI is advertised with unique local and remote node descriptors 588 dependent on the IP addressing. For IPv4 links, the link's local 589 IPv4 (TLV 259) and remote IPv4 (TLV 260) addresses will be used. For 590 IPv6 links, the local IPv6 (TLV 261) and remote IPv6 (TLV 262) 591 addresses will be used. For unnumbered links, the link local/remote 592 identifiers (TLV 258) will be used. For links supporting having both 593 IPv4 and IPv6 addresses, both sets of descriptors MAY be included in 594 the same Link NLRI. The link identifiers are described in table 5 of 595 [RFC7752]. 597 For a link to be used in Shortest Path Tree (SPT) for a given address 598 family, i.e., IPv4 or IPv6, both routers connecting the link MUST 599 have an address in the same subnet for that address family. However, 600 an IPv4 or IPv6 prefix associated with the link MAY be installed 601 without the corresponding address on the other side of link. 603 The link IGP metric attribute TLV (TLV 1095) MUST be advertised. If 604 a BGP SPF speaker receives a Link NLRI without an IGP metric 605 attribute TLV, then it SHOULD consider the received NLRI as a 606 malformed and the receiving BGP SPF speaker MUST handle such 607 malformed NLRI as 'Treat-as-withdraw' [RFC7606]. The BGP SPF metric 608 length is 4 octets. Like OSPF [RFC2328], a cost is associated with 609 the output side of each router interface. This cost is configurable 610 by the system administrator. The lower the cost, the more likely the 611 interface is to be used to forward data traffic. One possible 612 default for metric would be to give each interface a cost of 1 making 613 it effectively a hop count. Algorithms such as setting the metric 614 inversely to the link speed as supported in the OSPF MIB [RFC4750] 615 MAY be supported. However, this is beyond the scope of this 616 document. Refer to Section 10.1.1 for operational guidance. 618 The usage of other link attribute TLVs is beyond the scope of this 619 document. 621 5.2.2.1. BGP-LS-SPF Link NLRI Attribute Prefix-Length TLVs 623 Two BGP-LS Attribute TLVs of the BGP-LS-SPF Link NLRI are defined to 624 advertise the prefix length associated with the IPv4 and IPv6 link 625 prefixes derived from the link descriptor addresses. The prefix 626 length is used for the optional installation of prefixes 627 corresponding to Link NLRI as defined in Section 6.3. 629 0 1 2 3 630 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 |IPv4 (1182) or IPv6 Type (1183)| Length (1 Octet) | 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 634 | Prefix-Length | 635 +-+-+-+-+-+-+-+-+ 637 Prefix-length - A one-octet length restricted to 1-32 for IPv4 638 Link NLRI endpoint prefixes and 1-128 for IPv6 639 Link NLRI endpoint prefixes. 641 The Prefix-Length TLV is only relevant to Link NLRIs. The Prefix- 642 Length TLVs MUST be discarded as an error and not passed to other BGP 643 peers as specified in [RFC7606] when received with any NLRIs other 644 than Link NRLIs. An implementation MAY log an error for further 645 analysis. 647 The maximum prefix-length for IPv4 Prefix-Length TLV is 32 bits. A 648 prefix-length field indicating a larger value than 32 bits MUST be 649 discarded as an error and the received TLV is not passed to other BGP 650 peers as specified in [RFC7606]. The corresponding Link NLRI is 651 considered as malformed and MUST be handled as 'Treat-as-withdraw'. 652 An implementation MAY log an error for further analysis. 654 The maximum prefix-length for IPv6 Prefix-Length Type is 128 bits. A 655 prefix-length field indicating a larger value than 128 bits MUST be 656 discarded as an error and the received TLV is not passed to other BGP 657 peers as specified in [RFC7606]. The corresponding Link NLRI is 658 considered as malformed and MUST be handled as 'Treat-as-withdraw'. 659 An implementation MAY log an error for further analysis. 661 5.2.2.2. BGP-LS-SPF Link NLRI Attribute SPF Status TLV 663 A BGP-LS Attribute TLV of the BGP-LS-SPF Link NLRI is defined to 664 indicate the status of the link with respect to the BGP SPF 665 calculation. This will be used to expedite convergence for link 666 failures as discussed in Section 6.5.1. If the SPF Status TLV is not 667 included with the Link NLRI, the link is considered up and available. 668 The SPF status is acted upon with the execution of the next SPF 669 calculation Section 6.3. A single TLV type will be shared by the 670 Node, Link, and Prefix NLRI. The TLV type 1184 will be assigned by 671 IANA. 673 0 1 2 3 674 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 675 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 676 | Type (1184) | Length (1 Octet) | 677 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 678 | SPF Status | 679 +-+-+-+-+-+-+-+-+ 681 BGP Status Values: 0 - Reserved 682 1 - Link Unreachable with respect to BGP SPF 683 2-254 - Undefined 684 255 - Reserved 686 The BGP-LS-SPF Node Attribute SPF Status TLV, Link Attribute SPF 687 Status TLV, and Prefix Attribute SPF Status TLV use the same TLV Type 688 (1184). This implies that a BGP Update cannot contain multiple NLRI 689 with differing status. If the BGP-LS-SPF Status TLV is advertised 690 and the advertised value is not defined for all NLRI included in the 691 BGP update, then the SPF Status TLV is ignored and not used in SPF 692 computation but is still announced to other BGP SPF speakers. An 693 implementation MAY log an error for further analysis. 695 If a BGP SPF speaker received the Link NLRI but the SPF Status TLV is 696 not received, then any previously received information is considered 697 as implicitly withdrawn and the update is propagated to other BGP SPF 698 speakers. A BGP SPF speaker receiving a BGP Update containing an SPF 699 Status TLV in the BGP-LS attribute [RFC7752] with a value that is 700 outside the range of defined values SHOULD be processed and announced 701 to other BGP SPF speakers. However, a BGP SPF speaker MUST NOT use 702 the Status TLV in its SPF computation. An implementation MAY log 703 this information for further analysis. 705 5.2.3. IPv4/IPv6 Prefix NLRI Usage 707 IPv4/IPv6 Prefix NLRI is advertised with a Local Node Descriptor and 708 the prefix and length. The Prefix Descriptors field includes the IP 709 Reachability Information TLV (TLV 265) as described in [RFC7752]. 710 The Prefix Metric attribute TLV (TLV 1155) MUST be advertised. The 711 IGP Route Tag TLV (TLV 1153) MAY be advertised. The usage of other 712 attribute TLVs is beyond the scope of this document. For loopback 713 prefixes, the metric should be 0. For non-loopback prefixes, the 714 setting of the metric is a local matter and beyond the scope of this 715 document. 717 5.2.3.1. BGP-LS-SPF Prefix NLRI Attribute SPF Status TLV 719 A BGP-LS Attribute TLV to BGP-LS-SPF Prefix NLRI is defined to 720 indicate the status of the prefix with respect to the BGP SPF 721 calculation. This will be used to expedite convergence for prefix 722 unreachability as discussed in Section 6.5.1. If the SPF Status TLV 723 is not included with the Prefix NLRI, the prefix is considered 724 reachable. A single TLV type will be shared by the Node, Link, and 725 Prefix NLRI. The TLV type 1184 will be assigned by IANA. 727 0 1 2 3 728 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 729 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 730 | Type (1184) | Length (1 Octet) | 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 732 | SPF Status | 733 +-+-+-+-+-+-+-+-+ 735 BGP Status Values: 0 - Reserved 736 1 - Prefix Unreachable with respect to SPF 737 2-254 - Undefined 738 255 - Reserved 740 The BGP-LS-SPF Node Attribute SPF Status TLV, Link Attribute SPF 741 Status TLV, and Prefix Attribute SPF Status TLV use the same TLV Type 742 (1184). This implies that a BGP Update cannot contain multiple NLRI 743 with differing status. If the BGP-LS-SPF Status TLV is advertised 744 and the advertised value is not defined for all NLRI included in the 745 BGP update, then the SPF Status TLV is ignored and not used in SPF 746 computation but is still announced to other BGP SPF speakers. An 747 implementation MAY log an error for further analysis. 749 If a BGP SPF speaker received the Prefix NLRI but the SPF Status TLV 750 is not received, then any previously received information is 751 considered as implicitly withdrawn and the update is propagated to 752 other BGP SPF speakers. A BGP SPF speaker receiving a BGP Update 753 containing an SPF Status TLV in the BGP-LS attribute [RFC7752] with a 754 value that is outside the range of defined values SHOULD be processed 755 and announced to other BGP SPF speakers. However, a BGP SPF speaker 756 MUST NOT use the Status TLV in its SPF computation. An 757 implementation MAY log this information for further analysis. 759 5.2.4. BGP-LS Attribute Sequence-Number TLV 761 A BGP-LS Attribute TLV of the BGP-LS-SPF NLRI types is defined to 762 assure the most recent version of a given NLRI is used in the SPF 763 computation. The Sequence-Number TLV is mandatory for BGP-LS-SPF 764 NLRI. The TLV type 1181 has been assigned by IANA. The BGP-LS 765 Attribute TLV will contain an 8-octet sequence number. The usage of 766 the Sequence Number TLV is described in Section 6.1. 768 0 1 2 3 769 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 771 | Type (1181) | Length (8 Octets) | 772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773 | Sequence Number (High-Order 32 Bits) | 774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 775 | Sequence Number (Low-Order 32 Bits) | 776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 778 Sequence Number The 64-bit strictly-increasing sequence number MUST 779 be incremented for every self-originated version of BGP-LS-SPF NLRI. 780 BGP SPF speakers implementing this specification MUST use available 781 mechanisms to preserve the sequence number's strictly increasing 782 property for the deployed life of the BGP SPF speaker (including cold 783 restarts). One mechanism for accomplishing this would be to use the 784 high-order 32 bits of the sequence number as a wrap/boot count that 785 is incremented any time the BGP router loses its sequence number 786 state or the low-order 32 bits wrap. 788 When incrementing the sequence number for each self-originated NLRI, 789 the sequence number should be treated as an unsigned 64-bit value. 790 If the lower-order 32-bit value wraps, the higher-order 32-bit value 791 should be incremented and saved in non-volatile storage. If a BGP 792 SPF speaker completely loses its sequence number state (e.g., the BGP 793 SPF speaker hardware is replaced or experiences a cold-start), the 794 BGP NLRI selection rules (see Section 6.1) will insure convergence, 795 albeit not immediately. 797 The Sequence-Number TLV is mandatory for BGP-LS-SPF NLRI. If the 798 Sequence-Number TLV is not received then the corresponding Link NLRI 799 is considered as malformed and MUST be handled as 'Treat-as- 800 withdraw'. An implementation MAY log an error for further analysis. 802 5.3. NEXT_HOP Manipulation 804 All BGP peers that support SPF extensions would locally compute the 805 LOC-RIB Next-Hop as a result of the SPF process. Consequently, the 806 Next-Hop is always ignored on receipt. The Next-Hop address MUST be 807 encoded as described in [RFC4760]. BGP SPF speakers MUST interpret 808 the Next-Hop address of MP_REACH_NLRI attribute as an IPv4 address 809 whenever the length of the Next-Hop address is 4 octets, and as a 810 IPv6 address whenever the length of the Next-Hop address is 16 811 octets. 813 [RFC4760] modifies the rules of NEXT_HOP attribute whenever the 814 multiprotocol extensions for BGP-4 are enabled. BGP SPF speakers 815 MUST set the NEXT_HOP attribute according to the rules specified in 816 [RFC4760] as the BGP-LS-SPF routing information is carried within the 817 multiprotocol extensions for BGP-4. 819 6. Decision Process with SPF Algorithm 821 The Decision Process described in [RFC4271] takes place in three 822 distinct phases. The Phase 1 decision function of the Decision 823 Process is responsible for calculating the degree of preference for 824 each route received from a BGP SPF speaker's peer. The Phase 2 825 decision function is invoked on completion of the Phase 1 decision 826 function and is responsible for choosing the best route out of all 827 those available for each distinct destination, and for installing 828 each chosen route into the LOC-RIB. The combination of the Phase 1 829 and 2 decision functions is characterized as a Path Vector algorithm. 831 The SPF based Decision process replaces the BGP Decision process 832 described in [RFC4271]. This process starts with selecting only 833 those Node NLRI whose SPF capability TLV matches with the local BGP 834 SPF speaker's SPF capability TLV value. Since Link-State NLRI always 835 contains the local node descriptor Section 5.2, each NLRI is uniquely 836 originated by a single BGP SPF speaker in the BGP SPF routing domain 837 (the BGP node matching the NLRI's Node Descriptors). Instances of 838 the same NLRI originated by multiple BGP SPF speakers would be 839 indicative of a configuration error or a masquerading attack 840 (Section 9). These selected Node NLRI and their Link/Prefix NLRI are 841 used to build a directed graph during the SPF computation as 842 described below. The best routes for BGP prefixes are installed in 843 the RIB as a result of the SPF process. 845 When BGP-LS-SPF NLRI is received, all that is required is to 846 determine whether it is the most recent by examining the Node-ID and 847 sequence number as described in Section 6.1. If the received NLRI 848 has changed, it will be advertised to other BGP-LS-SPF peers. If the 849 attributes have changed (other than the sequence number), a BGP SPF 850 calculation will be triggered. However, a changed NLRI MAY be 851 advertised immediately to other peers and prior to any SPF 852 calculation. Note that the BGP MinRouteAdvertisementIntervalTimer 853 and MinASOriginationIntervalTimer [RFC4271] timers are not applicable 854 to the BGP-LS-SPF SAFI. The scheduling of the SPF calculation, as 855 described in Section 6.3, is an implementation issue. Scheduling MAY 856 be dampened consistent with the SPF back-off algorithm specified in 857 [RFC8405]. 859 The Phase 3 decision function of the Decision Process [RFC4271] is 860 also simplified since under normal SPF operation, a BGP SPF speaker 861 MUST advertise the changed NLRIs to all BGP peers with the BGP-LS-SPF 862 AFI/SAFI and install the changed routes in the Global RIB. The only 863 exception are unchanged NLRIs or stale NLRIs, i.e., NLRI received 864 with a less recent (numerically smaller) sequence number. 866 6.1. BGP NLRI Selection 868 The rules for all BGP-LS-SPF NLRIs selection for phase 1 of the BGP 869 decision process, section 9.1.1 [RFC4271], no longer apply. 871 1. Routes originated by directly connected BGP SPF peers are 872 preferred. This condition can be determined by comparing the BGP 873 Identifiers in the received Local Node Descriptor and OPEN 874 message. This rule will assure that stale NLRI is updated even 875 if a BGP-LS router loses its sequence number state due to a cold- 876 start. 878 2. The NLRI with the most recent Sequence Number TLV, i.e., highest 879 sequence number is selected. 881 3. The route received from the BGP SPF speaker with the numerically 882 larger BGP Identifier is preferred. 884 When a BGP SPF speaker completely loses its sequence number state, 885 i.e., due to a cold start, or in the unlikely possibility that 64-bit 886 sequence number wraps, the BGP routing domain will still converge. 888 This is due to the fact that BGP SPF speakers adjacent to the router 889 will always accept self-originated NLRI from the associated speaker 890 as more recent (rule # 1). When a BGP SPF speaker reestablishes a 891 connection with its peers, any existing session will be taken down 892 and stale NLRI will be replaced. The adjacent BGP SPF speaker will 893 update their NLRI advertisements, hop by hop, until the BGP routing 894 domain has converged. 896 The modified SPF Decision Process performs an SPF calculation rooted 897 at the BGP SPF speaker using the metrics from the Link Attribute IGP 898 Metric TLV (1095) and the Prefix Attribute Prefix Metric TLV (1155) 899 [RFC7752]. As a result, any other BGP attributes that would 900 influence the BGP decision process defined in [RFC4271] including 901 ORIGIN, MULTI_EXIT_DISC, and LOCAL_PREF attributes are ignored by the 902 SPF algorithm. The NEXT_HOP attribute is discussed in Section 5.3. 903 The AS_PATH and AS4_PATH [RFC6793] attributes are preserved and used 904 for loop detection [RFC4271]. They are ignored during the SPF 905 computation for BGP-LS-SPF NRLIs. 907 6.1.1. BGP Self-Originated NLRI 909 Node, Link, or Prefix NLRI with Node Descriptors matching the local 910 BGP SPF speaker are considered self-originated. When self-originated 911 NLRI is received and it doesn't match the local node's NLRI content 912 (including sequence number), special processing is required. 914 * If a self-originated NLRI is received and the sequence number is 915 more recent (i.e., greater than the local node's sequence number 916 for the NLRI), the NLRI sequence number will be advanced to one 917 greater than the received sequence number and the NLRI will be 918 readvertised to all peers. 920 * If self-originated NLRI is received and the sequence number is the 921 same as the local node's sequence number but the attributes 922 differ, the NLRI sequence number will be advanced to one greater 923 than the received sequence number and the NLRI will be 924 readvertised to all peers. 926 * If self-originated Link or Prefix NLRI is received and the Link or 927 Prefix NLRI is no longer being advertised by the local node, the 928 NLRI will be withdrawn. 930 The above actions are performed immediately when the first instance 931 of a newer self-originated NLRI is received. In this case, the newer 932 instance is considered to be a stale instance that was advertised by 933 the local node prior to a restart where the NLRI state is lost. 934 However, if subsequent newer self-originated NLRI is received for the 935 same Node, Link, or Prefix NLRI, the readvertisement or withdrawal is 936 delayed by 5 seconds since it is likely being advertised by a 937 misconfigured or rogue BGP SPF speaker Section 9. 939 6.2. Dual Stack Support 941 The SPF-based decision process operates on Node, Link, and Prefix 942 NLRIs that support both IPv4 and IPv6 addresses. Whether to run a 943 single SPF computation or multiple SPF computations for separate AFs 944 is an implementation matter. Normally, IPv4 next-hops are calculated 945 for IPv4 prefixes and IPv6 next-hops are calculated for IPv6 946 prefixes. 948 6.3. SPF Calculation based on BGP-LS-SPF NLRI 950 This section details the BGP-LS-SPF local routing information base 951 (RIB) calculation. The router will use BGP-LS-SPF Node, Link, and 952 Prefix NLRI to compute routes using the following algorithm. This 953 calculation yields the set of routes associated with the BGP SPF 954 Routing Domain. A router calculates the shortest-path tree using 955 itself as the root. Optimizations to the BGP-LS-SPF algorithm are 956 possible but MUST yield the same set of routes. The algorithm below 957 supports Equal Cost Multi-Path (ECMP) routes. Weighted Unequal Cost 958 Multi-Path routes are out of scope. The organization of this section 959 owes heavily to section 16 of [RFC2328]. 961 The following abstract data structures are defined in order to 962 specify the algorithm. 964 * Local Route Information Base (LOC-RIB) - This routing table 965 contains reachability information (i.e., next hops) for all 966 prefixes (both IPv4 and IPv6) as well as BGP-LS-SPF node 967 reachability. Implementations may choose to implement this with 968 separate RIBs for each address family and/or Prefix versus Node 969 reachability. It is synonymous with the Loc-RIB specified in 970 [RFC4271]. 972 * Global Routing Information Base (GLOBAL-RIB) - This is Routing 973 Information Base (RIB) containing the current routes that are 974 installed in the router's forwarding plane. This is commonly 975 referred to in networking parlance as "the RIB". 977 * Link State NLRI Database (LSNDB) - Database of BGP-LS-SPF NLRI 978 that facilitates access to all Node, Link, and Prefix NLRI. 980 * Candidate List (CAN-LIST) - This is a list of candidate Node NLRIs 981 used during the BGP SPF calculation Section 6.3. The list is 982 sorted by the cost to reach the Node NLRI with the Node NLRI with 983 the lowest reachability cost at the head of the list. This 984 facilitates execution of the Dijkstra algorithm Section 1.1 where 985 the shortest paths between the local node and other nodes in graph 986 area computed. The CAN-LIST is typically implemented as a heap 987 but other data structures have been used. 989 The algorithm is comprised of the steps below: 991 1. The current LOC-RIB is invalidated, and the CAN-LIST is 992 initialized to empty. The LOC-RIB is rebuilt during the course 993 of the SPF computation. The existing routing entries are 994 preserved for comparison to determine changes that need to be 995 made to the GLOBAL-RIB in step 6. 997 2. The computing router's Node NLRI is updated in the LOC-RIB with a 998 cost of 0 and the Node NLRI is also added to the CAN-LIST. The 999 next-hop list is set to the internal loopback next-hop. 1001 3. The Node NLRI with the lowest cost is removed from the candidate 1002 list for processing. If the BGP-LS Node attribute doesn't 1003 include an SPF Capability TLV (Section 5.2.1.1, the Node NLRI is 1004 ignored and the next lowest cost Node NLRI is selected from 1005 candidate list. The If the BGP-LS Node attribute includes an SPF 1006 Status TLV (Section 5.2.1.1) indicating the node is unreachable, 1007 the Node NLRI is ignored and the next lowest cost Node NLRI is 1008 selected from candidate list. The Node corresponding to this 1009 NLRI will be referred to as the Current-Node. If the candidate 1010 list is empty, the SPF calculation has completed and the 1011 algorithm proceeds to step 6. 1013 4. All the Prefix NLRI with the same Node Identifiers as the 1014 Current-Node will be considered for installation. The next- 1015 hop(s) for these Prefix NLRI are inherited from the Current-Node. 1016 The cost for each prefix is the metric advertised in the Prefix 1017 Attribute Prefix Metric TLV (1155) added to the cost to reach the 1018 Current-Node. The following will be done for each Prefix NLRI 1019 (referred to as the Current-Prefix): 1021 * If the BGP-LS Prefix attribute includes an SPF Status TLV 1022 indicating the prefix is unreachable, the Current-Prefix is 1023 considered unreachable and the next Prefix NLRI is examined in 1024 Step 4. 1026 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1027 and the LOC-RIB cost is less than the Current-Prefix's metric, 1028 the Current-Prefix does not contribute to the route and the 1029 next Prefix NLRI is examined in Step 4. 1031 * If the Current-Prefix's corresponding prefix is not in the 1032 LOC-RIB, the prefix is installed with the Current-Node's next- 1033 hops installed as the LOC-RIB route's next-hops and the metric 1034 being updated. If the IGP Route Tag TLV (1153) is included in 1035 the Current-Prefix's NLRI Attribute, the tag(s) are installed 1036 in the current LOC-RIB route's tag(s). 1038 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1039 and the cost is less than the LOC-RIB route's metric, the 1040 prefix is installed with the Current-Node's next-hops 1041 replacing the LOC-RIB route's next-hops and the metric being 1042 updated and any route tags removed. If the IGP Route Tag TLV 1043 (1153) is included in the Current-Prefix's NLRI Attribute, the 1044 tag(s) are installed in the current LOC-RIB route's tag(s). 1046 * If the Current-Prefix's corresponding prefix is in the LOC-RIB 1047 and the cost is the same as the LOC-RIB route's metric, the 1048 Current-Node's next-hops will be merged with LOC-RIB route's 1049 next-hops. If the number of merged next-hops exceeds the 1050 Equal-Cost Multi-Path (ECMP) limit, the number of next-hops is 1051 reduced with next-hops on numbered links preferred over next- 1052 hops on unnumbered links. Among next-hops on numbered links, 1053 the next-hops with the highest IPv4 or IPv6 addresses are 1054 preferred. Among next-hops on unnumbered links, the next-hops 1055 with the highest Remote Identifiers are preferred [RFC5307]. 1056 If the IGP Route Tag TLV (1153) is included in the Current- 1057 Prefix's NLRI Attribute, the tag(s) are merged into the LOC- 1058 RIB route's current tags. 1060 5. All the Link NLRI with the same Node Identifiers as the Current- 1061 Node will be considered for installation. Each link will be 1062 examined and will be referred to in the following text as the 1063 Current-Link. The cost of the Current-Link is the advertised IGP 1064 Metric TLV (1095) from the Link NLRI BGP-LS attribute added to 1065 the cost to reach the Current-Node. If the Current-Node is for 1066 the local BGP Router, the next-hop for the link will be a direct 1067 next-hop pointing to the corresponding local interface. For any 1068 other Current-Node, the next-hop(s) for the Current-Link will be 1069 inherited from the Current-Node. The following will be done for 1070 each link: 1072 a. The prefix(es) associated with the Current-Link are installed 1073 into the LOC-RIB using the same rules as were used for Prefix 1074 NLRI in the previous steps. Optionally, in deployments where 1075 BGP-SPF routers have limited routing table capacity, 1076 installation of these subnets can be suppressed. Suppression 1077 will have an operational impact as the IPv4/IPv6 link 1078 endpoint addresses will not be reachable and tools such as 1079 traceroute will display addresses that are not reachable. 1081 b. If the Current-Node NLRI attributes includes the SPF status 1082 TLV (Section 5.2.1.2) and the status indicates that the Node 1083 doesn't support transit, the next link for the Current-Node 1084 is processed in Step 5. 1086 c. If the Current-Link's NLRI attribute includes an SPF Status 1087 TLV indicating the link is down, the BGP-LS-SPF Link NLRI is 1088 considered down and the next link for the Current-Node is 1089 examined in Step 5. 1091 d. The Current-Link's Remote Node NLRI is accessed (i.e., the 1092 Node NLRI with the same Node identifiers as the Current- 1093 Link's Remote Node Descriptors). If it exists, it will be 1094 referred to as the Remote-Node and the algorithm will proceed 1095 as follows: 1097 * If the Remote-Node's NLRI attribute includes an SPF Status 1098 TLV indicating the node is unreachable, the next link for 1099 the Current-Node is examined in Step 5. 1101 * All the Link NLRI corresponding the Remote-Node will be 1102 searched for a Link NLRI pointing to the Current-Node. 1103 Each Link NLRI is examined for Remote Node Descriptors 1104 matching the Current-Node and Link Descriptors matching 1105 the Current-Link. For numbered links to match, the Link 1106 Descriptors MUST share a common IPv4 or IPv6 subnet. For 1107 unnumbered links to match, the Current Link's Local 1108 Identifier MUST match the Remote Node Link's Remote 1109 Identifier and the Current Link's Remote Identifier MUST 1110 the Remote Node Link's Local Identifier [RFC5307]. If 1111 these conditions are satisfied for one of the Remote- 1112 Node's links, the bi-directional connectivity check 1113 succeeds and the Remote-Node may be processed further. 1114 The Remote-Node's Link NLRI providing bi-directional 1115 connectivity will be referred to as the Remote-Link. If 1116 no Remote-Link is found, the next link for the Current- 1117 Node is examined in Step 5. 1119 * If the Remote-Link NLRI attribute includes an SPF Status 1120 TLV indicating the link is down, the Remote-Link NLRI is 1121 considered down and the next link for the Current-Node is 1122 examined in Step 5. 1124 * If the Remote-Node is not on the CAN-LIST, it is inserted 1125 based on the cost. The Remote Node's cost is the cost of 1126 Current-Node added the Current-Link's IGP Metric TLV 1127 (1095). The next-hop(s) for the Remote-Node are inherited 1128 from the Current-Link. 1130 * If the Remote-Node NLRI is already on the CAN-LIST with a 1131 higher cost, it must be removed and reinserted with the 1132 Remote-Node cost based on the Current-Link (as calculated 1133 in the previous step). The next-hop(s) for the Remote- 1134 Node are inherited from the Current-Link. 1136 * If the Remote-Node NLRI is already on the CAN-LIST with 1137 the same cost, it need not be reinserted on the CAN-LIST. 1138 However, the Current-Link's next-hop(s) must be merged 1139 into the current set of next-hops for the Remote-Node. 1141 * If the Remote-Node NLRI is already on the CAN-LIST with a 1142 lower cost, it need not be reinserted on the CAN-LIST. 1144 e. Return to step 3 to process the next lowest cost Node NLRI on 1145 the CAN-LIST. 1147 6. The LOC-RIB is examined and changes (adds, deletes, 1148 modifications) are installed into the GLOBAL-RIB. For each route 1149 in the LOC-RIB: 1151 * If the route was added during the current BGP SPF computation, 1152 install the route into the GLOBAL-RIB. 1154 * If the route modified during the current BGP SPF computation 1155 (e.g., metric, tags, or next-hops), update the route in the 1156 GLOBAL-RIB. 1158 * If the route was not installed during the current BGP SPF 1159 computation, remove the route from both the GLOBAL-RIB and the 1160 LOC-RIB. 1162 6.4. IPv4/IPv6 Unicast Address Family Interaction 1164 While the BGP-LS-SPF address family and the IPv4/IPv6 unicast address 1165 families MAY install routes into the same device routing tables, they 1166 will operate independently much the same as OSPF and IS-IS would 1167 operate today (i.e., "Ships-in-the-Night" mode). There is no 1168 implicit route redistribution between the BGP address families. 1170 It is RECOMMENDED that BGP-LS-SPF IPv4/IPv6 route computation and 1171 installation be given scheduling priority by default over other BGP 1172 address families as these address families are considered as underlay 1173 SAFIs. Similarly, it is RECOMMENDED that the route preference or 1174 administrative distance give active route installation preference to 1175 BGP-LS-SPF IPv4/IPv6 routes over BGP routes from other AFI/SAFIs. 1176 However, this preference MAY be overridden by an operator-configured 1177 policy. 1179 6.5. NLRI Advertisement 1181 6.5.1. Link/Prefix Failure Convergence 1183 A local failure will prevent a link from being used in the SPF 1184 calculation due to the IGP bi-directional connectivity requirement. 1185 Consequently, local link failures SHOULD always be given priority 1186 over updates (e.g., withdrawing all routes learned on a session) in 1187 order to ensure the highest priority propagation and optimal 1188 convergence. 1190 An IGP such as OSPF [RFC2328] will stop using the link as soon as the 1191 Router-LSA for one side of the link is received. With a BGP 1192 advertisement, the link would continue to be used until the last copy 1193 of the BGP-LS-SPF Link NLRI is withdrawn. In order to avoid this 1194 delay, the originator of the Link NLRI SHOULD advertise a more recent 1195 version with an increased Sequence Number TLV for the BGP-LS-SPF Link 1196 NLRI including the SPF Status TLV (Section 5.2.2.2) indicating the 1197 link is down with respect to BGP SPF. The configurable 1198 LinkStatusDownAdvertise timer controls the interval that the BGP-LS- 1199 LINK NLRI is advertised with SPF Status indicating the link is down 1200 prior to withdrawal. If the link becomes available in that period, 1201 the originator of the BGP-LS-SPF LINK NLRI SHOULD advertise a more 1202 recent version of the BGP-LS-SPF Link NLRI without the SPF Status TLV 1203 in the BGP-LS Link Attributes. The suggested default value for the 1204 LinkStatusDownAdvertise timer is 2 seconds. 1206 Similarly, when a prefix becomes unreachable, a more recent version 1207 of the BGP-LS-SPF Prefix NLRI SHOULD be advertised with the SPF 1208 Status TLV (Section 5.2.3.1) indicating the prefix is unreachable in 1209 the BGP-LS Prefix Attributes and the prefix will be considered 1210 unreachable with respect to BGP SPF. The configurable 1211 PrefixStatusDownAdvertise timer controls the interval that the BGP- 1212 LS-Prefix NLRI is advertised with SPF Status indicating the prefix is 1213 unreachable prior to withdrawal. If the prefix becomes reachable in 1214 that period, the originator of the BGP-LS-SPF Prefix NLRI SHOULD 1215 advertise a more recent version of the BGP-LS-SPF Prefix NLRI without 1216 the SPF Status TLV in the BGP-LS Prefix Attributes. The suggested 1217 default value for the PrefixStatusDownAdvertise timer is 2 seconds. 1219 6.5.2. Node Failure Convergence 1221 With BGP without graceful restart [RFC4724], all the NLRI advertised 1222 by a node are implicitly withdrawn when a session failure is 1223 detected. If fast failure detection such as BFD is utilized, and the 1224 node is on the fastest converging path, the most recent versions of 1225 BGP-LS-SPF NLRI may be withdrawn. This will result into an older 1226 version of the NLRI being used until the new versions arrive and, 1227 potentially, unnecessary route flaps. For the BGP-LS-SPF SAFI, NLRI 1228 SHOULD NOT be implicitly withdrawn immediately to prevent such 1229 unnecessary route flaps. The configurable 1230 NLRIImplicitWithdrawalDelay timer controls the interval that NLRI is 1231 retained prior to implicit withdrawal after a BGP SPF speaker has 1232 transitioned out of Established state. This will not delay 1233 convergence since the adjacent nodes will detect the link failure and 1234 advertise a more recent NLRI indicating the link is down with respect 1235 to BGP SPF (Section 6.5.1) and the BGP SPF calculation will fail the 1236 bi-directional connectivity check Section 6.3. The suggested default 1237 value for the NLRIImplicitWithdrawalDelay timer is 2 seconds. 1239 7. Error Handling 1241 This section describes the Error Handling actions, as described in 1242 [RFC7606], that are specific to SAFI BGP-LS-SPF BGP Update message 1243 processing. 1245 7.1. Processing of BGP-LS-SPF TLVs 1247 When a BGP SPF speaker receives a BGP Update containing a malformed 1248 Node NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST 1249 ignore the received TLV and MUST NOT pass it to other BGP peers as 1250 specified in [RFC7606]. When discarding an associated Node NLRI with 1251 a malformed TLV, a BGP SPF speaker SHOULD log an error for further 1252 analysis. 1254 When a BGP SPF speaker receives a BGP Update containing a malformed 1255 Link NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST 1256 ignore the received TLV and MUST NOT pass it to other BGP peers as 1257 specified in [RFC7606]. When discarding an associated Link NLRI with 1258 a malformed TLV, a BGP SPF speaker SHOULD log an error for further 1259 analysis. 1261 When a BGP SPF speaker receives a BGP Update containing a malformed 1262 Prefix NLRI SPF Status TLV in the BGP-LS Attribute [RFC7752], it MUST 1263 ignore the received TLV and MUST NOT pass it to other BGP peers as 1264 specified in [RFC7606]. When discarding an associated Prefix NLRI 1265 with a malformed TLV, a BGP SPF speaker SHOULD log an error for 1266 further analysis. 1268 When a BGP SPF speaker receives a BGP Update containing a malformed 1269 SPF Capability TLV in the Node NLRI BGP-LS Attribute [RFC7752], it 1270 MUST ignore the received TLV and the Node NLRI and MUST NOT pass it 1271 to other BGP peers as specified in [RFC7606]. When discarding a Node 1272 NLRI with a malformed TLV, a BGP SPF speaker SHOULD log an error for 1273 further analysis. 1275 When a BGP SPF speaker receives a BGP Update containing a malformed 1276 IPv4 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], 1277 it MUST ignore the received TLV and the Node NLRI and MUST NOT pass 1278 it to other BGP peers as specified in [RFC7606]. The corresponding 1279 Link NLRI is considered as malformed and MUST be handled as 'Treat- 1280 as-withdraw'. An implementation MAY log an error for further 1281 analysis. 1283 When a BGP SPF speaker receives a BGP Update containing a malformed 1284 IPv6 Prefix-Length TLV in the Link NLRI BGP-LS Attribute [RFC7752], 1285 it MUST ignore the received TLV and the Node NLRI and MUST NOT pass 1286 it to other BGP peers as specified in [RFC7606]. The corresponding 1287 Link NLRI is considered as malformed and MUST be handled as 'Treat- 1288 as-withdraw'. An implementation MAY log an error for further 1289 analysis. 1291 7.2. Processing of BGP-LS-SPF NLRIs 1293 A Link-State NLRI MUST NOT be considered as malformed or invalid 1294 based on the inclusion/exclusion of TLVs or contents of the TLV 1295 fields (i.e., semantic errors), as described in Section 5.1 and 1296 Section 5.1.1. 1298 A BGP-LS-SPF Speaker MUST perform the following syntactic validation 1299 of the BGP-LS-SPF NLRI to determine if it is malformed. 1301 1. Does the sum of all TLVs found in the BGP MP_REACH_NLRI attribute 1302 correspond to the BGP MP_REACH_NLRI length? 1304 2. Does the sum of all TLVs found in the BGP MP_UNREACH_NLRI 1305 attribute correspond to the BGP MP_UNREACH_NLRI length? 1307 3. Does the sum of all TLVs found in a BGP-LS-SPF NLRI correspond to 1308 the Total NLRI Length field of all its Descriptors? 1310 4. When an NLRI TLV is recognized, is the length of the TLV and its 1311 sub-TLVs valid? 1313 5. Has the syntactic correctness of the NLRI fields been verified as 1314 per [RFC7606]? 1316 6. Has the rule regarding ordering of TLVs been followed as 1317 described in Section 5.1.1? 1319 When the error determined allows for the router to skip the malformed 1320 NLRI(s) and continue processing of the rest of the update message 1321 (e.g., when the TLV ordering rule is violated), then it MUST handle 1322 such malformed NLRIs as 'Treat-as-withdraw'. In other cases, where 1323 the error in the NLRI encoding results in the inability to process 1324 the BGP update message (e.g., length related encoding errors), then 1325 the router SHOULD handle such malformed NLRIs as 'AFI/SAFI disable' 1326 when other AFI/SAFI besides BGP-LS are being advertised over the same 1327 session. Alternately, the router MUST perform 'session reset' when 1328 the session is only being used for BGP-LS-SPF or when its 'AFI/SAFI 1329 disable' action is not possible. 1331 7.3. Processing of BGP-LS Attribute 1333 A BGP-LS Attribute MUST NOT be considered as malformed or invalid 1334 based on the inclusion/exclusion of TLVs or contents of the TLV 1335 fields (i.e., semantic errors), as described in Section 5.1 and 1336 Section 5.1.1. 1338 A BGP-LS-SPF Speaker MUST perform the following syntactic validation 1339 of the BGP-LS Attribute to determine if it is malformed. 1341 1. Does the sum of all TLVs found in the BGP-LS-SPF Attribute 1342 correspond to the BGP-LS Attribute length? 1344 2. Has the syntactic correctness of the Attributes (including BGP-LS 1345 Attribute) been verified as per [RFC7606]? 1347 3. Is the length of each TLV and, when the TLV is recognized then, 1348 its sub-TLVs in the BGP-LS Attribute valid? 1350 When the detected error allows for the router to skip the malformed 1351 BGP-LS Attribute and continue processing of the rest of the update 1352 message (e.g., when the BGP-LS Attribute length and the total Path 1353 Attribute Length are correct but some TLV/sub-TLV length within the 1354 BGP-LS Attribute is invalid), then it MUST handle such malformed BGP- 1355 LS Attribute as 'Attribute Discard'. In other cases, when the error 1356 in the BGP-LS Attribute encoding results in the inability to process 1357 the BGP update message, then the handling is the same as described 1358 above for malformed NLRI. 1360 Note that the 'Attribute Discard' action results in the loss of all 1361 TLVs in the BGP-LS Attribute and not the removal of a specific 1362 malformed TLV. The removal of specific malformed TLVs may give a 1363 wrong indication to a BGP SPF speaker that the specific information 1364 is being deleted or is not available. 1366 When a BGP SPF speaker receives an update message with Link-State 1367 NLRI(s) in the MP_REACH_NLRI but without the BGP-LS-SPF Attribute, it 1368 is most likely an indication that a BGP SPF speaker preceding it has 1369 performed the 'Attribute Discard' fault handling. An implementation 1370 SHOULD preserve and propagate the Link-State NLRIs in such an update 1371 message so that the BGP SPF speaker can detect the loss of link-state 1372 information for that object and not assume its deletion/withdrawal. 1373 This also makes it possible for a network operator to trace back to 1374 the BGP SPF speaker which actually detected a problem with the BGP-LS 1375 Attribute. 1377 An implementation SHOULD log an error for further analysis for 1378 problems detected during syntax validation. 1380 When a BGP SPF speaker receives a BGP Update containing a malformed 1381 IGP metric TLV in the Link NLRI BGP-LS Attribute [RFC7752], it MUST 1382 ignore the received TLV and the Link NLRI and MUST NOT pass it to 1383 other BGP peers as specified in [RFC7606]. When discarding a Link 1384 NLRI with a malformed TLV, a BGP SPF speaker SHOULD log an error for 1385 further analysis. 1387 8. IANA Considerations 1389 This document defines the use of SAFI (80) for BGP SPF operation 1390 Section 5.1, and requests IANA to assign the value from the First 1391 Come First Serve (FCFS) range in the Subsequent Address Family 1392 Identifiers (SAFI) Parameters registry. 1394 This document also defines five attribute TLVs of BGP-LS-SPF NLRI. 1395 We request IANA to assign types for the SPF capability TLV, Sequence 1396 Number TLV, IPv4 Link Prefix-Length TLV, IPv6 Link Prefix-Length TLV, 1397 and SPF Status TLV from the "BGP-LS Node Descriptor, Link Descriptor, 1398 Prefix Descriptor, and Attribute TLVs" Registry. 1400 +=========================+=================+====================+ 1401 | Attribute TLV | Suggested Value | NLRI Applicability | 1402 +=========================+=================+====================+ 1403 | SPF Capability | 1180 | Node | 1404 +-------------------------+-----------------+--------------------+ 1405 | SPF Status | 1184 | Node, Link, Prefix | 1406 +-------------------------+-----------------+--------------------+ 1407 | IPv4 Link Prefix Length | 1182 | Link | 1408 +-------------------------+-----------------+--------------------+ 1409 | IPv6 Link Prefix Length | 1183 | Link | 1410 +-------------------------+-----------------+--------------------+ 1411 | Sequence Number | 1181 | Node, Link, Prefix | 1412 +-------------------------+-----------------+--------------------+ 1414 Table 1: NLRI Attribute TLVs 1416 9. Security Considerations 1418 This document defines a BGP SAFI, i.e., the BGP-LS-SPF SAFI. This 1419 document does not change the underlying security issues inherent in 1420 the BGP protocol [RFC4271]. The Security Considerations discussed in 1421 [RFC4271] apply to the BGP SPF functionality as well. The analysis 1422 of the security issues for BGP mentioned in [RFC4272] and [RFC6952] 1423 also applies to this document. The analysis of Generic Threats to 1424 Routing Protocols done in [RFC4593] is also worth noting. As the 1425 modifications described in this document for BGP SPF apply to IPv4 1426 Unicast and IPv6 Unicast as undelay SAFIs in a single BGP SPF Routing 1427 Domain, the BGP security solutions described in [RFC6811] and 1428 [RFC8205] are somewhat constricted as they are meant to apply for 1429 inter-domain BGP where multiple BGP Routing Domains are typically 1430 involved. The BGP-LS-SPF SAFI NLRI described in this document are 1431 typically advertised between EBGP or IBGP speakers under a single 1432 administrative domain. 1434 In the context of the BGP peering associated with this document, a 1435 BGP speaker MUST NOT accept updates from a peer that is not within 1436 any administrative control of an operator. That is, a participating 1437 BGP speaker SHOULD be aware of the nature of its peering 1438 relationships. Such protection can be achieved by manual 1439 configuration of peers at the BGP speaker. 1441 In order to mitigate the risk of peering with BGP speakers 1442 masquerading as legitimate authorized BGP speakers, it is recommended 1443 that the TCP Authentication Option (TCP-AO) [RFC5925] be used to 1444 authenticate BGP sessions. If an authorized BGP peer is compromised, 1445 that BGP peer could advertise modified Node, Link, or Prefix NLRI 1446 will result in misrouting, repeating origination of NLRI, and/or 1447 excessive SPF calculations. When a BGP speaker detects that its 1448 self-originated NLRI is being originated by another BGP speaker, an 1449 appropriate error should be logged so that the operator can take 1450 corrective action. 1452 10. Management Considerations 1454 This section includes unique management considerations for the BGP- 1455 LS-SPF address family. 1457 10.1. Configuration 1459 All routers in BGP SPF Routing Domain are under a single 1460 administrative domain allowing for consistent configuration. 1462 10.1.1. Link Metric Configuration 1464 Within a BGP SPF Routing Domain, the IGP metrics for all advertised 1465 links SHOULD be configured or defaulted consistently. For example, 1466 if a default metric is used for one router's links, then a similar 1467 metric should be used for all router's links. Similarly, if the link 1468 cost is derived from using the inverse of the link bandwidth on one 1469 router, then this SHOULD be done for all routers and the same 1470 reference bandwidth should be used to derive the inversely 1471 proportional metric. Failure to do so will not result in correct 1472 routing based on link metric. 1474 10.1.2. backoff-config 1476 In addition to configuration of the BGP-LS-SPF address family, 1477 implementations SHOULD support the "Shortest Path First (SPF) Back- 1478 Off Delay Algorithm for Link-State IGPs" [RFC8405]. If supported, 1479 configuration of the INITIAL_SPF_DELAY, SHORT_SPF_DELAY, 1480 LONG_SPF_DELAY, TIME_TO_LEARN, and HOLDDOWN_INTERVAL MUST be 1481 supported [RFC8405]. Section 6 of [RFC8405] recommends consistent 1482 configuration of these values throughout the IGP routing domain and 1483 this also applies to the BGP SPF Routing Domain. 1485 10.2. Operational Data 1487 In order to troubleshoot SPF issues, implementations SHOULD support 1488 an SPF log including entries for previous SPF computations. Each SPF 1489 log entry would include the BGP-LS-SPF NLRI SPF triggering the SPF, 1490 SPF scheduled time, SPF start time, SPF end time, and SPF type if 1491 different types of SPF are supported. Since the size of the log will 1492 be finite, implementations SHOULD also maintain counters for the 1493 total number of SPF computations and the total number of SPF 1494 triggering events. Additionally, to troubleshoot SPF scheduling and 1495 back-off [RFC8405], the current SPF back-off state, remaining time- 1496 to-learn, remaining holddown, last trigger event time, last SPF time, 1497 and next SPF time should be available. 1499 11. Implementation Status 1501 Note RFC Editor: Please remove this section and the associated 1502 references prior to publication. 1504 This section records the status of known implementations of the 1505 protocol defined by this specification at the time of posting of this 1506 Internet-Draft and is based on a proposal described in [RFC7942]. 1507 The description of implementations in this section is intended to 1508 assist the IETF in its decision processes in progressing drafts to 1509 RFCs. Please note that the listing of any individual implementation 1510 here does not imply endorsement by the IETF. Furthermore, no effort 1511 has been spent to verify the information presented here that was 1512 supplied by IETF contributors. This is not intended as, and must not 1513 be construed to be, a catalog of available implementations or their 1514 features. Readers are advised to note that other implementations may 1515 exist. 1517 According to RFC 7942, "this will allow reviewers and working groups 1518 to assign due consideration to documents that have the benefit of 1519 running code, which may serve as evidence of valuable experimentation 1520 and feedback that have made the implemented protocols more mature. 1521 It is up to the individual working groups to use this information as 1522 they see fit". 1524 The BGP-LS-SPF implementation status is documented in 1525 [I-D.psarkar-lsvr-bgp-spf-impl]. 1527 12. Acknowledgements 1529 The authors would like to thank Sue Hares, Jorge Rabadan, Boris 1530 Hassanov, Dan Frost, Matt Anderson, Fred Baker, Lukas Krattiger, 1531 Yingzhen Qu, and Haibo Wang for their review and comments. Thanks to 1532 Pushpasis Sarkar for discussions on preventing a BGP SPF Router from 1533 being used for non-local traffic (i.e., transit traffic). 1535 The authors extend special thanks to Eric Rosen for fruitful 1536 discussions on BGP-LS-SPF convergence as compared to IGPs. 1538 13. Contributors 1540 In addition to the authors listed on the front page, the following 1541 co-authors have contributed to the document. 1543 Derek Yeung 1544 Arrcus, Inc. 1545 derek@arrcus.com 1547 Gunter Van De Velde 1548 Nokia 1549 gunter.van_de_velde@nokia.com 1551 Abhay Roy 1552 Arrcus, Inc. 1553 abhay@arrcus.com 1555 Venu Venugopal 1556 Cisco Systems 1557 venuv@cisco.com 1559 Chaitanya Yadlapalli 1560 AT&T 1561 cy098d@att.com 1563 14. References 1565 14.1. Normative References 1567 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1568 Requirement Levels", BCP 14, RFC 2119, 1569 DOI 10.17487/RFC2119, March 1997, 1570 . 1572 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1573 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1574 DOI 10.17487/RFC4271, January 2006, 1575 . 1577 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 1578 RFC 4272, DOI 10.17487/RFC4272, January 2006, 1579 . 1581 [RFC4593] Barbir, A., Murphy, S., and Y. Yang, "Generic Threats to 1582 Routing Protocols", RFC 4593, DOI 10.17487/RFC4593, 1583 October 2006, . 1585 [RFC4750] Joyal, D., Ed., Galecki, P., Ed., Giacalone, S., Ed., 1586 Coltun, R., and F. Baker, "OSPF Version 2 Management 1587 Information Base", RFC 4750, DOI 10.17487/RFC4750, 1588 December 2006, . 1590 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1591 "Multiprotocol Extensions for BGP-4", RFC 4760, 1592 DOI 10.17487/RFC4760, January 2007, 1593 . 1595 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 1596 with BGP-4", RFC 5492, DOI 10.17487/RFC5492, February 1597 2009, . 1599 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1600 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1601 June 2010, . 1603 [RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet 1604 Autonomous System (AS) Number Space", RFC 6793, 1605 DOI 10.17487/RFC6793, December 2012, 1606 . 1608 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1609 Austein, "BGP Prefix Origin Validation", RFC 6811, 1610 DOI 10.17487/RFC6811, January 2013, 1611 . 1613 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1614 Patel, "Revised Error Handling for BGP UPDATE Messages", 1615 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1616 . 1618 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1619 S. Ray, "North-Bound Distribution of Link-State and 1620 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1621 DOI 10.17487/RFC7752, March 2016, 1622 . 1624 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1625 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1626 May 2017, . 1628 [RFC8205] Lepinski, M., Ed. and K. Sriram, Ed., "BGPsec Protocol 1629 Specification", RFC 8205, DOI 10.17487/RFC8205, September 1630 2017, . 1632 [RFC8405] Decraene, B., Litkowski, S., Gredler, H., Lindem, A., 1633 Francois, P., and C. Bowers, "Shortest Path First (SPF) 1634 Back-Off Delay Algorithm for Link-State IGPs", RFC 8405, 1635 DOI 10.17487/RFC8405, June 2018, 1636 . 1638 [RFC8654] Bush, R., Patel, K., and D. Ward, "Extended Message 1639 Support for BGP", RFC 8654, DOI 10.17487/RFC8654, October 1640 2019, . 1642 [RFC8665] Psenak, P., Ed., Previdi, S., Ed., Filsfils, C., Gredler, 1643 H., Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 1644 Extensions for Segment Routing", RFC 8665, 1645 DOI 10.17487/RFC8665, December 2019, 1646 . 1648 14.2. Informational References 1650 [I-D.ietf-lsvr-applicability] 1651 Patel, K., Lindem, A., Zandi, S., and G. Dawra, "Usage and 1652 Applicability of Link State Vector Routing in Data 1653 Centers", Work in Progress, Internet-Draft, draft-ietf- 1654 lsvr-applicability-05, 24 March 2020, 1655 . 1658 [I-D.psarkar-lsvr-bgp-spf-impl] 1659 Sarkar, P., Patel, K., Pallagatti, S., and s. 1660 sajibasil@gmail.com, "BGP Shortest Path Routing Extension 1661 Implementation Report", Work in Progress, Internet-Draft, 1662 draft-psarkar-lsvr-bgp-spf-impl-00, 2 June 2020, 1663 . 1666 [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, 1667 DOI 10.17487/RFC2328, April 1998, 1668 . 1670 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 1671 Reflection: An Alternative to Full Mesh Internal BGP 1672 (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, 1673 . 1675 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J., and Y. 1676 Rekhter, "Graceful Restart Mechanism for BGP", RFC 4724, 1677 DOI 10.17487/RFC4724, January 2007, 1678 . 1680 [RFC4915] Psenak, P., Mirtorabi, S., Roy, A., Nguyen, L., and P. 1681 Pillay-Esnault, "Multi-Topology (MT) Routing in OSPF", 1682 RFC 4915, DOI 10.17487/RFC4915, June 2007, 1683 . 1685 [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for 1686 IP Fast Reroute: Loop-Free Alternates", RFC 5286, 1687 DOI 10.17487/RFC5286, September 2008, 1688 . 1690 [RFC5307] Kompella, K., Ed. and Y. Rekhter, Ed., "IS-IS Extensions 1691 in Support of Generalized Multi-Protocol Label Switching 1692 (GMPLS)", RFC 5307, DOI 10.17487/RFC5307, October 2008, 1693 . 1695 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1696 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1697 . 1699 [RFC6952] Jethanandani, M., Patel, K., and L. Zheng, "Analysis of 1700 BGP, LDP, PCEP, and MSDP Issues According to the Keying 1701 and Authentication for Routing Protocols (KARP) Design 1702 Guide", RFC 6952, DOI 10.17487/RFC6952, May 2013, 1703 . 1705 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 1706 "Advertisement of Multiple Paths in BGP", RFC 7911, 1707 DOI 10.17487/RFC7911, July 2016, 1708 . 1710 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 1711 BGP for Routing in Large-Scale Data Centers", RFC 7938, 1712 DOI 10.17487/RFC7938, August 2016, 1713 . 1715 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 1716 Code: The Implementation Status Section", BCP 205, 1717 RFC 7942, DOI 10.17487/RFC7942, July 2016, 1718 . 1720 Authors' Addresses 1722 Keyur Patel 1723 Arrcus, Inc. 1725 Email: keyur@arrcus.com 1727 Acee Lindem 1728 Cisco Systems 1729 301 Midenhall Way 1730 Cary, NC 27513 1731 United States of America 1733 Email: acee@cisco.com 1735 Shawn Zandi 1736 LinkedIn 1737 222 2nd Street 1738 San Francisco, CA 94105 1739 United States of America 1741 Email: szandi@linkedin.com 1743 Wim Henderickx 1744 Nokia 1745 Antwerp 1746 Belgium 1748 Email: wim.henderickx@nokia.com