idnits 2.17.1 draft-xu-idr-neighbor-autodiscovery-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 26, 2019) is 1612 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-29) exists of draft-ietf-lsvr-bgp-spf-06 == Outdated reference: A later version (-05) exists of draft-ketant-idr-bgp-ls-bgp-only-fabric-03 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Alibaba Inc 4 Intended status: Standards Track K. Talaulikar 5 Expires: May 29, 2020 Cisco Systems 6 K. Bi 7 Huawei 8 J. Tantsura 9 Apstra 10 N. Triantafillis 11 Amazon Web Services 12 November 26, 2019 14 BGP Neighbor Discovery 15 draft-xu-idr-neighbor-autodiscovery-12 17 Abstract 19 BGP is being used as the underlay routing protocol in some large- 20 scaled data centers (DCs). Most popular design followed is to do 21 hop-by-hop external BGP (EBGP) session configurations between 22 neighboring routers on a per link basis. The provisioning of BGP 23 neighbors in routers across such a DC brings its own operational 24 complexity. 26 This document introduces a BGP neighbor discovery mechanism that 27 greatly simplifies BGP operations in such DC and other networks by 28 automatic setup of BGP sessions between neighbor routers using this 29 mechanism. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at https://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on May 29, 2020. 48 Copyright Notice 50 Copyright (c) 2019 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (https://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 69 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 70 5. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 6. UDP Message Header . . . . . . . . . . . . . . . . . . . . . 7 72 7. Hello Message Format . . . . . . . . . . . . . . . . . . . . 8 73 8. Hello Message TLVs . . . . . . . . . . . . . . . . . . . . . 10 74 8.1. Accepted ASN List TLV . . . . . . . . . . . . . . . . . . 10 75 8.2. Peering Address TLV . . . . . . . . . . . . . . . . . . . 11 76 8.3. Local Prefix TLV . . . . . . . . . . . . . . . . . . . . 13 77 8.4. Link Attributes TLV . . . . . . . . . . . . . . . . . . . 14 78 8.5. Neighbor TLV . . . . . . . . . . . . . . . . . . . . . . 17 79 8.6. Cryptographic Authentication TLV . . . . . . . . . . . . 18 80 9. Neighbor Discovery Procedure . . . . . . . . . . . . . . . . 20 81 9.1. Interface Procedures . . . . . . . . . . . . . . . . . . 20 82 9.2. Adjacency State Machine . . . . . . . . . . . . . . . . . 21 83 9.2.1. Down State . . . . . . . . . . . . . . . . . . . . . 22 84 9.2.2. Initial State . . . . . . . . . . . . . . . . . . . . 22 85 9.2.3. 1-Way State . . . . . . . . . . . . . . . . . . . . . 22 86 9.2.4. 2-Way State . . . . . . . . . . . . . . . . . . . . . 23 87 9.2.5. Adj-Reject State . . . . . . . . . . . . . . . . . . 23 88 9.2.6. Adj-OK State . . . . . . . . . . . . . . . . . . . . 24 89 9.2.7. Accepted State . . . . . . . . . . . . . . . . . . . 24 90 9.3. Adjacency Route . . . . . . . . . . . . . . . . . . . . . 25 91 10. Interactions with Base BGP Protocol . . . . . . . . . . . . . 26 92 11. Security Considerations . . . . . . . . . . . . . . . . . . . 27 93 12. Manageability Considerations . . . . . . . . . . . . . . . . 28 94 12.1. Operational Considerations . . . . . . . . . . . . . . . 28 95 12.2. Management Considerations . . . . . . . . . . . . . . . 29 97 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 98 13.1. BGP Hello Message . . . . . . . . . . . . . . . . . . . 30 99 13.2. TLVs of BGP Hello Message . . . . . . . . . . . . . . . 30 100 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 30 101 15. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 30 102 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 103 16.1. Normative References . . . . . . . . . . . . . . . . . . 31 104 16.2. Informative References . . . . . . . . . . . . . . . . . 32 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 107 1. Introduction 109 BGP is being used as the underlay routing protocol instead of link- 110 state routing protocols like IS-IS and OSPF in some large-scale data 111 centers (DCs). [RFC7938] describes the design, configuration and 112 operational aspects of using BGP in such networks. The most popular 113 design scheme involves the setup of external BGP (EBGP) sessions over 114 individual links between directly connected routers using their 115 interface addresses. Such BGP neighbor provisioning requires 116 configuration of the neighbor IP address and Autonomous System (AS) 117 Number (ASN) for BGP neighbor on each and every link of every BGP 118 router. As a DC fabric comprising of topology described in [RFC7938] 119 grows with addition of new leafs, spines, and links between them, the 120 BGP provisioning needs to be carefully updated. Unlike with the 121 link-state protocols, in the case of BGP, there is no automatic 122 discovery of neighbors and route exchange between them by simply 123 adding links and nodes of the fabric into the routing protocol 124 operation. 126 In some DC designs with BGP, multiple links are added between a leaf 127 and spine to add additional bandwidth. Use of link-aggregation at 128 Layer 2 level may not be always desirable in such cases due to the 129 risk of flow polarization on account of a mix of ECMP at Layer 2 and 130 Layer 3 levels. In such cases, one option is for EBGP sessions to be 131 setup between two BGP neighbors over each of the links between them. 132 In such a case, the BGP session scale and the resultant increase in 133 update processing may pose scalability challenges. A second option 134 is for a single EBGP session to be setup between the loopback IP 135 addresses between the neighbor and then configure some static routes 136 for loopback reachability over the underlying links. This option 137 introduces an additional provisioning task for the static routes. 139 Furthermore, there is also a need for BGP to be able to describe its 140 links and its neighbors on its directly connected links and export 141 this information via BGP-LS [RFC7752] to provide a detail link-level 142 topology view of a data center running BGP. The ability of BGP in 143 discovering its neighbors over its links, monitoring their liveliness 144 and learning the link attributes (such as addresses) is required for 145 the conveying the link-state topology in such a BGP network. This 146 information can be leveraged by the BGP-SPF proposal 147 [I-D.ietf-lsvr-bgp-spf] which introduces link-state routing 148 capabilities in BGP. This information can also be leveraged to 149 convey the link-state topology in a network running traditional BGP 150 routing using BGP-LS as described in 151 [I-D.ketant-idr-bgp-ls-bgp-only-fabric] and to enabled end to end 152 traffic engineering use-cases spanning across DCs and the core/access 153 networks. 155 1.1. Requirements Language 157 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 158 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 159 "OPTIONAL" in this document are to be interpreted as described in BCP 160 14 [RFC2119] [RFC8174] when, and only when, they appear in all 161 capitals, as shown here. 163 2. Terminology 165 This document makes use of the terms defined in [RFC4271] and 166 [RFC7938] . 168 3. Applicability 170 The applicability of the BGP Neighbor Discovery mechanism described 171 in this document is limited to deployments where BGP is used as 172 routing protocol between directly connected routers and when there is 173 a requirement for automatic setup of BGP peering between them. 175 o In DC networks where BGP is used as a hop-by-hop routing protocol 176 [RFC7938]. 178 o In metro networks where access aggregation topologies are 179 architected as a CLOS topology (or similar other networks) and BGP 180 is used as a hop-by-hop routing protocol. 182 While this document uses EBGP examples, the mechanism is equally 183 applicable in designs that use IBGP similarly for hop-by-hop routing. 185 The applicability of the BGP Neighbor Discovery mechanism to any 186 other BGP protocol deployment is outside the scope of this document. 188 4. Requirements 190 This section describe the requirements for the BGP hop-by-hop routing 191 deployments that were considered for the definition of the BGP 192 Neighbor Discovery extensions proposed in this document.. 194 Following are the key requirements related for the BGP neighbor 195 discovery process: 197 1. It should perform discovery of directly connected BGP routers. 198 Mechanism should support either IPv4 or IPv6 or a dual stack 199 design and it should be generic for any link-layer. 201 2. It should include exchange of BGP peering addresses (IPv4 or IPv6 202 or both) that routers can use to automatically setup BGP TCP 203 peering between themselves. The mechanism should leverage the 204 existing capability negotiation process performed as part of the 205 BGP TCP session establishment. 207 3. When BGP peering is desired to be performed over loopback 208 addresses of the routers, then the mechanism should automatically 209 setup reachability to the loopback over one or more underlying 210 directly connected links between them. In this scenario, the 211 mechanism should also provide resolution for the BGP next-hop 212 address (i.e. the loopback address) for the BGP routes exchanged 213 over these sessions between the loopback addresses. 215 4. Mechanism should enable exchange of link-level information such 216 as IP addresses and link attributes between the directly 217 connected BGP routers. It should be extensible to include other 218 information in the future. 220 5. Mechanism should be limited to link scope for security and use 221 link-local addressing only. Cryptographic mechanisms should be 222 also provided for additional security. 224 6. Mechanism should support capabilities for performing optional 225 validation of parameters to detect misconfiguration (e.g. link 226 address subnet mismatch, peering between incorrect AS, etc.) in 227 an extensible manner before going on to use the link and the 228 setup of the BGP TCP peering session over it. 230 7. The mechanism should not affect or change the BGP TCP session 231 establishment procedures and the BGP routing exchange over the 232 TCP session other than the interactions for triggering the setup/ 233 removal of peer session that is based on discovery mechanism. 235 8. The mechanism should leverage existing fast-detection techniques 236 for failures that are used currently for EBGP sessions over 237 directly connected links like fast-external-failover and BFD. 239 9. The mechanism should focus on the discovery process and exchange 240 of status as a control plane procedure and be sufficiently 241 loosely coupled with the base BGP operations to enable 242 implementations to ensure scalability of BGP operations when 243 using the discovery procedures. 245 5. Overview 247 At a high level, this specification introduces the use of UDP based 248 BGP Hello messages to be exchanged between directly connected BGP 249 routers for neighbor discovery. 251 1. Information is exchanged between BGP routers on a per link basis 252 leading to discovery of each others peering address and other 253 information. 255 2. The TCP session establishment for the BGP protocol operation and 256 the BGP routing exchange over these sessions can then follow 257 without any change/modification from the existing BGP protocol 258 operations as specified in [RFC4271]. 260 3. As part of the neighbor information exchange the route to a 261 neighbor's peering address is also automatically setup pointing 262 over the links over which the neighbor is discovered. 264 4. This route is used for both the BGP TCP session establishment as 265 well as for resolution of the BGP next-hop (NH) for the routes 266 learnt via the neighbor instead of an underlying IGP or static 267 route. 269 This document prefers the use of an extension to BGP protocol since 270 the deployments and use-cases targeted (i.e. large-scale DCs) are 271 already running BGP as their routing protocol. Extending BGP with 272 neighbor discovery capabilities is operationally and implementation 273 wise a simpler approach than requiring a new or an additional 274 protocol to be first extended to do this functionality (to exchange 275 BGP-specific parameters) and then also integrated its operations with 276 BGP protocol operations. 278 The BGP Neighbor discovery mechanism is a control plane mechanism 279 intended to discovery and maintain the BGP router's adjacencies with 280 its neighbors over directly connected links. Maintaining an 281 adjacency also involves detecting any changes in parameters using 282 periodic messages and triggering corresponding actions based on the 283 change. Such actions also include removal of the BGP TCP peering for 284 an auto discovered peering session based on the neighbor discovery. 285 However, the mechanism is not intended for a fast liveness detection 286 of neighbor and existing mechanisms for this purpose such as BFD 287 [RFC5880] may be leveraged. 289 The BGP Neighbor discovery mechanism is scoped to a link and works 290 using link-local addressing. In a BGP DC network that is using IPv6 291 in the fabric underlay, it is possible that no IPv6 global addresses 292 are assigned to the interfaces between the nodes and the IPv6 Global 293 address(es) are assigned only to the loopback interfaces of these 294 nodes. The Neighbor discovery mechanism enables the setup of BGP 295 peering using the IPv6 Global addresses on the loopback interfaces 296 and hop by hop routing with just IPv6 link-local addresses on the 297 interfaces. Such a design eases introduction of nodes in the fabric 298 and links between them from a provisioning aspect. In a deployment 299 with IPv4 addressing, IP unnumbered could be similarly used for all 300 the links between the nodes using the IPv4 address assigned to the 301 loopback interfaces on those nodes. 303 The BGP neighbor discovery mechanism defined in this document borrows 304 ideas from the Label Distribution Protocol (LDP) [RFC5036]. However, 305 most importantly, only the concept of link-local signaling based 306 neighbor discovery is borrowed while the discovery aspect for 307 targeted LDP sessions does not apply to this BGP neighbor discovery 308 mechanism. 310 The further sections in this document first describe the newly 311 introduced message formats and TLVs and then go on to describe the 312 procedures of BGP neighbor discovery and its integration with the 313 base BGP protocol mechanism as specified in [RFC4271]. 315 The operational and management aspects of the BGP neighbor discovery 316 mechanism are described in Section 12. 318 6. UDP Message Header 320 The BGP neighbor discovery mechanism will operate using UDP messages. 321 The UDP port of TBD (179 is the preferred port number to be assigned 322 as specified in Section 13) is used which is same as the TCP port 179 323 used by BGP. The BGP UDP message common header format is specified 324 as follows: 326 0 1 2 3 327 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 329 | Version | Type | Message Length | 330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 331 | AS number | 332 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 333 | BGP Identifier | 334 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 336 Figure 1: BGP UDP Message Header 338 Version: This 1-octet unsigned integer indicates the protocol 339 version number of the message. The current BGP version number is 340 4. 342 Type: The type of BGP message 344 Message Length: This 2-octet unsigned integer specifies the length 345 in octets of the entire BGP UDP message including the header. 347 AS number: AS Number of the UDP message sender. 349 BGP Identifier: BGP Identifier of the UDP message sender. 351 BGP UDP messages can be sent using either IPv4 or IPv6 depending on 352 the address used for session establishment and provisioned on the 353 interfaces over which these messages are sent. 355 7. Hello Message Format 357 A BGP router uses UDP based Hello messages to discover directly 358 connected BGP neighbors over those interfaces enabled for Neighbor 359 Discovery. The BGP Hello messages for the Neighbor Discovery 360 procedure are used for link-locally signaling and hence MUST be 361 addressed to the "all routers on this subnet" group multicast address 362 (i.e., 224.0.0.2 in the IPv4 case and FF02::2 in the IPv6 case) and 363 the TTL for the IP packets SHOULD be set to 1. The IP source address 364 MUST be set to the address of the interface over which the message is 365 sent out which would be the primary interface address or unnumbered 366 address in the IPv4 case and the IPv6 link-local address on the 367 interface in the IPv6 case. 369 The Hello message format is as follows: 371 0 1 2 3 372 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 374 | Version | Type | Message Length | 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 376 | AS number | 377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 | BGP Identifier | 379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 | Adjacency Hold Time | Flags | Reserved | 381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 | TLVs | 383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 Figure 2: BGP Hello Message 387 Version: This 1-octet unsigned integer indicates the protocol 388 version number of the message. The current BGP version number is 389 4. 391 Type: The type of BGP message (Hello - TBD value from BGP Message 392 Types Registry) 394 Message Length: This 2-octet unsigned integer specifies the length 395 in octets of the TLVs field. 397 AS number: AS Number of the BGP router sending the Hello message. 399 BGP Identifier: BGP Identifier of the BGP router sending the Hello 400 message. 402 Adjacency Hold Time: Hello adjacency hold timer in seconds. 403 Adjacency Hold Time specifies the time, for which the receiving 404 BGP neighbor router SHOULD maintain adjacency state for it, 405 without receipt of another Hello. A value of 0 means that the 406 receiving BGP peer should immediately mark that the adjacency to 407 the sender is going down. 409 Flags : Current defined bits are as follows. All other bits 410 SHOULD be cleared by sender and MUST be ignored by receiver. 412 0 1 2 3 4 5 6 7 413 +-+-+-+-+-+-+-+-+ 414 |S| | 415 +-+-+-+-+-+-+-+-+ 417 where: 419 S bit - indicates that this is a State Change Hello message 420 when SET and normal periodic Hello message when CLEAR 422 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 423 receiver. 425 TLVs: This field contains one or more TLVs as described below. 427 BGP HELLO messages can be sent using either IPv4 or IPv6 addresses 428 depending on the addressing used for session establishment and 429 provisioned on the interfaces over which these messages are sent. 430 When both IPv4 and IPv6 is enabled on the interface, then IPv6 431 address SHOULD be used. Implementations MAY provide an option to 432 override the choice of address family to be used. The choice of 433 address family to be used MUST be consistent on all BGP routers on a 434 given link for neighbor discovery. 436 Based on the setting of the S flag, there are two variants of the 437 Hello message: 439 1. State Change Hello Message : these Hello messages include TLVs 440 which convey the state and parameters of the local interface and 441 adjacency to other routers on the link. They are generated only 442 when there is a change in state of the adjacency or some 443 parameter at the interface level. 445 2. Periodic Hello Message : these are the normal periodic Hello 446 messages which do not include TLVs and are used to maintain the 447 adjacency on the link during steady state conditions. 449 These Hello message variants are intended to limit the exchange of 450 information and state via TLVs to only those periods where necessary 451 while using lightweight Hello messages during steady state. This 452 simplifies the Hello message processing and improves scalability of 453 the discovery mechanism. 455 The neighbor discovery procedure using the Hello message is described 456 in Section 9 and its relation with the BGP Keepalives and Hold Timer 457 for the TCP session is described in Section 10. 459 8. Hello Message TLVs 461 The BGP Hello message carries TLVs as described in this section that 462 enable exchange of information on a per interface basis between 463 directly connected BGP neighbors. These messages enable the neighbor 464 discovery process. 466 8.1. Accepted ASN List TLV 468 The Accepted ASN List TLV is an optional TLV that is used to signal 469 an unordered list of AS numbers from which the BGP router would 470 accept BGP sessions. When not signaled, it indicates that the router 471 will accept BGP peering from any ASN from its neighbors. Indicating 472 the list of ASNs, helps avoid the neighbor discovery process getting 473 stuck in a 1-way state where one side keeps attempting to setup 474 adjacency while the other does not accept it due to incorrect ASN. 476 The operational and management aspects of this ASN based policy 477 control for BGP neighbor discovery are described further in 478 Section 12. 480 This TLV SHOULD NOT be included in a Hello message with the S bit 481 CLEAR. More than a single instance of this TLV MUST NOT be included 482 in a Hello message. If a router receives multiple instances of this 483 TLV then it should only consider the first instance in the sequence 484 and ignore the rest. 486 The format of this TLV is shown below 488 0 1 2 3 489 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 491 | Type | Length | 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 | Accepted ASN List(variable) | 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 496 Figure 3: Accepted ASN List TLV 498 Type: TBD1 500 Length: Specifies the length of the Value field in octets (in 501 multiple of 4) 503 Accepted ASN-List: This variable-length field contains one or more 504 accepted 4-octet ASNs. 506 8.2. Peering Address TLV 508 The Peering Address TLV is used to indicate to the neighbor the 509 address to be used for setting up the BGP TCP session. Along with 510 the peering address, the router can specify its supported AFI/ 511 SAFI(s). When the AFI/SAFI values are specified as 0/0, then it 512 indicates that the neighbor can attempt for negotiation of any AFI/ 513 SAFIs. The indication of AFI/SAFI(s) in the Peering Address TLV is 514 not intended as an alternative for the MP capabilities negotiation 515 mechanism done as part of the BGP TCP session establishment. 517 Multiple instances of this TLV MAY be included in the Hello message, 518 one for each peering address (e.g. IPv4 and IPv6 or multiple IPv4 519 addresses for different AFI/SAFI sessions). When multiple peering 520 addresses are provisioned, then the indication helps the router 521 select the appropriate peer address of the neighbor based on its 522 local peering address profile by matching the supported AFI/SAFIs. 524 This TLV is essential for the setting up of the TCP peering between 525 BGP neighbors using the neighbor discovery mechanism. When a BGP 526 router stops including a Peer Address in its State Change Hello 527 messages, then it is no longer accepting TCP peering sessions to that 528 address and the neighbor SHOULD clean up any peering session that was 529 setup to that address via the discovery mechanism. 531 Implementations SHOULD support the signaling of an interface IP 532 address in the Peering Address TLV and perform the BGP TCP session 533 establishment using interface addresses (i.e. the neighbor discovery 534 mechanism is not limited to the use of loopback addresses for the 535 peering session establishment). Implementations MAY support the 536 signaling of IPv6 Link Local addresses using the Peering Address TLV 537 and using the same for the BGP TCP session setup. 539 This TLV SHOULD NOT be included in a Hello message with the S bit 540 CLEAR. 542 The Peering Address TLV format is shown below. 544 0 1 2 3 545 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 | Type | Length | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | Flags | No. AFI/SAFI | Reserved | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 551 | Address (4-octet or 16-octet) | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 555 | AFI | SAFI | ... 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 559 | sub-TLVs ... 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 Figure 4: Peering Address TLV 564 Type: TBD2 566 Length: Specifies the length of the Value field in octets. 568 Flags : Current defined bits are as follows. All other bits 569 SHOULD be cleared by sender and MUST be ignored by receiver. 571 0 1 2 3 4 5 6 7 572 +-+-+-+-+-+-+-+-+ 573 |A| | 574 +-+-+-+-+-+-+-+-+ 576 where: 578 A bit - address is IPv6 when SET and IPv4 when CLEAR 580 Number of AFI/SAFI: indicates the number of AFI/SAFI pairs that 581 the router supports on the given peering address. 583 Reserved: sender SHOULD set to 0 and receiver MUST ignore. 585 Address: This 4 or 16 octet field indicates the IPv4 or IPv6 586 address which is used for establishing BGP sessions. 588 AFI/SAFI : one or more pairs of these values that indicate the 589 supported capabilities on the peering address. 591 Sub-TLVs : optional and currently none defined 593 8.3. Local Prefix TLV 595 BGP neighbor discovery mechanism, in certain scenarios, requires a 596 BGP router to program a route in its local routing table for a prefix 597 belonging to its neighbor router. On such scenario is when the BGP 598 TCP peering is to be setup between the loopback addresses on the 599 neighboring routers. This requires that the routers have 600 reachability to their each other's loopback addresses before the TCP 601 session can be brought up. 603 The Local Prefix TLV is an optional TLV which enables a BGP router to 604 explicitly signal its local prefix to its neighbor for setting up of 605 such a local routing entry pointing over the underlying link over 606 which it is being signaled. This enables the BGP router to have 607 control over the specific links over which its neighbor that may 608 reach it for the specific local prefix. The details of the procedure 609 for programming of the route corresponding to the prefix signaled 610 using the Local Prefix TLV is described in Section 9.3.. 612 Multiple instances of the Local Prefix TLV MAY be included in the 613 Hello message with each carrying a specific prefix in it. This TLV 614 SHOULD NOT be included in a Hello message with the S bit CLEAR. 616 The Local Prefix TLV format is as shown below. 618 0 1 2 3 619 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 621 | Type | Length | 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 | Flags | Prefix Length | Reserved | 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 | Prefix Address (4-octet or 16-octet) | 626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 | sub-TLVs ... 628 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 630 Figure 5: Local Prefix TLV 632 Type: TBD3 634 Length: Specifies the length of the Value field in octets 636 Flags : Current defined bits are as follows. All other bits 637 SHOULD be cleared by sender and MUST be ignored by receiver. 639 0 1 2 3 4 5 6 7 640 +-+-+-+-+-+-+-+-+ 641 |A| | 642 +-+-+-+-+-+-+-+-+ 644 where: 646 A bit - address is IPv6 when SET and IPv4 when CLEAR 648 Prefix Length: specifies the Prefix length 650 Reserved: sender SHOULD set to 0 and receiver MUST ignore. 652 Prefix Address: This 4 or 16 octet field indicates the IPv4 or 653 IPv6 prefix address. 655 Sub-TLVs : optional and currently none defined 657 8.4. Link Attributes TLV 659 The Link Attributes TLV is a mandatory TLV in a State Change Hello 660 message that signals to the neighbor the link attributes of the 661 interface on the local router. One and only one instance of this TLV 662 MUST be included in the State Change Hello message. A State Change 663 Hello message without this TLV included MUST be discarded and an 664 error logged for the same. 666 This TLV enables a BGP router to learn all its neighbors IP addresses 667 on the specific link as well as it's link identifier. When the 668 interface is IPv4 enabled, all the IPv4 addresses configured on it 669 are included in this TLV. IPv4 unnumbered address is not included in 670 this TLV and no IPv4 address would be included for the interface in 671 such cases. When the interface is IPv6 enabled, all the IPv6 global 672 addresses configured on the interface are included in this TLV. IPv6 673 link-local addresses are not included in this TLV. In case of an 674 interface running dual stack, both IPv4 and IPv6 addresses are 675 included in this TLV irrespective of the address family that is used 676 for UDP message exchange. 678 Additional sub-TLVs may be defined in the future to exchange other 679 link attributes between BGP neighbors. This TLV SHOULD NOT be 680 included in a Hello message with the S bit CLEAR. 682 The Link Attributes TLV format is as shown below. 684 0 1 2 3 685 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 686 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 | Type | Length | 688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 689 | Local Interface ID | Flags | Reserved | 690 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 691 | No. of IPv4 Addresses | No. of IPv6 Addresses | 692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 | IPv4 Interface Address | 696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 697 | Prefix Mask | ... 698 +-+-+-+-+-+-+-+-+ 700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 701 | IPv6 Global Interface Address | 702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 | Prefix Mask | ... 704 +-+-+-+-+-+-+-+-+ 706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 | sub-TLVs ... 708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 Figure 6: Link Attributes TLV 712 Type: TBD4 713 Length: Specifies the length of the Value field in octets 715 Local Interface ID : the local interface ID of the interface 716 (refer unnumbered link section of [RFC2104] e.g. the MIB-2 717 ifIndex). This helps uniquely identify the link even when there 718 are multiple links between two neighbors using IPv4 unnumbered 719 address or only having IPv6 link-local addresses. 721 Flags : Currently defined bits are as follows. Other bits SHOULD 722 be cleared by sender and MUST be ignored by receiver. 724 0 1 2 3 4 5 6 7 725 +-+-+-+-+-+-+-+-+ 726 |I|V|B| | 727 +-+-+-+-+-+-+-+-+ 729 where: 731 I bit - indicates link is enabled for IPv4 733 V bit - indicates link is enabled for IPv6 735 B bit - indicates support for BFD monitoring [RFC5880] over the 736 link 738 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 739 receiver. 741 No. of IPv4 Addresses : specifies the number of IPv4 addresses on 742 the interface. When value is 0, then it indicates no IPv4 743 Prefixes are present or the interface is IPv4 unnumbered if it is 744 enabled for IPv4 746 No. of IPv6 Addresses : specifies the number of IPv6 global 747 addresses on the interface. When value is 0, then it indicates no 748 IPv6 Global Prefixes are present and the interface is only 749 configured with IPv6 link-local addresses if it is enabled for 750 IPv6. 752 IPv4 Address & Mask: Zero or more pairs of IPv4 address and their 753 mask. 755 IPv6 Address & Mask: Zero or more pairs of IPv6 address and their 756 mask. 758 Sub-TLVs : optional and currently none defined 760 8.5. Neighbor TLV 762 The Neighbor TLV is used by a BGP router to indicate its Hello 763 adjacency state with its neighboring router(s) on the specific link. 764 The neighbor is identified by its AS Number and BGP Identifier. The 765 router MUST include the Neighbor TLV for each of its discovered 766 neighbors on that link irrespective of its status. 768 The usage of the Neighbor TLV is described in detail in Section 9. 769 This TLV SHOULD NOT be included in a Hello message with the S bit 770 CLEAR. 772 The Neighbor TLV format is as shown below. 774 0 1 2 3 775 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 777 | Type | Length | 778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 779 | Flags | State | Reserved | 780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 781 | Neighbor AS number | 782 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 783 | Neighbor BGP Identifier | 784 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 785 | sub-TLVs ... 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 788 Figure 7: Neighbor TLV 790 Type: TBD5 792 Length: Specifies the length of the Value field in octets 794 Flags : Current defined bits are as follows. All other bits 795 SHOULD be cleared by sender and MUST be ignored by receiver. 797 0 1 2 3 4 5 6 7 798 +-+-+-+-+-+-+-+-+ 799 |B| | 800 +-+-+-+-+-+-+-+-+ 802 where: 804 B bit - When SET with the adjacency state not in Accepted state 805 indicates that the adjacency is not accepted due to BFD down. 807 State : Indicates the state code of the adjacency state machine 808 (refer to Section 9.2 for details) for the neighbor over this 809 link. The following codes are currently defined 811 0 - Down (not to be used as state in this TLV 813 1 - Initial (not to be used as state in this TLV) 815 2 - 1-way 817 3 - 2-way 819 4 - Adj-Reject 821 5 - Adj-OK 823 6 - Accepted 825 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 826 receiver. 828 Neighbor AS number: AS Number of the neighbor BGP router as 829 signaled in its Hello message. 831 Neighbor BGP Identifier: BGP Identifier of the neighbor BGP router 832 as signaled in its Hello message. 834 Sub-TLVs : currently none defined 836 8.6. Cryptographic Authentication TLV 838 The Cryptographic Authentication TLV is an optional TLV that is used 839 as part of an authentication mechanism for BGP Hello message by 840 securing against spoofing attacks. It also introduces a 841 cryptographic sequence number carried in the Hello messages that can 842 be used to protect against replay attacks. Using this Cryptographic 843 Authentication TLV, one or more secret keys (with corresponding 844 Security Association (SA) IDs) are configured on each BGP router. 845 For each BGP Hello message, the key is used to generate and verify an 846 HMAC Hash that is stored in the Cryptographic Authentication TLV. 847 For the cryptographic hash function, this document proposes to use 848 SHA-1, SHA-256, SHA-384, and SHA-512 defined in US NIST Secure Hash 849 Standard (SHS) [FIPS-180-4]. The HMAC authentication mode defined in 850 [RFC2104] is used. Of the above, implementations MUST include 851 support for at least HMAC-SHA-256, SHOULD include support for HMAC- 852 SHA-1, and MAY include support for HMAC-SHA-384 and HMAC-SHA-512. 854 Further details for ensuring the security of the BGP Hello UDP 855 messages are described in Section 11. 857 The Cryptographic Authentication TLV format is as shown below. 859 0 1 2 3 860 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 862 | Type | Length | 863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 864 | Security Association ID | 865 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 866 | Cryptographic Sequence Number (High-Order 32 Bits) | 867 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 868 | Cryptographic Sequence Number (Low-Order 32 Bits) | 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 | Authentication Data (Variable) // 871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 873 Figure 8: Cryptographic Authentication TLV 875 Type: TBD6 877 Length: Specifies the length of the Value field in octets 879 Security Association ID: The 32-bit field that maps to the 880 authentication algorithm and the secret key used to create the 881 message digest carried in Hello message payload. 883 Cryptographic Sequence Number: The 64-bit, strictly increasing 884 sequence number that is used to guard against replay attacks. The 885 64-bit sequence number MUST be incremented for every BGP Hello 886 message sent by the BGP router. Upon reception, the sequence 887 number MUST be greater than the sequence number in the last BGP 888 Hello message accepted from the sending BGP neighbor. Otherwise, 889 the BGP hello message is considered a replayed packet and is 890 dropped. The Cryptographic Sequence Number is a single space per 891 BGP router. 893 Authentication Data: This field carries the digest computed by the 894 Cryptographic Authentication algorithm in use. The length of the 895 Authentication Data varies based on the cryptographic algorithm in 896 use, which is shown below: 898 HMAC-SHA1 20 bytes 899 HMAC-SHA-256 32 bytes 901 HMAC-SHA-384 48 bytes 903 HMAC-SHA-512 64 bytes 905 9. Neighbor Discovery Procedure 907 The neighbor discovery mechanism in BGP is implemented with the 908 introduction of an Interface state in BGP and an Adjacency Finite 909 State Machine (FSM). This section describes the states, FSM and 910 procedures involved. 912 9.1. Interface Procedures 914 In order to perform neighbor discovery, BGP needs to maintain state 915 for the subset of its connected interfaces over which neighbor 916 discovery is enabled. For these interfaces, BGP sends its Hello 917 messages, including the TLVs described in Section 8, as long as its 918 link is UP. The Neighbor TLV described in Section 8.5 is, included 919 once a neighbor is discovered as described in Section 9.2 . 921 The Hello messages MUST be originated periodically at an interval 922 which is less than or equal to one third of the Adjacency Hold Time 923 indicated by the router in its Hello message. The RECOMMENDED 924 default value for the Adjacency Hold Time is 45 seconds which makes 925 the hello message interval to be 15 seconds. Period Hello messages 926 ensure robustness of the neighbor discovery mechanism against 927 transient loss of hello messages that are sent over unreliable UDP 928 messaging channel and also enable detection of neighbor down events 929 over specific links. Periodic Hello messages that do not convey any 930 change in state SHOULD exclude TLVs that signal the local interface 931 or adjacency state and have the S bit CLEAR as specified in 932 Section 7. 934 A State Change Hello message MUST be triggered, without waiting for 935 the periodic timer expiry, whenever there is a change in the router's 936 Hello TLVs' content that needs to be signaled to its neighbor over 937 the specific link. A State Change Hello message MUST also be 938 triggered when a new neighbor's Hello message is first received or 939 change is detected in the neighbor's Hello TLV's that results in 940 change in it's adjacency state. Once a State Change Hello message is 941 triggered on a specific interface, the router MUST continue to 942 generate State Change Hello messages on it with the necessary TLVs 943 included at periodic hello message intervals for a period of time 944 that is at least equal to the Adjacency Hold Time. This ensures that 945 messages carrying the updated information and local state changes are 946 not lost. The router can switch back to Periodic Hello messages 947 after it has transmitted State Change Hello messages with the latest 948 TLV contents for the Adjacency Hold Time period. 950 When a router receives a Hello message from its neighbor, it MUST 951 restart the Adjacency Hold timer that it is maintaining for the 952 neighbor adjacency using the value indicated in the Hello message. 953 When the message is of type State Change (i.e. with S bit SET), it 954 additionally needs to process all the TLVs included and verify the 955 signaled state against what was conveyed in the previous State Change 956 Hello message from the same neighbor. Any changed identified would 957 trigger the adjacency FSM change as described in Section 9.2. 959 When a router does not receive a Hello message from its neighbor for 960 a period equal to Adjacency Hold Time, then it MUST treat this as an 961 adjacency down event and clean up its adjacency state to this 962 neighbor as described in Section 9.2. 964 Before the interface is shut or the neighbor discovery mechanism is 965 disabled on it, the router SHOULD attempt to send out immediate Hello 966 messages, with the S bit CLEAR (i.e. not including state related 967 TLVs) and with Adjacency Hold Time set to 0, to trigger the adjacency 968 down event on its neighbors. It MUST then clean up its own adjacency 969 states on that specific link. 971 When either the BGP Identifier or the AS number are modified, then 972 the router MUST send out a triggered Hello message, with the S bit 973 CLEAR and with Adjacency Hold Time set to 0 using the old BGP 974 Identifier and AS number values, over all the links enabled for BGP 975 neighbor discovery. 977 A router receiving a Hello message with Adjacency Hold Time set to 0 978 MUST treat this event as if the adjacency hold timer has expired for 979 the specific neighbor and proceed to bring down the adjacency. 981 An interface going down (e.g. due to link failure or loss of signal) 982 MUST immediately trigger the adjacency down event for all adjacencies 983 over it as if the adjacency hold timer expired for all neighbors on 984 that link. 986 9.2. Adjacency State Machine 988 On a per interface basis, BGP needs to maintain an adjacency state 989 for each neighbor that it discovers. The adjacency state is 990 maintained as a FSM and it has states as described in the following 991 sections. 993 9.2.1. Down State 995 This is the transient terminal state after which an adjacency is 996 deleted. 998 When transitioning to the Down state from Accepted, the router 999 removes the path corresponding to this adjacency from any Adjacency 1000 Route that it had setup to the neighbor's prefixes. If no other 1001 adjacency exists in Accepted state to the neighbor, then it also 1002 deletes the BGP TCP peering session(s) setup to the neighbor based on 1003 the neighbor discovery mechanism. 1005 9.2.2. Initial State 1007 This is the transient initial state from which an adjacency starts, 1008 when the router detects a hello message from a new neighbor on the 1009 link, and immediately transitions to the 1-way state. 1011 9.2.3. 1-Way State 1013 While in the 1-way state (or when entering it), the adjacency 1014 transitions from 1-way to 2-way state when the router detects a 1015 Neighbor TLV corresponding to itself in the neighbor's Hello message. 1016 If the state does not immediately transition on to 2-way after 1017 entering 1-way, the the router MUST immediately trigger a State 1018 Change Hello message with the inclusion of the neighbor in a Neighbor 1019 TLV with the state set to 1-way. 1021 When transitioning to the 1-way state from Accepted, the router 1022 removes the path corresponding to this adjacency from any Adjacency 1023 Route that it had setup to the neighbor's prefixes. If no other 1024 adjacency exists in Accepted state to the neighbor, then it also 1025 deletes the BGP TCP peering session(s) setup to the neighbor based on 1026 the neighbor discovery mechanism. 1028 Adjacency transitions to Down state for any of the following events: 1030 o Link goes down operationally or is administratively shut 1032 o Adjacency Hold Timer expires 1034 o Router receives a Hello message from its neighbor with Adjacency 1035 Hold Time value set to 0 1037 o Neighbor discovery is disabled on the link 1039 o Change in BGP Identifier or AS number on the local router 1041 9.2.4. 2-Way State 1043 Upon transitioning into this state, the router triggers a State 1044 Change Hello message with the neighbor's status set to 2-way in the 1045 Neighbor TLV. At this stage, both neighbors have received each 1046 other's Hello messages and thus discovered each other. 1048 When the router, in this adjacency state, detects that the neighbor's 1049 state for itself is 2-way or higher, then it performs the validation 1050 checks based on local policy and information exchanged in the Hello 1051 TLVs. Following are some of the validation checks that may be 1052 performed on the adjacency: 1054 o Verify subnet matching between the local and remote interface 1055 addresses. 1057 o Verify AS numbers based on local policy as well as against the 1058 Allowed ASN TLV when one is being exchanged. 1060 o Verify that BFD monitoring (when enabled) is indicating UP state. 1062 When the adjacency passes the validation checks, it transitions to 1063 the Adj-OK state and transitions to the Adj-Reject state otherwise. 1065 The adjacency transitions to Down state for any of the adjacency down 1066 events described in Section 9.2.3 . 1068 The adjacency transitions to 1-way state when the router stops seeing 1069 itself in a Neighbor TLV of its Neighbor's State Change Hello 1070 messages. 1072 9.2.5. Adj-Reject State 1074 Upon transitioning into this state, the router triggers a State 1075 Change Hello message with the neighbor's status set to Adj-Reject in 1076 the Neighbor TLV. 1078 The adjacency remains in the Adj-Reject state as long as the 1079 parameters being exchanged via the State Change Hello messages do not 1080 pass validation checks. The neighbors continue to include each other 1081 in their respective State Change Hello messages. 1083 The adjacency transitions to the Adj-OK state once the validation 1084 checks pass (e.g. due to update in any parameters or local policy). 1086 The adjacency transitions to Down state for any of the adjacency down 1087 events described in Section 9.2.3 . 1089 The adjacency transitions to 1-way state when the router stops seeing 1090 itself in a Neighbor TLV of its Neighbor's State Change Hello 1091 messages. 1093 When transitioning to an Adj-Reject state from Accepted state, the 1094 router removes the path corresponding to this adjacency from any 1095 Adjacency Route that it had setup to the neighbor's prefixes. If no 1096 other adjacency exists in Accepted state to the neighbor, then it 1097 also deletes the BGP TCP peering session(s) setup to the neighbor 1098 based on the neighbor discovery mechanism. 1100 9.2.6. Adj-OK State 1102 Upon transitioning into this state, the router triggers a State 1103 Change Hello message with the neighbor's status set to Adj-OK in the 1104 Neighbor TLV. 1106 The adjacency transition to Adj-OK state indicates that the router 1107 has accepted its neighbor. However, it is possible that the neighbor 1108 has not accept it and is signaling Adj-Reject state for the adjacency 1109 from it's end. 1111 The adjacency transitions to the Accepted state from Adj-OK once it 1112 detects that its neighbor is also signaling the Adj-OK or Accepted 1113 state for it. 1115 The adjacency transitions to Down state for any of the adjacency down 1116 events described in Section 9.2.3 . 1118 The adjacency transitions to 1-way state when the router stops seeing 1119 itself in a Neighbor TLV of its Neighbor's State Change Hello 1120 messages. 1122 The adjacency transitions to Adj-Reject state when any of the 1123 validation checks listed in Section 9.2.4 fail. 1125 When transitioning to an Adj-OK state from Accepted state, the router 1126 removes the path corresponding to this adjacency from any Adjacency 1127 Route that it had setup to the neighbor's prefixes. If no other 1128 adjacency exists in Accepted state to the neighbor, then it also 1129 deletes the BGP TCP peering session(s) setup to the neighbor based on 1130 the neighbor discovery mechanism. 1132 9.2.7. Accepted State 1134 The adjacency transition to Accepted state indicates that both the 1135 neighboring routers have accepted the adjacency to each other. 1137 On this transition, the router triggers a State Change Hello message 1138 with the neighbor's status set to Accepted in the Neighbor TLV. It 1139 then installs the Adjacency Route(s) for the Prefix(es) signaled by 1140 the neighbor via the Local Prefix TLV via this adjacency link using 1141 the neighbor's address on that link. If this is the first Accepted 1142 adjacency to the neighbor then the Adjacency Route gets added to the 1143 local routing table, otherwise an additional path corresponding to 1144 this adjacency link and neighbor address on it gets added to the 1145 existing Adjacency Route. The details are described in Section 9.3. 1147 When this is the first Accepted adjacency to the neighbor, then the 1148 setup of the BGP TCP session to the Peering Address(es) signaled by 1149 the neighbor is also triggered. 1151 The adjacency transitions to Down state for any of the adjacency down 1152 events described in Section 9.2.3. 1154 The adjacency transitions to 1-way state when the router stops seeing 1155 itself in a Neighbor TLV of its Neighbor's State Change Hello 1156 messages. 1158 The adjacency transitions to Adj-Reject state when any of the 1159 validation checks listed in Section 9.2.4 fail. 1161 9.3. Adjacency Route 1163 The Adjacency Route programming is an optional part of the BGP 1164 Neighbor Discovery mechanism for setting up reachability for the 1165 neighbor's prefixes signaled via the Local Prefix TLV corresponding 1166 to adjacencies in Accepted state. 1168 Adjacency Routes establish reachability between local prefixes on 1169 directly connected BGP routers. They enable reachability between the 1170 Peering Addresses (generally loopbacks) of the two neighbors so that 1171 the BGP TCP session may come up between them. Then, for the BGP 1172 routes learnt over the TCP session, where the next-hop is the 1173 neighbor, they also provide the BGP NH resolution. 1175 Unlike other BGP routes, these are not recursive routes as in they 1176 point to the neighbor's interface and IP address. These routes that 1177 are setup as part of the neighbor discovery procedure are hence 1178 different from the regular IBGP and EBGP routes. These routes also 1179 MUST have a better administrative distance as compared to the IBGP 1180 and EBGP routes to ensure that they do not get displaced from the 1181 forwarding by BGP routes learnt over the very session(s) established 1182 using these peering routes. 1184 The Adjacency Routes SHOULD NOT be stored in any of BGP RIBs 1185 [RFC4271] since they are not computed based on the BGP decision 1186 process. It is RECOMMENDED that these routes be managed in a 1187 separate routing table within the BGP Neighbor Discovery function to 1188 ensure that none of the processing and validation for BGP RIB affects 1189 them and in turn they do not influence the BGP decision process and 1190 route calculation. 1192 When there are multiple interconnecting links between two BGP 1193 neighbors, a single BGP TCP session may be setup between them over 1194 which routes are then exchanged. However, in the forwarding, the 1195 Adjacency route will have multiple paths - one for each of these 1196 interconnecting links. So the BGP routes learnt over the session 1197 actually end up getting resolved over this Adjacency route and in 1198 turn gets the ECMP load balancing even with a single BGP session. 1200 10. Interactions with Base BGP Protocol 1202 The BGP Finite State Machine (FSM) as specified in [RFC4271] is 1203 unchanged and the BGP TCP session establishment, route updates and 1204 processing continues to follow the BGP protocol specifications. 1206 BGP peering addresses along with their respective ASNs have 1207 traditionally been explicitly provisioned on both BGP neighbors. The 1208 difference that neighbor discovery mechanism brings about is in 1209 elimination of this configuration as these parameters are learnt via 1210 the neighbor discovery procedure. Once BGP router learns its 1211 neighbor's peering address and ASN, then its initializes the BGP Peer 1212 FSM for this neighbor in the Idle State - just as if this neighbor 1213 was configured. From thereon, the BGP Peer FSM actions follows. 1215 The BGP Keepalives and Hold Timer for the session over TCP apply 1216 unchanged and they govern the operations of the BGP TCP session. 1217 While the BGP Keepalive works at the TCP session level, the BGP 1218 Adjacency Hold Timer monitors one or more underlying interconnecting 1219 link adjacencies between the neighbors. The reachability for the BGP 1220 TCP session may also be over the some BGP routes learnt via routing 1221 updates over the sessions setup via neighbor discovery. It is likely 1222 that even after all the underlying interconnecting link adjacencies 1223 between two neighbors are down that the neighbor's peering address is 1224 reachable via BGP routing over some other path in the network. In 1225 order to avoid this, it is RECOMMENDED that the BGP TCP sessions 1226 setup via neighbor discovery mechanism use TTL set to 1 to ensure 1227 they are setup only over directly attached links to the neighbors. 1229 Since the BGP TCP session setup via neighbor discovery was meant for 1230 hop-by-hop routing, it would be necessary to bring down the session 1231 even while its BGP Hold Timer has not expired for faster convergence. 1233 Therefore, when all the underlying link adjacencies between two BGP 1234 neighbors move out of the Accepted state (or go down), then the BGP 1235 TCP peering session that was setup using BGP Neighbor Discovery 1236 mechanism between these two neighbors is also deleted as if it was 1237 un-configured. 1239 Since the BGP neighbor discovery mechanism runs over a UDP socket, it 1240 is isolated from the core BGP protocol working which is TCP based. 1241 Implementations SHOULD ensure that the hello processing does not 1242 affect the base BGP operations and scalability. One option may be to 1243 run the BGP neighbor discovery mechanism in a separate thread from 1244 the rest of BGP processing. These implementation details, however, 1245 are outside the scope of this document. 1247 It is not generally expected that BGP sessions are explicitly 1248 provisioned along with the neighbor discovery mechanism. However, in 1249 such an event, the neighbor discovery mechanism MUST NOT affect or 1250 result in any changes to provisioned BGP neighbors and their 1251 operations. Specifically, BGP peering to auto-discovered neighbors 1252 MUST NOT be instantiated using the procedures described in this 1253 document when the same BGP neighbor is already provisioned. The 1254 configured BGP neighbor parameters take precedence and the auto- 1255 discovered values and parameters are not used for such configured BGP 1256 sessions. 1258 11. Security Considerations 1260 BGP routers accept TCP connection attempts to port 179 only from the 1261 provisioned BGP neighbors or, in some implementations, those from 1262 within a configured address range. With the BGP neighbor auto- 1263 discovery mechanism, it is now possible for BGP to automatically 1264 learn neighbors and initiate/receive TCP connections from them. This 1265 introduces the need for specific considerations to be taken care of 1266 to ensure security of the BGP protocol operations. 1268 This document introduces UDP messages in BGP for the neighbor 1269 discovery mechanism using the BGP Hello messages. For security 1270 purposes, implementations MUST exchange the Hello messages only on 1271 interfaces specifically enabled for neighbor discovery. Hello 1272 messages MUST NOT be accepted on other than the 224.0.0.2 or FF02::2 1273 addresses. Optionally, implementations MAY set TTL to 255 when 1274 originating the Hello messages and receivers check specifically for 1275 the TLV to be 254 and discard the packet when this is not the case. 1276 This ensures that the Hello packets signaling happens between 1277 directly connected BGP routers only. 1279 The BGP neighbor discovery mechanism is expected to be run typically 1280 in DCs and between physically connected routers that are trustworthy. 1282 The Cryptographic Authentication TLV (as described in Section 8.6) 1283 SHOULD be used in deployments where this assumption of 1284 trustworthiness is not valid. This mechanism is similar to one 1285 defined for LDP Hello messages that are also UDP based as specified 1286 in [RFC7349]. An updated future version of this document will 1287 describe similar procedures for BGP hello in more details. 1289 Once the BGP hello messages and the neighbor discovery mechanism is 1290 secured, then the security considerations for BGP protocol operations 1291 apply for the auto-discovered neighbor sessions. 1293 12. Manageability Considerations 1295 This section is structured as recommended in [RFC5706]. 1297 12.1. Operational Considerations 1299 The BGP neighbor discovery mechanism introduced by this document is 1300 not applicable to general BGP deployments as discussed in Section 3. 1301 The mechanism is specifically meant for networks where BGP is used as 1302 a hop-by-hop routing protocol E.g. as described in [RFC7938]. The 1303 neighbor discovery mechanism hence SHOULD NOT be enabled by default 1304 in BGP. 1306 Implementations SHOULD provide configuration methods that allow 1307 enablement of BGP neighbor discovery on specific local interfaces. 1308 In a DC network, it is expected that the operator selects the 1309 appropriate links on which to enable this e.g. on a Tier 2 node it is 1310 enabled on all links towards the Tier 1 and Tier 3 nodes while on a 1311 Tier 1 node, it may be only enabled on the links towards the Tier 2 1312 node. The details of this enablement are outside the scope of this 1313 document since it varies based on the DC design and may be 1314 implementation specific. 1316 Implementations SHOULD provide configuration methods that enable the 1317 setup of BGP neighbor templates that enables operator to setup BGP 1318 neighbor discovery parameters on the BGP router. Some of the aspects 1319 to be considered in such a template are: 1321 o Local address to be used for the BGP TCP session peering along 1322 with the local ASN and the AFI/SAFI enabled for the auto- 1323 discovered sessions 1325 o BGP policies to be enabled for the auto-discovered sessions 1327 o Optionally specify the list of ASNs with which auto-discovered 1328 sessions should be brought up. This is to ensure that when links 1329 between different Tier nodes are not used by BGP when they get 1330 connected wrongly due to accidents (e.g. say a Tier 3 node is 1331 connected to a Tier 1 node). 1333 o Authentication methods that are need to be enabled in an 1334 environment which is not secure 1336 o Local interfaces over which the specific template needs to be 1337 applied for BGP neighbor discovery 1339 o Other parameters like the Adjacency Hold Timer value to be used or 1340 other optional features 1342 This mechanism does not impose any restrictions on the way ASNs or 1343 addresses are assigned to the nodes. Various automatic provisioning, 1344 auto-configuration or zero-touch-provisioning mechanisms may be used. 1346 Implementations SHOULD report the state of the BGP operations over 1347 each link enabled for neighbor discovery including the status of all 1348 adjacencies learnt over it. Implementations SHOULD also report the 1349 operations of the auto-discovered BGP TCP peering sessions similar to 1350 the provisioned BGP neighbors. 1352 Implementations SHOULD support logging of events like discovery of an 1353 adjacency using neighbor discovery including peering route updates 1354 and events like triggering of BGP TCP session establishment for them. 1355 Errors and alarms related to loss of adjacencies and tear down of BGP 1356 TCP peering sessions SHOULD also be generated so they could be 1357 monitored. 1359 12.2. Management Considerations 1361 This document introduces UDP based messaging in BGP protocol and 1362 therefore the necessary fault management mechanisms are required to 1363 be implemented for the same. Implementations MUST discard 1364 unsupported message types or version types other than 4 received over 1365 a UDP session. Such messages MUST NOT affect the neighbor discovery 1366 mechanism in operation using the Hello messages. Unknown TLVs 1367 received via the Hello messages MUST be ignored and the rest of the 1368 Hello message MUST be processed. Implementations SHOULD discard 1369 Hello messages with malformed TLVs and this should be logged as an 1370 error. 1372 13. IANA Considerations 1374 This documents requests IANA for updates to the BGP Parameters 1375 registry as described in this section. 1377 13.1. BGP Hello Message 1379 This document requests IANA to allocate a new UDP port (179 is the 1380 preferred number ) and a BGP message type code for BGP Hello message. 1382 Value TLV Name Reference 1383 ----- ------------------------------------ ------------- 1384 Service Name: BGP-HELLO 1385 Transport Protocol(s): UDP 1386 Assignee: IESG 1387 Contact: IETF Chair . 1388 Description: BGP Hello Message. 1389 Reference: This document -- draft-xu-idr-neighbor-autodiscovery. 1390 Port Number: 179 (preferred value) -- To be assigned by IANA. 1392 13.2. TLVs of BGP Hello Message 1394 This document requests IANA to create a new registry "TLVs of BGP 1395 Hello Message" with the following registration procedure: 1397 Registry Name: TLVs of BGP Hello Message. 1399 Value TLV Name Reference 1400 ------- ---------------------------------- ------------- 1401 0 Reserved This document 1402 1 Accepted ASN List This document 1403 2 Peering Address This document 1404 3 Local Prefix This document 1405 4 Link Attributes This document 1406 5 Neighbor This document 1407 6 Cryptographic Authentication This document 1408 7-65500 Unassigned 1409 65501-65534 Experimental This document 1410 65535 Reserved This document 1412 14. Acknowledgements 1414 The authors would like to thank Enke Chen, Krishna Swamy and Ramesh 1415 Yakkala for their valuable comments and suggestions on this document. 1417 15. Contributors 1418 Satya Mohanty 1419 Cisco 1420 Email: satyamoh@cisco.com 1422 Shunwan Zhuang 1423 Huawei 1424 Email: zhuangshunwan@huawei.com 1426 Chao Huang 1427 Alibaba Inc 1428 Email: jingtan.hc@alibaba-inc.com 1430 Guixin Bao 1431 Alibaba Inc 1432 Email: guixin.bgx@alibaba-inc.com 1434 Jinghui Liu 1435 Ruijie Networks 1436 Email: liujh@ruijie.com.cn 1438 Zhichun Jiang 1439 Tencent 1440 Email: zcjiang@tencent.com 1442 Shaowen Ma 1443 Mellanox 1444 mashaowen@gmail.com 1446 16. References 1448 16.1. Normative References 1450 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1451 Requirement Levels", BCP 14, RFC 2119, 1452 DOI 10.17487/RFC2119, March 1997, 1453 . 1455 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1456 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1457 DOI 10.17487/RFC4271, January 2006, 1458 . 1460 [RFC5036] Andersson, L., Ed., Minei, I., Ed., and B. Thomas, Ed., 1461 "LDP Specification", RFC 5036, DOI 10.17487/RFC5036, 1462 October 2007, . 1464 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1465 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1466 May 2017, . 1468 16.2. Informative References 1470 [FIPS-180-4] 1471 Technology, N. I. O. S. A., "Secure Hash Standard (SHS), 1472 FIPS PUB 180-4", March 2012. 1474 [I-D.ietf-lsvr-bgp-spf] 1475 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, 1476 "Shortest Path Routing Extensions for BGP Protocol", 1477 draft-ietf-lsvr-bgp-spf-06 (work in progress), September 1478 2019. 1480 [I-D.ketant-idr-bgp-ls-bgp-only-fabric] 1481 Talaulikar, K., Filsfils, C., ananthamurthy, k., Zandi, 1482 S., Dawra, G., and M. Durrani, "BGP Link-State Extensions 1483 for BGP-only Fabric", draft-ketant-idr-bgp-ls-bgp-only- 1484 fabric-03 (work in progress), September 2019. 1486 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 1487 Hashing for Message Authentication", RFC 2104, 1488 DOI 10.17487/RFC2104, February 1997, 1489 . 1491 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1492 Management of New Protocols and Protocol Extensions", 1493 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1494 . 1496 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1497 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1498 . 1500 [RFC7349] Zheng, L., Chen, M., and M. Bhatia, "LDP Hello 1501 Cryptographic Authentication", RFC 7349, 1502 DOI 10.17487/RFC7349, August 2014, 1503 . 1505 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1506 S. Ray, "North-Bound Distribution of Link-State and 1507 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1508 DOI 10.17487/RFC7752, March 2016, 1509 . 1511 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 1512 BGP for Routing in Large-Scale Data Centers", RFC 7938, 1513 DOI 10.17487/RFC7938, August 2016, 1514 . 1516 Authors' Addresses 1518 Xiaohu Xu 1519 Alibaba Inc 1520 China 1522 Email: xiaohu.xxh@alibaba-inc.com 1524 Ketan Talaulikar 1525 Cisco Systems 1526 India 1528 Email: ketant@cisco.com 1530 Kunyang Bi 1531 Huawei 1532 China 1534 Email: bikunyang@huawei.com 1536 Jeff Tantsura 1537 Apstra 1538 USA 1540 Email: jefftant.ietf@gmail.com 1542 Nikos Triantafillis 1543 Amazon Web Services 1544 USA 1546 Email: ntriantafillis@gmail.com