idnits 2.17.1 draft-xu-idr-neighbor-autodiscovery-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 16, 2019) is 1837 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5082' is defined on line 1458, but no explicit reference was found in the text == Unused Reference: 'RFC4202' is defined on line 1486, but no explicit reference was found in the text == Outdated reference: A later version (-29) exists of draft-ietf-lsvr-bgp-spf-04 == Outdated reference: A later version (-05) exists of draft-ketant-idr-bgp-ls-bgp-only-fabric-02 -- Obsolete informational reference (is this intentional?): RFC 7752 (Obsoleted by RFC 9552) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group X. Xu 3 Internet-Draft Alibaba Inc 4 Intended status: Standards Track K. Talaulikar 5 Expires: October 18, 2019 Cisco Systems 6 K. Bi 7 Huawei 8 J. Tantsura 9 N. Triantafillis 10 Apstra 11 April 16, 2019 13 BGP Neighbor Discovery 14 draft-xu-idr-neighbor-autodiscovery-11 16 Abstract 18 BGP is being used as the underlay routing protocol in some large- 19 scaled data centers (DCs). Most popular design followed is to do 20 hop-by-hop external BGP (EBGP) session configurations between 21 neighboring routers on a per link basis. The provisioning of BGP 22 neighbors in routers across such a DC brings its own operational 23 complexity. 25 This document introduces a BGP neighbor discovery mechanism that 26 greatly simplifies BGP operations in such DC and other networks by 27 automatic setup of BGP sessions between neighbor routers using this 28 mechanism. 30 Requirements Language 32 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 33 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 34 document are to be interpreted as described in RFC 2119 [RFC2119]. 36 Status of This Memo 38 This Internet-Draft is submitted in full conformance with the 39 provisions of BCP 78 and BCP 79. 41 Internet-Drafts are working documents of the Internet Engineering 42 Task Force (IETF). Note that other groups may also distribute 43 working documents as Internet-Drafts. The list of current Internet- 44 Drafts is at https://datatracker.ietf.org/drafts/current/. 46 Internet-Drafts are draft documents valid for a maximum of six months 47 and may be updated, replaced, or obsoleted by other documents at any 48 time. It is inappropriate to use Internet-Drafts as reference 49 material or to cite them other than as "work in progress." 51 This Internet-Draft will expire on October 18, 2019. 53 Copyright Notice 55 Copyright (c) 2019 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (https://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with respect 63 to this document. Code Components extracted from this document must 64 include Simplified BSD License text as described in Section 4.e of 65 the Trust Legal Provisions and are provided without warranty as 66 described in the Simplified BSD License. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 3. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 4 73 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 74 5. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 75 6. UDP Message Header . . . . . . . . . . . . . . . . . . . . . 7 76 7. Hello Message Format . . . . . . . . . . . . . . . . . . . . 8 77 8. Hello Message TLVs . . . . . . . . . . . . . . . . . . . . . 10 78 8.1. Accepted ASN List TLV . . . . . . . . . . . . . . . . . . 10 79 8.2. Peering Address TLV . . . . . . . . . . . . . . . . . . . 11 80 8.3. Local Prefix TLV . . . . . . . . . . . . . . . . . . . . 13 81 8.4. Link Attributes TLV . . . . . . . . . . . . . . . . . . . 14 82 8.5. Neighbor TLV . . . . . . . . . . . . . . . . . . . . . . 16 83 8.6. Cryptographic Authentication TLV . . . . . . . . . . . . 18 84 9. Neighbor Discovery Procedure . . . . . . . . . . . . . . . . 20 85 9.1. Interface Procedures . . . . . . . . . . . . . . . . . . 20 86 9.2. Adjacency State Machine . . . . . . . . . . . . . . . . . 21 87 9.2.1. Down State . . . . . . . . . . . . . . . . . . . . . 21 88 9.2.2. Initial State . . . . . . . . . . . . . . . . . . . . 22 89 9.2.3. 1-Way State . . . . . . . . . . . . . . . . . . . . . 22 90 9.2.4. 2-Way State . . . . . . . . . . . . . . . . . . . . . 22 91 9.2.5. Adj-Reject State . . . . . . . . . . . . . . . . . . 23 92 9.2.6. Adj-OK State . . . . . . . . . . . . . . . . . . . . 24 93 9.2.7. Accepted State . . . . . . . . . . . . . . . . . . . 24 94 9.3. Adjacency Route . . . . . . . . . . . . . . . . . . . . . 25 95 10. Interactions with Base BGP Protocol . . . . . . . . . . . . . 26 96 11. Security Considerations . . . . . . . . . . . . . . . . . . . 27 97 12. Manageability Considerations . . . . . . . . . . . . . . . . 28 98 12.1. Operational Considerations . . . . . . . . . . . . . . . 28 99 12.2. Management Considerations . . . . . . . . . . . . . . . 29 100 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 101 13.1. BGP Hello Message . . . . . . . . . . . . . . . . . . . 29 102 13.2. TLVs of BGP Hello Message . . . . . . . . . . . . . . . 30 103 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 30 104 15. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 30 105 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 31 106 16.1. Normative References . . . . . . . . . . . . . . . . . . 31 107 16.2. Informative References . . . . . . . . . . . . . . . . . 32 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 110 1. Introduction 112 BGP is being used as the underlay routing protocol instead of link- 113 state routing protocols like IS-IS and OSPF in some large-scale data 114 centers (DCs). [RFC7938] describes the design, configuration and 115 operational aspects of using BGP in such networks. The most popular 116 design scheme involves the setup of external BGP (EBGP) sessions over 117 individual links between directly connected routers using their 118 interface addresses. Such BGP neighbor provisioning requires 119 configuration of the neighbor IP address and Autonomous System (AS) 120 Number (ASN) for BGP neighbor on each and every link of every BGP 121 router. As a DC fabric comprising of topology described in [RFC7938] 122 grows with addition of new leafs, spines, and links between them, the 123 BGP provisioning needs to be carefully updated. Unlike with the 124 link-state protocols, in the case of BGP, there is no automatic 125 discovery of neighbors and route exchange between them by simply 126 adding links and nodes of the fabric into the routing protocol 127 operation. 129 In some DC designs with BGP, multiple links are added between a leaf 130 and spine to add additional bandwidth. Use of link-aggregation at 131 Layer 2 level may not be always desirable in such cases due to the 132 risk of flow polarization on account of a mix of ECMP at Layer 2 and 133 Layer 3 levels. In such cases, one option is for EBGP sessions to be 134 setup between two BGP neighbors over each of the links between them. 135 In such a case, the BGP session scale and the resultant increase in 136 update processing may pose scalability challenges. A second option 137 is for a single EBGP session to be setup between the loopback IP 138 addresses between the neighbor and then configure some static routes 139 for loopback reachability over the underlying links. This option 140 introduces an additional provisioning task for the static routes. 142 Furthermore, there is also a need for BGP to be able to describe its 143 links and its neighbors on its directly connected links and export 144 this information via BGP-LS [RFC7752] to provide a detail link-level 145 topology view of a data center running BGP. The ability of BGP in 146 discovering its neighbors over its links, monitoring their liveliness 147 and learning the link attributes (such as addresses) is required for 148 the conveying the link-state topology in such a BGP network. This 149 information can be leveraged by the BGP-SPF proposal 150 [I-D.ietf-lsvr-bgp-spf] which introduces link-state routing 151 capabilities in BGP. This information can also be leveraged to 152 convey the link-state topology in a network running traditional BGP 153 routing using BGP-LS as described in 154 [I-D.ketant-idr-bgp-ls-bgp-only-fabric] and to enabled end to end 155 traffic engineering use-cases spanning across DCs and the core/access 156 networks. 158 2. Terminology 160 This document makes use of the terms defined in [RFC4271] and 161 [RFC7938] . 163 3. Applicability 165 The applicability of the BGP Neighbor Discovery mechanism described 166 in this document is limited to deployments where BGP is used as 167 routing protocol between directly connected routers and when there is 168 a requirement for automatic setup of BGP peering between them. 170 o In DC networks where BGP is used as a hop-by-hop routing protocol 171 [RFC7938]. 173 o In metro networks where access aggregation topologies are 174 architected as a CLOS topology (or similar other networks) and BGP 175 is used as a hop-by-hop routing protocol. 177 While this document uses EBGP examples, the mechanism is equally 178 applicable in designs that use IBGP similarly for hop-by-hop routing. 180 The applicability of the BGP Neighbor Discovery mechanism to any 181 other BGP protocol deployment is outside the scope of this document. 183 4. Requirements 185 This section describe the requirements for the BGP hop-by-hop routing 186 deployments that were considered for the definition of the BGP 187 Neighbor Discovery extensions proposed in this document.. 189 Following are the key requirements related for the BGP neighbor 190 discovery process: 192 1. It should perform discovery of directly connected BGP routers. 193 Mechanism should support either IPv4 or IPv6 or a dual stack 194 design and it should be generic for any link-layer. 196 2. It should include exchange of BGP peering addresses (IPv4 or IPv6 197 or both) that routers can use to automatically setup BGP TCP 198 peering between themselves. The mechanism should leverage the 199 existing capability negotiation process performed as part of the 200 BGP TCP session establishment. 202 3. When BGP peering is desired to be performed over loopback 203 addresses of the routers, then the mechanism should automatically 204 setup reachability to the loopback over one or more underlying 205 directly connected links between them. In this scenario, the 206 mechanism should also provide resolution for the BGP next-hop 207 address (i.e. the loopback address) for the BGP routes exchanged 208 over these sessions between the loopback addresses. 210 4. Mechanism should enable exchange of link-level information such 211 as IP addresses and link attributes between the directly 212 connected BGP routers. It should be extensible to include other 213 information in the future. 215 5. Mechanism should be limited to link scope for security and use 216 link-local addressing only. Cryptographic mechanisms should be 217 also provided for additional security. 219 6. Mechanism should support capabilities for performing optional 220 validation of parameters to detect misconfiguration (e.g. link 221 address subnet mismatch, peering between incorrect AS, etc.) in 222 an extensible manner before going on to use the link and the 223 setup of the BGP TCP peering session over it. 225 7. The mechanism should not affect or change the BGP TCP session 226 establishment procedures and the BGP routing exchange over the 227 TCP session other than the interactions for triggering the setup/ 228 removal of peer session that is based on discovery mechanism. 230 8. The mechanism should leverage existing fast-detection techniques 231 for failures that are used currently for EBGP sessions over 232 directly connected links like fast-external-failover and BFD. 234 9. The mechanism should focus on the discovery process and exchange 235 of status as a control plane procedure and be sufficiently 236 loosely coupled with the base BGP operations to enable 237 implementations to ensure scalability of BGP operations when 238 using the discovery procedures. 240 5. Overview 242 At a high level, this specification introduces the use of UDP based 243 BGP Hello messages to be exchanged between directly connected BGP 244 routers for neighbor discovery. 246 1. Information is exchanged between BGP routers on a per link basis 247 leading to discovery of each others peering address and other 248 information. 250 2. The TCP session establishment for the BGP protocol operation and 251 the BGP routing exchange over these sessions can then follow 252 without any change/modification from the existing BGP protocol 253 operations as specified in [RFC4271]. 255 3. As part of the neighbor information exchange the route to a 256 neighbor's peering address is also automatically setup pointing 257 over the links over which the neighbor is discovered. 259 4. This route is used for both the BGP TCP session establishment as 260 well as for resolution of the BGP next-hop (NH) for the routes 261 learnt via the neighbor instead of an underlying IGP or static 262 route. 264 This document prefers the use of an extension to BGP protocol since 265 the deployments and use-cases targeted (i.e. large-scale DCs) are 266 already running BGP as their routing protocol. Extending BGP with 267 neighbor discovery capabilities is operationally and implementation 268 wise a simpler approach than requiring a new or an additional 269 protocol to be first extended to do this functionality (to exchange 270 BGP-specific parameters) and then also integrated its operations with 271 BGP protocol operations. 273 The BGP Neighbor discovery mechanism is a control plane mechanism 274 intended to discovery and maintain the BGP router's adjacencies with 275 its neighbors over directly connected links. Maintaining an 276 adjacency also involves detecting any changes in parameters using 277 periodic messages and triggering corresponding actions based on the 278 change. Such actions also include removal of the BGP TCP peering for 279 an auto discovered peering session based on the neighbor discovery. 280 However, the mechanism is not intended for a fast liveness detection 281 of neighbor and existing mechanisms for this purpose such as BFD 282 [RFC5880] may be leveraged. 284 The BGP Neighbor discovery mechanism is scoped to a link and works 285 using link-local addressing. In a BGP DC network that is using IPv6 286 in the fabric underlay, it is possible that no IPv6 global addresses 287 are assigned to the interfaces between the nodes and the IPv6 Global 288 address(es) are assigned only to the loopback interfaces of these 289 nodes. The Neighbor discovery mechanism enables the setup of BGP 290 peering using the IPv6 Global addresses on the loopback interfaces 291 and hop by hop routing with just IPv6 link-local addresses on the 292 interfaces. Such a design eases introduction of nodes in the fabric 293 and links between them from a provisioning aspect. In a deployment 294 with IPv4 addressing, IP unnumbered could be similarly used for all 295 the links between the nodes using the IPv4 address assigned to the 296 loopback interfaces on those nodes. 298 The BGP neighbor discovery mechanism defined in this document borrows 299 ideas from the Label Distribution Protocol (LDP) [RFC5036]. However, 300 most importantly, only the concept of link-local signaling based 301 neighbor discovery is borrowed while the discovery aspect for 302 targeted LDP sessions does not apply to this BGP neighbor discovery 303 mechanism. 305 The further sections in this document first describe the newly 306 introduced message formats and TLVs and then go on to describe the 307 procedures of BGP neighbor discovery and its integration with the 308 base BGP protocol mechanism as specified in [RFC4271]. 310 The operational and management aspects of the BGP neighbor discovery 311 mechanism are described in Section 12. 313 6. UDP Message Header 315 The BGP neighbor discovery mechanism will operate using UDP messages. 316 The UDP port of TBD (179 is the preferred port number to be assigned 317 as specified in Section 13) is used which is same as the TCP port 179 318 used by BGP. The BGP UDP message common header format is specified 319 as follows: 321 0 1 2 3 322 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 324 | Version | Type | Message Length | 325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 326 | AS number | 327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 328 | BGP Identifier | 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 331 Figure 1: BGP UDP Message Header 333 Version: This 1-octet unsigned integer indicates the protocol 334 version number of the message. The current BGP version number is 335 4. 337 Type: The type of BGP message 339 Message Length: This 2-octet unsigned integer specifies the length 340 in octets of the entire BGP UDP message including the header. 342 AS number: AS Number of the UDP message sender. 344 BGP Identifier: BGP Identifier of the UDP message sender. 346 BGP UDP messages can be sent using either IPv4 or IPv6 depending on 347 the address used for session establishment and provisioned on the 348 interfaces over which these messages are sent. 350 7. Hello Message Format 352 A BGP router uses UDP based Hello messages to discover directly 353 connected BGP neighbors over those interfaces enabled for Neighbor 354 Discovery. The BGP Hello messages for the Neighbor Discovery 355 procedure are used for link-locally signaling and hence MUST be 356 addressed to the "all routers on this subnet" group multicast address 357 (i.e., 224.0.0.2 in the IPv4 case and FF02::2 in the IPv6 case) and 358 the TTL for the IP packets SHOULD be set to 1. The IP source address 359 MUST be set to the address of the interface over which the message is 360 sent out which would be the primary interface address or unnumbered 361 address in the IPv4 case and the IPv6 link-local address on the 362 interface in the IPv6 case. 364 The Hello message format is as follows: 366 0 1 2 3 367 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | Version | Type | Message Length | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 | AS number | 372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 | BGP Identifier | 374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 375 | Adjacency Hold Time | Flags | Reserved | 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 377 | TLVs | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 380 Figure 2: BGP Hello Message 382 Version: This 1-octet unsigned integer indicates the protocol 383 version number of the message. The current BGP version number is 384 4. 386 Type: The type of BGP message (Hello - TBD value from BGP Message 387 Types Registry) 389 Message Length: This 2-octet unsigned integer specifies the length 390 in octets of the TLVs field. 392 AS number: AS Number of the BGP router sending the Hello message. 394 BGP Identifier: BGP Identifier of the BGP router sending the Hello 395 message. 397 Adjacency Hold Time: Hello adjacency hold timer in seconds. 398 Adjacency Hold Time specifies the time, for which the receiving 399 BGP neighbor router SHOULD maintain adjacency state for it, 400 without receipt of another Hello. A value of 0 means that the 401 receiving BGP peer should immediately mark that the adjacency to 402 the sender is going down. 404 Flags : Current defined bits are as follows. All other bits 405 SHOULD be cleared by sender and MUST be ignored by receiver. 407 0 1 2 3 4 5 6 7 408 +-+-+-+-+-+-+-+-+ 409 |S| | 410 +-+-+-+-+-+-+-+-+ 412 where: 414 S bit - indicates that this is a State Change Hello message 415 when SET and normal periodic Hello message when CLEAR 417 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 418 receiver. 420 TLVs: This field contains one or more TLVs as described below. 422 BGP HELLO messages can be sent using either IPv4 or IPv6 addresses 423 depending on the addressing used for session establishment and 424 provisioned on the interfaces over which these messages are sent. 425 When both IPv4 and IPv6 is enabled on the interface, then IPv6 426 address SHOULD be used. Implementations MAY provide an option to 427 override the choice of address family to be used. The choice of 428 address family to be used MUST be consistent on all BGP routers on a 429 given link for neighbor discovery. 431 Based on the setting of the S flag, there are two variants of the 432 Hello message: 434 1. State Change Hello Message : these Hello messages include TLVs 435 which convey the state and parameters of the local interface and 436 adjacency to other routers on the link. They are generated only 437 when there is a change in state of the adjacency or some 438 parameter at the interface level. 440 2. Periodic Hello Message : these are the normal periodic Hello 441 messages which do not include TLVs and are used to maintain the 442 adjacency on the link during steady state conditions. 444 These Hello message variants are intended to limit the exchange of 445 information and state via TLVs to only those periods where necessary 446 while using lightweight Hello messages during steady state. This 447 simplifies the Hello message processing and improves scalability of 448 the discovery mechanism. 450 The neighbor discovery procedure using the Hello message is described 451 in Section 9 and its relation with the BGP Keepalives and Hold Timer 452 for the TCP session is described in Section 10. 454 8. Hello Message TLVs 456 The BGP Hello message carries TLVs as described in this section that 457 enable exchange of information on a per interface basis between 458 directly connected BGP neighbors. These messages enable the neighbor 459 discovery process. 461 8.1. Accepted ASN List TLV 463 The Accepted ASN List TLV is an optional TLV that is used to signal 464 an unordered list of AS numbers from which the BGP router would 465 accept BGP sessions. When not signaled, it indicates that the router 466 will accept BGP peering from any ASN from its neighbors. Indicating 467 the list of ASNs, helps avoid the neighbor discovery process getting 468 stuck in a 1-way state where one side keeps attempting to setup 469 adjacency while the other does not accept it due to incorrect ASN. 471 The operational and management aspects of this ASN based policy 472 control for BGP neighbor discovery are described further in 473 Section 12. 475 This TLV SHOULD NOT be included in a Hello message with the S bit 476 CLEAR. More than a single instance of this TLV MUST NOT be included 477 in a Hello message. If a router receives multiple instances of this 478 TLV then it should only consider the first instance in the sequence 479 and ignore the rest. 481 The format of this TLV is shown below 482 0 1 2 3 483 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 485 | Type | Length | 486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 487 | Accepted ASN List(variable) | 488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 490 Figure 3: Accepted ASN List TLV 492 Type: TBD1 494 Length: Specifies the length of the Value field in octets (in 495 multiple of 4) 497 Accepted ASN-List: This variable-length field contains one or more 498 accepted 4-octet ASNs. 500 8.2. Peering Address TLV 502 The Peering Address TLV is used to indicate to the neighbor the 503 address to be used for setting up the BGP TCP session. Along with 504 the peering address, the router can specify its supported AFI/ 505 SAFI(s). When the AFI/SAFI values are specified as 0/0, then it 506 indicates that the neighbor can attempt for negotiation of any AFI/ 507 SAFIs. The indication of AFI/SAFI(s) in the Peering Address TLV is 508 not intended as an alternative for the MP capabilities negotiation 509 mechanism done as part of the BGP TCP session establishment. 511 Multiple instances of this TLV MAY be included in the Hello message, 512 one for each peering address (e.g. IPv4 and IPv6 or multiple IPv4 513 addresses for different AFI/SAFI sessions). When multiple peering 514 addresses are provisioned, then the indication helps the router 515 select the appropriate peer address of the neighbor based on its 516 local peering address profile by matching the supported AFI/SAFIs. 518 This TLV is essential for the setting up of the TCP peering between 519 BGP neighbors using the neighbor discovery mechanism. When a BGP 520 router stops including a Peer Address in its State Change Hello 521 messages, then it is no longer accepting TCP peering sessions to that 522 address and the neighbor SHOULD clean up any peering session that was 523 setup to that address via the discovery mechanism. 525 Implementations SHOULD support the signaling of an interface IP 526 address in the Peering Address TLV and perform the BGP TCP session 527 establishment using interface addresses (i.e. the neighbor discovery 528 mechanism is not limited to the use of loopback addresses for the 529 peering session establishment). Implementations MAY support the 530 signaling of IPv6 Link Local addresses using the Peering Address TLV 531 and using the same for the BGP TCP session setup. 533 This TLV SHOULD NOT be included in a Hello message with the S bit 534 CLEAR. 536 The Peering Address TLV format is shown below. 538 0 1 2 3 539 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | Type | Length | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | Flags | No. AFI/SAFI | Reserved | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | Address (4-octet or 16-octet) | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | AFI | SAFI | ... 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | sub-TLVs ... 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 556 Figure 4: Peering Address TLV 558 Type: TBD2 560 Length: Specifies the length of the Value field in octets. 562 Flags : Current defined bits are as follows. All other bits 563 SHOULD be cleared by sender and MUST be ignored by receiver. 565 0 1 2 3 4 5 6 7 566 +-+-+-+-+-+-+-+-+ 567 |A| | 568 +-+-+-+-+-+-+-+-+ 570 where: 572 A bit - address is IPv6 when SET and IPv4 when CLEAR 574 Number of AFI/SAFI: indicates the number of AFI/SAFI pairs that 575 the router supports on the given peering address. 577 Reserved: sender SHOULD set to 0 and receiver MUST ignore. 579 Address: This 4 or 16 octet field indicates the IPv4 or IPv6 580 address which is used for establishing BGP sessions. 582 AFI/SAFI : one or more pairs of these values that indicate the 583 supported capabilities on the peering address. 585 Sub-TLVs : optional and currently none defined 587 8.3. Local Prefix TLV 589 BGP neighbor discovery mechanism, in certain scenarios, requires a 590 BGP router to program a route in its local routing table for a prefix 591 belonging to its neighbor router. On such scenario is when the BGP 592 TCP peering is to be setup between the loopback addresses on the 593 neighboring routers. This requires that the routers have 594 reachability to their each other's loopback addresses before the TCP 595 session can be brought up. 597 The Local Prefix TLV is an optional TLV which enables a BGP router to 598 explicitly signal its local prefix to its neighbor for setting up of 599 such a local routing entry pointing over the underlying link over 600 which it is being signaled. This enables the BGP router to have 601 control over the specific links over which its neighbor that may 602 reach it for the specific local prefix. The details of the procedure 603 for programming of the route corresponding to the prefix signaled 604 using the Local Prefix TLV is described in Section 9.3.. 606 Multiple instances of the Local Prefix TLV MAY be included in the 607 Hello message with each carrying a specific prefix in it. This TLV 608 SHOULD NOT be included in a Hello message with the S bit CLEAR. 610 The Local Prefix TLV format is as shown below. 612 0 1 2 3 613 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | Type | Length | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | Flags | Prefix Length | Reserved | 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 | Prefix Address (4-octet or 16-octet) | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 621 | sub-TLVs ... 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 Figure 5: Local Prefix TLV 626 Type: TBD3 627 Length: Specifies the length of the Value field in octets 629 Flags : Current defined bits are as follows. All other bits 630 SHOULD be cleared by sender and MUST be ignored by receiver. 632 0 1 2 3 4 5 6 7 633 +-+-+-+-+-+-+-+-+ 634 |A| | 635 +-+-+-+-+-+-+-+-+ 637 where: 639 A bit - address is IPv6 when SET and IPv4 when CLEAR 641 Prefix Length: specifies the Prefix length 643 Reserved: sender SHOULD set to 0 and receiver MUST ignore. 645 Prefix Address: This 4 or 16 octet field indicates the IPv4 or 646 IPv6 prefix address. 648 Sub-TLVs : optional and currently none defined 650 8.4. Link Attributes TLV 652 The Link Attributes TLV is a mandatory TLV in a State Change Hello 653 message that signals to the neighbor the link attributes of the 654 interface on the local router. One and only one instance of this TLV 655 MUST be included in the State Change Hello message. A State Change 656 Hello message without this TLV included MUST be discarded and an 657 error logged for the same. 659 This TLV enables a BGP router to learn all its neighbors IP addresses 660 on the specific link as well as it's link identifier. When the 661 interface is IPv4 enabled, all the IPv4 addresses configured on it 662 are included in this TLV. IPv4 unnumbered address is not included in 663 this TLV and no IPv4 address would be included for the interface in 664 such cases. When the interface is IPv6 enabled, all the IPv6 global 665 addresses configured on the interface are included in this TLV. IPv6 666 link-local addresses are not included in this TLV. In case of an 667 interface running dual stack, both IPv4 and IPv6 addresses are 668 included in this TLV irrespective of the address family that is used 669 for UDP message exchange. 671 Additional sub-TLVs may be defined in the future to exchange other 672 link attributes between BGP neighbors. This TLV SHOULD NOT be 673 included in a Hello message with the S bit CLEAR. 675 The Link Attributes TLV format is as shown below. 677 0 1 2 3 678 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 680 | Type | Length | 681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 682 | Local Interface ID | Flags | Reserved | 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 | No. of IPv4 Addresses | No. of IPv6 Addresses | 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 688 | IPv4 Interface Address | 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 690 | Prefix Mask | ... 691 +-+-+-+-+-+-+-+-+ 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 | IPv6 Global Interface Address | 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | Prefix Mask | ... 697 +-+-+-+-+-+-+-+-+ 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 | sub-TLVs ... 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 Figure 6: Link Attributes TLV 705 Type: TBD4 707 Length: Specifies the length of the Value field in octets 709 Local Interface ID : the local interface ID of the interface 710 (refer unnumbered link section of [RFC2104] e.g. the MIB-2 711 ifIndex). This helps uniquely identify the link even when there 712 are multiple links between two neighbors using IPv4 unnumbered 713 address or only having IPv6 link-local addresses. 715 Flags : Currently defined bits are as follows. Other bits SHOULD 716 be cleared by sender and MUST be ignored by receiver. 718 0 1 2 3 4 5 6 7 719 +-+-+-+-+-+-+-+-+ 720 |I|V|B| | 721 +-+-+-+-+-+-+-+-+ 723 where: 725 I bit - indicates link is enabled for IPv4 727 V bit - indicates link is enabled for IPv6 729 B bit - indicates support for BFD monitoring [RFC5880] over the 730 link 732 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 733 receiver. 735 No. of IPv4 Addresses : specifies the number of IPv4 addresses on 736 the interface. When value is 0, then it indicates no IPv4 737 Prefixes are present or the interface is IPv4 unnumbered if it is 738 enabled for IPv4 740 No. of IPv6 Addresses : specifies the number of IPv6 global 741 addresses on the interface. When value is 0, then it indicates no 742 IPv6 Global Prefixes are present and the interface is only 743 configured with IPv6 link-local addresses if it is enabled for 744 IPv6. 746 IPv4 Address & Mask: Zero or more pairs of IPv4 address and their 747 mask. 749 IPv6 Address & Mask: Zero or more pairs of IPv6 address and their 750 mask. 752 Sub-TLVs : optional and currently none defined 754 8.5. Neighbor TLV 756 The Neighbor TLV is used by a BGP router to indicate its Hello 757 adjacency state with its neighboring router(s) on the specific link. 758 The neighbor is identified by its AS Number and BGP Identifier. The 759 router MUST include the Neighbor TLV for each of its discovered 760 neighbors on that link irrespective of its status. 762 The usage of the Neighbor TLV is described in detail in Section 9. 763 This TLV SHOULD NOT be included in a Hello message with the S bit 764 CLEAR. 766 The Neighbor TLV format is as shown below. 768 0 1 2 3 769 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 771 | Type | Length | 772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773 | Flags | State | Reserved | 774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 775 | Neighbor AS number | 776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 777 | Neighbor BGP Identifier | 778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 779 | sub-TLVs ... 780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 782 Figure 7: Neighbor TLV 784 Type: TBD5 786 Length: Specifies the length of the Value field in octets 788 Flags : Current defined bits are as follows. All other bits 789 SHOULD be cleared by sender and MUST be ignored by receiver. 791 0 1 2 3 4 5 6 7 792 +-+-+-+-+-+-+-+-+ 793 |B| | 794 +-+-+-+-+-+-+-+-+ 796 where: 798 B bit - When SET with the adjacency state not in Accepted state 799 indicates that the adjacency is not accepted due to BFD down. 801 State : Indicates the state code of the adjacency state machine 802 (refer to Section 9.2 for details) for the neighbor over this 803 link. The following codes are currently defined 805 0 - Down (not to be used as state in this TLV 807 1 - Initial (not to be used as state in this TLV) 809 2 - 1-way 811 3 - 2-way 812 4 - Adj-Reject 814 5 - Adj-OK 816 6 - Accepted 818 Reserved: SHOULD be set to 0 by sender and MUST be ignored by 819 receiver. 821 Neighbor AS number: AS Number of the neighbor BGP router as 822 signaled in its Hello message. 824 Neighbor BGP Identifier: BGP Identifier of the neighbor BGP router 825 as signaled in its Hello message. 827 Sub-TLVs : currently none defined 829 8.6. Cryptographic Authentication TLV 831 The Cryptographic Authentication TLV is an optional TLV that is used 832 as part of an authentication mechanism for BGP Hello message by 833 securing against spoofing attacks. It also introduces a 834 cryptographic sequence number carried in the Hello messages that can 835 be used to protect against replay attacks. Using this Cryptographic 836 Authentication TLV, one or more secret keys (with corresponding 837 Security Association (SA) IDs) are configured on each BGP router. 838 For each BGP Hello message, the key is used to generate and verify an 839 HMAC Hash that is stored in the Cryptographic Authentication TLV. 840 For the cryptographic hash function, this document proposes to use 841 SHA-1, SHA-256, SHA-384, and SHA-512 defined in US NIST Secure Hash 842 Standard (SHS) [FIPS-180-4]. The HMAC authentication mode defined in 843 [RFC2104] is used. Of the above, implementations MUST include 844 support for at least HMAC-SHA-256, SHOULD include support for HMAC- 845 SHA-1, and MAY include support for HMAC-SHA-384 and HMAC-SHA-512. 847 Further details for ensuring the security of the BGP Hello UDP 848 messages are described in Section 11. 850 The Cryptographic Authentication TLV format is as shown below. 852 0 1 2 3 853 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 | Type | Length | 856 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 857 | Security Association ID | 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 | Cryptographic Sequence Number (High-Order 32 Bits) | 860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 861 | Cryptographic Sequence Number (Low-Order 32 Bits) | 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 | Authentication Data (Variable) // 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 866 Figure 8: Cryptographic Authentication TLV 868 Type: TBD6 870 Length: Specifies the length of the Value field in octets 872 Security Association ID: The 32-bit field that maps to the 873 authentication algorithm and the secret key used to create the 874 message digest carried in Hello message payload. 876 Cryptographic Sequence Number: The 64-bit, strictly increasing 877 sequence number that is used to guard against replay attacks. The 878 64-bit sequence number MUST be incremented for every BGP Hello 879 message sent by the BGP router. Upon reception, the sequence 880 number MUST be greater than the sequence number in the last BGP 881 Hello message accepted from the sending BGP neighbor. Otherwise, 882 the BGP hello message is considered a replayed packet and is 883 dropped. The Cryptographic Sequence Number is a single space per 884 BGP router. 886 Authentication Data: This field carries the digest computed by the 887 Cryptographic Authentication algorithm in use. The length of the 888 Authentication Data varies based on the cryptographic algorithm in 889 use, which is shown below: 891 HMAC-SHA1 20 bytes 893 HMAC-SHA-256 32 bytes 895 HMAC-SHA-384 48 bytes 897 HMAC-SHA-512 64 bytes 899 9. Neighbor Discovery Procedure 901 The neighbor discovery mechanism in BGP is implemented with the 902 introduction of an Interface state in BGP and an Adjacency Finite 903 State Machine (FSM). This section describes the states, FSM and 904 procedures involved. 906 9.1. Interface Procedures 908 In order to perform neighbor discovery, BGP needs to maintain state 909 for the subset of its connected interfaces over which neighbor 910 discovery is enabled. For these interfaces, BGP sends its Hello 911 messages, including the TLVs described in Section 8, as long as its 912 link is UP. The Neighbor TLV described in Section 8.5 is, included 913 once a neighbor is discovered as described in Section 9.2 . 915 The Hello messages MUST be originated periodically at an interval 916 which is less than or equal to one third of the Adjacency Hold Time 917 indicated by the router in its Hello message. The RECOMMENDED 918 default value for the Adjacency Hold Time is 45 seconds which makes 919 the hello message interval to be 15 seconds. Period Hello messages 920 ensure robustness of the neighbor discovery mechanism against 921 transient loss of hello messages that are sent over unreliable UDP 922 messaging channel and also enable detection of neighbor down events 923 over specific links. Periodic Hello messages that do not convey any 924 change in state SHOULD exclude TLVs that signal the local interface 925 or adjacency state and have the S bit CLEAR as specified in 926 Section 7. 928 A State Change Hello message MUST be triggered, without waiting for 929 the periodic timer expiry, whenever there is a change in the router's 930 Hello TLVs' content that needs to be signaled to its neighbor over 931 the specific link. A State Change Hello message MUST also be 932 triggered when a new neighbor's Hello message is first received or 933 change is detected in the neighbor's Hello TLV's that results in 934 change in it's adjacency state. Once a State Change Hello message is 935 triggered on a specific interface, the router MUST continue to 936 generate State Change Hello messages on it with the necessary TLVs 937 included at periodic hello message intervals for a period of time 938 that is at least equal to the Adjacency Hold Time. This ensures that 939 messages carrying the updated information and local state changes are 940 not lost. The router can switch back to Periodic Hello messages 941 after it has transmitted State Change Hello messages with the latest 942 TLV contents for the Adjacency Hold Time period. 944 When a router receives a Hello message from its neighbor, it MUST 945 restart the Adjacency Hold timer that it is maintaining for the 946 neighbor adjacency using the value indicated in the Hello message. 948 When the message is of type State Change (i.e. with S bit SET), it 949 additionally needs to process all the TLVs included and verify the 950 signaled state against what was conveyed in the previous State Change 951 Hello message from the same neighbor. Any changed identified would 952 trigger the adjacency FSM change as described in Section 9.2. 954 When a router does not receive a Hello message from its neighbor for 955 a period equal to Adjacency Hold Time, then it MUST treat this as an 956 adjacency down event and clean up its adjacency state to this 957 neighbor as described in Section 9.2. 959 Before the interface is shut or the neighbor discovery mechanism is 960 disabled on it, the router SHOULD attempt to send out immediate Hello 961 messages, with the S bit CLEAR (i.e. not including state related 962 TLVs) and with Adjacency Hold Time set to 0, to trigger the adjacency 963 down event on its neighbors. It MUST then clean up its own adjacency 964 states on that specific link. 966 When either the BGP Identifier or the AS number are modified, then 967 the router MUST send out a triggered Hello message, with the S bit 968 CLEAR and with Adjacency Hold Time set to 0 using the old BGP 969 Identifier and AS number values, over all the links enabled for BGP 970 neighbor discovery. 972 A router receiving a Hello message with Adjacency Hold Time set to 0 973 MUST treat this event as if the adjacency hold timer has expired for 974 the specific neighbor and proceed to bring down the adjacency. 976 An interface going down (e.g. due to link failure or loss of signal) 977 MUST immediately trigger the adjacency down event for all adjacencies 978 over it as if the adjacency hold timer expired for all neighbors on 979 that link. 981 9.2. Adjacency State Machine 983 On a per interface basis, BGP needs to maintain an adjacency state 984 for each neighbor that it discovers. The adjacency state is 985 maintained as a FSM and it has states as described in the following 986 sections. 988 9.2.1. Down State 990 This is the transient terminal state after which an adjacency is 991 deleted. 993 When transitioning to the Down state from Accepted, the router 994 removes the path corresponding to this adjacency from any Adjacency 995 Route that it had setup to the neighbor's prefixes. If no other 996 adjacency exists in Accepted state to the neighbor, then it also 997 deletes the BGP TCP peering session(s) setup to the neighbor based on 998 the neighbor discovery mechanism. 1000 9.2.2. Initial State 1002 This is the transient initial state from which an adjacency starts, 1003 when the router detects a hello message from a new neighbor on the 1004 link, and immediately transitions to the 1-way state. 1006 9.2.3. 1-Way State 1008 While in the 1-way state (or when entering it), the adjacency 1009 transitions from 1-way to 2-way state when the router detects a 1010 Neighbor TLV corresponding to itself in the neighbor's Hello message. 1011 If the state does not immediately transition on to 2-way after 1012 entering 1-way, the the router MUST immediately trigger a State 1013 Change Hello message with the inclusion of the neighbor in a Neighbor 1014 TLV with the state set to 1-way. 1016 When transitioning to the 1-way state from Accepted, the router 1017 removes the path corresponding to this adjacency from any Adjacency 1018 Route that it had setup to the neighbor's prefixes. If no other 1019 adjacency exists in Accepted state to the neighbor, then it also 1020 deletes the BGP TCP peering session(s) setup to the neighbor based on 1021 the neighbor discovery mechanism. 1023 Adjacency transitions to Down state for any of the following events: 1025 o Link goes down operationally or is administratively shut 1027 o Adjacency Hold Timer expires 1029 o Router receives a Hello message from its neighbor with Adjacency 1030 Hold Time value set to 0 1032 o Neighbor discovery is disabled on the link 1034 o Change in BGP Identifier or AS number on the local router 1036 9.2.4. 2-Way State 1038 Upon transitioning into this state, the router triggers a State 1039 Change Hello message with the neighbor's status set to 2-way in the 1040 Neighbor TLV. At this stage, both neighbors have received each 1041 other's Hello messages and thus discovered each other. 1043 When the router, in this adjacency state, detects that the neighbor's 1044 state for itself is 2-way or higher, then it performs the validation 1045 checks based on local policy and information exchanged in the Hello 1046 TLVs. Following are some of the validation checks that may be 1047 performed on the adjacency: 1049 o Verify subnet matching between the local and remote interface 1050 addresses. 1052 o Verify AS numbers based on local policy as well as against the 1053 Allowed ASN TLV when one is being exchanged. 1055 o Verify that BFD monitoring (when enabled) is indicating UP state. 1057 When the adjacency passes the validation checks, it transitions to 1058 the Adj-OK state and transitions to the Adj-Reject state otherwise. 1060 The adjacency transitions to Down state for any of the adjacency down 1061 events described in Section 9.2.3 . 1063 The adjacency transitions to 1-way state when the router stops seeing 1064 itself in a Neighbor TLV of its Neighbor's State Change Hello 1065 messages. 1067 9.2.5. Adj-Reject State 1069 Upon transitioning into this state, the router triggers a State 1070 Change Hello message with the neighbor's status set to Adj-Reject in 1071 the Neighbor TLV. 1073 The adjacency remains in the Adj-Reject state as long as the 1074 parameters being exchanged via the State Change Hello messages do not 1075 pass validation checks. The neighbors continue to include each other 1076 in their respective State Change Hello messages. 1078 The adjacency transitions to the Adj-OK state once the validation 1079 checks pass (e.g. due to update in any parameters or local policy). 1081 The adjacency transitions to Down state for any of the adjacency down 1082 events described in Section 9.2.3 . 1084 The adjacency transitions to 1-way state when the router stops seeing 1085 itself in a Neighbor TLV of its Neighbor's State Change Hello 1086 messages. 1088 When transitioning to an Adj-Reject state from Accepted state, the 1089 router removes the path corresponding to this adjacency from any 1090 Adjacency Route that it had setup to the neighbor's prefixes. If no 1091 other adjacency exists in Accepted state to the neighbor, then it 1092 also deletes the BGP TCP peering session(s) setup to the neighbor 1093 based on the neighbor discovery mechanism. 1095 9.2.6. Adj-OK State 1097 Upon transitioning into this state, the router triggers a State 1098 Change Hello message with the neighbor's status set to Adj-OK in the 1099 Neighbor TLV. 1101 The adjacency transition to Adj-OK state indicates that the router 1102 has accepted its neighbor. However, it is possible that the neighbor 1103 has not accept it and is signaling Adj-Reject state for the adjacency 1104 from it's end. 1106 The adjacency transitions to the Accepted state from Adj-OK once it 1107 detects that its neighbor is also signaling the Adj-OK or Accepted 1108 state for it. 1110 The adjacency transitions to Down state for any of the adjacency down 1111 events described in Section 9.2.3 . 1113 The adjacency transitions to 1-way state when the router stops seeing 1114 itself in a Neighbor TLV of its Neighbor's State Change Hello 1115 messages. 1117 The adjacency transitions to Adj-Reject state when any of the 1118 validation checks listed in Section 9.2.4 fail. 1120 When transitioning to an Adj-OK state from Accepted state, the router 1121 removes the path corresponding to this adjacency from any Adjacency 1122 Route that it had setup to the neighbor's prefixes. If no other 1123 adjacency exists in Accepted state to the neighbor, then it also 1124 deletes the BGP TCP peering session(s) setup to the neighbor based on 1125 the neighbor discovery mechanism. 1127 9.2.7. Accepted State 1129 The adjacency transition to Accepted state indicates that both the 1130 neighboring routers have accepted the adjacency to each other. 1132 On this transition, the router triggers a State Change Hello message 1133 with the neighbor's status set to Accepted in the Neighbor TLV. It 1134 then installs the Adjacency Route(s) for the Prefix(es) signaled by 1135 the neighbor via the Local Prefix TLV via this adjacency link using 1136 the neighbor's address on that link. If this is the first Accepted 1137 adjacency to the neighbor then the Adjacency Route gets added to the 1138 local routing table, otherwise an additional path corresponding to 1139 this adjacency link and neighbor address on it gets added to the 1140 existing Adjacency Route. The details are described in Section 9.3. 1142 When this is the first Accepted adjacency to the neighbor, then the 1143 setup of the BGP TCP session to the Peering Address(es) signaled by 1144 the neighbor is also triggered. 1146 The adjacency transitions to Down state for any of the adjacency down 1147 events described in Section 9.2.3. 1149 The adjacency transitions to 1-way state when the router stops seeing 1150 itself in a Neighbor TLV of its Neighbor's State Change Hello 1151 messages. 1153 The adjacency transitions to Adj-Reject state when any of the 1154 validation checks listed in Section 9.2.4 fail. 1156 9.3. Adjacency Route 1158 The Adjacency Route programming is an optional part of the BGP 1159 Neighbor Discovery mechanism for setting up reachability for the 1160 neighbor's prefixes signaled via the Local Prefix TLV corresponding 1161 to adjacencies in Accepted state. 1163 Adjacency Routes establish reachability between local prefixes on 1164 directly connected BGP routers. They enable reachability between the 1165 Peering Addresses (generally loopbacks) of the two neighbors so that 1166 the BGP TCP session may come up between them. Then, for the BGP 1167 routes learnt over the TCP session, where the next-hop is the 1168 neighbor, they also provide the BGP NH resolution. 1170 Unlike other BGP routes, these are not recursive routes as in they 1171 point to the neighbor's interface and IP address. These routes that 1172 are setup as part of the neighbor discovery procedure are hence 1173 different from the regular IBGP and EBGP routes. These routes also 1174 MUST have a better administrative distance as compared to the IBGP 1175 and EBGP routes to ensure that they do not get displaced from the 1176 forwarding by BGP routes learnt over the very session(s) established 1177 using these peering routes. 1179 The Adjacency Routes SHOULD NOT be stored in any of BGP RIBs 1180 [RFC4271] since they are not computed based on the BGP decision 1181 process. It is RECOMMENDED that these routes be managed in a 1182 separate routing table within the BGP Neighbor Discovery function to 1183 ensure that none of the processing and validation for BGP RIB affects 1184 them and in turn they do not influence the BGP decision process and 1185 route calculation. 1187 When there are multiple interconnecting links between two BGP 1188 neighbors, a single BGP TCP session may be setup between them over 1189 which routes are then exchanged. However, in the forwarding, the 1190 Adjacency route will have multiple paths - one for each of these 1191 interconnecting links. So the BGP routes learnt over the session 1192 actually end up getting resolved over this Adjacency route and in 1193 turn gets the ECMP load balancing even with a single BGP session. 1195 10. Interactions with Base BGP Protocol 1197 The BGP Finite State Machine (FSM) as specified in [RFC4271] is 1198 unchanged and the BGP TCP session establishment, route updates and 1199 processing continues to follow the BGP protocol specifications. 1201 BGP peering addresses along with their respective ASNs have 1202 traditionally been explicitly provisioned on both BGP neighbors. The 1203 difference that neighbor discovery mechanism brings about is in 1204 elimination of this configuration as these parameters are learnt via 1205 the neighbor discovery procedure. Once BGP router learns its 1206 neighbor's peering address and ASN, then its initializes the BGP Peer 1207 FSM for this neighbor in the Idle State - just as if this neighbor 1208 was configured. From thereon, the BGP Peer FSM actions follows. 1210 The BGP Keepalives and Hold Timer for the session over TCP apply 1211 unchanged and they govern the operations of the BGP TCP session. 1212 While the BGP Keepalive works at the TCP session level, the BGP 1213 Adjacency Hold Timer monitors one or more underlying interconnecting 1214 link adjacencies between the neighbors. The reachability for the BGP 1215 TCP session may also be over the some BGP routes learnt via routing 1216 updates over the sessions setup via neighbor discovery. It is likely 1217 that even after all the underlying interconnecting link adjacencies 1218 between two neighbors are down that the neighbor's peering address is 1219 reachable via BGP routing over some other path in the network. In 1220 order to avoid this, it is RECOMMENDED that the BGP TCP sessions 1221 setup via neighbor discovery mechanism use TTL set to 1 to ensure 1222 they are setup only over directly attached links to the neighbors. 1224 Since the BGP TCP session setup via neighbor discovery was meant for 1225 hop-by-hop routing, it would be necessary to bring down the session 1226 even while its BGP Hold Timer has not expired for faster convergence. 1227 Therefore, when all the underlying link adjacencies between two BGP 1228 neighbors move out of the Accepted state (or go down), then the BGP 1229 TCP peering session that was setup using BGP Neighbor Discovery 1230 mechanism between these two neighbors is also deleted as if it was 1231 un-configured. 1233 Since the BGP neighbor discovery mechanism runs over a UDP socket, it 1234 is isolated from the core BGP protocol working which is TCP based. 1236 Implementations SHOULD ensure that the hello processing does not 1237 affect the base BGP operations and scalability. One option may be to 1238 run the BGP neighbor discovery mechanism in a separate thread from 1239 the rest of BGP processing. These implementation details, however, 1240 are outside the scope of this document. 1242 It is not generally expected that BGP sessions are explicitly 1243 provisioned along with the neighbor discovery mechanism. However, in 1244 such an event, the neighbor discovery mechanism MUST NOT affect or 1245 result in any changes to provisioned BGP neighbors and their 1246 operations. Specifically, BGP peering to auto-discovered neighbors 1247 MUST NOT be instantiated using the procedures described in this 1248 document when the same BGP neighbor is already provisioned. The 1249 configured BGP neighbor parameters take precedence and the auto- 1250 discovered values and parameters are not used for such configured BGP 1251 sessions. 1253 11. Security Considerations 1255 BGP routers accept TCP connection attempts to port 179 only from the 1256 provisioned BGP neighbors or, in some implementations, those from 1257 within a configured address range. With the BGP neighbor auto- 1258 discovery mechanism, it is now possible for BGP to automatically 1259 learn neighbors and initiate/receive TCP connections from them. This 1260 introduces the need for specific considerations to be taken care of 1261 to ensure security of the BGP protocol operations. 1263 This document introduces UDP messages in BGP for the neighbor 1264 discovery mechanism using the BGP Hello messages. For security 1265 purposes, implementations MUST exchange the Hello messages only on 1266 interfaces specifically enabled for neighbor discovery. Hello 1267 messages MUST NOT be accepted on other than the 224.0.0.2 or FF02::2 1268 addresses. Optionally, implementations MAY set TTL to 255 when 1269 originating the Hello messages and receivers check specifically for 1270 the TLV to be 254 and discard the packet when this is not the case. 1271 This ensures that the Hello packets signaling happens between 1272 directly connected BGP routers only. 1274 The BGP neighbor discovery mechanism is expected to be run typically 1275 in DCs and between physically connected routers that are trustworthy. 1276 The Cryptographic Authentication TLV (as described in Section 8.6) 1277 SHOULD be used in deployments where this assumption of 1278 trustworthiness is not valid. This mechanism is similar to one 1279 defined for LDP Hello messages that are also UDP based as specified 1280 in [RFC7349]. An updated future version of this document will 1281 describe similar procedures for BGP hello in more details. 1283 Once the BGP hello messages and the neighbor discovery mechanism is 1284 secured, then the security considerations for BGP protocol operations 1285 apply for the auto-discovered neighbor sessions. 1287 12. Manageability Considerations 1289 This section is structured as recommended in [RFC5706]. 1291 12.1. Operational Considerations 1293 The BGP neighbor discovery mechanism introduced by this document is 1294 not applicable to general BGP deployments as discussed in Section 3. 1295 The mechanism is specifically meant for networks where BGP is used as 1296 a hop-by-hop routing protocol E.g. as described in [RFC7938]. The 1297 neighbor discovery mechanism hence SHOULD NOT be enabled by default 1298 in BGP. 1300 Implementations SHOULD provide configuration methods that allow 1301 enablement of BGP neighbor discovery on specific local interfaces. 1302 In a DC network, it is expected that the operator selects the 1303 appropriate links on which to enable this e.g. on a Tier 2 node it is 1304 enabled on all links towards the Tier 1 and Tier 3 nodes while on a 1305 Tier 1 node, it may be only enabled on the links towards the Tier 2 1306 node. The details of this enablement are outside the scope of this 1307 document since it varies based on the DC design and may be 1308 implementation specific. 1310 Implementations SHOULD provide configuration methods that enable the 1311 setup of BGP neighbor templates that enables operator to setup BGP 1312 neighbor discovery parameters on the BGP router. Some of the aspects 1313 to be considered in such a template are: 1315 o Local address to be used for the BGP TCP session peering along 1316 with the local ASN and the AFI/SAFI enabled for the auto- 1317 discovered sessions 1319 o BGP policies to be enabled for the auto-discovered sessions 1321 o Optionally specify the list of ASNs with which auto-discovered 1322 sessions should be brought up. This is to ensure that when links 1323 between different Tier nodes are not used by BGP when they get 1324 connected wrongly due to accidents (e.g. say a Tier 3 node is 1325 connected to a Tier 1 node). 1327 o Authentication methods that are need to be enabled in an 1328 environment which is not secure 1330 o Local interfaces over which the specific template needs to be 1331 applied for BGP neighbor discovery 1333 o Other parameters like the Adjacency Hold Timer value to be used or 1334 other optional features 1336 This mechanism does not impose any restrictions on the way ASNs or 1337 addresses are assigned to the nodes. Various automatic provisioning, 1338 auto-configuration or zero-touch-provisioning mechanisms may be used. 1340 Implementations SHOULD report the state of the BGP operations over 1341 each link enabled for neighbor discovery including the status of all 1342 adjacencies learnt over it. Implementations SHOULD also report the 1343 operations of the auto-discovered BGP TCP peering sessions similar to 1344 the provisioned BGP neighbors. 1346 Implementations SHOULD support logging of events like discovery of an 1347 adjacency using neighbor discovery including peering route updates 1348 and events like triggering of BGP TCP session establishment for them. 1349 Errors and alarms related to loss of adjacencies and tear down of BGP 1350 TCP peering sessions SHOULD also be generated so they could be 1351 monitored. 1353 12.2. Management Considerations 1355 This document introduces UDP based messaging in BGP protocol and 1356 therefore the necessary fault management mechanisms are required to 1357 be implemented for the same. Implementations MUST discard 1358 unsupported message types or version types other than 4 received over 1359 a UDP session. Such messages MUST NOT affect the neighbor discovery 1360 mechanism in operation using the Hello messages. Unknown TLVs 1361 received via the Hello messages MUST be ignored and the rest of the 1362 Hello message MUST be processed. Implementations SHOULD discard 1363 Hello messages with malformed TLVs and this should be logged as an 1364 error. 1366 13. IANA Considerations 1368 This documents requests IANA for updates to the BGP Parameters 1369 registry as described in this section. 1371 13.1. BGP Hello Message 1373 This document requests IANA to allocate a new UDP port (179 is the 1374 preferred number ) and a BGP message type code for BGP Hello message. 1376 Value TLV Name Reference 1377 ----- ------------------------------------ ------------- 1378 Service Name: BGP-HELLO 1379 Transport Protocol(s): UDP 1380 Assignee: IESG 1381 Contact: IETF Chair . 1382 Description: BGP Hello Message. 1383 Reference: This document -- draft-xu-idr-neighbor-autodiscovery. 1384 Port Number: 179 (preferred value) -- To be assigned by IANA. 1386 13.2. TLVs of BGP Hello Message 1388 This document requests IANA to create a new registry "TLVs of BGP 1389 Hello Message" with the following registration procedure: 1391 Registry Name: TLVs of BGP Hello Message. 1393 Value TLV Name Reference 1394 ------- ---------------------------------- ------------- 1395 0 Reserved This document 1396 1 Accepted ASN List This document 1397 2 Peering Address This document 1398 3 Local Prefix This document 1399 4 Link Attributes This document 1400 5 Neighbor This document 1401 6 Cryptographic Authentication This document 1402 7-65500 Unassigned 1403 65501-65534 Experimental This document 1404 65535 Reserved This document 1406 14. Acknowledgements 1408 The authors would like to thank Enke Chen, Krishna Swamy and Ramesh 1409 Yakkala for their valuable comments and suggestions on this document. 1411 15. Contributors 1412 Satya Mohanty 1413 Cisco 1414 Email: satyamoh@cisco.com 1416 Shunwan Zhuang 1417 Huawei 1418 Email: zhuangshunwan@huawei.com 1420 Chao Huang 1421 Alibaba Inc 1422 Email: jingtan.hc@alibaba-inc.com 1424 Guixin Bao 1425 Alibaba Inc 1426 Email: guixin.bgx@alibaba-inc.com 1428 Jinghui Liu 1429 Ruijie Networks 1430 Email: liujh@ruijie.com.cn 1432 Zhichun Jiang 1433 Tencent 1434 Email: zcjiang@tencent.com 1436 Shaowen Ma 1437 Juniper Networks 1438 mashaowen@gmail.com 1440 16. References 1442 16.1. Normative References 1444 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1445 Requirement Levels", BCP 14, RFC 2119, 1446 DOI 10.17487/RFC2119, March 1997, 1447 . 1449 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1450 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1451 DOI 10.17487/RFC4271, January 2006, 1452 . 1454 [RFC5036] Andersson, L., Ed., Minei, I., Ed., and B. Thomas, Ed., 1455 "LDP Specification", RFC 5036, DOI 10.17487/RFC5036, 1456 October 2007, . 1458 [RFC5082] Gill, V., Heasley, J., Meyer, D., Savola, P., Ed., and C. 1459 Pignataro, "The Generalized TTL Security Mechanism 1460 (GTSM)", RFC 5082, DOI 10.17487/RFC5082, October 2007, 1461 . 1463 16.2. Informative References 1465 [FIPS-180-4] 1466 Technology, N. I. O. S. A., "Secure Hash Standard (SHS), 1467 FIPS PUB 180-4", March 2012. 1469 [I-D.ietf-lsvr-bgp-spf] 1470 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, 1471 "Shortest Path Routing Extensions for BGP Protocol", 1472 draft-ietf-lsvr-bgp-spf-04 (work in progress), December 1473 2018. 1475 [I-D.ketant-idr-bgp-ls-bgp-only-fabric] 1476 Talaulikar, K., Filsfils, C., ananthamurthy, k., Zandi, 1477 S., Dawra, G., and M. Durrani, "BGP Link-State Extensions 1478 for BGP-only Fabric", draft-ketant-idr-bgp-ls-bgp-only- 1479 fabric-02 (work in progress), March 2019. 1481 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 1482 Hashing for Message Authentication", RFC 2104, 1483 DOI 10.17487/RFC2104, February 1997, 1484 . 1486 [RFC4202] Kompella, K., Ed. and Y. Rekhter, Ed., "Routing Extensions 1487 in Support of Generalized Multi-Protocol Label Switching 1488 (GMPLS)", RFC 4202, DOI 10.17487/RFC4202, October 2005, 1489 . 1491 [RFC5706] Harrington, D., "Guidelines for Considering Operations and 1492 Management of New Protocols and Protocol Extensions", 1493 RFC 5706, DOI 10.17487/RFC5706, November 2009, 1494 . 1496 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1497 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1498 . 1500 [RFC7349] Zheng, L., Chen, M., and M. Bhatia, "LDP Hello 1501 Cryptographic Authentication", RFC 7349, 1502 DOI 10.17487/RFC7349, August 2014, 1503 . 1505 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1506 S. Ray, "North-Bound Distribution of Link-State and 1507 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1508 DOI 10.17487/RFC7752, March 2016, 1509 . 1511 [RFC7938] Lapukhov, P., Premji, A., and J. Mitchell, Ed., "Use of 1512 BGP for Routing in Large-Scale Data Centers", RFC 7938, 1513 DOI 10.17487/RFC7938, August 2016, 1514 . 1516 Authors' Addresses 1518 Xiaohu Xu 1519 Alibaba Inc 1520 China 1522 Email: xiaohu.xxh@alibaba-inc.com 1524 Ketan Talaulikar 1525 Cisco Systems 1526 India 1528 Email: ketant@cisco.com 1530 Kunyang Bi 1531 Huawei 1532 China 1534 Email: bikunyang@huawei.com 1536 Jeff Tantsura 1537 Apstra 1538 USA 1540 Email: jefftant.ietf@gmail.com 1542 Nikos Triantafillis 1543 Apstra 1544 USA 1546 Email: nikos@apstra.com