idnits 2.17.1 draft-ietf-idr-rs-bfd-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (July 3, 2017) is 2488 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Internet Initiative Japan 4 Intended status: Standards Track J. Haas 5 Expires: January 4, 2018 J. Scudder 6 Juniper Networks, Inc. 7 A. Nipper 8 T. King 9 DE-CIX Management GmbH 10 July 3, 2017 12 Making Route Servers Aware of Data Link Failures at IXPs 13 draft-ietf-idr-rs-bfd-03 15 Abstract 17 When BGP route servers are used, the data plane is not congruent with 18 the control plane. Therefore, peers at an Internet exchange can lose 19 data connectivity without the control plane being aware of it, and 20 packets are lost. This document proposes the use of a newly defined 21 BGP Subsequent Address Family Identifier (SAFI) both to allow the 22 route server to request its clients use BFD to track data plane 23 connectivity to their peers' addresses, and for the clients to signal 24 that connectivity state back to the route server. 26 Requirements Language 28 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 29 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to 30 be interpreted as described in [RFC2119] only when they appear in all 31 upper case. They may also appear in lower or mixed case as English 32 words, without normative meaning. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 4, 2018. 50 Copyright Notice 52 Copyright (c) 2017 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 70 4. Next Hop Validation . . . . . . . . . . . . . . . . . . . . . 5 71 4.1. ReachAsk . . . . . . . . . . . . . . . . . . . . . . . . 5 72 4.2. LocReach . . . . . . . . . . . . . . . . . . . . . . . . 5 73 4.3. ReachTell . . . . . . . . . . . . . . . . . . . . . . . . 6 74 4.4. NHIB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 75 5. Advertising NH-Reach state in BGP . . . . . . . . . . . . . . 6 76 6. Client Procedures for NH-Reach Changes . . . . . . . . . . . 8 77 7. Recommendations for Using BFD . . . . . . . . . . . . . . . . 9 78 8. Other Considerations . . . . . . . . . . . . . . . . . . . . 9 79 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 80 10. Security Considerations . . . . . . . . . . . . . . . . . . . 9 81 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 82 11.1. Normative References . . . . . . . . . . . . . . . . . . 10 83 11.2. Informative References . . . . . . . . . . . . . . . . . 11 84 Appendix A. Summary of Document Changes . . . . . . . . . . . . 11 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 87 1. Introduction 89 In configurations (typically Internet Exchange Points (IXPs)) where 90 EBGP routing information is exchanged between client routers through 91 the agency of a route server (RS) [RFC7947], but traffic is exchanged 92 directly, operational issues can arise when partial data plane 93 connectivity exists among the route server client routers. Since the 94 data plane is not congruent with the control plane, the client 95 routers on the IXP can lose data connectivity without the control 96 plane - the route server - being aware of it, resulting in 97 significant data loss. 99 To remedy this, two basic problems need to be solved: 101 1. Client routers must have a means of verifying connectivity 102 amongst themselves, and 103 2. Client routers must have a means of communicating the knowledge 104 of the failure (and restoration) back to the route server. 106 The first can be solved by application of Bidirectional Forwarding 107 Detection [RFC5880]. The second can be solved by exchanging BGP 108 routes which use the NH-Reach Subsequent Address Family Identifier 109 (SAFI) defined in this document. 111 Throughout this document, we generally assume that the route server 112 being discussed is able to represent different RIBs towards different 113 clients, as discussed in section 2.3.2.1 of [RFC7947]. If this is 114 not the case, the procedures described here to allow BFD to be 115 automatically provisioned between clients still have value; however, 116 the procedures for signaling reachability back to the route server 117 may not. 119 Throughout this document, we refer to the "route server", "RS" or 120 just "server" and the "client" to describe the two BGP routers 121 engaging in the exchange of information. We observe that there could 122 be other applications for this extension. Our use of terminology is 123 intended for clarity of description, and not to limit the future 124 applicability of the proposal. 126 2. Definitions 128 o Indirect peer: If a route server is configured such that routes 129 from a given client might be sent to some other client, or vice- 130 versa, those two clients are considered to be indirect peers. 131 o RS: Route Server. See [RFC7947]. 133 3. Overview 135 As with the base BGP protocol, we model the function of this 136 extension as the interaction between a conceptual set of databases: 138 o ReachAsk: The reachability request database. A database of 139 nexthops (host addresses) for which data plane reachability is 140 being queried. 141 o ReachAsk-Out: A set of queries sent to the client. 142 o ReachAsk-In: A set of queries received from the route server. 144 o ReachTell: The reachability response database. A database of 145 responses to ReachAsk queries, indicating what is known about data 146 plane reachability. 147 o ReachTell-Out: The responses being sent to the route server. 148 o ReachTell-In: The response received from the client. 149 o LocReach: The local reachability database. 150 o NHIB: Next Hop Information Base. Stores what is known about the 151 client's reachability to its next hops. 153 +--------------------------------------------------------+ 154 | +------------+ +------------+ +------------+ | 155 | | Per- | | Configured | | Per- | | 156 | | Client | | indirect | | Client | | 157 | | NHIB | | peers | | RIB | | 158 | +-----^------+ +------------+ +-----+------+ | 159 | | \ | | 160 | +-----+------+ `-->-----v------+ | 161 | |ReachTell-In| |ReachAsk-Out| | 162 | +------^-----+ Route Server +-----+------+ | 163 +----------|----------------------------------|----------+ 164 | | 165 | | 166 | | 167 | | 168 +----------|----------------------------------|----------+ 169 | +------+------+ RS Client +-----v-----+ | 170 | |ReachTell-Out| |ReachAsk-In| | 171 | +------^------+ +-----+-----+ | 172 | | +------------+ | | 173 | | | | | | 174 | `----------+ LocReach <----------' | 175 | | | | 176 | +------------+ | 177 +--------------------------------------------------------+ 179 Route Server, RS Client, and Reachability Ask and Tell databases with 180 In/Out Queues 182 In outline, the route server requests its client to track 183 connectivity for all the potential next hops the RS might send to the 184 client, by sending these next hops as ReachAsk "routes". The client 185 tracks connectivity using BFD and reports its connectivity status to 186 the RS using ReachTell "routes". Connectivity status may be that the 187 next hop is reachable, unreachable, or unknown. Once the RS has been 188 informed by the client of its connectivity, it uses this information 189 to influence the route selection the RS performs on behalf of the 190 client. Details are elaborated in the following sections. 192 4. Next Hop Validation 194 Below, we detail procedures where a route server tells its client 195 router about other client nexthops by sending it ReachAsk routes and 196 the client router verifies connectivity to those other client routers 197 and communicates its findings back to the RS using ReachTell routes. 198 The RS uses the received ReachTell routes as input to the NHIB and 199 hence the route selection process it performs on behalf of the 200 client. 202 4.1. ReachAsk 204 The route server maintains a ReachAsk database for each client that 205 supports this proposal, that is, for each client that has advertised 206 support (Section 5) for the NH-Reach SAFI. This database is the 207 union of: 209 o The set of next hops found in the associated per-client Loc-RIB 210 (see section 2.3.2.1 of [RFC7947]). 211 o The set of addresses of this client's indirect peers (Section 2). 212 o The RS MAY also add other entries, for example under configuration 213 control. 215 We note that under most circumstances, the first (Loc-RIB next hops) 216 set will be a subset of the second (indirect peers) set. For this 217 not to be the case, a client would have to have sent a "third party" 218 next hop [RFC4271] to the server. To cover such a case, an 219 implementation MAY note any such next hops, and include them in its 220 list of indirect peers. (This implies that if a third party next hop 221 for client C is conveyed to client A, not only will C be placed in 222 A's ReachAsk database, but A will be placed in C's ReachAsk 223 database.) 225 The contents of the ReachAsk database are communicated to the client 226 using the NLRI format and procedures described in Section 5. 228 4.2. LocReach 230 The client MUST attempt to track data plane connectivity to each host 231 address depicted in the ReachAsk database. It MAY also track 232 connectivity to other addresses. The use of BFD for this purpose is 233 detailed in Section 6. 235 For each address being tracked, its state is maintained by the client 236 in a LocReach entry. The state can be: 238 o Unknown. Connectivity status is unknown. This may be due to a 239 temporary or permanent lack of feasible OAM mechanism to determine 240 the status. 241 o Up. The address has been determined to be reachable. 242 o Down. The address has been determined to be unreachable. 244 The LocReach database is used as input for the ReachTell database; it 245 MAY also be used as input to the client's route resolvability 246 condition (section 9.1.2.1 of [RFC4271]). 248 4.3. ReachTell 250 The ReachTell database contains an entry for every entry in the 251 LocReach database. 253 The contents of the ReachTell database are communicated to the server 254 using the NLRI format and procedures described in Section 5. 256 4.4. NHIB 258 The route server maintains a per-client Next Hop Information Base, or 259 NHIB. This contains the information about next hop status received 260 from ReachTell. 262 In computing its per-client Loc-RIB, the RS uses the content of the 263 related per-client NHIB as input to the route resolvability condition 264 (section 9.1.2.1 of [RFC4271]). The next hop being resolved is 265 looked up in the NHIB and its state determined: 267 o Up next hops are considered resolvable. 268 o Unknown next hops MAY be considered resolvable. They MAY be less 269 preferred for selection. 270 o Down next hops MUST NOT be considered resolvable. 271 o If a given next hop is not present in the NHIB, but is present in 272 ReachAsk-Out, either the client has not responded yet (a transient 273 condition) or an error exists. Similar to Unknown next hops, such 274 routes MAY be considered resolvable; they MAY be less preferred. 276 5. Advertising NH-Reach state in BGP 278 A new BGP SAFI, the NH-Reach SAFI, is defined in this document. It 279 has been assigned value TBD. A route server or a route server client 280 using the procedures in this document MUST advertise support for this 281 SAFI, for the IPv4 and/or IPv6 Address Family Identifier (AFI). The 282 use of this SAFI with any other AFI is not defined by this document. 284 NH-Reach NLRI "routes" have a Length of Next Hop Network Address 285 value of 0, therefore they have an empty Network Address of Next Hop 286 field (section 3 of [RFC4760]). 288 Since as specified here, ReachTell "routes" from different clients 289 populate distinct databases on the RS, there will generally be only a 290 single path per "route"; this implies that route selection need not 291 be performed (or equivalently, that it's trivial to perform). 293 In the other direction, a client might peer with multiple route 294 servers and receive differing sets of ReachAsk routes from them. An 295 implementation MAY handle this situation by implementing a distinct 296 ReachAsk and ReachTell per server, but it MAY also handle it by 297 placing all servers' ReachAsk "routes" into a single ReachAsk, and 298 sending the results to all servers from a single ReachTell. This 299 would imply some route server(s) might get ReachTell results they had 300 not asked for, but this is permissible in any case. Again, since the 301 contents of ReachAsk are simply a set of host routes to be tested, 302 route selection over a combined ReachAsk MAY be omitted. 304 ReachAsk and ReachTell entries are exchanged using the NH-Reach NLRI 305 encoding: 307 0 1 2 3 308 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 |T|Reserved |Sta| next hop (4 or 16 octets) | 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 312 . ... next hop (4 or 16 octets) ... . 313 . . 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 316 NH-Reach NLRI Format 318 o T: Type is a one-bit field that can take the value 0, meaning the 319 NLRI is a ReachAsk entry, or 1, meaning it is a ReachTell entry. 320 o Reserved: These five bits are reserved. They MUST be sent as zero 321 and MUST be disregarded on receipt. 322 o Sta: State is a two-bit field used to signal the LocReach 323 (Section 4.2) state: 325 * 0 or 3: Unknown. 326 * 1: Up. 327 * 2: Down. 329 Although either 0 or 3 is to be interpreted as "Unknown", the 330 value 0 MUST be used on transmission. The value 3 MUST be 331 accepted as an alias for 0 on receipt. 333 o The next hop field is an IPv4 or IPv6 host route, depending on 334 whether the AFI is IPv4 or IPv6. 336 ReachAsk and ReachTell entries MUST NOT be propagated from one BGP 337 peering session to another; the routes are not transitive. 339 The next hop field is the key for the NH-Reach NLRI type; the 340 information encoded in the top octet is non-key information. It is 341 possible in principle (although unlikely) for two NLRI to be validly 342 present in an UPDATE message with identical next hop fields but 343 different types. However, two NLRI with the same next hop field and 344 different State fields MUST NOT be encoded in the same UPDATE 345 message. If such is encountered, the receiver MUST behave as though 346 the state "Unknown" was received for the next hop in question. 348 6. Client Procedures for NH-Reach Changes 350 When an entry is added to a route server client's ReachAsk-In for a 351 route server peering session, the client will then attempt to verify 352 connectivity to the host depicted by that entry. The procedure 353 described in this specification utilizes BFD. 355 If no existing BFD session exists to this nexthop, a BFD session is 356 provisioned to that IP address and the LocReach reachability state 357 (Section 4.2) is set to Unknown. 359 If the client cannot establish a BFD session with an entry in its 360 ReachAsk-In, the nexthop remains in LocReach with its Reachable state 361 Unknown. 363 Once the BFD session moves to the Up state, the LocReach reachability 364 state is set to Up. 366 When the BFD session transitions out of the Up state to the Down 367 state, the LocReach reachability state is set to Down. 369 If the BFD session transitions out of the Up state to the AdminDown 370 state, the LocReach reachability state is set to Unknown. 372 When entries are removed from the route server client's ReachAsk-In 373 for a route server peering session, the client MAY delay de- 374 provisioning the BFD peering session. If the client delays de- 375 provisioning the session, it should remove it if the BFD session 376 transitions to the Down or AdminDown states. 378 7. Recommendations for Using BFD 380 The RECOMMENDED way a client router can confirm the data plane 381 connectivity to its next hops is available, is the use of BFD in 382 asynchronous mode. Echo mode MAY be used if both client routers 383 running a BFD session support this. The use of authentication in BFD 384 is OPTIONAL as there is a certain level of trust between the 385 operators of the client routers at a particular IXP. If trust cannot 386 be assumed, it is recommended to use pair-wise keys (how this can be 387 achieved is outside the scope of this document). The ttl/hop limit 388 values as described in section 5 [RFC5881] MUST be obeyed in order to 389 shield BFD sessions against packets coming from outside the IXP. 391 The following values of the BFD configuration of client routers (see 392 section 6.8.1 [RFC5880]) are RECOMMENDED: 394 o DesiredMinTxInterval: 1,000,000 (microseconds) 395 o RequiredMinRxInterval: 1,000,000 (microseconds) 396 o DetectMult: 3 398 A client router administrator MAY select more appropriate values to 399 meet the special needs of a particular deployment. 401 8. Other Considerations 403 For purposes of routing stability, implementations may wish to apply 404 hysteresis ("holddown") to next hops that have transitioned from 405 reachable to unreachable and back. 407 Implementations MAY restrict the range of addresses with which they 408 will attempt to form BFD relationships. For example, an 409 implementation might by default only allow BFD relationships with 410 peers that share a subnetwork with the route server. An 411 implementation MAY apply such restrictions by default. 413 9. IANA Considerations 415 IANA is requested to allocate a value from the Subsequent Address 416 Family Identifiers (SAFI) Parameters registry for this proposal. Its 417 Description in that registry shall be NH-Reach with a Reference of 418 this RFC. 420 10. Security Considerations 422 The mechanism in this document permits a route server client to 423 influence the contents of the route server's Adj-Ribs-Out through its 424 reports of next hop reachability state using the NH-Reach SAFI. 425 Since this state is per-client, if a route server client is able to 426 inject NH-Reach routes for another route server's BGP session to a 427 client, it can cause the route server to select different forwarding 428 than otherwise expected. This issue may be mitigated using transport 429 security on the BGP sessions between the route server and its 430 clients. See [RFC4272]. 432 The NH-Reach SAFI enables the server to trigger creation of a BFD 433 session on its client. A malicious or misbehaving server could 434 trigger an unreasonable number of sessions, a potential resource 435 exhaustion attack. The sedate default timers proposed in Section 7 436 mitigate this; they also mitigate concerns about use of the client as 437 a source of packets in a flooding attack. An implementation MAY also 438 impose limits on the number of BFD sessions it will create at the 439 request of the server. 441 The reachability tests between route server clients themselves may be 442 a target for attack. Such attacks may include forcing a BFD session 443 Down through injecting false BFD state. A less likely attack 444 includes forcing a BFD session to stay Up when its real state is 445 Down. These attacks may be mitigated using the BFD security 446 mechanisms defined in [RFC5880]. 448 11. References 450 11.1. Normative References 452 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 453 Requirement Levels", BCP 14, RFC 2119, 454 DOI 10.17487/RFC2119, March 1997, 455 . 457 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 458 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 459 DOI 10.17487/RFC4271, January 2006, 460 . 462 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 463 "Multiprotocol Extensions for BGP-4", RFC 4760, 464 DOI 10.17487/RFC4760, January 2007, 465 . 467 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 468 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 469 . 471 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 472 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, 473 DOI 10.17487/RFC5881, June 2010, 474 . 476 [RFC7947] Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, 477 "Internet Exchange BGP Route Server", RFC 7947, 478 DOI 10.17487/RFC7947, September 2016, 479 . 481 11.2. Informative References 483 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 484 RFC 4272, DOI 10.17487/RFC4272, January 2006, 485 . 487 Appendix A. Summary of Document Changes 489 idr-02 to idr-03: Substantial rewrite. Introduce NLRI format that 490 embeds state. 491 idr-01 to idr-02: Move from BGP-LS to NH-Reach SAFI. Lots of 492 editorial changes. 493 idr-00 to idr-01: Add BGP Capability. Move from NH-Cost to BGP-LS. 494 ymbk-01 to idr-00: No technical changes; adopted by IDR. 495 ymbk-00 to ymbk-01: Clarifications to BFD procedures. Use BFD state 496 as an input to BGP route selection. 498 Authors' Addresses 500 Randy Bush 501 Internet Initiative Japan 502 5147 Crystal Springs 503 Bainbridge Island, Washington 98110 504 US 506 Email: randy@psg.com 508 Jeffrey Haas 509 Juniper Networks, Inc. 510 1133 Innovation Way 511 Sunnyvale, CA 94089 512 US 514 Email: jhaas@juniper.net 515 John G. Scudder 516 Juniper Networks, Inc. 517 1133 Innovation Way 518 Sunnyvale, CA 94089 519 US 521 Email: jgs@juniper.net 523 Arnold Nipper 524 DE-CIX Management GmbH 525 Lichtstrasse 43i 526 Cologne 50825 527 Germany 529 Email: arnold.nipper@de-cix.net 531 Thomas King 532 DE-CIX Management GmbH 533 Lichtstrasse 43i 534 Cologne 50825 535 Germany 537 Email: thomas.king@de-cix.net