idnits 2.17.1 draft-ietf-idr-rs-bfd-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (July 6, 2015) is 3218 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-ietf-idr-bgp-nh-cost-02 == Outdated reference: A later version (-12) exists of draft-ietf-idr-ix-bgp-route-server-07 == Outdated reference: A later version (-13) exists of draft-ietf-idr-ls-distribution-11 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Internet Initiative Japan 4 Intended status: Standards Track J. Haas 5 Expires: January 7, 2016 J. Scudder 6 Juniper Networks, Inc. 7 A. Nipper 8 T. King, Ed. 9 DE-CIX Management GmbH 10 July 6, 2015 12 Making Route Servers Aware of Data Link Failures at IXPs 13 draft-ietf-idr-rs-bfd-01 15 Abstract 17 When route servers are used, the data plane is not congruent with the 18 control plane. Therefore, the peers on the Internet exchange can 19 lose data connectivity without the control plane being aware of it, 20 and packets are dropped on the floor. This document proposes the use 21 of BFD between the two peering routers to detect a data plane 22 failure, and then uses BGP next hop cost to signal the state of the 23 data link to the route server(s). 25 Requirements Language 27 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 28 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to 29 be interpreted as described in [RFC2119] only when they appear in all 30 upper case. They may also appear in lower or mixed case as English 31 words, without normative meaning. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 7, 2016. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 68 2. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 2.1. Mutual Discovery of Route Server Client Routers . . . . . 3 70 2.2. Tracking Connectivity . . . . . . . . . . . . . . . . . . 4 71 3. Advertising Client Router Connectivity to the Route Server . 5 72 4. Modelling the IXP Network using BGP Link-State . . . . . . . 5 73 5. Utilizing Next Hop Unreachability Information at Client 74 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 75 6. Recommendations for Using BFD . . . . . . . . . . . . . . . . 6 76 7. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 8 77 8. Capability Detection . . . . . . . . . . . . . . . . . . . . 8 78 9. Other Considerations . . . . . . . . . . . . . . . . . . . . 8 79 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 80 11. Normative References . . . . . . . . . . . . . . . . . . . . 8 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 83 1. Introduction 85 In configurations (typically Internet Exchange Points (IXP)) where 86 EBGP routing information is exchanged between client routers through 87 the agency of a route server [I-D.ietf-idr-ix-bgp-route-server], but 88 traffic is exchanged directly, operational issues can arise when 89 partial data plane connectivity exists among the route server client 90 routers. This is because, as the data plane is not congruent with 91 the control plane, the client routers on the IXP can lose data 92 connectivity without the control plane - the route server - being 93 aware of it, and packets are dropped on the floor. 95 To remedy this, two basic problems need to be solved: 97 1. Client routers must have a means of verifying connectivity 98 amongst themselves, and 100 2. Client routers must have a means of communicating the knowledge 101 so gained back to the route server. 103 The first can be solved by application of Bidirectional Forwarding 104 Detection [RFC5880]. The second can be solved by use of BGP Link- 105 State [I-D.ietf-idr-ls-distribution]. There is a subsidiary problem 106 that must also be solved. Since one of the key value propositions 107 offered by a route server is that client routers need not be 108 configured to peer with each other: 110 3. Client routers must have a means (other than configuration) to 111 know of one another's existence. 113 This can also be solved by an application of BGP Link-State. 115 Throughout this document, we generally assume that the route server 116 being discussed is able to represent different RIBs towards different 117 clients, as discussed in section 2.3.2.1. 118 [I-D.ietf-idr-ix-bgp-route-server]. These procedures (other than the 119 use of BFD to track next hop reachability) have limited value if this 120 is not the case. 122 2. Operation 124 Below, we detail procedures where a route server tells its client 125 routers about other client routers (by sending it their next hops 126 using BGP Link-State), the client router verifies connectivity to 127 those other client routers (using BFD) and communicates its findings 128 back to the route server (again using BGP Link-State). The route 129 server uses the received BGP Link-State routes as input to the route 130 selection process it performs on behalf of the client. 132 2.1. Mutual Discovery of Route Server Client Routers 134 Strictly speaking, what is needed is not for a route server client 135 router to know of other (control-plane) client routers, but rather to 136 know (so that it can validate) all the next hops the route server 137 might choose to send the client router, i.e. to know of potential 138 forwarding plane relationships. 140 In effect, this requirement amounts to knowing the BGP next hops the 141 route server is aware of for the particular per-client Loc-RIB (see 142 section 2.3.2.1. [I-D.ietf-idr-ix-bgp-route-server]). We introduce 143 a new table for each client to store known next hops, their 144 compatibility with this proposed solution and their learned 145 reachability. We call these tables per-client Next Hop Information 146 Base (NHIB). BGP Link-State is used to transfer the NHIBs from the 147 route server to route server clients. 149 At the route server, the NHIB for each client is populated with the 150 next hops from its Loc-RIB. If the BGP capabilities learned during 151 BGP session setup identify a next hop as compatible with this 152 proposal, this is reflected in the NHIB. Initially, it is assumed 153 that the client router is able to reach its next hops which is stored 154 in the NHIB. 156 If a next hop is added to the NHIB for a particular client, a route 157 SHOULD be added to the router server's Adj-NHIB-Out. This route 158 contains a BGP Link-State SAFI and models the next hop as node (see 159 section 3.2.1 [I-D.ietf-idr-ls-distribution]) and the connectivity 160 between the route server and the next hop as link (see section 3.2.2 161 [I-D.ietf-idr-ls-distribution]). If a next hop is removed from a 162 NHIB, the corresponding route in the Adj-NHIB-Out SHOULD be removed. 164 A route server client SHOULD use BFD [RFC5880] (or other means beyond 165 the scope of this document) to track forwarding plane connectivity to 166 each next hop depicted in the received BGP Link-State information. 168 2.2. Tracking Connectivity 170 For each next hop in the NHIB received from the route server (called 171 Adj-NHIB-In), the client router SHOULD use some means to confirm that 172 data plane connectivity does exist to that next hop. 174 The client router maintains its own NHIB in order to keep track of 175 its (potential) next hops, their capabilities as learned from the 176 route server, and their reachability. The NHIB is updated according 177 to the Adj-NHIB-In and client routers own tests to verify 178 connectivity to next hops. 180 For each next hop in the Adj-NHIB-In received from the route server, 181 the client router SHOULD evaluate the next hop's compatibility with 182 this proposal. If the next hop supports this proposed mechanism the 183 client router SHOULD setup a BFD session to it if one is not already 184 available and track the reachability of this next hop. 186 For each next hop in the Adj-NHIB-In, a corresponding BGP Link-State 187 SAFI containing a node NLRI route SHOULD be placed in the client 188 router's own Adj-NHIB-Out to be advertised to the route server. If 189 the next hop is not compatible with this proposal a route containing 190 a BGP Link-State SAFI and a link NLRI SHOULD be placed in the client 191 router's own Adj-NHIB-Out. The link NLRI is configured as follows: 192 the local node is set to the client router, the remote node if set to 193 the particular next hop. Any next hop that is compatible with this 194 proposal and for which connectivity is in the process of verification 195 (in other words a BFD session is initiated) or is already verified a 196 route containing a BGP Link-State SAFI and a link NLRI as described 197 above SHOULD be placed to the client router's own Adj-NHIB-Out. For 198 any next hop for which connectivity has failed a route SHOULD be 199 placed in the client router's own Adj-NHIB-Out to withdraw the 200 previously advertised link from the route server. (This may also be 201 done as a result of policy even if connectivity exists.) 203 If the test of connectivity between one client router and another 204 client router has failed the client router that detected this failure 205 should perform connectivity test for a configurable amount of time 206 (preferable 24 hours) on a regular basis (e.g. every 5 minutes). If 207 during this time no connectivity can be restored no more testing is 208 performed until manually changed or the client router is rebooted. 210 3. Advertising Client Router Connectivity to the Route Server 212 As discussed above, a client router will advertise its Adj-NHIB-Out 213 to the route server. The route server SHOULD update the reachability 214 information of next hops in the client's NHIB table accordingly. 215 Furthermore, the route server SHOULD use reachability information 216 from the NHIB as input to its own decision process when computing the 217 Adj-RIB-Out for this peer. This peer-dependent Adj-RIB-Out is then 218 advertised to this peer. In particular, the route server MUST 219 exclude any routes whose next hops the client has declared to be not 220 reachable. 222 4. Modelling the IXP Network using BGP Link-State 224 This section describes how BGP Link-State is used to a) transfer the 225 per-client NHIB form the route server to the route server clients and 226 b) transfer the reachability information about next hops from the 227 route server client to the route server. 229 Each route server client and the route server are modeled as nodes 230 (see section 3.2.1 [I-D.ietf-idr-ls-distribution]). As node ID the 231 BGP identifier (see section 1.1 [RFC4271]) is used. 233 BGP Link-State defines as link a so-called half-way link (see section 234 3.2.2 [I-D.ietf-idr-ls-distribution]). To cover the bidirectional 235 connectivity between two nodes two link definitions are required. In 236 order to model the connectivity between two route server clients a 237 link is used. 239 For both nodes and links the Protocol-ID is set to 5 to reflect the 240 virtual modeling. The instance identifier for nodes and links is set 241 to 0 as the default layer 3 routing topology is utilized. 243 The link descriptor TLV code points 259-262 are applied depending on 244 the IP protocol version used. Prefix descriptors are not applied. 246 A way is needed to model whether a client router is compatible the 247 mechanisms described in this document or not. For this, a new node 248 descriptor Sub-TVLs (see section 3.2.1.4 249 [I-D.ietf-idr-ls-distribution]) is introduced. 251 +--------------------+-----------------------------+--------+ 252 | Sub-TLV Code Point | Description | Length | 253 +--------------------+-----------------------------+--------+ 254 | 516 | Compatible to this document | 1 | 255 +--------------------+-----------------------------+--------+ 257 Table 1: Node Descriptor Sub-TLV 259 The value of this Sub-TVL is set to 0 if a client router does not 260 support the mechanisms described in this document (of if the support 261 is administratively disabled). Otherwise the value is set to 1. 263 5. Utilizing Next Hop Unreachability Information at Client Routers 265 A client router detecting an unreachable next hop signals this 266 information to the route server as described above. Also, it treats 267 the routes as unresolvable as per section 9.1.2.1 [RFC4271] and 268 proceeds with route selection as normal. 270 Changes in nexthop reachability via these mechanisms should receive 271 some amount of consideration toward avoiding unnecessary route 272 flapping. Similar mechanisms exist in IGP implementations and should 273 be applied to this scenario. 275 6. Recommendations for Using BFD 277 The RECOMMENDED way a client router can confirm the data plane 278 connectivity to its next hops is available, is the use of BFD in 279 asynchronous mode. Echo mode MAY be used if both client routers 280 running a BFD session support this. The use of authentication in BFD 281 is OPTIONAL as there is a certain level of trust between the 282 operators of the client routers at a particular IXP. If trust cannot 283 be assumed, it is recommended to use pair-wise keys (how this can be 284 achieved is outside the scope of this document). The ttl/hop limit 285 values as described in section 5 [RFC5881] MUST be obeyed in order to 286 secure BFD sessions from packets coming from outside the IXP. 288 There is interdependence between the functionality described in this 289 document and BFD from an administrative point of view. To streamline 290 behaviour of different implementations the following is RECOMMENDED: 292 o If BFD is administratively shut down by the administrator of a 293 client router then the functionality described in this document 294 MUST also be administratively shut down. 295 o If the administrator enables the functionality described in this 296 document on a client router then BFD MUST be automatically 297 enabled. 299 The following values of the BFD configuration of client routers (see 300 section 6.8.1 [RFC5880]) are RECOMMENDED in order to allow a fast 301 detection of lost data plane connectivity: 303 o DesiredMinTxInterval: 1,000,000 (microseconds) 304 o RequiredMinRxInterval: 1,000,000 (microseconds) 305 o DetectMult: 3 307 The configuration values above are a trade-off between fast detection 308 of data plane connectivity and the load client routers must handle 309 keeping up the BFD communication. Selecting smaller 310 DesiredMinTxInterval and RequiredMinRxInterval values generates lots 311 of BFD packets, especially at larger IXPs with many hundreds of 312 client routers. 314 The configuration values above are selected in order to handle brief 315 interrupts on the data plane. Otherwise, if a BFD session detects a 316 brief data plane interrupt to a particular client router, it will 317 cause to signal the route server that it should remove routes from 318 this client router and tell it shortly afterwards to add the routes 319 again. This is disruptive and computational expensive on the route 320 server. 322 The configuration values above are also partially impacted by BGP 323 advertisement time in reaction to events from BFD. If the 324 configuration values are selected so that BFD detects data plane 325 interrupts a lot faster than the BGP advertisement time, a data plane 326 connectivity flapping could be detected by BFD but the route server 327 is not informed about them because BGP is not able to transport this 328 information fast enough. 330 As discussed, finding good configuration values is hard so a client 331 router administrator MAY select better suited values depending on the 332 special needs of the particular deployment. 334 7. Bootstrapping 336 If the route server starts it does not know anything about 337 connectivity states between client routers. So, the route server 338 assumes optimistically that all client routers are able to reach each 339 other unless told otherwise. 341 8. Capability Detection 343 In order for two BGP speakers to follow the mechanism defined in this 344 document, they MUST use BGP Capabilities Advertisements [RFC5492]. 345 This is done as specified in [RFC4760], by using capability code 1 346 (multiprotocol BGP), with an AFI XXX and SAFI XXX. 348 9. Other Considerations 350 For purposes of routing stability, implementations may wish to apply 351 hysteresis ("holddown") to next hops that have transitioned from 352 reachable to unreachable and back. 354 10. Acknowledgments 356 The authors would like to thank the authors of 357 [I-D.ietf-idr-bgp-nh-cost] for their work as it was a basis for this 358 proposal. 360 11. Normative References 362 [I-D.ietf-idr-bgp-nh-cost] 363 Varlashkin, I., Raszuk, R., Patel, K., Bhardwaj, M., and 364 S. Bayraktar, "Carrying next-hop cost information in BGP", 365 draft-ietf-idr-bgp-nh-cost-02 (work in progress), May 366 2015. 368 [I-D.ietf-idr-ix-bgp-route-server] 369 Jasinska, E., Hilliard, N., Raszuk, R., and N. Bakker, 370 "Internet Exchange BGP Route Server", draft-ietf-idr-ix- 371 bgp-route-server-07 (work in progress), June 2015. 373 [I-D.ietf-idr-ls-distribution] 374 Gredler, H., Medved, J., Previdi, S., Farrel, A., and S. 375 Ray, "North-Bound Distribution of Link-State and TE 376 Information using BGP", draft-ietf-idr-ls-distribution-11 377 (work in progress), June 2015. 379 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 380 Requirement Levels", BCP 14, RFC 2119, March 1997. 382 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 383 Protocol 4 (BGP-4)", RFC 4271, January 2006. 385 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 386 "Multiprotocol Extensions for BGP-4", RFC 4760, January 387 2007. 389 [RFC5492] Scudder, J. and R. Chandra, "Capabilities Advertisement 390 with BGP-4", RFC 5492, February 2009. 392 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 393 (BFD)", RFC 5880, June 2010. 395 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 396 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 397 2010. 399 Authors' Addresses 401 Randy Bush 402 Internet Initiative Japan 403 5147 Crystal Springs 404 Bainbridge Island, Washington 98110 405 US 407 Email: randy@psg.com 409 Jeffrey Haas 410 Juniper Networks, Inc. 411 1194 N. Mathilda Ave. 412 Sunnyvale, CA 94089 413 US 415 Email: jhaas@juniper.net 417 John G. Scudder 418 Juniper Networks, Inc. 419 1194 N. Mathilda Ave. 420 Sunnyvale, CA 94089 421 US 423 Email: jgs@juniper.net 424 Arnold Nipper 425 DE-CIX Management GmbH 426 Lichtstrasse 43i 427 Cologne 50825 428 Germany 430 Email: arnold.nipper@de-cix.net 432 Thomas King (editor) 433 DE-CIX Management GmbH 434 Lichtstrasse 43i 435 Cologne 50825 436 Germany 438 Email: thomas.king@de-cix.net