idnits 2.17.1 draft-ietf-grow-bgp-gshut-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 21, 2017) is 2501 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Pierre Francois 3 Internet-Draft Individual Contributor 4 Intended status: Informational B. Decraene 5 Expires: December 23, 2017 Orange 6 C. Pelsser 7 Strasbourg University 8 K. Patel 9 Arrcus, Inc. 10 C. Filsfils 11 Cisco Systems 12 June 21, 2017 14 Graceful BGP session shutdown 15 draft-ietf-grow-bgp-gshut-07 17 Abstract 19 This draft describes operational procedures aimed at reducing the 20 amount of traffic lost during planned maintenances of routers or 21 links, involving the shutdown of BGP peering sessions. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on December 23, 2017. 40 Copyright Notice 42 Copyright (c) 2017 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 3. Packet loss upon manual eBGP session shutdown . . . . . . . . 3 60 4. Practices to avoid packet losses . . . . . . . . . . . . . . 4 61 4.1. Improving availability of alternate paths . . . . . . . . 4 62 4.2. Make before break convergence: g-shut . . . . . . . . . . 4 63 5. Forwarding modes and transient forwarding loops during 64 convergence . . . . . . . . . . . . . . . . . . . . . . . . . 7 65 6. Link Up cases . . . . . . . . . . . . . . . . . . . . . . . . 7 66 6.1. Unreachability local to the ASBR . . . . . . . . . . . . 7 67 6.2. iBGP convergence . . . . . . . . . . . . . . . . . . . . 7 68 7. IANA assigned g-shut BGP community . . . . . . . . . . . . . 8 69 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 70 9. Security Considerations . . . . . . . . . . . . . . . . . . . 9 71 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 72 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 73 11.1. Normative References . . . . . . . . . . . . . . . . . . 9 74 11.2. Informative References . . . . . . . . . . . . . . . . . 9 75 Appendix A. Alternative techniques with limited applicability . 10 76 A.1. Multi Exit Discriminator tweaking . . . . . . . . . . . . 10 77 A.2. IGP distance Poisoning . . . . . . . . . . . . . . . . . 10 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 80 1. Introduction 82 Routing changes in BGP can be caused by planned, maintenance 83 operations. This document discusses operational procedures to be 84 applied in order to reduce or eliminate losses of packets during the 85 maintenance. These losses come from the transient lack of 86 reachability during the BGP convergence following the shutdown of an 87 eBGP peering session between two Autonomous System Border Routers 88 (ASBR). 90 This document presents procedures for the cases where the forwarding 91 plane is impacted by the maintenance, hence when the use of Graceful 92 Restart does not apply. 94 The procedures described in this document can be applied to reduce or 95 avoid packet loss for outbound and inbound traffic flows initially 96 forwarded along the peering link to be shut down. These procedures 97 trigger, in both involved ASes, rerouting to the alternate path, 98 while allowing routers to keep using old paths until alternate ones 99 are learned, installed in the RIB and in the FIB. This ensures that 100 routers always have a valid route available during the convergence 101 process. 103 The goal of the document is to meet the requirements described in 104 [RFC6198] at best, without changing the BGP protocol. 106 Still, it explains why reserving a community value for the purpose of 107 BGP session graceful shutdown would reduce the management overhead 108 bound with the solution. It would also allow vendors to provide an 109 automatic graceful shutdown mechanism that does not require any 110 router reconfiguration at maintenance time. 112 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 113 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 114 document are to be interpreted as described in RFC 2119 [RFC2119]. 116 2. Terminology 118 g-shut initiator: a router on which the session shutdown is performed 119 for the maintenance. 121 g-shut neighbor: a router that peers with the g-shut initiator via 122 (one of) the session(s) to be shut down. 124 Initiator AS: the Autonomous System of the g-shut initiator. 126 Neighbor AS: the Autonomous System of the g-shut neighbor. 128 Loss of Connectivity (LoC: the state when a router has no path 129 towards an affected prefix. 131 3. Packet loss upon manual eBGP session shutdown 133 Packets can be lost during a manual shutdown of an eBGP session for 134 two reasons. 136 First, routers involved in the convergence process can transiently 137 lack of paths towards an affected prefix, and drop traffic destined 138 to this prefix. This is because alternate paths can be hidden by 139 nodes of an AS. This happens when the paths are not selected as best 140 by the ASBR that receive them on an eBGP session, or by Route 141 Reflectors that do not propagate them further in the iBGP topology 142 because they do not select them as best. 144 Second, within the AS, the FIB of routers can be transiently 145 inconsistent during the BGP convergence and packets towards affected 146 prefixes can loop and be dropped. Note that these loops only happen 147 when ASBR-to-ASBR encapsulation is not used within the AS. 149 This document only addresses the first reason. 151 4. Practices to avoid packet losses 153 This section describes means for an ISP to reduce the transient loss 154 of packets upon a manual shutdown of a BGP session. 156 4.1. Improving availability of alternate paths 158 All solutions that increase the availability of alternate BGP paths 159 at routers performing packet lookups in BGP tables such as 160 [I-D.ietf-idr-best-external] and [RFC7911] help in reducing the LoC 161 bound with manual shutdown of eBGP sessions. 163 One of such solutions increasing diversity in such a way that, at any 164 single step of the convergence process following the eBGP session 165 shutdown, a BGP router does not receive a message withdrawing the 166 only path it currently knows for a given NLRI, allows for a 167 simplified g-shut procedure. 169 Note that the LoC for the inbound traffic of the maintained router, 170 induced by a lack of alternate path propagation within the iBGP 171 topology of a neighboring AS is not under the control of the operator 172 performing the maintenance. The part of the procedure aimed at 173 avoiding LoC for incoming paths can thus be applied even if no LoC 174 are expected for the outgoing paths. 176 4.2. Make before break convergence: g-shut 178 This section describes configurations and actions to be performed for 179 the graceful shutdown of eBGP sessions, iBGP sessions or a whole BGP 180 speaker. 182 The goal of this procedure is to let, in both ASes, the paths being 183 shutdown visible, but with a lower LOCAL_PREF value, while alternate 184 paths spread through the iBGP topology. Instead of withdrawing the 185 path, routers of an AS will keep on using it until they become aware 186 of alternate paths. 188 4.2.1. eBGP g-shut 190 This section describes configurations and actions to be performed for 191 the graceful shutdown of eBGP peering links. 193 4.2.1.1. Pre-configuration 195 On each ASBR supporting the g-shut procedure, an outbound BGP route 196 policy is applied on all iBGP sessions of the ASBR, that: 198 o matches the g-shut community 200 o sets the LOCAL_PREF attribute of the paths tagged with the g-shut 201 community to a low value 203 o removes the g-shut community from the paths. 205 o optionally, adds an AS specific g-shut community on these paths to 206 indicate that these are to be withdrawn soon. If some ingress 207 ASBRs reset the LOCAL_PREF attribute, this AS specific g-shut 208 community will be used to override other LOCAL_PREF preference 209 changes. 211 Note that in the case where an AS is aggregating multiple routes 212 under a covering prefix, it is recommended to filter out the g-shut 213 community from the resulting aggregate BGP route. By doing so, the 214 setting of the g-shut community on one of the aggregated routes will 215 not let the entire aggregate inherit the community. Not doing so 216 would let the entire aggregate undergo the g-shut behavior. 218 4.2.1.2. Operations at maintenance time 220 On the g-shut initiator, upon maintenance time, it is required to: 222 o apply an outbound BGP route policy on the maintained eBGP session 223 to tag the paths propagated over the session with the g-shut 224 community. This will trigger the BGP implementation to re- 225 advertise all active routes previously advertised, and tag them 226 with the g-shut community. 228 o apply an inbound BGP route policy on the maintained eBGP session 229 to tag the paths received over the session with the g-shut 230 community. 232 o wait for convergence to happen. 234 o perform a BGP session shutdown. 236 4.2.1.3. BGP implementation support for g-Shut 238 A BGP router implementation MAY provide features aimed at automating 239 the application of the graceful shutdown procedures described above. 241 Upon a session shutdown specified as graceful by the operator, a BGP 242 implementation supporting a g-shut feature SHOULD: 244 1. On the eBGP side, update all the paths propagated over the 245 corresponding eBGP session, tagging the g-shut community to them. 246 Any subsequent update sent to the session being gracefully shut 247 down would be tagged with the g-shut community. 249 2. On the iBGP side, lower the LOCAL_PREF value of the paths 250 received over the eBGP session being shut down, upon their 251 propagation over iBGP sessions. Optionally, also tag these paths 252 with an AS specific g-shut community. 254 3. Optionally shut down the session after a configured time. 256 4. Prevent the g-shut community from being inherited by a path that 257 would aggregate some paths tagged with the GSHUT community. This 258 behavior avoids the GSHUT procedure to be applied to the 259 aggregate upon the graceful shutdown of one of its covered 260 prefixes. 262 A BGP implementation supporting a g-shut feature SHOULD also 263 automatically install the BGP policies that are supposed to be 264 configured, as described in Section 4.2.1.1 for sessions over which 265 g-shut is to be supported. 267 4.2.2. iBGP g-shut 269 For the shutdown of an iBGP session, provided the iBGP topology is 270 viable after the maintenance of the session, i.e, if all BGP speakers 271 of the AS have an iBGP signaling path for all prefixes advertised on 272 this g-shut iBGP session, then the shutdown of an iBGP session does 273 not lead to transient unreachability. As a consequence, no specific 274 g-shut action is required. 276 4.2.3. Router g-shut 278 In the case of a shutdown of a router, a reconfiguration of the 279 outbound BGP route policies of the g-shut initiator SHOULD be 280 performed to set a low LOCAL_PREF value for the paths originated by 281 the g-shut initiator (e.g, BGP aggregates redistributed from other 282 protocols, including static routes). 284 This behavior is equivalent to the recommended behavior for paths 285 "redistributed" from eBGP sessions to iBGP sessions in the case of 286 the shutdown of an ASBR. 288 5. Forwarding modes and transient forwarding loops during convergence 290 The g-shut procedure or the solutions improving the availability of 291 alternate paths, do not change the fact that BGP convergence and the 292 subsequent FIB updates are run independently on each router of the 293 ASes. If the AS applying the solution does not rely on encapsulation 294 to forward packets from the Ingress Border Router to the Egress 295 Border Router, then transient forwarding loops and consequent packet 296 losses can occur during the convergence process. If zero LoC is 297 required, encapsulation is required between ASBRs of the AS. 299 6. Link Up cases 301 We identify two potential causes for transient packet losses upon an 302 eBGP link up event. The first one is local to the g-no-shut 303 initiator, the second one is due to the BGP convergence following the 304 injection of new best paths within the iBGP topology. 306 6.1. Unreachability local to the ASBR 308 An ASBR that selects as best a path received over a newly brought up 309 eBGP session may transiently drop traffic. This can typically happen 310 when the nexthop attribute differs from the IP address of the eBGP 311 peer, and the receiving ASBR has not yet resolved the MAC address 312 associated with the IP address of that "third party" nexthop. 314 A BGP speaker implementation could avoid such losses by ensuring that 315 "third party" nexthops are resolved before installing paths using 316 these in the RIB. 318 If the link up event corresponds to an eBGP session that is being 319 manually brought up, over an already up multi-access link, then the 320 operator can ping third party nexthops that are expected to be used 321 before actually bringing the session up, or ping directed broadcast 322 the subnet IP address of the link. By proceeding like this, the MAC 323 addresses associated with these third party nexthops will be resolved 324 by the g-no-shut initiator. 326 6.2. iBGP convergence 328 Corner cases leading to LoC can occur during an eBGP link up event. 330 A typical example for such transient unreachability for a given 331 prefix is the following: 333 Let's consider 3 route reflectors RR1, RR2, RR3. There is a full 334 mesh of iBGP session between them. 336 1. RR1 is initially advertising the current best path to the 337 members of its iBGP RR full-mesh. It propagated that path within 338 its RR full-mesh. RR2 knows only that path toward the prefix. 340 2. RR3 receives a new best path originated by the "g-no-shut" 341 initiator, being one of its RR clients. RR3 selects it as best, 342 and propagates an UPDATE within its RR full-mesh, i.e., to RR1 and 343 RR2. 345 3. RR1 receives that path, reruns its decision process, and picks 346 this new path as best. As a result, RR1 withdraws its previously 347 announced best-path on the iBGP sessions of its RR full-mesh. 349 4. If, for any reason, RR3 processes the withdraw generated in 350 step 3, before processing the update generated in step 2, RR3 351 transiently suffers from unreachability for the affected prefix. 353 The use of [I-D.ietf-idr-best-external] among the RR of the iBGP 354 full-mesh can solve these corner cases by ensuring that within an AS, 355 the advertisement of a new route is not translated into the withdraw 356 of a former route. 358 Indeed, "best-external" ensures that an ASBR does not withdraw a 359 previously advertised (eBGP) path when it receives an additional, 360 preferred path over an iBGP session. Also, "best-intra-cluster" 361 ensures that a RR does not withdraw a previously advertised (iBGP) 362 path to its non clients (e.g. other RRs in a mesh of RR) when it 363 receives a new, preferred path over an iBGP session. 365 7. IANA assigned g-shut BGP community 367 Applying the g-shut procedure is rendered much easier with the use of 368 a single g-shut BGP community value [RFC1997] which could be used on 369 all eBGP sessions, for both inbound and outbound signaling. The 370 community value 0xFFFF0000 has been assigned by IANA for this 371 purpose. 373 8. IANA Considerations 375 This document has no actions for IANA. 377 9. Security Considerations 379 By providing the g-shut service to a neighboring AS, an ISP provides 380 means to this neighbor and possibly its downstream ASes to lower the 381 LOCAL_PREF value assigned to the paths received from this neighbor. 383 The neighbor could abuse the technique and do inbound traffic 384 engineering by declaring some prefixes as undergoing a maintenance so 385 as to switch traffic to another peering link. 387 If this behavior is not tolerated by the ISP, it SHOULD monitor the 388 use of the g-shut community by this neighbor. 390 10. Acknowledgments 392 The authors wish to thank Olivier Bonaventure, Pradosh Mohapatra and 393 Job Snijders for their useful comments on this work. 395 11. References 397 11.1. Normative References 399 [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities 400 Attribute", RFC 1997, DOI 10.17487/RFC1997, August 1996, 401 . 403 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 404 Requirement Levels", BCP 14, RFC 2119, 405 DOI 10.17487/RFC2119, March 1997, 406 . 408 [RFC6198] Decraene, B., Francois, P., Pelsser, C., Ahmad, Z., 409 Elizondo Armengol, A., and T. Takeda, "Requirements for 410 the Graceful Shutdown of BGP Sessions", RFC 6198, 411 DOI 10.17487/RFC6198, April 2011, 412 . 414 11.2. Informative References 416 [I-D.ietf-idr-best-external] 417 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 418 Gredler, "Advertisement of the best external route in 419 BGP", draft-ietf-idr-best-external-05 (work in progress), 420 January 2012. 422 [RFC7911] Walton, D., Retana, A., Chen, E., and J. Scudder, 423 "Advertisement of Multiple Paths in BGP", RFC 7911, 424 DOI 10.17487/RFC7911, July 2016, 425 . 427 Appendix A. Alternative techniques with limited applicability 429 A few alternative techniques have been considered to provide g-shut 430 capabilities but have been rejected due to their limited 431 applicability. This section describe them for possible reference. 433 A.1. Multi Exit Discriminator tweaking 435 The MED attribute of the paths to be avoided can be increased so as 436 to force the routers in the neighboring AS to select other paths. 438 The solution only works if the alternate paths are as good as the 439 initial ones with respect to the Local-Pref value and the AS Path 440 Length value. In the other cases, increasing the MED value will not 441 have an impact on the decision process of the routers in the 442 neighboring AS. 444 A.2. IGP distance Poisoning 446 The distance to the BGP nexthop corresponding to the maintained 447 session can be increased in the IGP so that the old paths will be 448 less preferred during the application of the IGP distance tie-break 449 rule. However, this solution only works for the paths whose 450 alternates are as good as the old paths with respect to their Local- 451 Pref value, their AS Path length, and their MED value. 453 Also, this poisoning cannot be applied when nexthop self is used as 454 there is no nexthop specific to the maintained session to poison in 455 the IGP. 457 Authors' Addresses 459 Pierre Francois 460 Individual Contributor 462 Email: pfrpfr@gmail.com 464 Bruno Decraene 465 Orange 467 Email: bruno.decraene@orange.com 468 Cristel Pelsser 469 Strasbourg University 471 Email: pelsser@unistra.fr 473 Keyur Patel 474 Arrcus, Inc. 476 Email: keyur@arrcus.com 478 Clarence Filsfils 479 Cisco Systems 481 Email: cfilsfil@cisco.com