idnits 2.17.1 draft-ietf-grow-bgp-graceful-shutdown-requirements-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 28, 2011) is 4830 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group B. Decraene 3 Internet-Draft France Telecom 4 Intended status: Informational P. Francois 5 UCL 6 C. Pelsser 7 IIJ 8 Z. Ahmad 9 Orange Business Services 10 A. J. Elizondo Armengol 11 Telefonica I+D 12 T. Takeda 13 NTT 14 January 28, 2011 16 Requirements for the graceful shutdown of BGP sessions 17 draft-ietf-grow-bgp-graceful-shutdown-requirements-07.txt 19 Status of this Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. This document may contain material 23 from IETF Documents or IETF Contributions published or made publicly 24 available before November 10, 2008. The person(s) controlling the 25 copyright in some of this material may not have granted the IETF 26 Trust the right to allow modifications of such material outside the 27 IETF Standards Process. Without obtaining an adequate license from 28 the person(s) controlling the copyright in such materials, this 29 document may not be modified outside the IETF Standards Process, and 30 derivative works of it may not be created outside the IETF Standards 31 Process, except to format it for publication as an RFC or to 32 translate it into languages other than English. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt. 47 The list of Internet-Draft Shadow Directories can be accessed at 48 http://www.ietf.org/shadow.html. 50 This Internet-Draft will expire on July 27, 2011. 52 Internet-Draft Requirements for the graceful shutdown of BGP sessions 54 Copyright Notice 56 Copyright (c) 2011 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Abstract 71 The Border Gateway Protocol(BGP) is heavily used in Service Provider 72 networks both for Internet and BGP/MPLS VPN services. For resiliency 73 purposes, redundant routers and BGP sessions can be deployed to 74 reduce the consequences of an AS Border Router or BGP session 75 breakdown on customers' or peers' traffic. However simply taking down 76 or even bringing up a BGP session for maintenance purposes may still 77 induce connectivity losses during the BGP convergence. This is not 78 satisfactory any more for new applications (e.g. voice over IP, on 79 line gaming, VPN). Therefore, a solution is required for the graceful 80 shutdown of a (set of) BGP session(s) in order to limit the amount of 81 traffic loss during a planned shutdown. This document expresses 82 requirements for such a solution. 84 Table of Contents 86 1. Conventions used in this document...........................3 87 2. Introduction................................................3 88 3. Problem statement...........................................4 89 3.1. Example of undesirable BGP routing behavior.................4 90 3.2. Causes of packet loss.......................................5 91 4. Terminology.................................................6 92 5. Goals and requirements......................................7 93 6. Security Considerations.....................................9 94 7. IANA Considerations........................................10 95 8. References.................................................10 96 8.1. Normative References.......................................10 97 8.2. Informative References.....................................10 98 9. Acknowledgments............................................10 99 10. Appendix: Reference BGP Topologies.........................12 100 10.1. EBGP topologies............................................12 101 10.2. IBGP topologies............................................14 102 10.3. Routing decisions..........................................17 104 Internet-Draft Requirements for the graceful shutdown of BGP sessions 106 1. Conventions used in this document 108 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 109 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 110 document are to be interpreted as described in RFC 2119 [RFC2119]. 112 2. Introduction 114 The Border Gateway Protocol(BGP) [RFC4271] is heavily used in Service 115 Provider networks both for Internet and BGP/MPLS VPN services 116 [RFC4364]. For resiliency purposes, redundant routers and BGP 117 sessions can be deployed to reduce the consequences of an AS Border 118 Router or BGP session breakdown on customers' or peers' traffic. 120 We place ourselves in the context where a Service Provider performs a 121 maintenance operation and needs to shut down one or multiple BGP 122 peering link(s) or a whole ASBR. If an alternate path is available 123 within the AS, the requirement is to avoid or reduce customer or peer 124 traffic loss during the BGP convergence. Indeed, as an alternate path 125 is available in the Autonomous System (AS), it should be made 126 possible to reroute the customer or peer traffic on this backup path 127 before the BGP session(s) is/are torn down, the nominal path is 128 withdrawn and the forwarding is stopped. 130 The requirements also cover the subsequent re-establishment of the 131 BGP session as even this "UP" case can currently trigger route loss 132 and thus traffic loss at some routers. 134 BGP [RFC4271] and MP-BGP [RFC4760] do not currently have a mechanism 135 to gracefully migrate traffic from one BGP next hop to another 136 without interrupting the flow of traffic. When a BGP session is taken 137 down, BGP behaves as if it was a sudden link or router failure and 138 withdraws the prefixes learnt over that session, which may trigger 139 traffic loss. There is no mechanism to advertise to its BGP peers 140 that the prefix will soon be unreachable, while still being 141 reachable. When applicable, such mechanism would reduce or prevent 142 traffic loss. It would typically be applicable in case of a 143 maintenance operation requiring the shutdown of a forwarding 144 resource. Typical examples would be a link or line card maintenance, 145 replacement or upgrade. It may also be applicable for a software 146 upgrade as it may involve a firmware reset on the line cards and 147 hence forwarding interruption. 148 The introduction of Route Reflectors as per [RFC4456] to solve 149 scalability issues bound to IBGP full-meshes has worsened the 150 duration of routing convergence as some route reflectors may hide the 151 back up path. Thus depending on RR topology more IBGP hops may be 152 involved in the IBGP convergence. 154 Internet-Draft Requirements for the graceful shutdown of BGP sessions 156 Note that these planned maintenance operations cannot be addressed by 157 Graceful Restart extensions [RFC4724] as GR only applies when the 158 forwarding is preserved during the control plane restart. On the 159 contrary, Graceful Shutdown applies when the forwarding is 160 interrupted. 161 Note also that some protocols are already considering such graceful 162 shutdown procedure (e.g. GMPLS in [RFC5817]). 164 A metric of success is the degree to which such a mechanism 165 eliminates traffic loss during maintenance operations. 167 3. Problem statement 169 As per [RFC4271], when one (or many) BGP session(s) are shut down, a 170 BGP NOTIFICATION message is sent to the peer and the session is then 171 closed. A protocol convergence is then triggered both by the local 172 router and by the peer. Alternate paths to the destination are 173 selected, if known. If those alternates paths are not known prior to 174 the BGP session shutdown, additional BGP convergence steps are 175 required in each AS to search for an alternate path. 177 This behavior is not satisfactory in a maintenance situation because 178 the traffic that was directed towards the removed next-hops may be 179 lost until the end of the BGP convergence. As it is a planned 180 operation, a make before break solution should be made possible. 182 As maintenance operations are frequent in large networks [Reliable], 183 the global availability of the network is significantly impaired by 184 this BGP maintenance issue. 186 3.1. Example of undesirable BGP routing behavior 188 To illustrate these problems, let us consider the following simple 189 example where one customer router "CUST" is dual-attached to two SP 190 routers "ASBR1" and "ASBR2". 191 ASBR1 and ASBR2 are in the same AS and owned by the same service 192 provider. Both are IBGP client of the route reflector R1. 194 Internet-Draft Requirements for the graceful shutdown of BGP sessions 196 ' 197 AS1 ' AS2 198 ' 200 /-----------ASBR1--- 201 / \ 202 / \ 203 CUST R1 204 \ / 205 Z/z \ / 206 \-----------ASBR2--- 208 ' 209 AS1 ' AS2 210 ' 212 Figure 1. Dual attached customer 214 Before the maintenance, packets for destination Z/z use the ASBR1- 215 CUST link because R1 selects ASBR1's route based on the IGP cost. 217 Let's assume the service provider wants to shutdown the ASBR1-CUST 218 link for maintenance purposes. Currently, when the shutdown is 219 performed on ASBR1, the following steps are performed: 220 1. ASBR1 withdraw its prefix Z/z to its route reflector R1. 221 2. R1 runs its decision process, selects the route from ASBR2 and 222 advertises the new path to ASBR1. 223 3. ASBR1 runs its decision process and recovers the reachability of 224 Z/z. 226 Traffic is lost between step 1 when ASBR1 looses its route and step 3 227 when it discovers a new path. 229 Note that this is a simplified description for illustrative purpose. 230 In a bigger AS, multiple steps of BGP convergence may be required to 231 find and select the best alternate path (e.g. ASBR1 is chosen based 232 on a higher local pref, hierarchical route reflectors are used...). 233 When multiple BGP routers are involved and plenty of prefixes are 234 affected, the recovery process can take longer than applications 235 requirements. 237 3.2. Causes of packet loss 239 The loss of packets during the maintenance has two main causes: 240 - lack of an alternate path on some routers, 241 - transient routing inconsistency. 243 Internet-Draft Requirements for the graceful shutdown of BGP sessions 245 Some routers may lack an alternate path because another router is 246 hiding the backup path. This router can be: 247 - a route reflector only propagating its best path; 248 - the backup ASBR not advertising the backup path because it prefers 249 the nominal path. 250 This lack of knowledge of the alternate path is the first target of 251 this requirement draft. 253 Transient routing inconsistencies happen during IBGP convergence 254 because routers do not simultaneously update their RIBs and hence do 255 not simultaneously update their FIBs entries. This can lead to 256 forwarding loops which result in both link congestion and packet 257 drops. The duration of these transient micro-loops is dependent on 258 the IBGP topology (e.g. number of Route Reflectors between ingress 259 and egress ASBR), implementation differences among router platforms 260 which result in differences in the time taken to update specific 261 prefix in the FIB, forwarding mode (hop by hop IP forwarding versus 262 tunneling). 264 Note that when an IP lookup is only performed on entry to the AS, for 265 example prior to entry into a tunnel across the AS, micro-loops will 266 not occur. An example of this is when BGP is being used to as the 267 routing protocol for MPLS VPN as defined in [RFC4364]. 268 Note that [RFC5715] defines a framework for loop-free convergence. It 269 has been written in the context of IP Fast ReRoute for link state IGP 270 [RFC5714] but some concepts are also of interest for BGP convergence. 272 4. Terminology 274 g-shut: Graceful SHUTdown. A method for explicitly notifying the BGP 275 routers that a BGP session (and hence the prefixes learnt over that 276 session) is going to be disabled. 278 g-noshut: Graceful NO SHUTdown. A method for explicitly notifying 279 the BGP routers that a BGP session (and hence the prefixes learnt 280 over that session) is going to be enabled. 282 g-shut initiator: the router on which the session(s) shutdown is 283 (are) performed for the maintenance. 285 g-shut neighbor: a router that peers with the g-shut initiator 286 via (one of) the session(s) undergoing maintenance. 288 Affected prefixes: a prefix initially reached via the peering 289 link(s) undergoing maintenance. 291 Affected router: a router reaching an affected prefix via a 292 peering link undergoing maintenance. 294 Initiator AS: the autonomous system of the g-shut initiator 295 router. 297 Internet-Draft Requirements for the graceful shutdown of BGP sessions 299 Neighbor AS(es): the autonomous system(s) of the g-shut neighbor 300 router(s). 302 5. Goals and requirements 304 Currently, when a BGP session of the router under maintenance is shut 305 down, the router removes the routes and then triggers the BGP 306 convergence on its BGP peers by withdrawing its route. 307 The goal of BGP graceful shutdown of a (set of) BGP session(s) is to 308 minimize traffic loss during a planned shutdown. Ideally a solution 309 should reduce this traffic loss to zero. 310 Another goal is to minimize and preferably to eliminate packet loss 311 when the BGP session is re-established following the maintenance. 313 As the event is known in advance, a make before break solution can be 314 used in order to initiate the BGP convergence, find and install the 315 alternate paths before the nominal paths are removed. As a result, 316 before the nominal BGP session is shut down, all affected routers 317 learn and use the alternate paths. Those alternate paths are computed 318 by BGP taking into account the known status of the network which 319 includes known failures that the network is processing concurrently 320 with the BGP session graceful shutdown and possibly known other 321 graceful shutdown under way. Therefore multiple BGP graceful 322 shutdowns overlapping within a short timeframe are gracefully 323 handled. Indeed a given graceful shutdown takes into account all 324 previous ones and previous graceful shutdown are given some time to 325 adapt to this new one. Then the nominal BGP session can be shut down. 327 As a result, provided an alternate path with enough remaining 328 capacity is available, the packets are rerouted before the BGP 329 session termination and fewer packets (possibly none) are lost during 330 the BGP convergence process since at any time, all routers have a 331 valid path. 333 From the above goals we can derive the following requirements: 335 a) A mechanism to advertise the maintenance action to all affected 336 routers is REQUIRED. Such mechanism may be either implicit or 337 explicit. Note that affected routers can be located both in the local 338 AS and in neighboring ASes. Note also that the maintenance action can 339 either be the shutdown of a BGP session or the establishment of a BGP 340 session. 341 The mechanism SHOULD allow BGP routers to minimize and preferably to 342 eliminate packet loss when a path is removed or advertised. In 343 particular, it SHOULD be ensured that the old path is not removed 344 from the routing tables of the affected routers before the new path 345 is known. 346 The solution mechanism MUST significantly reduce and ideally 348 Internet-Draft Requirements for the graceful shutdown of BGP sessions 350 eliminate packet loss. A trade off may be made between the degree of 351 packet loss and the simplicity of the solution. 353 b) An Internet wide convergence is OPTIONAL. However if the 354 initiator AS and the neighbor AS(es) have a backup path, they SHOULD 355 be able to gracefully converge before the nominal path is shut down. 357 c) The proposed solution SHOULD be applicable to any kind of BGP 358 sessions (EBGP, IBGP, IBGP route reflector client, EBGP 359 confederations, EBGP multi hop, MultiProtocol BGP extension...) and 360 any address family. If a BGP implementation allows closing or 361 enabling a sub-set of AFIs carried in a MP-BGP session, this 362 mechanism MAY be applicable to this sub-set of AFIs. 364 Depending on the kind of session, there may be some variations in the 365 proposed solution in order to fulfill the requirements. 367 The following cases should be handled in priority: 368 - The shutdown of an inter-AS link and therefore the shutdown of an 369 eBGP session; 370 - The shutdown of an AS Border Router and therefore the shutdown of 371 all its BGP sessions. 373 Service Providers and platforms implementing a graceful shutdown 374 solution should note that in BGP/MPLS VPN as per [RFC4364], the PE-CE 375 routing can be performed by other protocols than BGP (e.g. static 376 routes, RIPv2, OSPF, IS-IS). This is out of scope of this document. 378 d) The proposed solution SHOULD NOT change the BGP convergence 379 behavior for the ASes exterior to the maintenance process, namely 380 ASes other than the initiator AS and it(s) neighbor AS(es). 382 e) An incremental deployment on a per AS or per BGP session basis 383 MUST be made possible. In case of partial deployment the proposed 384 solution SHOULD incrementally improve the maintenance process. 385 It should be noted that in an inter domain relation, one AS may have 386 more incentive to use graceful shutdown than the other. Similarly, in 387 a BGP/MPLS VPN environment, it's much easier to upgrade the PE 388 routers than the CE mainly because there is at least an order of 389 magnitude more CE and CE locations than PE and PE locations. As a 390 consequence, when splitting the cost of the solution between the g- 391 shut initiator and the g-shut neighbour the solution SHOULD favour a 392 low cost solution on the neighbour AS side in order to reduce the 393 impact on the g-shut neighbour. Impact should be understood as a 394 generic term which includes first hardware, then software, then 395 configuration upgrade. 397 f) Redistribution or advertisement of (static) IP routes into BGP 398 SHOULD also be covered. 400 Internet-Draft Requirements for the graceful shutdown of BGP sessions 402 g) The proposed solution MAY be designed in order to avoid 403 transient forwarding loops. Indeed, forwarding loops increase packet 404 transit delay and may lead to link saturation. 406 h) The specific procedure SHOULD end when the BGP session is closed 407 following the g-shut and once the BGP session is gracefully opened 408 following the g-noshut. In the end, once the planned maintenance is 409 finished the nominal BGP routing MUST be reestablished. 410 The duration of the g-shut procedure, and hence the time before the 411 BGP session is safely closed SHOULD be discussed by the solution 412 document. Examples of possible solutions are the use of a pre- 413 configured timer, of a message to signal the end of the BGP 414 convergence or monitoring the traffic on the g-shut interface. 416 i) The solution SHOULD be simple and simple to operate. Hence it 417 MAY only cover a subset of the cases. As a consequence, most of the 418 above requirements are expressed as "SHOULD" rather than "MUST". 420 The metrics to evaluate and compare the proposed solutions are: 421 - The duration of the remaining loss of connectivity when the BGP 422 session is brought down or up; 423 - The applicability to a wide range of BGP and network topologies; 424 - The simplicity; 425 - The duration of transient forwarding loops; 426 - The additional load introduced in BGP (e.g. BGP messages sent to 427 peer routers, peer ASes, the Internet). 429 6. Security Considerations 431 At the requirements stage, this graceful shutdown mechanism is 432 expected to not affect the security of the BGP protocol, especially 433 if it can be kept simple. No new sessions are required and the 434 additional ability to signal the graceful shutdown is not expected to 435 bring additional attack vector as BGP neighbors already have the 436 ability to send incorrect or misleading information or even shut down 437 the session. 439 Security considerations MUST be addressed by the proposed 440 solutions. In particular they SHOULD address the issues of bogus 441 g-shut messages and how they would affect the network(s), as well 442 as the impact of hiding a g-shut message so that g-shut is not 443 performed. 445 The solution SHOULD NOT increase the ability for one AS to 446 selectively influence routing decision in the peer AS (inbound 447 Traffic Engineering) outside the case of the BGP session 448 shutdown. Otherwise, the peer AS SHOULD have means to detect such 449 behavior. 451 Internet-Draft Requirements for the graceful shutdown of BGP sessions 453 7. IANA Considerations 455 This document has no actions for IANA. 457 8. References 459 8.1. Normative References 461 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 462 Requirement Levels", BCP 14, RFC 2119, March 1997. 464 [RFC4271] Rekhter, Y. and T. Li, "A Border Gateway protocol 4 465 (BGP)", RFC 4271, January 2006. 467 [RFC4760] Bates, T., Chandra, R., Katz, D. and Y. Rekhter, 468 "Multiprotocol Extensions for BGP-4", RFC 4760 January 469 2007. 471 [RFC4456] Bates, T., Chen E. and R. Chandra "BGP Route Reflection: 472 An Alternative to Full Mesh Internal BGP (IBGP)", RFC 473 4456 April 2006. 475 [RFC4364] Rosen, E. and Y. Rekhter "BGP/MPLS IP Virtual Private 476 Networks (VPNs)", RFC 4364 February 2006. 478 8.2. Informative References 480 [RFC5817] Ali, Z., Vasseur, J.P., Zamfir, A. and J. Newton, 481 "Graceful Shutdown in MPLS and Generalized MPLS Traffic 482 Engineering Networks", RFC 5817, April 2010. 484 [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free 485 Convergence", RFC 5715, January 2010. 487 [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", RFC 488 5714, January 2010. 490 [RFC4724] Sangli, S., Chen, E., Fernando, R., Scudder, J. and Y. 491 Rekhter, "Graceful Restart Mechanism for BGP", RFC 492 4724, January 2007. 494 [Reliable] Network Strategy Partners, LLC. "Reliable IP Nodes: A 495 prerequisite to profitable IP services", November 2002. 496 http://www.nspllc.com/NewPages/Reliable_IP_Nodes.pdf 498 9. Acknowledgments 500 Authors would like to thank Nicolas Dubois, Benoit Fondeviole, 501 Christian Jacquenet, Olivier Bonaventure, Steve Uhlig, Xavier 502 Vinet, Vincent Gillet, Jean-Louis le Roux, Pierre Alain Coste and 504 Internet-Draft Requirements for the graceful shutdown of BGP sessions 506 Ronald Bonica for the useful discussions on this subject, their 507 review and comments. 509 This draft has been partly sponsored by the European project IST 510 AGAVE. 512 Internet-Draft Requirements for the graceful shutdown of BGP sessions 514 10. Appendix: Reference BGP Topologies 516 This section describes some frequent BGP topologies used both within 517 the AS (IBGP) and between ASes (EBGP). Solutions should be applicable 518 to the following topologies and their combinations. 520 10.1. EBGP topologies 522 This section describes some frequent BGP topologies used between 523 ASes. In each figure, a line represents a BGP session. 525 10.1.1. 1 ASBR in AS1 connected to two ASBRs in the neighboring AS2 527 In this topology we have an asymmetric protection scheme between 528 AS1 and AS2: 529 - On AS2 side, two different routers are used to connect to AS1. 530 - On AS1 side, one single router with two BGP sessions is used. 532 ' 533 AS1 ' AS2 534 ' 535 /----------- ASBR2.1 536 / ' 537 / ' 538 ASBR1.1 ' 539 \ ' 540 \ ' 541 \----------- ASBR2.2 542 ' 543 ' 544 AS1 ' AS2 545 ' 547 Figure 2. EBGP topology with redundant ASBR in one of the AS. 549 BGP graceful shutdown is expected to be applicable for the 550 maintenance of: 551 - one of the routers of AS2; 552 - one link between AS1 and AS2, performed either on an AS1 or AS2 553 router. 555 Note that in case of maintenance of the whole router, all its BGP 556 sessions need to be gracefully shutdown at the beginning of the 557 maintenance and gracefully brought up at the end of the 558 maintenance. 560 Internet-Draft Requirements for the graceful shutdown of BGP sessions 562 10.1.2. 2 ASBRs in AS1 connected to 2 ASBRs in AS2 564 In this topology we have a symmetric protection scheme between 565 AS1 and AS2: on both sides, two different routers are used to 566 connect AS1 to AS2. 568 ' 569 AS1 ' AS2 570 ' 571 ASBR1.1----------- ASBR2.1 572 ' 573 ' 574 ' 575 ' 576 ' 577 ASBR1.2----------- ASBR2.2 578 ' 579 AS1 ' AS2 580 ' 582 Figure 3. EBGP topology with redundant ASBRs in both ASes 584 BGP graceful shutdown is expected to be applicable for the 585 maintenance of: 586 - any of the ASBR routers (in AS1 or AS2); 587 - one link between AS1 and AS2 performed either on an AS1 or AS2 588 router. 590 10.1.3. 2 ASBRs in AS2 each connected to two different ASes 592 In this topology at least three ASes are involved. 594 ' 595 AS1 ' AS2 596 ' 597 ASBR1.1----------- ASBR2.1 598 | ' 599 | ' 600 '''''|'''''''''' 601 | ' 602 | ' 603 ASBR3.1----------- ASBR2.2 604 ' 605 AS3 ' AS2 607 Figure 4. EBGP topology of a dual homed customer 609 Internet-Draft Requirements for the graceful shutdown of BGP sessions 611 As the requirements expressed in section 5 is to advertise the 612 maintenance only within the initiator and neighbor ASes, but not 613 Internet wide, BGP graceful shutdown solutions may not be 614 applicable to this topology. Depending on which routes are 615 exchanged between these ASes, some protection for some of the 616 traffic may be possible. 618 For instance if ASBR2.2 performs a maintenance affecting ASBR3.1 then 619 ASBR3.1 will be notified. However ASBR1.1 may not be notified of the 620 maintenance of the eBGP session between ASBR3.1 and ASBR2.2. 622 10.2. IBGP topologies 624 This section describes some frequent BGP topologies used within an 625 AS. In each figure, a line represents a BGP session. 627 10.2.1. IBGP Full-Mesh 629 In this topology we have a full mesh of IBGP sessions: 631 P1 ----- P2 632 | \ / | 633 | \ / | 634 | \/ | AS1 635 | /\ | 636 | / \ | 637 | / \ | 638 ASBR1.1--ASBR1.2 639 \ / 640 \ / 641 ''''''\'''/'''''''''''' 642 \ / AS2 643 ASBR2.1 645 Figure 5. IBGP full mesh 647 When the session between ASBR1.1 and ASBR2.1 is gracefully 648 shutdown, it is required that all affected routers of AS1 reroute 649 traffic to ASBR1.2 before the session between ASBR1.1 and ASBR2.1 650 is shut down. 651 Similarly, when the session between ASBR1.1 and ASBR2.1 is 652 gracefully brought up, all affected routers of AS1 preferring 653 ASBR1.1 over ASBR1.2 need to reroute traffic to ASBR1.1 before the 654 less preferred path through ASBR1.2 is possibly withdrawn. 656 10.2.2. Route Reflector 658 In this topology, route reflectors are used to limit the number of 659 IBGP sessions. There is a single level of route reflectors and the 660 route reflectors are fully meshed. 662 Internet-Draft Requirements for the graceful shutdown of BGP sessions 664 P1 (RR)-- P2 (RR) 665 | \ / | 666 | \ / | 667 | \ / | AS1 668 | \/ | 669 | /\ | 670 | / \ | 671 | / \ | 672 | / \ | 673 ASBR1.1 ASBR1.2 674 \ / 675 \ / 676 ''''''\''''''/'''''''''''' 677 \ / 678 \ / AS2 679 ASBR2.1 681 Figure 6. Route Reflector 683 When the session between ASBR1.1 and ASBR2.1 is gracefully 684 shutdown, all BGP routers of AS1 need to reroute traffic to 685 ASBR1.2 before the session between ASBR1.1 and ASBR2.1 is shut 686 down. 687 Similarly, when the session between ASBR1.1 and ASBR2.1 is 688 gracefully brought up, all affected routers of AS1 preferring 689 ASBR1.1 over ASBR1.2 need to reroute traffic to ASBR1.1 before the 690 less preferred path through ASBR1.2 is possibly withdrawn. 692 10.2.3. hierarchical Route Reflector 694 In this topology, hierarchical route reflectors are used to limit 695 the number of IBGP sessions. There could me more than two levels 696 of route reflectors and the top level route reflectors are fully 697 meshed. 699 Internet-Draft Requirements for the graceful shutdown of BGP sessions 701 P1 (RR) -------- P2 (RR) 702 | | 703 | | 704 | | AS1 705 | | 706 | | 708 P3 (RR) P4 (RR) 709 | | 710 | | 711 | | AS1 712 | | 713 | | 714 ASBR1.1 ASBR1.2 715 \ / 716 \ / 717 ''''''\'''''''''/'''''''''''' 718 \ / 719 \ / AS2 720 ASBR2.1 722 Figure 7. Hierarchical Route Reflector 724 When the session between ASBR1.1 and ASBR2.1 is gracefully 725 shutdown, all BGP routers of AS1 need to reroute traffic to 726 ASBR1.2 before the session between ASBR1.1 and ASBR2.1 is shut 727 down. 728 Similarly, when the session between ASBR1.1 and ASBR2.1 is 729 gracefully brought up, all affected routers of AS1 preferring 730 ASBR1.1 over ASBR1.2 need to reroute traffic to ASBR1.1 before the 731 less preferred path through ASBR1.2 is possibly withdrawn. 733 10.2.4. Confederations 735 In this topology, a confederation of ASs is used to limit the number 736 of IBGP sessions. Moreover, RRs may be present in the member ASs of 737 the confederation. 738 Confederations may be run with different sub-options. Regarding the 739 IGP, each member AS can run its own IGP or they can all share the 740 same IGP. Regarding BGP, local_pref may or may not cross the member 741 AS boundaries. 742 A solution should support the graceful shutdown and graceful bring up 743 of EBGP sessions between member-ASs in the confederation in addition 744 to the graceful shutdown and graceful bring up of EBGP sessions 745 between a member-AS and an AS outside of the confederation. 747 Internet-Draft Requirements for the graceful shutdown of BGP sessions 749 ASBR1C.1 ---------- ASBR1C.2 750 | | 751 | | 752 | AS1C | 753 | | 754 | | 755 """|"""""""""""""""""""|""" 756 | " | 757 ASBR1A.2 " ASBR1B.2 758 | " | 759 | " | 760 | AS1A " AS1B | AS1 761 | " | 762 | " | 763 ASBR1A.1 " ASBR1B.1 764 \ " / 765 \ " / 766 ''''''\'''''''''''''/'''''''''''' 767 \ / 768 \ / AS2 769 ASBR2.1 771 Figure 8. Confederation 773 In the above figure, member-AS AS1A, AS1B, AS1C belong to a 774 confederation of ASes in AS1. AS1A and AS1B are connected to AS2. 776 In normal operation, for the traffic toward AS2, 777 . AS1A sends the traffic directly to AS2 through ASBR1A.1 778 . AS1B sends the traffic directly to AS2 through ASBR1B.1 779 . AS1C load balances the traffic between AS1A and AS1B 781 When the session between ASBR1A.1 and ASBR2.1 is gracefully shutdown, 782 all BGP routers of AS1 need to reroute traffic to ASBR1B.1 before the 783 session between ASBR1A.1 and ASBR2.1 is shut down. 784 Similarly, when the session between ASBR1A.1 and ASBR2.1 is 785 gracefully brought up, all affected routers of AS1 preferring 786 ASBR1A.1 over ASBR1.2 need to reroute traffic to ASBR1A.1 before the 787 less preferred path through ASBR1.2 is possibly withdrawn. 789 10.3. Routing decisions 791 We describe here some routing engineering choices that are 792 frequently used in ASes and that should be supported by the 793 solution. 795 Internet-Draft Requirements for the graceful shutdown of BGP sessions 797 10.3.1. Hot potato (IGP cost) 799 Ingress router selects the nominal egress ASBR (AS exit point) 800 based on the IGP cost to reach the BGP next-hop. 802 10.3.2. Cold potato (BGP local preference) 804 Ingress router selects the nominal egress ASBR based on the BGP 805 local LOCAL_PREF value set and advertised by the exit point. 807 10.3.3. Cold potato (BGP preference set on ingress) 809 Ingress router selects the nominal egress ASBR based on 810 preconfigured policy information. (Typically by locally setting 811 the BGP local pref based on the BGP communities attached on the 812 routes). 813 As per [RFC4271], note that if tunnels are not used to forward 814 packets between ingress and egress ASBR, this can lead to 815 persistent forwarding loops. 817 Authors' Addresses 819 Bruno Decraene 820 France Telecom 821 38-40 rue du General Leclerc 822 92794 Issy Moulineaux cedex 9 823 France 825 Email: bruno.decraene@orange-ftgroup.com 827 Pierre Francois 828 Universite catholique de Louvain 829 Place Ste Barbe, 2 830 Louvain-la-Neuve 1348 831 BE 833 Email: francois@info.ucl.ac.be 835 Cristel Pelsser 836 Internet Initiative Japan 837 Jinbocho Mitsui Building 838 1-105 Kanda jinbo-cho 839 Chiyoda-ku, Tokyo 101-0051 840 Japan 842 Email: cristel@iij.ad.jp 844 Internet-Draft Requirements for the graceful shutdown of BGP sessions 846 Zubair Ahmad 847 Orange Business Services 848 13775 McLearen Road, Oak Hill VA 20171 849 USA 851 Email: zubair.ahmad@orange-ftgroup.com 853 Antonio Jose Elizondo Armengol 854 Division de Analisis Tecnologicos 855 Technology Analysis Division 856 Telefonica I+D 857 C/ Emilio Vargas 6 858 28043, Madrid 860 E-mail: ajea@tid.es 862 Tomonori Takeda 863 NTT Corporation 864 9-11, Midori-Cho 3 Chrome 865 Musashino-Shi, Tokyo 180-8585 866 Japan 868 Email: takeda.tomonori@lab.ntt.co.jp