idnits 2.17.1 draft-ietf-grow-diverse-bgp-path-dist-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 20, 2010) is 5058 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 783, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 793, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-03 == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-01 -- Unexpected draft version: The latest known version of draft-ietf-idr-route-oscillation is -00, but you're referring to -01. == Outdated reference: A later version (-03) exists of draft-pmohapat-idr-fast-conn-restore-00 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group R. Raszuk, Ed. 3 Internet-Draft Cisco Systems 4 Intended status: Informational D. McPherson 5 Expires: December 22, 2010 Arbor Networks 6 K. Kumaki 7 KDDI Corporation 8 June 20, 2010 10 Distribution of diverse BGP paths. 11 draft-ietf-grow-diverse-bgp-path-dist-00 13 Abstract 15 The BGP4 protocol specifies the selection and propagation of a single 16 best path for each prefix. As defined today BGP has no mechanisms to 17 distribute paths other then best path between it's speakers. This 18 behaviour results in number of disadvantages for new applications and 19 services. 21 This document presents an alternative mechanism for solving the 22 problem based on the concept of parallel route reflector planes. It 23 also compares existing solutions and proposed ideas that enable 24 distribution of more paths than just the best path. 26 This proposal does not specify any changes to the BGP protocol 27 definition. It does not require upgrades to provider edge or core 28 routers nor does it need network wide upgrades. The authors believe 29 that the GROW WG would be the best place for this work. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on December 22, 2010. 48 Copyright Notice 49 Copyright (c) 2010 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2.1. BGP Add-Paths Proposal . . . . . . . . . . . . . . . . . . 3 67 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 68 4. Multi plane route reflection . . . . . . . . . . . . . . . . . 5 69 4.1. Co-located best and backup path RRs . . . . . . . . . . . 8 70 4.2. Randomly located best and backup path RRs . . . . . . . . 9 71 4.3. Multi plane route servers for Internet Exchanges . . . . . 11 72 5. Discussion on current models of IBGP route distribution . . . 12 73 5.1. Full Mesh . . . . . . . . . . . . . . . . . . . . . . . . 12 74 5.2. Confederations . . . . . . . . . . . . . . . . . . . . . . 13 75 5.3. Route reflectors . . . . . . . . . . . . . . . . . . . . . 14 76 6. Deployment considerations . . . . . . . . . . . . . . . . . . 14 77 7. Summary of benefits . . . . . . . . . . . . . . . . . . . . . 16 78 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 16 79 9. Security considerations . . . . . . . . . . . . . . . . . . . 17 80 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 81 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 17 82 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 83 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 13.1. Normative References . . . . . . . . . . . . . . . . . . . 18 85 13.2. Informative References . . . . . . . . . . . . . . . . . . 18 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 88 1. Introduction 90 Current BGP4 [RFC4271] protocol specification allows for the 91 selection and propagation of only one best path for each prefix. The 92 BGP protocol as defined today has no mechanism to distribute other 93 then best path between it's speakers. This behaviour results in a 94 number of problems in the deployment of new applications and 95 services. 97 This document presents an alternative mechanism for solving the 98 problem based on the concept of parallel route reflector planes. It 99 also compares existing solutions and proposed ideas that enable 100 distribution of more paths than just the best path. The parallel 101 route reflector planes solution brings very significant benefits at a 102 negligible capex and opex deployment price as compared to the 103 alternative techniques and is being considered by a number of network 104 operators for deployment in their networks. 106 This proposal does not specify any changes to the BGP protocol 107 definition. It does not require upgrades to provider edge or core 108 routers nor does it need network wide upgrades. The authors believe 109 that the GROW WG would be the best place for this work. 111 2. History 113 The need to disseminate more paths than just the best path is 114 primarily driven by two requirements. One of them is the problem of 115 BGP oscillations [I-D.ietf-idr-route-oscillation]. The second is the 116 desire for reduction of time of reachability restoration in the event 117 of network or network element's failure. These two reasons have lead 118 to the proposal of BGP add-paths [I-D.ietf-idr-add-paths]. 120 2.1. BGP Add-Paths Proposal 122 As it has been proven that distribution of only the best path of a 123 route is not sufficient to meet the needs of continuously growing 124 number of services carried over BGP the add-paths proposal was 125 submitted in 2002 to enable BGP to distribute more then one path. 126 This is achieved by including as a part of the NLRI an additional 127 four octet value called the Path Identifier. 129 The implication of this change on a BGP implementation is that it 130 must now maintain per path, instead of per prefix, peer advertisement 131 state to track which of the peers each path was advertised to. This 132 new requirement has it's own memory and processing cost. Suffice to 133 say that by the middle of 2009 none of the commercial BGP 134 implementation can claim to support the new add-path behaviour in 135 production code, in part because of this resource overhead. 137 An important observation is that distribution of more than one best 138 path by Autonomous System Border Routers (ASBRs) with multiple EBGP 139 peers attached to it where no "next hop self" is set may result in 140 bestpath selection inconsistency within the autonomous system. 141 Therefore it is also required to attach in the form of a new 142 attribute the possible tie breakers and propagate those within the 143 domain. The example of such attribute for the purpose of fast 144 connectivity restoration to address that very case of ASBR injecting 145 multiple external paths into the IBGP mesh has been presented and 146 discussed in Fast Connectivity Restoration Using BGP Add-paths 147 [I-D.ietf-idr-add-paths] document. Based on the additionally 148 propagated information also best path selection is recommended to be 149 modified to make sure that best and backup path selection within the 150 domain stays consistent. More discussion on this particular point 151 will be contained in the deployment considerations section below. In 152 the proposed solution in this document we observe that in order to 153 address most of the applications just use of best external 154 advertisement is required. For ASBRs which are peering to multiple 155 upstream ASs setting "next hop self" is recommended. 157 The add paths protocol extensions have to be implemented by all the 158 routers within an AS in order for the system to work correctly. The 159 required code modifications include enhancements such as the Fast 160 Connectivity Restoration Using BGP Add-path 161 [I-D.pmohapat-idr-fast-conn-restore]. The deployment of such 162 technology in an entire service provider network requires software 163 and perhaps sometimes in the cases of End-of-Engineering or End-of- 164 Life equipment even hardware upgrades. Such an operation may or may 165 not be economically feasible. Even if add-path functionality was 166 available today on all commercial routing equipment and across all 167 vendors, experience indicates that to achieve 100% deployment 168 coverage within any medium or large global network may easily take 169 years. 171 While it needs to be clearly acknowledged that the add-path mechanism 172 provides the most general way to address the problem of distributing 173 more then one path between BGP speakers, this document provides a 174 much easier to deploy solution that requires no modification to the 175 BGP protocol. The alternative method presented is capable of 176 addressing critical service provider requirements for disseminating 177 more than a single path across an AS with a significantly lower 178 deployment cost. 180 3. Goals 182 The proposal described in this document is not intended to compete 183 with add-paths. Instead if deployed it is to be used as a very easy 184 method to accommodate the majority of applications which may require 185 presence of alternative BGP exit points. 187 It is presented to network operators as a possible choice and 188 provides those operators who need additional paths today an 189 alternative from the need to transition to a full mesh. 191 It is intended as a way to buy more time allowing for a smoother and 192 gradual migration where router upgrades will be required for perhaps 193 different reasons. It will also allow the time required where 194 standard RP/RE memory size can easily accommodate the associated 195 overhead with other techniques without any compromises. 197 4. Multi plane route reflection 199 The idea contained in the proposal assumes the use of route 200 reflection within the network. Other techniques as described in the 201 following sections already provide means for distribution of 202 alternate paths today. 204 Let's observe today's picture of simple route reflected domain: 206 ASBR3 207 *** 208 * * 209 +------------* *-----------+ 210 | AS1 * * | 211 | *** | 212 | | 213 | | 214 | | 215 | RR1 *** RR2 | 216 | *** * * *** | 217 |* * * P * * *| 218 |* * * * * *| 219 | *** *** *** | 220 | | 221 | IBGP | 222 | | 223 | | 224 | *** *** | 225 | * * * * | 226 +-----* *---------* *----+ 227 * * * * 228 *** *** 229 ASBR1 ASBR2 230 EBGP 232 Figure1: Simple route reflection 234 Figure 1 shows an AS that is connected via EBGP peering at ASBR1 and 235 ASBR2 to an upstream AS or set of ASes. For a given destination "D" 236 ASBR1 and ASBR2 will each have an external path P1 and P2 237 respectively. The AS network uses two route reflectors RR1 and RR2 238 for redundancy reasons. The route reflectors propagate the single 239 BGP best path for each route to all clients. All ASBRs are clients 240 of RR1 and RR2. 242 Below are the possible cases of the path information that ASBR3 may 243 receive from route reflectors RR1 and RR2: 245 1. When best path tie breaker is the IGP distance: When paths P1 and 246 P2 are considered to be equally good best path candidates the 247 selection will depend on the distance of the path next-hops from 248 the route reflector making the decision. Depending on the 249 positioning of the route reflectors in the IGP topology they may 250 choose the same best path or a different one. In such a case 251 ASBR3 may receive either the same path or different paths from 252 each of the route reflectors. 254 2. When best path tie breaker is Multi-Exit-Discriminator or Local 255 Preference: In this case only one path from preferred exit point 256 ASBR will be available to RRs since the other peering ASBR will 257 consider the IBGP path as best and will not announce (or if 258 already announced will withdraw) its own external path. The 259 exception here is the use of BGP Best-External proposal which 260 will allow stated ASBR to still propagate to the RRs its own 261 external path. Unfortunately RRs will not be able to distribute 262 it any further to other clients as only the overall best path 263 will be reflected. 265 The proposed solution is based on the use of additional route 266 reflectors or new functionality enabled on the exisiting route 267 reflectors that instead of distributing the best path for each route 268 will distribute an alternative path other then best. The best path 269 (main) reflector plane distributes the best path for each route as it 270 does today. The second plane distributes the second best path for 271 each route and so on. Distribution of N paths for each route can be 272 achieved by using N reflector planes. 274 Each plane of route reflectors is a logical entity and may or may not 275 be co-located with the existing best path route reflectors. Adding a 276 route reflector plane to a network may be as easy as enabling a 277 logical router partition, new BGP process or just a new configuration 278 knob on an existing route reflector and configuring an additional 279 IBGP session from the current clients if required. There are no code 280 changes required on the route reflector clients for this mechanism to 281 work. It is easy to observe that the installation of one or more 282 additional route reflector control planes is much cheaper and an 283 easier than the need of upgrading 100s of routers in the entire 284 network to support different protocol encoding. 286 Diverse path route reflectors need the new ability to calculate and 287 propagate the Nth best path instead of the overall best path. An 288 implementation is encouraged to enable this new functionality on a 289 per neighbor basis. 291 While this is an implementation detail, the code to calculate Nth 292 best path is also required by other BGP solutions. For example in 293 the application of fast connectivity restoration BGP must calculate a 294 backup path for installation into the RIB and FIB ahead of the actual 295 failure. 297 To address the problem of external paths not being available to route 298 reflectors due to local preference or MED factors it is recommended 299 that ASBRs enable the best-external functionality in order to always 300 inject their external paths to the route reflectors. 302 4.1. Co-located best and backup path RRs 304 To simplify the description let's assume that we only use two route 305 reflector planes (N=2). When co-located the additional 2nd best path 306 reflectors are connected to the network at the same points from the 307 perspective of the IGP as the existing best path RRs. Let's also 308 assume that best-external is enabled on all ASBRs. 310 ASBR3 311 *** 312 * * 313 +------------* *-----------+ 314 | AS1 * * | 315 | *** | 316 | | 317 | RR1 RR2 | 318 | *** *** | 319 |* * *** * *| 320 |* * * * * *| 321 | *** * P * *** | 322 |* * * * * *| 323 |* * *** * *| 324 | *** *** | 325 | RR1' IBGP RR2'| 326 | | 327 | | 328 | *** *** | 329 | * * * * | 330 +-----* *---------* *----+ 331 * * * * 332 *** *** 333 ASBR1 ASBR2 335 EBGP 337 Figure2: Co-located 2nd best RR plane 339 The following is a list of configuration changes required to enable 340 the 2nd best path route reflector plane: 342 1. Adding RR1' and RR2' either as logical or physical new control 343 plane RRs in the same IGP points as RR1 and RR2 respectively 345 2. Enabling RR1' and RR2' for 2nd plane route reflection 347 3. Enabling best-external on ASBRs 349 4. Configuring ASBR-RR's IBGP sessions 351 The expected behaviour is that under any BGP condition the ASBR3 and 352 P routers will receive both paths P1 and P2 for destination D. The 353 availability of both paths will allow them to implement a number of 354 new services as listed in the applications section below. 356 As an alternative to fully meshing all RRs and RRs' an operator who 357 has a large number of reflectors deployed today may choose to peer 358 newly introduced RRs' to a hierarchical RR' which would be an IBGP 359 interconnect point within the 2nd plane as well as between planes. 361 One of the deployment model of this scenario can be achieved by 362 simple upgrade of the existing route reflectors without the need to 363 deploy any new logical or physical platforms. Such upgrade would 364 allow route reflectors to service both upgraded to add-paths peers as 365 well as those peers which can not be immediately upgraded while in 366 the same time allowing to distribute more then single best path. 368 The way to accomplish this would be to create a separate IBGP session 369 for each N-th BGP path. Such session should be preferably terminated 370 at a different loopback address of the route reflector. At the BGP 371 OPEN stage of each such session a different bgp_router_id should be 372 used. Correspondingly route reflector should also allow its clients 373 to use the same bgp_router_id on each such session. 375 4.2. Randomly located best and backup path RRs 377 Now let's consider a deployment case where an operator wishes to 378 enable a 2nd RR' plane using only a single additional router in a 379 different network location to his current route reflectors. 381 Note that this model of operation assumes that the present best path 382 route reflectors are only control plane devices. If the route 383 reflector is in the data forwarding path then the implementation must 384 be able to clearly separate the Nth best-path selection from the 385 selection of the paths to be used for data forwarding. The basic 386 premise of this mode of deployment assumes that all reflector planes 387 have the same information to choose from which includes the same set 388 of BGP paths. It also requires the ability to skip the comparison of 389 the IGP metric to reach the bgp next hop during best-path 390 calculation. 392 ASBR3 393 *** 394 * * 395 +------------* *-----------+ 396 | AS1 * * | 397 | IBGP *** | 398 | | 399 | *** | 400 | * * | 401 | RR1 * P * RR2 | 402 | *** * * *** | 403 |* * *** * *| 404 |* * * *| 405 | *** RR' *** | 406 | *** | 407 | * * | 408 | * * | 409 | *** | 410 | *** *** | 411 | * * * * | 412 +-----* *---------* *----+ 413 * * * * 414 *** *** 415 ASBR1 ASBR2 417 EBGP 419 Figure3: Experimental deployment of 2nd best RR 421 The following is a list of configuration changes required to enable 422 the 2nd best path route reflector RR' as a single platform: 424 1. Adding RR' logical or physical as new route reflector anywhere in 425 the network 427 2. Enabling RR' for 2nd plane route reflection 429 3. Enabling best-external on ASBRs 431 4. Fully meshing newly added RRs' with the all other reflectors in 432 both planes. That condition does not apply if the newly added 433 RR'(s) already have peering to all ASBRs/PEs. 435 5. Configuring ASBRs-RR' IBGP sessions 437 6. Disabling IGP metric check in BGP best path on all route 438 reflectors. 440 In this scenario the operator has the flexibility to instroduce the 441 new additional route reflector on any existing or new hardware in the 442 network. Any of the existing routers that are not already members of 443 the best path route reflector plane can be easily configured to serve 444 the 2nd plane either via using a logical / virtual router partition 445 or by local implementation hooks. 447 Even if the IGP metric is not taken into consideration when comparing 448 paths during the bestpath calculation, an implementation still has to 449 consider paths with unreachable nexthops as invalid. It is worth 450 pointing out that some implementations today already allow for 451 configuration which results in no IGP metric comparison during the 452 best path calculation. 454 The additional planes of route reflectors do not need to be fully 455 redundant as the primary one does. If we are preparing for a single 456 network failure event, a failure of a non backed up N-th best-path 457 route reflector would not result in an connectivity outage of the 458 actual data plane. The reason is that this would at most affect the 459 presence of a backup path (not an active one) on same parts of the 460 network. If the operator chooses to build the N-th best path plane 461 redundantly by installing not one, but two or more route reflectors 462 serving each additional plane the additional robustness will be 463 achieved. 465 As a result of this solution ASBR3 and other ASBRs peering to RR' 466 will be receiving the 2nd best path. 468 Similarly to section 4.1 as an alternative to fully meshing all RRs & 469 RRs' an operator who may have a large number of reflectors already 470 deployed today may choose to peer newly introduced RRs' to a 471 hierarchical RR' which would be an IBGP interconnect point within the 472 2nd plane as well as between planes. 474 4.3. Multi plane route servers for Internet Exchanges 476 Another group of devices where the proposed multi-plane architecture 477 may be of particular applicability are EBGP route servers used at the 478 majority of internet exchange points. 480 In such cases 100s of ISPs are interconnected on a common LAN. 481 Instead of having 100s of direct EBGP sessions on each exchange 482 client, a single peering is created to the transparent route server. 483 The route server can only propagate a single best path. Mandating 484 the upgrade for 100s of different service providers in order to 485 implement add-path may be much more difficult as compared to asking 486 them for provisioning one new EBGP session to an Nth best-path route 487 server plane. 489 The solution proposed in this document fits very well with the 490 requirement of having broader EBGP path diversity among the members 491 of any Internet Exchange Point. 493 5. Discussion on current models of IBGP route distribution 495 In today's networks BGP4 operates as specified in [RFC4271] 497 There are a number of technology choices for intra-AS BGP route 498 distribution: 500 1. Full mesh 502 2. Confederations 504 3. Route reflectors 506 5.1. Full Mesh 508 A full mesh, the most basic iBGP architecture, exists when all the 509 BGP speaking routers within the AS peer directly with all other BGP 510 speaking routers within the AS, irrespective of where a given router 511 resides within the AS (e.g., P router, PE router, etc..). 513 While this is the simplest intra-domain path distribution method, 514 historically there have been a number of challenges in realizing such 515 an IBGP full mesh in a large scale network. While some of these 516 challenges are no longer applicable today some may still apply, to 517 include the following: 519 1. Number of TCP sessions: The number of IBGP sessions on a single 520 router in a full mesh topology of a large scale service provider 521 can easily reach 100s. While on hardware and software used in 522 the late 70s, 80s and 90s such numbers could be of concern, today 523 customer requirements for the number of BGP sessions per box are 524 reaching 1000s. This is already an order of magnitude more then 525 the potential number of IBGP sessions. Advancement in hardware 526 and software used in production routers mean that running a full 527 mesh of IBGP sessions should not be dismissed due to the 528 resulting number of TCP sessions alone. 530 2. Provisioning: When operating and troubleshooting large networks 531 one of the top-most requirements is to keep the design as simple 532 as possible. When the autonomous systems network is composed of 533 hundreds of nodes it becomes very difficult to manually provision 534 a full mesh of IBGP sessions. Adding or removing a router 535 requires reconfiguration of all the other routers in the AS. 537 While this is a real concern today there is already work in 538 progress in the IETF to define IBGP peering automation through an 539 IBGP Auto Discovery [I-D.raszuk-idr-ibgp-auto-mesh] mechanism. 541 3. Number of paths: Another concern when deploying a full IBGP mesh 542 is the number of BGP paths for each route that have to be stored 543 at every node. This number is very tightly related to the number 544 of external peerings of an AS, the use of local preference or 545 multi-exit-discriminator techniques and the presence of best- 546 external [I-D.ietf-idr-best-external] advertisement 547 configuration. If we make a rough assumption that the BGP4 path 548 data structure consumes about 80-100 bytes the resulting control 549 plane memory requirement for 500,000 IPv4 routes with one 550 additional external path is 38-48 MB while for 1 million IPv4 551 routes it grows linearly to 76-95 MB. It is not possible to 552 reach a general conclusion if this condition is negligible or if 553 it is a show stopper for a full mesh deployment without direct 554 reference to a given network. 556 To summarize, a full mesh IBGP peering can offer natural 557 dissemination of multiple external paths among BGP speakers. When 558 realized with the help of IBGP Auto Discovery peering automation this 559 seems like a viable deployment especially in medium and small scale 560 networks. 562 5.2. Confederations 564 For the purpose of this document let's observe that confederations 565 [RFC5065] can be viewed as a hierarchical full mesh model. 567 Within each sub-AS BGP speakers are fully meshed and as discussed in 568 section 2.1 all full mesh characteristics (number of TCP sessions, 569 provisioning and potential concern over number of paths still apply 570 in the sub-AS scale). 572 In addition to the direct peering of all BGP speakers within each 573 sub-AS, all sub-AS border routers must also be fully meshed with each 574 other. Sub-AS border routers configured with best-external 575 functionality can inject additional exit paths within a sub-AS. 577 To summarize, it is technically sound to use confederations with the 578 combination of best-external to achieve distribution of more than a 579 single best path per route in a large autonomous systems. 581 In topologies where route reflectors are deployed within the 582 confederation sub-ASes the technique describe here does apply. 584 5.3. Route reflectors 586 The main motivation behind the use of route reflectors [RFC4456] is 587 the avoidance of the full mesh session management problem described 588 above. Route reflectors, for good or for bad, are the most common 589 solution today for interconnecting BGP speakers within an internal 590 routing domain. 592 Route reflector peerings follow the advertisement rules defined by 593 the BGP4 protocol. As a result only a single best path per prefix is 594 sent to client BGP peers. That is the main reason why many current 595 networks are exposed to a phenomenon called BGP path starvation which 596 essentially results in inability to deliver a number of applications 597 discussed later. 599 The route reflection equivalent when interconnecting BGP speakers 600 between domains is popularly called the Route Server and is globally 601 deployed today in many internet exchange points. 603 6. Deployment considerations 605 The diverse BGP path dissemination proposal allows the distribution 606 of more paths than just the best-path to route reflector or route 607 server clients of today's BGP4 implementations. 609 From the client's point of view receiving additional paths via 610 separate IBGP sessions terminated at the new router reflector plane 611 is functionally equivalent to constructing a full mesh peering 612 without the problems that such a full mesh would come with (discussed 613 in section 2.1). 615 By precisely defining the number of reflector planes, network 616 operators have full control over the number of redundant paths in the 617 network. This number can be defined to address the needs of the 618 service(s) being deployed. 620 The Nth plane route reflectors should be acting as control plane 621 devices. While they can be provisioned on the current production 622 routers selected backup BGP paths should not be used directly in the 623 date plane. Use of the calculated Nth path by the RRs can lead to 624 inconsistent best-path selection in the domain. For the purposes of 625 local RIB / FIB installation, any router (including the RRs) which is 626 in the data path must use the overall global best and Nth best paths. 628 The proposed architecture deployed along with the BGP best-external 629 functionality covers all three cases where the classic BGP route 630 reflection paradigm would fail to distribute alternate exit points 631 paths. 633 1. ASBRs advertising their single best external paths with no local- 634 preference or multi-exit-discriminator present. 636 2. ASBRs advertising their single best external paths with local- 637 preference or multi-exit-discriminator present and with BGP best- 638 external functionality enabled. 640 3. ASBRs with multiple external paths. 642 Let's discuss the last (3rd) case in more detail. This describes the 643 scenario of a single ASBR connected to multiple EBGP peers. In 644 practice this peering scenario is quite common. It is mostly due to 645 the geographic location of EBGP peers and the diversity of those 646 peers (for example peering to multiple tier 1 ISPs etc...). It is 647 not designed for failure recovery scenarios as single failure of the 648 ASBR would simultaneously result in loss of connectivity to all of 649 the peers. In most medium and large geographically distributed 650 networks there is always another ASBR or multiple ASBRs providing 651 peering backups, typically in other geographically diverse locations 652 in the network. 654 When an operator uses ASBRs with multiple peerings setting next hop 655 self will effectively allow to locally repair the atomic failure of 656 any external peer without any compromise to the data plane. The most 657 common reason for not setting next hop self is traditionally the 658 associated drawback of loosing ability to signal the external 659 failures of peering ASBRs or links to those ASBRs by fast IGP 660 flooding. Such potential drawback can be easily avoided by using 661 different peering address from the address used for next hop mapping 662 as well as removing such next hop from IGP at the last possible BGP 663 path failure. 665 Herein one may correctly observe that in the case of setting next hop 666 self on an ASBR, attributes of other external paths such ASBR is 667 peering with may be different from the attributes of it's best 668 external path. Therefore, not injecting all of those external paths 669 with their corresponding attribute can not be compared to equivalent 670 paths for the same prefix coming from different ASBRs. 672 While such observation in principle is correct one should put things 673 in perspective of the overall goal which is to provide data plane 674 connectivity upon a single failure with minimal interruption/packet 675 loss. During such transient conditions, using even potentially 676 suboptimal exit points is reasonable, so long as forwarding 677 information loops are not introduced. In the mean time BGP control 678 plane will on it's own re-advertise newly elected best external path, 679 route reflector planes will calculate their Nth best paths and 680 propagate to it's clients. The result is that after seconds even if 681 potential sub-optimality were encountered it will be quickly and 682 naturally healed. 684 7. Summary of benefits 686 The diverse BGP path dissemination proposal provides the following 687 benefits when compared to the alternatives: 689 1. No modifications to BGP4 protocol. 691 2. No requirement for upgrades to edge and core routers. Backward 692 compatible with the existing BGP deployments. 694 3. Can be easily enabled by introduction of a new route reflector / 695 route server plane dedicated to the selection and distribution of 696 Nth best-path. 698 4. Does not require major modification to BGP implementations in the 699 entire network which will result in an unnecessary increase of 700 memory and CPU consumption due to the shift from today's per 701 prefix to a per path advertisement state tracking. 703 5. Can be safely deployed gradually through addition of a single 704 logical or physical route reflector with the new functionality 705 described in this document. 707 6. The proposed solution is equally applicable to any BGP address 708 family as described in Multiprotocol Extensions for BGP-4 RFC4760 709 [RFC4760]. In particular it can be used "as is" without any 710 modifications to both IPv4 and IPv6 address families. 712 8. Applications 714 This section lists the most common applications which require 715 presence of redundant BGP paths: 717 1. Fast connectivity restoration where backup paths with alternate 718 exit points would be pre-installed as well as pre-resolved in the 719 FIB of routers. That would allow for a local action upon 720 reception of a critical event notification of network / node 721 failure. This failure recovery mechaism based on the presence of 722 backup paths is also suitable for gracefully addressing scheduled 723 maintenane requirements as described in 724 [I-D.decraene-bgp-graceful-shutdown-requirements]. 726 2. Multi-path load balancing for both IBGP and EBGP. 728 3. BGP control plane churn reduction both intra-domain and inter- 729 domain. 731 An important point to observe is that all of the above intra-domain 732 applications based on the use of reflector planes but are also 733 applicable in the inter-domain Internet exchange case. As discussed 734 in section 4.3 an internet exchange can deploy shadow route server 735 slices each responsible for distribution of an Nth best path to it's 736 EBGP peers. 738 9. Security considerations 740 The new mechanism for diverse BGP path dissemination proposed in this 741 document does not introduce any new security concerns as compared to 742 base BGP4 specification [RFC4271]. 744 10. IANA Considerations 746 The new mechanism for diverse BGP path dissemination does not require 747 any new allocations from IANA. 749 11. Contributors 751 The following people contributed significantly to the content of the 752 document: 754 Keyur Patel 755 Cisco Systems 756 170 West Tasman Drive 757 San Jose, CA 95134 758 US 759 Email: keyupate@cisco.com 760 Rex Fernando 761 Cisco Systems 762 170 West Tasman Drive 763 San Jose, CA 95134 764 US 765 Email: rex@cisco.com 767 Isidor Kouvelas 768 Cisco Systems 769 170 West Tasman Drive 770 San Jose, CA 95134 771 US 772 Email: kouvelas@cisco.com 774 12. Acknowledgments 776 The authors would like to thank Bruno Decraene, Bart Peirens and Eric 777 Rosen for their valuable input. 779 13. References 781 13.1. Normative References 783 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 784 Requirement Levels", BCP 14, RFC 2119, March 1997. 786 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 787 Protocol 4 (BGP-4)", RFC 4271, January 2006. 789 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 790 "Multiprotocol Extensions for BGP-4", RFC 4760, 791 January 2007. 793 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 794 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 795 May 2008. 797 13.2. Informative References 799 [I-D.decraene-bgp-graceful-shutdown-requirements] 800 Decraene, B., Francois, P., pelsser, c., Ahmad, Z., and A. 801 Armengol, "Requirements for the graceful shutdown of BGP 802 sessions", 803 draft-decraene-bgp-graceful-shutdown-requirements-01 (work 804 in progress), March 2009. 806 [I-D.ietf-idr-add-paths] 807 Walton, D., Retana, A., Chen, E., and J. Scudder, 808 "Advertisement of Multiple Paths in BGP", 809 draft-ietf-idr-add-paths-03 (work in progress), 810 February 2010. 812 [I-D.ietf-idr-best-external] 813 Marques, P., Fernando, R., Chen, E., and P. Mohapatra, 814 "Advertisement of the best external route in BGP", 815 draft-ietf-idr-best-external-01 (work in progress), 816 February 2010. 818 [I-D.ietf-idr-route-oscillation] 819 McPherson, D., "BGP Persistent Route Oscillation 820 Condition", draft-ietf-idr-route-oscillation-01 (work in 821 progress), February 2002. 823 [I-D.pmohapat-idr-fast-conn-restore] 824 Mohapatra, P., Fernando, R., Filsfils, C., and R. Raszuk, 825 "Fast Connectivity Restoration Using BGP Add-path", 826 draft-pmohapat-idr-fast-conn-restore-00 (work in 827 progress), September 2008. 829 [I-D.raszuk-idr-ibgp-auto-mesh] 830 Raszuk, R., "IBGP Auto Mesh", 831 draft-raszuk-idr-ibgp-auto-mesh-00 (work in progress), 832 June 2003. 834 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 835 Reflection: An Alternative to Full Mesh Internal BGP 836 (IBGP)", RFC 4456, April 2006. 838 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 839 System Confederations for BGP", RFC 5065, August 2007. 841 Authors' Addresses 843 Robert Raszuk (editor) 844 Cisco Systems 845 170 West Tasman Drive 846 San Jose, CA 95134 847 US 849 Email: raszuk@cisco.com 850 Danny McPherson 851 Arbor Networks 853 Email: danny@arbor.net 855 Kenji Kumaki 856 KDDI Corporation 857 Garden Air Tower 858 Iidabashi, Chiyoda-ku, Tokyo 102-8460 859 Japan 861 Email: ke-kumaki@kddi.com