idnits 2.17.1 draft-ietf-grow-diverse-bgp-path-dist-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Jun 2011) is 4698 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2119' is defined on line 829, but no explicit reference was found in the text == Unused Reference: 'RFC5226' is defined on line 839, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-04 == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-04 -- Unexpected draft version: The latest known version of draft-ietf-idr-route-oscillation is -00, but you're referring to -01. == Outdated reference: A later version (-03) exists of draft-pmohapat-idr-fast-conn-restore-01 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group R. Raszuk, Ed. 3 Internet-Draft R. Fernando 4 Intended status: Informational K. Patel 5 Expires: December 3, 2011 Cisco Systems 6 D. McPherson 7 Verisign 8 K. Kumaki 9 KDDI Corporation 10 Jun 2011 12 Distribution of diverse BGP paths. 13 draft-ietf-grow-diverse-bgp-path-dist-04 15 Abstract 17 The BGP4 protocol specifies the selection and propagation of a single 18 best path for each prefix. As defined today BGP has no mechanisms to 19 distribute paths other then best path between its speakers. This 20 behaviour results in number of disadvantages for new applications and 21 services. 23 This document presents an alternative mechanism for solving the 24 problem based on the concept of parallel route reflector planes. 25 Such planes can be build in parallel or they can co-exit on the 26 current route reflection platforms. Document also compares existing 27 solutions and proposed ideas that enable distribution of more paths 28 than just the best path. 30 This proposal does not specify any changes to the BGP protocol 31 definition. It does not require upgrades to provider edge or core 32 routers nor does it need network wide upgrades. The authors believe 33 that the GROW WG would be the best place for this work. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on December 3, 2011. 51 Copyright Notice 53 Copyright (c) 2011 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 2.1. BGP Add-Paths Proposal . . . . . . . . . . . . . . . . . . 4 71 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 72 4. Multi plane route reflection . . . . . . . . . . . . . . . . . 6 73 4.1. Co-located best and backup path RRs . . . . . . . . . . . 9 74 4.2. Randomly located best and backup path RRs . . . . . . . . 11 75 4.3. Multi plane route servers for Internet Exchanges . . . . . 13 76 5. Discussion on current models of IBGP route distribution . . . 14 77 5.1. Full Mesh . . . . . . . . . . . . . . . . . . . . . . . . 14 78 5.2. Confederations . . . . . . . . . . . . . . . . . . . . . . 15 79 5.3. Route reflectors . . . . . . . . . . . . . . . . . . . . . 16 80 6. Deployment considerations . . . . . . . . . . . . . . . . . . 16 81 7. Summary of benefits . . . . . . . . . . . . . . . . . . . . . 18 82 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 18 83 9. Security considerations . . . . . . . . . . . . . . . . . . . 19 84 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 85 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 87 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 88 13.1. Normative References . . . . . . . . . . . . . . . . . . . 20 89 13.2. Informative References . . . . . . . . . . . . . . . . . . 21 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 92 1. Introduction 94 Current BGP4 [RFC4271] protocol specification allows for the 95 selection and propagation of only one best path for each prefix. The 96 BGP protocol as defined today has no mechanism to distribute other 97 then best path between its speakers. This behaviour results in a 98 number of problems in the deployment of new applications and 99 services. 101 This document presents an alternative mechanism for solving the 102 problem based on the concept of parallel route reflector planes. It 103 also compares existing solutions and proposed ideas that enable 104 distribution of more paths than just the best path. The parallel 105 route reflector planes solution brings very significant benefits at a 106 negligible capex and opex deployment price as compared to the 107 alternative techniques and is being considered by a number of network 108 operators for deployment in their networks. 110 This proposal does not specify any changes to the BGP protocol 111 definition. It does not require upgrades to provider edge or core 112 routers nor does it need network wide upgrades. The only upgrade 113 required is the new functionality on the new or current route 114 reflectors. The authors believe that the GROW WG would be the best 115 place for this work. 117 2. History 119 The need to disseminate more paths than just the best path is 120 primarily driven by three requirements. First is the problem of BGP 121 oscillations [I-D.ietf-idr-route-oscillation]. The second is the 122 desire for reduction of time of reachability restoration in the event 123 of network or network element's failure. Third requirement is to 124 enhance BGP load balancing capabilities. Those reasons have lead to 125 the proposal of BGP add-paths [I-D.ietf-idr-add-paths]. 127 2.1. BGP Add-Paths Proposal 129 As it has been proven that distribution of only the best path of a 130 route is not sufficient to meet the needs of continuously growing 131 number of services carried over BGP the add-paths proposal was 132 submitted in 2002 to enable BGP to distribute more then one path. 133 This is achieved by including as a part of the NLRI an additional 134 four octet value called the Path Identifier. 136 The implication of this change on a BGP implementation is that it 137 must now maintain per path, instead of per prefix, peer advertisement 138 state to track which of the peers each path was advertised to. This 139 new requirement has its own memory and processing cost. Suffice to 140 say that by the end of 2009 none of the commercial BGP implementation 141 could claimed to support the new add-path behaviour in production 142 code, in major part due to this resource overhead. 144 An important observation is that distribution of more than one best 145 path by Autonomous System Border Routers (ASBRs) with multiple EBGP 146 peers attached to it where no "next hop self" is set may result in 147 bestpath selection inconsistency within the autonomous system. 148 Therefore it is also required to attach in the form of a new 149 attribute the possible tie breakers and propagate those within the 150 domain. The example of such attribute for the purpose of fast 151 connectivity restoration to address that very case of ASBR injecting 152 multiple external paths into the IBGP mesh has been presented and 153 discussed in Fast Connectivity Restoration Using BGP Add-paths 154 [I-D.ietf-idr-add-paths] document. Based on the additionally 155 propagated information also best path selection is recommended to be 156 modified to make sure that best and backup path selection within the 157 domain stays consistent. More discussion on this particular point 158 will be contained in the deployment considerations section below. In 159 the proposed solution in this document we observe that in order to 160 address most of the applications just use of best external 161 advertisement is required. For ASBRs which are peering to multiple 162 upstream ASs setting "next hop self" is recommended. 164 The add paths protocol extensions have to be implemented by all the 165 routers within an AS in order for the system to work correctly. It 166 remains quite a research topic to analyze benefits or risk associated 167 with partial add-paths deployments. The risk becomes even greater in 168 networks not using some form of edge to edge encapsulation. 170 The required code modifications include enhancements such as the Fast 171 Connectivity Restoration Using BGP Add-path 172 [I-D.pmohapat-idr-fast-conn-restore]. The deployment of such 173 technology in an entire service provider network requires software 174 and perhaps sometimes in the cases of End-of-Engineering or End-of- 175 Life equipment even hardware upgrades. Such operation may or may not 176 be economically feasible. Even if add-path functionality was 177 available today on all commercial routing equipment and across all 178 vendors, experience indicates that to achieve 100% deployment 179 coverage within any medium or large global network may easily take 180 years. 182 While it needs to be clearly acknowledged that the add-path mechanism 183 provides the most general way to address the problem of distributing 184 many paths between BGP speakers, this document provides a much easier 185 to deploy solution that requires no modification to the BGP protocol 186 where only a few additional paths may be required. The alternative 187 method presented is capable of addressing critical service provider 188 requirements for disseminating more than a single path across an AS 189 with a significantly lower deployment cost. 191 3. Goals 193 The proposal described in this document is not intended to compete 194 with add-paths. Instead if deployed it is to be used as a very easy 195 method to accommodate the majority of applications which may require 196 presence of alternative BGP exit points. 198 It is presented to network operators as a possible choice and 199 provides those operators who need additional paths today an 200 alternative from the need to transition to a full mesh. 202 It is intended as a way to buy more time allowing for a smoother and 203 gradual migration where router upgrades will be required for perhaps 204 different reasons. It will also allow the time required where 205 standard RP/RE memory size can easily accommodate the associated 206 overhead with other techniques without any compromises. 208 4. Multi plane route reflection 210 The idea contained in the proposal assumes the use of route 211 reflection within the network. Other techniques as described in the 212 following sections already provide means for distribution of 213 alternate paths today. 215 Let's observe today's picture of simple route reflected domain: 217 ASBR3 218 *** 219 * * 220 +------------* *-----------+ 221 | AS1 * * | 222 | *** | 223 | | 224 | | 225 | | 226 | RR1 *** RR2 | 227 | *** * * *** | 228 |* * * P * * *| 229 |* * * * * *| 230 | *** *** *** | 231 | | 232 | IBGP | 233 | | 234 | | 235 | *** *** | 236 | * * * * | 237 +-----* *---------* *----+ 238 * * * * 239 *** *** 240 ASBR1 ASBR2 241 EBGP 243 Figure1: Simple route reflection 245 Figure 1 shows an AS that is connected via EBGP peering at ASBR1 and 246 ASBR2 to an upstream AS or set of ASes. For a given destination "D" 247 ASBR1 and ASBR2 will each have an external path P1 and P2 248 respectively. The AS network uses two route reflectors RR1 and RR2 249 for redundancy reasons. The route reflectors propagate the single 250 BGP best path for each route to all clients. All ASBRs are clients 251 of RR1 and RR2. 253 Below are the possible cases of the path information that ASBR3 may 254 receive from route reflectors RR1 and RR2: 256 1. When best path tie breaker is the IGP distance: When paths P1 and 257 P2 are considered to be equally good best path candidates the 258 selection will depend on the distance of the path next-hops from 259 the route reflector making the decision. Depending on the 260 positioning of the route reflectors in the IGP topology they may 261 choose the same best path or a different one. In such a case 262 ASBR3 may receive either the same path or different paths from 263 each of the route reflectors. 265 2. When best path tie breaker is Multi-Exit-Discriminator or Local 266 Preference: In this case only one path from preferred exit point 267 ASBR will be available to RRs since the other peering ASBR will 268 consider the IBGP path as best and will not announce (or if 269 already announced will withdraw) its own external path. The 270 exception here is the use of BGP Best-External proposal which 271 will allow stated ASBR to still propagate to the RRs its own 272 external path. Unfortunately RRs will not be able to distribute 273 it any further to other clients as only the overall best path 274 will be reflected. 276 The proposed solution is based on the use of additional route 277 reflectors or new functionality enabled on the existing route 278 reflectors that instead of distributing the best path for each route 279 will distribute an alternative path other then best. The best path 280 (main) reflector plane distributes the best path for each route as it 281 does today. The second plane distributes the second best path for 282 each route and so on. Distribution of N paths for each route can be 283 achieved by using N reflector planes. 285 As diverse-path functionality may be enabled on a per peer basis one 286 of the deployment model can be realized to continue advertisement of 287 overall best path from both route reflectors while in addition new 288 session can be provisioned to get additional path. That will allow 289 the non interupted use of best path even if one of the RRs goes down 290 provided that the overall best path is still a valid one. 292 Each plane of route reflectors is a logical entity and may or may not 293 be co-located with the existing best path route reflectors. Adding a 294 route reflector plane to a network may be as easy as enabling a 295 logical router partition, new BGP process or just a new configuration 296 knob on an existing route reflector and configuring an additional 297 IBGP session from the current clients if required. There are no code 298 changes required on the route reflector clients for this mechanism to 299 work. It is easy to observe that the installation of one or more 300 additional route reflector control planes is much cheaper and an 301 easier than the need of upgrading 100s of route reflector clients in 302 the entire network to support different bgp protocol encoding. 304 Diverse path route reflectors need the new ability to calculate and 305 propagate the Nth best path instead of the overall best path. An 306 implementation is encouraged to enable this new functionality on a 307 per neighbor basis. 309 While this is an implementation detail, the code to calculate Nth 310 best path is also required by other BGP solutions. For example in 311 the application of fast connectivity restoration BGP must calculate a 312 backup path for installation into the RIB and FIB ahead of the actual 313 failure. 315 To address the problem of external paths not being available to route 316 reflectors due to local preference or MED factors it is recommended 317 that ASBRs enable the best-external functionality in order to always 318 inject their external paths to the route reflectors. 320 4.1. Co-located best and backup path RRs 322 To simplify the description let's assume that we only use two route 323 reflector planes (N=2). When co-located the additional 2nd best path 324 reflectors are connected to the network at the same points from the 325 perspective of the IGP as the existing best path RRs. Let's also 326 assume that best-external is enabled on all ASBRs. 328 ASBR3 329 *** 330 * * 331 +------------* *-----------+ 332 | AS1 * * | 333 | *** | 334 | | 335 | RR1 RR2 | 336 | *** *** | 337 |* * *** * *| 338 |* * * * * *| 339 | *** * P * *** | 340 |* * * * * *| 341 |* * *** * *| 342 | *** *** | 343 | RR1' IBGP RR2'| 344 | | 345 | | 346 | *** *** | 347 | * * * * | 348 +-----* *---------* *----+ 349 * * * * 350 *** *** 351 ASBR1 ASBR2 353 EBGP 355 Figure2: Co-located 2nd best RR plane 357 The following is a list of configuration changes required to enable 358 the 2nd best path route reflector plane: 360 1. Unless same RR1/RR2 platform is being used adding RR1' and RR2' 361 either as logical or physical new control plane RRs in the same 362 IGP points as RR1 and RR2 respectively. 364 2. Enabling best-external on ASBRs 366 3. Enabling RR1' and RR2' for 2nd plane route reflection. 367 Alternatively instructing existing RR1 and RR2 to calculate also 368 2nd best path. 370 4. Unless one of the existing RRs is turned to advertise only 371 diverse path to it's current clients configuring new ASBRs-RR' 372 IBGP sessions 374 The expected behaviour is that under any BGP condition the ASBR3 and 375 P routers will receive both paths P1 and P2 for destination D. The 376 availability of both paths will allow them to implement a number of 377 new services as listed in the applications section below. 379 As an alternative to fully meshing all RRs and RRs' an operator who 380 has a large number of reflectors deployed today may choose to peer 381 newly introduced RRs' to a hierarchical RR' which would be an IBGP 382 interconnect point within the 2nd plane as well as between planes. 384 One of the deployment model of this scenario can be achieved by 385 simple upgrade of the existing route reflectors without the need to 386 deploy any new logical or physical platforms. Such upgrade would 387 allow route reflectors to service both upgraded to add-paths peers as 388 well as those peers which can not be immediately upgraded while in 389 the same time allowing to distribute more then single best path. The 390 obvious protocol benefit of using existing RRs to distribute towards 391 their clients best and diverse bgp paths over different IBGP session 392 is the automatic assurance that such client would always get 393 different paths with their next hop being different. 395 The way to accomplish this would be to create a separate IBGP session 396 for each N-th BGP path. Such session should be preferably terminated 397 at a different loopback address of the route reflector. At the BGP 398 OPEN stage of each such session a different bgp_router_id may be 399 used. Correspondingly route reflector should also allow its clients 400 to use the same bgp_router_id on each such session. 402 4.2. Randomly located best and backup path RRs 404 Now let's consider a deployment case where an operator wishes to 405 enable a 2nd RR' plane using only a single additional router in a 406 different network location to his current route reflectors. This 407 model would be of particular use in networks where some form of end- 408 to-end encapsulation (IP or MPLS) is enabled between provider edge 409 routers. 411 Note that this model of operation assumes that the present best path 412 route reflectors are only control plane devices. If the route 413 reflector is in the data forwarding path then the implementation must 414 be able to clearly separate the Nth best-path selection from the 415 selection of the paths to be used for data forwarding. The basic 416 premise of this mode of deployment assumes that all reflector planes 417 have the same information to choose from which includes the same set 418 of BGP paths. It also requires the ability to ignore the step of 419 comparison of the IGP metric to reach the bgp next hop during best- 420 path calculation. 422 ASBR3 423 *** 424 * * 425 +------------* *-----------+ 426 | AS1 * * | 427 | IBGP *** | 428 | | 429 | *** | 430 | * * | 431 | RR1 * P * RR2 | 432 | *** * * *** | 433 |* * *** * *| 434 |* * * *| 435 | *** RR' *** | 436 | *** | 437 | * * | 438 | * * | 439 | *** | 440 | *** *** | 441 | * * * * | 442 +-----* *---------* *----+ 443 * * * * 444 *** *** 445 ASBR1 ASBR2 447 EBGP 449 Figure3: Experimental deployment of 2nd best RR 451 The following is a list of configuration changes required to enable 452 the 2nd best path route reflector RR' as a single platform or to 453 enable one of the existing control plane RRs for diverse-path 454 functionality: 456 1. If needed adding RR' logical or physical as new route reflector 457 anywhere in the network 459 2. Enabling best-external on ASBRs 461 3. Disabling IGP metric check in BGP best path on all route 462 reflectors. 464 4. Enabling RR' or any of the existing RR for 2nd plane path 465 calculation 467 5. If required fully meshing newly added RRs' with the all other 468 reflectors in both planes. That condition does not apply if the 469 newly added RR'(s) already have peering to all ASBRs/PEs. 471 6. Unless one of the existing RRs is turned to advertise only 472 diverse path to it's current clients configuring new ASBRs-RR' 473 IBGP sessions 475 In this scenario the operator has the flexibility to introduce the 476 new additional route reflector functionality on any existing or new 477 hardware in the network. Any of the existing routers that are not 478 already members of the best path route reflector plane can be easily 479 configured to serve the 2nd plane either via using a logical / 480 virtual router partition or by having their bgp implementation 481 compliant to this specification. 483 Even if the IGP metric is not taken into consideration when comparing 484 paths during the bestpath calculation, an implementation still has to 485 consider paths with unreachable nexthops as invalid. It is worth 486 pointing out that some implementations today already allow for 487 configuration which results in no IGP metric comparison during the 488 best path calculation. 490 The additional planes of route reflectors do not need to be fully 491 redundant as the primary one does. If we are preparing for a single 492 network failure event, a failure of a non backed up N-th best-path 493 route reflector would not result in an connectivity outage of the 494 actual data plane. The reason is that this would at most affect the 495 presence of a backup path (not an active one) on same parts of the 496 network. If the operator chooses to build the N-th best path plane 497 redundantly by installing not one, but two or more route reflectors 498 serving each additional plane the additional robustness will be 499 achieved. 501 As a result of this solution ASBR3 and other ASBRs peering to RR' 502 will be receiving the 2nd best path. 504 Similarly to section 4.1 as an alternative to fully meshing all RRs & 505 RRs' an operator who may have a large number of reflectors already 506 deployed today may choose to peer newly introduced RRs' to a 507 hierarchical RR' which would be an IBGP interconnect point between 508 planes. 510 4.3. Multi plane route servers for Internet Exchanges 512 Another group of devices where the proposed multi-plane architecture 513 may be of particular applicability are EBGP route servers used at 514 many of internet exchange points. 516 In such cases 100s of ISPs are interconnected on a common LAN. 518 Instead of having 100s of direct EBGP sessions on each exchange 519 client, a single peering is created to the transparent route server. 520 The route server can only propagate a single best path. Mandating 521 the upgrade for 100s of different service providers in order to 522 implement add-path may be much more difficult as compared to asking 523 them for provisioning one new EBGP session to an Nth best-path route 524 server plane. That will allow to distribute more then single best 525 BGP path from a given route server to such IX peer. 527 The solution proposed in this document fits very well with the 528 requirement of having broader EBGP path diversity among the members 529 of any Internet Exchange Point. 531 5. Discussion on current models of IBGP route distribution 533 In today's networks BGP4 operates as specified in [RFC4271] 535 There are a number of technology choices for intra-AS BGP route 536 distribution: 538 1. Full mesh 540 2. Confederations 542 3. Route reflectors 544 5.1. Full Mesh 546 A full mesh, the most basic iBGP architecture, exists when all the 547 BGP speaking routers within the AS peer directly with all other BGP 548 speaking routers within the AS, irrespective of where a given router 549 resides within the AS (e.g., P router, PE router, etc..). 551 While this is the simplest intra-domain path distribution method, 552 historically there have been a number of challenges in realizing such 553 an IBGP full mesh in a large scale network. While some of these 554 challenges are no longer applicable today some may still apply, to 555 include the following: 557 1. Number of TCP sessions: The number of IBGP sessions on a single 558 router in a full mesh topology of a large scale service provider 559 can easily reach 100s. While on hardware and software used in 560 the late 70s, 80s and 90s such numbers could be of concern, today 561 customer requirements for the number of BGP sessions per box are 562 reaching 1000s. This is already an order of magnitude more then 563 the potential number of IBGP sessions. Advancement in hardware 564 and software used in production routers mean that running a full 565 mesh of IBGP sessions should not be dismissed due to the 566 resulting number of TCP sessions alone. 568 2. Provisioning: When operating and troubleshooting large networks 569 one of the top-most requirements is to keep the design as simple 570 as possible. When the autonomous systems network is composed of 571 hundreds of nodes it becomes very difficult to manually provision 572 a full mesh of IBGP sessions. Adding or removing a router 573 requires reconfiguration of all the other routers in the AS. 574 While this is a real concern today there is already work in 575 progress in the IETF to define IBGP peering automation through an 576 IBGP Auto Discovery [I-D.raszuk-idr-ibgp-auto-mesh] mechanism. 578 3. Number of paths: Another concern when deploying a full IBGP mesh 579 is the number of BGP paths for each route that have to be stored 580 at every node. This number is very tightly related to the number 581 of external peerings of an AS, the use of local preference or 582 multi-exit-discriminator techniques and the presence of best- 583 external [I-D.ietf-idr-best-external] advertisement 584 configuration. If we make a rough assumption that the BGP4 path 585 data structure consumes about 80-100 bytes the resulting control 586 plane memory requirement for 500,000 IPv4 routes with one 587 additional external path is 38-48 MB while for 1 million IPv4 588 routes it grows linearly to 76-95 MB. It is not possible to 589 reach a general conclusion if this condition is negligible or if 590 it is a show stopper for a full mesh deployment without direct 591 reference to a given network. 593 To summarize, a full mesh IBGP peering can offer natural 594 dissemination of multiple external paths among BGP speakers. When 595 realized with the help of IBGP Auto Discovery peering automation this 596 seems like a viable deployment especially in medium and small scale 597 networks. 599 5.2. Confederations 601 For the purpose of this document let's observe that confederations 602 [RFC5065] can be viewed as a hierarchical full mesh model. 604 Within each sub-AS BGP speakers are fully meshed and as discussed in 605 section 2.1 all full mesh characteristics (number of TCP sessions, 606 provisioning and potential concern over number of paths still apply 607 in the sub-AS scale). 609 In addition to the direct peering of all BGP speakers within each 610 sub-AS, all sub-AS border routers must also be fully meshed with each 611 other. Sub-AS border routers configured with best-external 612 functionality can inject additional exit paths within a sub-AS. 614 To summarize, it is technically sound to use confederations with the 615 combination of best-external to achieve distribution of more than a 616 single best path per route in a large autonomous systems. 618 In topologies where route reflectors are deployed within the 619 confederation sub-ASes the technique describe here does apply. 621 5.3. Route reflectors 623 The main motivation behind the use of route reflectors [RFC4456] is 624 the avoidance of the full mesh session management problem described 625 above. Route reflectors, for good or for bad, are the most common 626 solution today for interconnecting BGP speakers within an internal 627 routing domain. 629 Route reflector peerings follow the advertisement rules defined by 630 the BGP4 protocol. As a result only a single best path per prefix is 631 sent to client BGP peers. That is the main reason why many current 632 networks are exposed to a phenomenon called BGP path starvation which 633 essentially results in inability to deliver a number of applications 634 discussed later. 636 The route reflection equivalent when interconnecting BGP speakers 637 between domains is popularly called the Route Server and is globally 638 deployed today in many internet exchange points. 640 6. Deployment considerations 642 The diverse BGP path dissemination proposal allows the distribution 643 of more paths than just the best-path to route reflector or route 644 server clients of today's BGP4 implementations. 646 From the client's point of view receiving additional paths via 647 separate IBGP sessions terminated at the new router reflector plane 648 is functionally equivalent to constructing a full mesh peering 649 without the problems that such a full mesh would come with set of 650 problems as discussed in earlier section. 652 By precisely defining the number of reflector planes, network 653 operators have full control over the number of redundant paths in the 654 network. This number can be defined to address the needs of the 655 service(s) being deployed. 657 The Nth plane route reflectors should be acting as control plane 658 network entities. While they can be provisioned on the current 659 production routers selected Nth best BGP paths should not be used 660 directly in the date plane with the exception of such paths being BGP 661 multipath eligible and such functionality is enabled. On RRs being 662 in the data plane unless multipath is enabled 2nd best path is 663 expected to be a backup path and should be installed as such into 664 local RIB/FIB. 666 The proposed architecture deployed along with the BGP best-external 667 functionality covers all three cases where the classic BGP route 668 reflection paradigm would fail to distribute alternate exit points 669 paths. 671 1. ASBRs advertising their single best external paths with no local- 672 preference or multi-exit-discriminator present. 674 2. ASBRs advertising their single best external paths with local- 675 preference or multi-exit-discriminator present and with BGP best- 676 external functionality enabled. 678 3. ASBRs with multiple external paths. 680 Let's discuss the 3rd above case in more detail. This describes the 681 scenario of a single ASBR connected to multiple EBGP peers. In 682 practice this peering scenario is quite common. It is mostly due to 683 the geographic location of EBGP peers and the diversity of those 684 peers (for example peering to multiple tier 1 ISPs etc...). It is 685 not designed for failure recovery scenarios as single failure of the 686 ASBR would simultaneously result in loss of connectivity to all of 687 the peers. In most medium and large geographically distributed 688 networks there is always another ASBR or multiple ASBRs providing 689 peering backups, typically in other geographically diverse locations 690 in the network. 692 When an operator uses ASBRs with multiple peerings setting next hop 693 self will effectively allow to locally repair the atomic failure of 694 any external peer without any compromise to the data plane. The most 695 common reason for not setting next hop self is traditionally the 696 associated drawback of loosing ability to signal the external 697 failures of peering ASBRs or links to those ASBRs by fast IGP 698 flooding. Such potential drawback can be easily avoided by using 699 different peering address from the address used for next hop mapping 700 as well as removing such next hop from IGP at the last possible BGP 701 path failure. 703 Herein one may correctly observe that in the case of setting next hop 704 self on an ASBR, attributes of other external paths such ASBR is 705 peering with may be different from the attributes of its best 706 external path. Therefore, not injecting all of those external paths 707 with their corresponding attribute can not be compared to equivalent 708 paths for the same prefix coming from different ASBRs. 710 While such observation in principle is correct one should put things 711 in perspective of the overall goal which is to provide data plane 712 connectivity upon a single failure with minimal interruption/packet 713 loss. During such transient conditions, using even potentially 714 suboptimal exit points is reasonable, so long as forwarding 715 information loops are not introduced. In the mean time BGP control 716 plane will on its own re-advertise newly elected best external path, 717 route reflector planes will calculate their Nth best paths and 718 propagate to its clients. The result is that after seconds even if 719 potential sub-optimality were encountered it will be quickly and 720 naturally healed. 722 7. Summary of benefits 724 The diverse BGP path dissemination proposal provides the following 725 benefits when compared to the alternatives: 727 1. No modifications to BGP4 protocol. 729 2. No requirement for upgrades to edge and core routers. Backward 730 compatible with the existing BGP deployments. 732 3. Can be easily enabled by introduction of a new route reflector, 733 route server plane dedicated to the selection and distribution of 734 Nth best-path or just by new configuration of the upgraded 735 current route reflector(s). 737 4. Does not require major modification to BGP implementations in the 738 entire network which will result in an unnecessary increase of 739 memory and CPU consumption due to the shift from today's per 740 prefix to a per path advertisement state tracking. 742 5. Can be safely deployed gradually on a RR cluster basis. 744 6. The proposed solution is equally applicable to any BGP address 745 family as described in Multiprotocol Extensions for BGP-4 RFC4760 746 [RFC4760]. In particular it can be used "as is" without any 747 modifications to both IPv4 and IPv6 address families. 749 8. Applications 751 This section lists the most common applications which require 752 presence of redundant BGP paths: 754 1. Fast connectivity restoration where backup paths with alternate 755 exit points would be pre-installed as well as pre-resolved in the 756 FIB of routers. That would allow for a local action upon 757 reception of a critical event notification of network / node 758 failure. This failure recovery mechaism based on the presence of 759 backup paths is also suitable for gracefully addressing scheduled 760 maintenane requirements as described in 761 [I-D.decraene-bgp-graceful-shutdown-requirements]. 763 2. Multi-path load balancing for both IBGP and EBGP. 765 3. BGP control plane churn reduction both intra-domain and inter- 766 domain. 768 An important point to observe is that all of the above intra-domain 769 applications based on the use of reflector planes but are also 770 applicable in the inter-domain Internet exchange point examples. As 771 discussed in section 4.3 an internet exchange can conceptually deploy 772 shadow route server planes each responsible for distribution of an 773 Nth best path to its EBGP peers. In practice it may just equal to 774 new short configuration and establishment of new BGP sessions to IX 775 peers. 777 9. Security considerations 779 The new mechanism for diverse BGP path dissemination proposed in this 780 document does not introduce any new security concerns as compared to 781 base BGP4 specification [RFC4271]. 783 10. IANA Considerations 785 The new mechanism for diverse BGP path dissemination does not require 786 any new allocations from IANA. 788 11. Contributors 790 The following people contributed significantly to the content of the 791 document: 793 Selma Yilmaz 794 Cisco Systems 795 170 West Tasman Drive 796 San Jose, CA 95134 797 US 798 Email: seyilmaz@cisco.com 800 Satish Mynam 801 Cisco Systems 802 170 West Tasman Drive 803 San Jose, CA 95134 804 US 805 Email: mynam@cisco.com 807 Isidor Kouvelas 808 Cisco Systems 809 170 West Tasman Drive 810 San Jose, CA 95134 811 US 812 Email: kouvelas@cisco.com 814 12. Acknowledgments 816 The authors would like to thank Bruno Decraene, Bart Peirens, Eric 817 Rosen, Jim Uttaro, Renwei Li and George Wes for their valuable input. 819 The authors would also like to express special thank you to number of 820 operators who helped to optimize the provided solution to be as close 821 as possible to their daily operational practices. Especially many 822 thx goes to Ted Seely, Shan Amante, Benson Schliesser and Seiichi 823 Kawamura. 825 13. References 827 13.1. Normative References 829 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 830 Requirement Levels", BCP 14, RFC 2119, March 1997. 832 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 833 Protocol 4 (BGP-4)", RFC 4271, January 2006. 835 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 836 "Multiprotocol Extensions for BGP-4", RFC 4760, 837 January 2007. 839 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 840 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 841 May 2008. 843 13.2. Informative References 845 [I-D.decraene-bgp-graceful-shutdown-requirements] 846 Decraene, B., Francois, P., pelsser, c., Ahmad, Z., and A. 847 Armengol, "Requirements for the graceful shutdown of BGP 848 sessions", 849 draft-decraene-bgp-graceful-shutdown-requirements-01 (work 850 in progress), March 2009. 852 [I-D.ietf-idr-add-paths] 853 Walton, D., Retana, A., Chen, E., and J. Scudder, 854 "Advertisement of Multiple Paths in BGP", 855 draft-ietf-idr-add-paths-04 (work in progress), 856 August 2010. 858 [I-D.ietf-idr-best-external] 859 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 860 Gredler, "Advertisement of the best external route in 861 BGP", draft-ietf-idr-best-external-04 (work in progress), 862 April 2011. 864 [I-D.ietf-idr-route-oscillation] 865 McPherson, D., "BGP Persistent Route Oscillation 866 Condition", draft-ietf-idr-route-oscillation-01 (work in 867 progress), February 2002. 869 [I-D.pmohapat-idr-fast-conn-restore] 870 Mohapatra, P., Fernando, R., Filsfils, C., and R. Raszuk, 871 "Fast Connectivity Restoration Using BGP Add-path", 872 draft-pmohapat-idr-fast-conn-restore-01 (work in 873 progress), March 2011. 875 [I-D.raszuk-idr-ibgp-auto-mesh] 876 Raszuk, R., "IBGP Auto Mesh", 877 draft-raszuk-idr-ibgp-auto-mesh-00 (work in progress), 878 June 2003. 880 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 881 Reflection: An Alternative to Full Mesh Internal BGP 882 (IBGP)", RFC 4456, April 2006. 884 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 885 System Confederations for BGP", RFC 5065, August 2007. 887 Authors' Addresses 889 Robert Raszuk (editor) 890 Cisco Systems 891 170 West Tasman Drive 892 San Jose, CA 95134 893 US 895 Email: raszuk@cisco.com 897 Rex Fernando 898 Cisco Systems 899 170 West Tasman Drive 900 San Jose, CA 95134 901 US 903 Email: rex@cisco.com 905 Keyur Patel 906 Cisco Systems 907 170 West Tasman Drive 908 San Jose, CA 95134 909 US 911 Email: keyupate@cisco.com 913 Danny McPherson 914 Verisign 915 21345 Ridgetop Circle 916 Dulles, VA 20166 917 US 919 Email: dmcpherson@verisign.com 920 Kenji Kumaki 921 KDDI Corporation 922 Garden Air Tower 923 Iidabashi, Chiyoda-ku, Tokyo 102-8460 924 Japan 926 Email: ke-kumaki@kddi.com