idnits 2.17.1 draft-ietf-grow-diverse-bgp-path-dist-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 2012) is 4296 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 == Outdated reference: A later version (-03) exists of draft-pmohapat-idr-fast-conn-restore-02 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW Working Group R. Raszuk, Ed. 3 Internet-Draft NTT MCL 4 Intended status: Informational R. Fernando 5 Expires: January 2, 2013 K. Patel 6 Cisco Systems 7 D. McPherson 8 Verisign 9 K. Kumaki 10 KDDI Corporation 11 July 2012 13 Distribution of diverse BGP paths. 14 draft-ietf-grow-diverse-bgp-path-dist-08 16 Abstract 18 The BGP4 protocol specifies the selection and propagation of a single 19 best path for each prefix. As defined and widely deployed today BGP 20 has no mechanisms to distribute alternate paths which are not 21 considered best path between its speakers. This behaviour results in 22 number of disadvantages for new applications and services. 24 The main objective of this document is to observe that by simply 25 adding new session between route reflector and it's client Nth best 26 path can be distributed. Document also compares existing solutions 27 and proposed ideas that enable distribution of more paths than just 28 the best path. 30 This proposal does not specify any changes to the BGP protocol 31 definition. It does not require software upgrade of provider edge 32 routers acting as route reflector clients. 34 Status of this Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 2, 2013. 50 Copyright Notice 52 Copyright (c) 2012 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. History . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2.1. BGP Add-Paths Proposal . . . . . . . . . . . . . . . . . . 4 70 3. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 71 4. Multi plane route reflection . . . . . . . . . . . . . . . . . 6 72 4.1. Co-located best and backup path RRs . . . . . . . . . . . 9 73 4.2. Randomly located best and backup path RRs . . . . . . . . 11 74 4.3. Multi plane route servers for Internet Exchanges . . . . . 14 75 5. Discussion on current models of IBGP route distribution . . . 14 76 5.1. Full Mesh . . . . . . . . . . . . . . . . . . . . . . . . 14 77 5.2. Confederations . . . . . . . . . . . . . . . . . . . . . . 15 78 5.3. Route reflectors . . . . . . . . . . . . . . . . . . . . . 16 79 6. Deployment considerations . . . . . . . . . . . . . . . . . . 16 80 7. Summary of benefits . . . . . . . . . . . . . . . . . . . . . 18 81 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 19 82 9. Security considerations . . . . . . . . . . . . . . . . . . . 19 83 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 84 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 20 85 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 86 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 87 13.1. Normative References . . . . . . . . . . . . . . . . . . . 21 88 13.2. Informative References . . . . . . . . . . . . . . . . . . 21 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 91 1. Introduction 93 Current BGP4 [RFC4271] protocol specification allows for the 94 selection and propagation of only one best path for each prefix. The 95 BGP protocol as defined today has no mechanism to distribute other 96 than best path between its speakers. This behaviour results in a 97 number of problems in the deployment of new applications and 98 services. 100 This document presents a mechanism for solving the problem based on 101 the conceptual creation of parallel route reflector planes. It also 102 compares existing solutions and proposes ideas that enable 103 distribution of more paths than just the best path. The parallel 104 route reflector planes solution brings very significant benefits at a 105 negligible capex and opex deployment price as compared to the 106 alternative techniques (full bgp mesh or add-paths) and is being 107 considered by a number of network operators for deployment in their 108 networks. 110 This proposal does not specify any changes to the BGP protocol 111 definition. It does not require upgrades to provider edge or core 112 routers nor does it need network wide upgrades. The only upgrade 113 required is the new functionality on the new or current route 114 reflectors. 116 2. History 118 The need to disseminate more paths than just the best path is 119 primarily driven by three requirements. First is the problem of BGP 120 oscillations [RFC3345]. The second is the desire for faster 121 reachability restoration in the event of network link or network 122 element's failure. Third requirement is to enhance BGP load 123 balancing capabilities. Those reasons have lead to the proposal of 124 BGP add-paths [I-D.ietf-idr-add-paths]. 126 2.1. BGP Add-Paths Proposal 128 As it has been proven that distribution of only the best path of a 129 route is not sufficient to meet the needs of continuously growing 130 number of services carried over BGP, the add-paths proposal was 131 submitted in 2002 to enable BGP to distribute more than one path. 132 This is achieved by including as a part of the NLRI an additional 133 four octet value called the Path Identifier. 135 The implication of this change on a BGP implementation is that it 136 must now maintain per path, instead of per prefix, peer advertisement 137 state to track to which of the peers given path was advertised to. 139 This new requirement comes with its own memory and processing cost. 141 An important observation is that distribution of more than one best 142 path by Autonomous System Border Routers (ASBRs) with multiple EBGP 143 peers attached to it where no "next hop self" is set may result in 144 bestpath selection inconsistency within the autonomous system. 145 Therefore it is also required to attach in the form of a new 146 attribute the possible tie breakers and propagate those within the 147 domain. The example of such attribute for the purpose of fast 148 connectivity restoration to address that very case of ASBR injecting 149 multiple external paths into the IBGP mesh has been presented and 150 discussed in Fast Connectivity Restoration Using BGP Add-paths 151 [I-D.ietf-idr-add-paths] document. Based on the additionally 152 propagated information also best path selection is recommended to be 153 modified to make sure that best and backup path selection within the 154 domain stays consistent. More discussion on this particular point 155 will be contained in the deployment considerations section below. In 156 the proposed solution in this document we observe that in order to 157 address most of the applications just use of best external 158 advertisement is required. For ASBRs which are peering to multiple 159 upstream domains setting "next hop self" is recommended. 161 The add paths protocol extensions have to be implemented by all the 162 routers within an Autonomous System (AS) in order for the system to 163 work correctly. It remains quite a research topic to analyze 164 benefits or risk associated with partial add-paths deployments. The 165 risk becomes even greater in networks not using some form of edge to 166 edge encapsulation. 168 The required code modifications can offer the foundation for 169 enhancements such as the Fast Connectivity Restoration Using BGP Add- 170 path [I-D.pmohapat-idr-fast-conn-restore]. The deployment of such 171 technology in an entire service provider network requires software 172 and perhaps sometimes in the cases of End-of-Engineering or End-of- 173 Life equipment even hardware upgrades. Such operation may or may not 174 be economically feasible. Even if add-path functionality was 175 available today on all commercial routing equipment and across all 176 vendors, experience indicates that to achieve 100% deployment 177 coverage within any medium or large global network may easily take 178 years. 180 While it needs to be clearly acknowledged that the add-path mechanism 181 provides the most general way to address the problem of distributing 182 many paths between BGP speakers, this document provides a much easier 183 to deploy solution that requires no modification to the BGP protocol 184 where only a few additional paths may be required. The alternative 185 method presented is capable of addressing critical service provider 186 requirements for disseminating more than a single path across an AS 187 with a significantly lower deployment cost. That in the light of 188 number of general network scaling concerns documented in RFC4984 189 [RFC4271] "Report from the IAB Workshop on Routing and Addressing" 190 may provide a significant advantage. 192 3. Goals 194 The proposal described in this document is not intended to compete 195 with add-paths. It provides an interim solution until the 196 standardization and implementation of add-paths and until support for 197 that function can be deployed across the network. 199 It is presented to network operators as a possible choice and 200 provides those operators who need additional paths today an 201 alternative from the need to transition to a full mesh. The Nth best 202 path describes a set of N paths with different BGP next hops with no 203 implication of ordering or preference among said N paths. 205 It is intended as a way to buy more time allowing for a smoother and 206 gradual migration where router upgrades will be required for perhaps 207 different reasons. It will also allow the time required where 208 standard RP/RE memory size can easily accommodate the associated 209 overhead with other techniques without any compromises. 211 4. Multi plane route reflection 213 The idea contained in the proposal assumes the use of route 214 reflection within the network. 216 Let's observe today's picture of simple route reflected domain: 218 ASBR3 219 *** 220 * * 221 +------------* *-----------+ 222 | AS1 * * | 223 | *** | 224 | | 225 | | 226 | | 227 | RR1 *** RR2 | 228 | *** * * *** | 229 |* * * P * * *| 230 |* * * * * *| 231 | *** *** *** | 232 | | 233 | IBGP | 234 | | 235 | | 236 | *** *** | 237 | * * * * | 238 +-----* *---------* *----+ 239 * * * * 240 *** *** 241 ASBR1 ASBR2 242 EBGP 244 Figure1: Simple route reflection 246 Abbreviations used: RR - Route Reflector, P - Core router. 248 Figure 1 shows an AS that is connected via EBGP peering at ASBR1 and 249 ASBR2 to an upstream AS or set of ASes. For a given destination "D" 250 ASBR1 and ASBR2 may have an external path P1 and P2 respectively. 251 The AS network uses two route reflectors RR1 and RR2 for redundancy 252 reasons. The route reflectors propagate the single BGP best path for 253 each route to all clients. All ASBRs are clients of RR1 and RR2. 255 Below are the possible cases of the path information that ASBR3 may 256 receive from route reflectors RR1 and RR2: 258 1. When best path tie breaker is the IGP distance: When paths P1 and 259 P2 are considered to be equally good best path candidates the 260 selection will depend on the distance of the path next-hops from 261 the route reflector making the decision. Depending on the 262 positioning of the route reflectors in the IGP topology they may 263 choose the same best path or a different one. In such a case 264 ASBR3 may receive either the same path or different paths from 265 each of the route reflectors. 267 2. When best path tie breaker is Multi-Exit-Discriminator or Local 268 Preference: In this case only one path from preferred exit point 269 ASBR will be available to RRs since the other peering ASBR will 270 consider the IBGP path as best and will not announce (or if 271 already announced will withdraw) its own external path. The 272 exception here is the use of BGP Best-External proposal which 273 will allow stated ASBR to still propagate to the RRs its own 274 external path. Unfortunately RRs will not be able to distribute 275 it any further to other clients as only the overall best path 276 will be reflected. 278 There is no requirement of path ordering. The "Nth best path" really 279 describes set of N paths with different bgp next hops. 281 The proposed solution is based on the use of additional route 282 reflectors or new functionality enabled on the existing route 283 reflectors that instead of distributing the best path for each route 284 will distribute an alternative path other than best. The best path 285 (main) reflector plane distributes the best path for each route as it 286 does today. The second plane distributes the second best path for 287 each route and so on. Distribution of N paths for each route can be 288 achieved by using N reflector planes. 290 As diverse-path functionality may be enabled on a per peer basis one 291 of the deployment model can be realized to continue advertisement of 292 overall best path from both route reflectors while in addition new 293 session can be provisioned to get additional path. That will allow 294 the non interrupted use of best path even if one of the RRs goes down 295 provided that the overall best path is still a valid one. 297 Each plane of route reflectors is a logical entity and may or may not 298 be co-located with the existing best path route reflectors. Adding a 299 route reflector plane to a network may be as easy as enabling a 300 logical router partition, new BGP process or just a new configuration 301 knob on an existing route reflector and configuring an additional 302 IBGP session from the current clients if required. There are no code 303 changes required on the route reflector clients for this mechanism to 304 work. It is easy to observe that the installation of one or more 305 additional route reflector control planes is much cheaper and an 306 easier than the need of upgrading 100s of route reflector clients in 307 the entire network to support different bgp protocol encoding. 309 Diverse path route reflectors need the new ability to calculate and 310 propagate the Nth best path instead of the overall best path. An 311 implementation is encouraged to enable this new functionality on a 312 per neighbor basis. 314 While this is an implementation detail, the code to calculate Nth 315 best path is also required by other BGP solutions. For example in 316 the application of fast connectivity restoration BGP must calculate a 317 backup path for installation into the RIB and FIB ahead of the actual 318 failure. 320 To address the problem of external paths not being available to route 321 reflectors due to local preference or MED factors it is recommended 322 that ASBRs enable the best-external functionality in order to always 323 inject their external paths to the route reflectors. 325 4.1. Co-located best and backup path RRs 327 To simplify the description let's assume that we only use two route 328 reflector planes (N=2). When co-located the additional 2nd best path 329 reflectors are connected to the network at the same points from the 330 perspective of the IGP as the existing best path RRs. Let's also 331 assume that best-external is enabled on all ASBRs. 333 ASBR3 334 *** 335 * * 336 +------------* *-----------+ 337 | AS1 * * | 338 | *** | 339 | | 340 | RR1 RR2 | 341 | *** *** | 342 |* * *** * *| 343 |* * * * * *| 344 | *** * P * *** | 345 |* * * * * *| 346 |* * *** * *| 347 | *** *** | 348 | RR1' IBGP RR2'| 349 | | 350 | | 351 | *** *** | 352 | * * * * | 353 +-----* *---------* *----+ 354 * * * * 355 *** *** 356 ASBR1 ASBR2 358 EBGP 360 Figure2: Co-located 2nd best RR plane 362 The following is a list of configuration changes required to enable 363 the 2nd best path route reflector plane: 365 1. Unless same RR1/RR2 platform is being used adding RR1' and RR2' 366 either as logical or physical new control plane RRs in the same 367 IGP points as RR1 and RR2 respectively. 369 2. Enabling best-external on ASBRs 371 3. Enabling RR1' and RR2' for 2nd plane route reflection. 372 Alternatively instructing existing RR1 and RR2 to calculate also 373 2nd best path. 375 4. Unless one of the existing RRs is turned to advertise only 376 diverse path to its current clients configuring new ASBRs-RR' 377 IBGP sessions 379 The expected behaviour is that under any BGP condition the ASBR3 and 380 P routers will receive both paths P1 and P2 for destination D. The 381 availability of both paths will allow them to implement a number of 382 new services as listed in the applications section below. 384 As an alternative to fully meshing all RRs and RRs' an operator who 385 has a large number of reflectors deployed today may choose to peer 386 newly introduced RRs' to a hierarchical RR' which would be an IBGP 387 interconnect point within the 2nd plane as well as between planes. 389 One of the deployment model of this scenario can be achieved by 390 simple upgrade of the existing route reflectors without the need to 391 deploy any new logical or physical platforms. Such upgrade would 392 allow route reflectors to service both upgraded to add-paths peers as 393 well as those peers which can not be immediately upgraded while in 394 the same time allowing to distribute more than single best path. The 395 obvious protocol benefit of using existing RRs to distribute towards 396 their clients best and diverse bgp paths over different IBGP session 397 is the automatic assurance that such client would always get 398 different paths with their next hop being different. 400 The way to accomplish this would be to create a separate IBGP session 401 for each N-th BGP path. Such session should be preferably terminated 402 at a different loopback address of the route reflector. At the BGP 403 OPEN stage of each such session a different bgp_router_id may be 404 used. Correspondingly route reflector should also allow its clients 405 to use the same bgp_router_id on each such session. 407 4.2. Randomly located best and backup path RRs 409 Now let's consider a deployment case where an operator wishes to 410 enable a 2nd RR' plane using only a single additional router in a 411 different network location to his current route reflectors. This 412 model would be of particular use in networks where some form of end- 413 to-end encapsulation (IP or MPLS) is enabled between provider edge 414 routers. 416 Note that this model of operation assumes that the present best path 417 route reflectors are only control plane devices. If the route 418 reflector is in the data forwarding path then the implementation must 419 be able to clearly separate the Nth best-path selection from the 420 selection of the paths to be used for data forwarding. The basic 421 premise of this mode of deployment assumes that all reflector planes 422 have the same information to choose from which includes the same set 423 of BGP paths. It also requires the ability to ignore the step of 424 comparison of the IGP metric to reach the bgp next hop during best- 425 path calculation. 427 ASBR3 428 *** 429 * * 430 +------------* *-----------+ 431 | AS1 * * | 432 | IBGP *** | 433 | | 434 | *** | 435 | * * | 436 | RR1 * P * RR2 | 437 | *** * * *** | 438 |* * *** * *| 439 |* * * *| 440 | *** RR' *** | 441 | *** | 442 | * * | 443 | * * | 444 | *** | 445 | *** *** | 446 | * * * * | 447 +-----* *---------* *----+ 448 * * * * 449 *** *** 450 ASBR1 ASBR2 452 EBGP 454 Figure3: Experimental deployment of 2nd best RR 456 The following is a list of configuration changes required to enable 457 the 2nd best path route reflector RR' as a single platform or to 458 enable one of the existing control plane RRs for diverse-path 459 functionality: 461 1. If needed adding RR' logical or physical as new route reflector 462 anywhere in the network 464 2. Enabling best-external on ASBRs 466 3. Disabling IGP metric check in BGP best path on all route 467 reflectors. 469 4. Enabling RR' or any of the existing RR for 2nd plane path 470 calculation 472 5. If required fully meshing newly added RRs' with the all other 473 reflectors in both planes. That condition does not apply if the 474 newly added RR'(s) already have peering to all ASBRs/PEs. 476 6. Unless one of the existing RRs is turned to advertise only 477 diverse path to its current clients configuring new ASBRs-RR' 478 IBGP sessions 480 In this scenario the operator has the flexibility to introduce the 481 new additional route reflector functionality on any existing or new 482 hardware in the network. Any of the existing routers that are not 483 already members of the best path route reflector plane can be easily 484 configured to serve the 2nd plane either via using a logical / 485 virtual router partition or by having their bgp implementation 486 compliant to this specification. 488 Even if the IGP metric is not taken into consideration when comparing 489 paths during the bestpath calculation, an implementation still has to 490 consider paths with unreachable nexthops as invalid. It is worth 491 pointing out that some implementations today already allow for 492 configuration which results in no IGP metric comparison during the 493 best path calculation. 495 The additional planes of route reflectors do not need to be fully 496 redundant as the primary one does. If we are preparing for a single 497 network failure event, a failure of a non backed up N-th best-path 498 route reflector would not result in an connectivity outage of the 499 actual data plane. The reason is that this would at most affect the 500 presence of a backup path (not an active one) on same parts of the 501 network. If the operator chooses to create the N-th best path plane 502 redundantly by installing not one, but two or more route reflectors 503 serving each additional plane the additional robustness will be 504 achieved. 506 As a result of this solution ASBR3 and other ASBRs peering to RR' 507 will be receiving the 2nd best path. 509 Similarly to section 4.1 as an alternative to fully meshing all RRs & 510 RRs' an operator who may have a large number of reflectors already 511 deployed today may choose to peer newly introduced RRs' to a 512 hierarchical RR' which would be an IBGP interconnect point between 513 planes. 515 It is recommended that an implementation will advertise overall best 516 path over Nth diverse-path session if there is no other BGP path with 517 different next hop present. That is equivalent to today's case where 518 client is connected to more than one RR. 520 4.3. Multi plane route servers for Internet Exchanges 522 Another group of devices where the proposed multi-plane architecture 523 may be of particular applicability are EBGP route servers used at 524 many of internet exchange points. 526 In such cases 100s of ISPs are interconnected on a common LAN. 527 Instead of having 100s of direct EBGP sessions on each exchange 528 client, a single peering is created to the transparent route server. 529 The route server can only propagate a single best path. Mandating 530 the upgrade for 100s of different service providers in order to 531 implement add-path may be much more difficult as compared to asking 532 them for provisioning one new EBGP session to an Nth best-path route 533 server plane. That will allow to distribute more than the single 534 best BGP path from a given route server to such Internet Exchange 535 Point (IX) peer. 537 The solution proposed in this document fits very well with the 538 requirement of having broader EBGP path diversity among the members 539 of any Internet Exchange Point. 541 5. Discussion on current models of IBGP route distribution 543 In today's networks BGP4 operates as specified in [RFC4271] 545 There are a number of technology choices for intra-AS BGP route 546 distribution: 548 1. Full mesh 550 2. Confederations 552 3. Route reflectors 554 5.1. Full Mesh 556 A full mesh, the most basic iBGP architecture, exists when all the 557 BGP speaking routers within the AS peer directly with all other BGP 558 speaking routers within the AS, irrespective of where a given router 559 resides within the AS (e.g., P router, PE router, etc..). 561 While this is the simplest intra-domain path distribution method, 562 historically there have been a number of challenges in realizing such 563 an IBGP full mesh in a large scale network. While some of these 564 challenges are no longer applicable today some may still apply, to 565 include the following: 567 1. Number of TCP sessions: The number of IBGP sessions on a single 568 router in a full mesh topology of a large scale service provider 569 can easily reach 100s. While on hardware and software used in 570 the late 70s, 80s and 90s such numbers could be of concern, today 571 customer requirements for the number of BGP sessions per box are 572 reaching 1000s. This is already an order of magnitude more than 573 the potential number of IBGP sessions. Advancement in hardware 574 and software used in production routers mean that running a full 575 mesh of IBGP sessions should not be dismissed due to the 576 resulting number of TCP sessions alone. 578 2. Provisioning: When operating and troubleshooting large networks 579 one of the top-most requirements is to keep the design as simple 580 as possible. When the autonomous systems network is composed of 581 hundreds of nodes it becomes very difficult to manually provision 582 a full mesh of IBGP sessions. Adding or removing a router 583 requires reconfiguration of all the other routers in the AS. 584 While this is a real concern today there is already work in 585 progress in the IETF to define IBGP peering automation through an 586 IBGP Auto Discovery [I-D.raszuk-idr-ibgp-auto-mesh] mechanism. 588 3. Number of paths: Another concern when deploying a full IBGP mesh 589 is the number of BGP paths for each route that have to be stored 590 at every node. This number is very tightly related to the number 591 of external peerings of an AS, the use of local preference or 592 multi-exit-discriminator techniques and the presence of best- 593 external [I-D.ietf-idr-best-external] advertisement 594 configuration. If we make a rough assumption that the BGP4 path 595 data structure consumes about 80-100 bytes the resulting control 596 plane memory requirement for 500,000 IPv4 routes with one 597 additional external path is 38-48 MB while for 1 million IPv4 598 routes it grows linearly to 76-95 MB. It is not possible to 599 reach a general conclusion if this condition is negligible or if 600 it is a show stopper for a full mesh deployment without direct 601 reference to a given network. 603 To summarize, a full mesh IBGP peering can offer natural 604 dissemination of multiple external paths among BGP speakers. When 605 realized with the help of IBGP Auto Discovery peering automation this 606 seems like a viable deployment especially in medium and small scale 607 networks. 609 5.2. Confederations 611 For the purpose of this document let's observe that confederations 612 [RFC5065] can be viewed as a hierarchical full mesh model. 614 Within each sub-AS BGP speakers are fully meshed and as discussed in 615 section 2.1 all full mesh characteristics (number of TCP sessions, 616 provisioning and potential concern over number of paths still apply 617 in the sub-AS scale). 619 In addition to the direct peering of all BGP speakers within each 620 sub-AS, all sub-AS border routers must also be fully meshed with each 621 other. Sub-AS border routers configured with best-external 622 functionality can inject additional exit paths within a sub-AS. 624 To summarize, it is technically sound to use confederations with the 625 combination of best-external to achieve distribution of more than a 626 single best path per route in a large autonomous systems. 628 In topologies where route reflectors are deployed within the 629 confederation sub-ASes the technique describe here does apply. 631 5.3. Route reflectors 633 The main motivation behind the use of route reflectors [RFC4456] is 634 the avoidance of the full mesh session management problem described 635 above. Route reflectors, for good or for bad, are the most common 636 solution today for interconnecting BGP speakers within an internal 637 routing domain. 639 Route reflector peerings follow the advertisement rules defined by 640 the BGP4 protocol. As a result only a single best path per prefix is 641 sent to client BGP peers. That is the main reason why many current 642 networks are exposed to a phenomenon called BGP path starvation which 643 essentially results in inability to deliver a number of applications 644 discussed later. 646 The route reflection equivalent when interconnecting BGP speakers 647 between domains is popularly called the Route Server and is globally 648 deployed today in many internet exchange points. 650 6. Deployment considerations 652 Distribution of diverse BGP paths proposal allows the dissemination 653 of more paths than just the best-path to route reflector or route 654 server clients of today's BGP4 implementations. As deployment 655 recommendation it needs to be mentioned that fast connectivity 656 restoration as well as majority of intra-domain BGP level load 657 balancing needs can be accommodated with only two paths (overall best 658 as well as second best). Therefor as deployment recommendation this 659 document suggests use of N=2 with diverse-path. 661 From the client's point of view receiving additional paths via 662 separate IBGP sessions terminated at the new router reflector plane 663 is functionally equivalent to constructing a full mesh peering 664 without the problems that such a full mesh would come with set of 665 problems as discussed in earlier section. 667 By precisely defining the number of reflector planes, network 668 operators have full control over the number of redundant paths in the 669 network. This number can be defined to address the needs of the 670 service(s) being deployed. 672 The Nth plane route reflectors should be acting as control plane 673 network entities. While they can be provisioned on the current 674 production routers selected Nth best BGP paths should not be used 675 directly in the date plane with the exception of such paths being BGP 676 multipath eligible and such functionality is enabled. On RRs being 677 in the data plane unless multipath is enabled 2nd best path is 678 expected to be a backup path and should be installed as such into 679 local RIB/FIB. 681 The use of terminology of "planes" in this document is more of a 682 conceptual nature. In practice all paths are still kept in the 683 single table where normal best path is calculated. That means that 684 tools like looking glass should not observe any changes nor impact 685 when diverse-path has been enabled. 687 The proposed architecture deployed along with the BGP best-external 688 functionality covers all three cases where the classic BGP route 689 reflection paradigm would fail to distribute alternate exit points 690 paths. 692 1. ASBRs advertising their single best external paths with no local- 693 preference or multi-exit-discriminator present. 695 2. ASBRs advertising their single best external paths with local- 696 preference or multi-exit-discriminator present and with BGP best- 697 external functionality enabled. 699 3. ASBRs with multiple external paths. 701 This section focuses on discussion of the 3rd above case in more 702 detail. This describes the scenario of a single ASBR connected to 703 multiple EBGP peers. In practice this peering scenario is quite 704 common. It is mostly due to the geographic location of EBGP peers 705 and the diversity of those peers (for example peering to multiple 706 tier 1 ISPs etc...). It is not designed for failure recovery 707 scenarios as single failure of the ASBR would simultaneously result 708 in loss of connectivity to all of the peers. In most medium and 709 large geographically distributed networks there is always another 710 ASBR or multiple ASBRs providing peering backups, typically in other 711 geographically diverse locations in the network. 713 When an operator uses ASBRs with multiple peerings setting next hop 714 self will effectively allow to locally repair the atomic failure of 715 any external peer without any compromise to the data plane. The most 716 common reason for not setting next hop self is traditionally the 717 associated drawback of loosing ability to signal the external 718 failures of peering ASBRs or links to those ASBRs by fast IGP 719 flooding. Such potential drawback can be easily avoided by using 720 different peering address from the address used for next hop mapping 721 as well as removing such next hop from IGP at the last possible BGP 722 path failure. 724 Herein one may correctly observe that in the case of setting next hop 725 self on an ASBR, attributes of other external paths such ASBR is 726 peering with may be different from the attributes of its best 727 external path. Therefore, not injecting all of those external paths 728 with their corresponding attribute can not be compared to equivalent 729 paths for the same prefix coming from different ASBRs. 731 While such observation in principle is correct one should put things 732 in perspective of the overall goal which is to provide data plane 733 connectivity upon a single failure with minimal interruption/packet 734 loss. During such transient conditions, using even potentially 735 suboptimal exit points is reasonable, so long as forwarding 736 information loops are not introduced. In the mean time BGP control 737 plane will on its own re-advertise newly elected best external path, 738 route reflector planes will calculate their Nth best paths and 739 propagate to its clients. The result is that after seconds even if 740 potential sub-optimality were encountered it will be quickly and 741 naturally healed. 743 7. Summary of benefits 745 Distribution of diverse BGP paths proposal provides the following 746 benefits when compared to the alternatives: 748 1. No modifications to BGP4 protocol. 750 2. No requirement for upgrades to edge and core routers (as required 751 in draft-ietf-idr-add-paths-07). Backward compatible with the 752 existing BGP deployments. 754 3. Can be easily enabled by introduction of a new route reflector, 755 route server plane dedicated to the selection and distribution of 756 Nth best-path or just by new configuration of the upgraded 757 current route reflector(s). 759 4. Does not require major modification to BGP implementations in the 760 entire network which will result in an unnecessary increase of 761 memory and CPU consumption due to the shift from today's per 762 prefix to a per path advertisement state tracking. 764 5. Can be safely deployed gradually on a RR cluster basis. 766 6. The proposed solution is equally applicable to any BGP address 767 family as described in Multiprotocol Extensions for BGP-4 RFC4760 768 [RFC4760]. In particular it can be used "as is" without any 769 modifications to both IPv4 and IPv6 address families. 771 8. Applications 773 This section lists the most common applications which require 774 presence of redundant BGP paths: 776 1. Fast connectivity restoration where backup paths with alternate 777 exit points would be pre-installed as well as pre-resolved in the 778 FIB of routers. That would allow for a local action upon 779 reception of a critical event notification of network / node 780 failure. This failure recovery mechanism based on the presence 781 of backup paths is also suitable for gracefully addressing 782 scheduled maintenance requirements as described in 783 [I-D.decraene-bgp-graceful-shutdown-requirements]. 785 2. Multi-path load balancing for both IBGP and EBGP. 787 3. BGP control plane churn reduction both intra-domain and inter- 788 domain. 790 An important point to observe is that all of the above intra-domain 791 applications based on the use of reflector planes but are also 792 applicable in the inter-domain Internet exchange point examples. As 793 discussed in section 4.3 an internet exchange can conceptually deploy 794 shadow route server planes each responsible for distribution of an 795 Nth best path to its EBGP peers. In practice it may just equal to 796 new short configuration and establishment of new BGP sessions to IX 797 peers. 799 9. Security considerations 801 The new mechanism for diverse BGP path dissemination proposed in this 802 document does not introduce any new security concerns as compared to 803 base BGP4 specification [RFC4271] especially when compared against 804 full iBGP mesh topology. 806 In addition authors observe that all BGP security issues as described 807 in [RFC4272] do apply to the additional BGP session or sessions as 808 recommended by this specification. Therefor all recommended 809 mitigation techniques to BGP security are applicable here. 811 10. IANA Considerations 813 Following [RFC5226] authors declare that the new mechanism for 814 diverse BGP path dissemination does not require any new allocations 815 from IANA. 817 11. Contributors 819 The following people contributed significantly to the content of the 820 document: 822 Selma Yilmaz 823 Cisco Systems 824 170 West Tasman Drive 825 San Jose, CA 95134 826 US 827 Email: seyilmaz@cisco.com 829 Satish Mynam 830 Juniper Networks 831 1194 N. Mathilda Ave 832 Sunnyvale, CA 94089 833 US 834 Email: smynam@juniper.net 836 Isidor Kouvelas 837 Cisco Systems 838 170 West Tasman Drive 839 San Jose, CA 95134 840 US 841 Email: kouvelas@cisco.com 843 12. Acknowledgments 845 The authors would like to thank Bruno Decraene, Bart Peirens, Eric 846 Rosen, Jim Uttaro, Renwei Li, Wes George and Adrian Farrel for their 847 valuable input. 849 The authors would also like to express special thank you to number of 850 operators who helped to optimize the provided solution to be as close 851 as possible to their daily operational practices. Especially many 852 thx goes to Ted Seely, Shan Amante, Benson Schliesser and Seiichi 853 Kawamura. 855 13. References 857 13.1. Normative References 859 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 860 Protocol 4 (BGP-4)", RFC 4271, January 2006. 862 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 863 Reflection: An Alternative to Full Mesh Internal BGP 864 (IBGP)", RFC 4456, April 2006. 866 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 867 "Multiprotocol Extensions for BGP-4", RFC 4760, 868 January 2007. 870 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 871 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 872 May 2008. 874 13.2. Informative References 876 [I-D.decraene-bgp-graceful-shutdown-requirements] 877 Decraene, B., Francois, P., pelsser, c., Ahmad, Z., and A. 878 Armengol, "Requirements for the graceful shutdown of BGP 879 sessions", 880 draft-decraene-bgp-graceful-shutdown-requirements-01 (work 881 in progress), March 2009. 883 [I-D.ietf-idr-add-paths] 884 Walton, D., Chen, E., Retana, A., and J. Scudder, 885 "Advertisement of Multiple Paths in BGP", 886 draft-ietf-idr-add-paths-07 (work in progress), June 2012. 888 [I-D.ietf-idr-best-external] 889 Marques, P., Fernando, R., Chen, E., Mohapatra, P., and H. 891 Gredler, "Advertisement of the best external route in 892 BGP", draft-ietf-idr-best-external-05 (work in progress), 893 January 2012. 895 [I-D.pmohapat-idr-fast-conn-restore] 896 Mohapatra, P., Fernando, R., Filsfils, C., and R. Raszuk, 897 "Fast Connectivity Restoration Using BGP Add-path", 898 draft-pmohapat-idr-fast-conn-restore-02 (work in 899 progress), October 2011. 901 [I-D.raszuk-idr-ibgp-auto-mesh] 902 Raszuk, R., "IBGP Auto Mesh", 903 draft-raszuk-idr-ibgp-auto-mesh-00 (work in progress), 904 June 2003. 906 [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, 907 "Border Gateway Protocol (BGP) Persistent Route 908 Oscillation Condition", RFC 3345, August 2002. 910 [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", 911 RFC 4272, January 2006. 913 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 914 System Confederations for BGP", RFC 5065, August 2007. 916 Authors' Addresses 918 Robert Raszuk (editor) 919 NTT MCL 920 101 S Ellsworth Avenue Suite 350 921 San Mateo, CA 94401 922 US 924 Email: robert@raszuk.net 926 Rex Fernando 927 Cisco Systems 928 170 West Tasman Drive 929 San Jose, CA 95134 930 US 932 Email: rex@cisco.com 933 Keyur Patel 934 Cisco Systems 935 170 West Tasman Drive 936 San Jose, CA 95134 937 US 939 Email: keyupate@cisco.com 941 Danny McPherson 942 Verisign 943 21345 Ridgetop Circle 944 Dulles, VA 20166 945 US 947 Email: dmcpherson@verisign.com 949 Kenji Kumaki 950 KDDI Corporation 951 Garden Air Tower 952 Iidabashi, Chiyoda-ku, Tokyo 102-8460 953 Japan 955 Email: ke-kumaki@kddi.com