idnits 2.17.1 draft-ietf-idr-add-paths-guidelines-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 593 has weird spacing: '...ltipath selec...' -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 4271' is mentioned on line 143, but not defined == Missing Reference: 'RFC-2119' is mentioned on line 165, but not defined == Missing Reference: 'RFC4364' is mentioned on line 790, but not defined == Unused Reference: 'RFC2119' is defined on line 858, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 885, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group J. Uttaro 2 Internet Draft AT&T 3 Expires: July 27, 2014 P. Francois 4 Intended status: Standards Track IMDEA Networks 5 January 27, 2014 R. Fragassi 6 A. Simpson 7 Alcatel-Lucent 8 K. Patel 9 Cisco Systems 10 P. Mohapatra 11 Cumulus Networks 13 Best Practices for Advertisement of Multiple Paths in IBGP 14 draft-ietf-idr-add-paths-guidelines-06.txt 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html 37 This Internet-Draft will expire on Jul 27, 2014. 39 Copyright Notice 41 Copyright (c) 2012 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Abstract 56 Add-Paths is a BGP enhancement that allows a BGP router to advertise 57 multiple distinct paths for the same prefix/NLRI. This provides a 58 number of potential benefits, including reduced routing churn, faster 59 convergence and better loadsharing. 61 This document provides recommendations to implementers of Add-Paths 62 so that network operators have the tools needed to address their 63 specific applications and to manage the scalability impact of Add- 64 Paths. A router implementing Add-Paths may learn many paths for a 65 prefix and must decide which of these to advertise to peers. This 66 document analyses different algorithms for making this selection and 67 provides recommendations based on the target application. 69 Table of Contents 71 1. Introduction...................................................4 72 2. Terminology....................................................4 73 3. Add-Paths Applications.........................................5 74 3.1. Fast Connectivity Restoration.............................5 75 3.2. Load Balancing............................................7 76 3.3. Churn Reduction...........................................7 77 3.4. Suppression of MED-Related Persistent Route Oscillation...7 78 4. Implementation Guidelines......................................8 79 4.1. Capability Negotiation....................................8 80 4.2. Receiving Multiple Paths..................................9 81 4.3. Advertising Multiple Paths................................9 82 4.3.1. Path Selection Modes................................11 83 4.3.1.1. Advertise All Paths............................11 84 4.3.1.2. Advertise N Paths..............................12 85 4.3.1.3. Advertise All AS-Wide Best Paths...............12 86 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths 87 (Double AS Wide)........................................13 88 4.3.1.5. Advertise Used Multipaths......................14 89 4.3.2. Derived Modes from Bounding the Number of Advertised 90 Paths......................................................14 91 4.3.3. Derived Modes from Adding N More Paths..............15 92 5. Deployment Considerations.....................................15 93 5.1. Introducing Add-Paths into an Existing Network...........15 94 5.2. Scalability Considerations...............................18 95 5.3. Routing Consistency Considerations.......................18 96 5.4. Consistency between Advertised Paths and Forwarding Paths19 97 5.5. Routing Churn............................................20 98 6. Security Considerations.......................................20 99 7. Acknowledgments...............................................20 100 8. Contributors..................................................20 101 9. IANA Considerations...........................................20 102 10. References...................................................20 103 10.1. Normative References....................................20 104 10.2. Informative References..................................21 105 Appendix A. Other Path Selection Modes...........................22 106 A.1. Advertise Neighbor-AS Group Best Path....................22 107 A.2. Best LocPref/Second LocPref..............................22 108 A.3. Advertise Paths at decisive step -1......................23 110 1. Introduction 112 The BGP Add-Paths capability enhances current BGP implementations by 113 allowing a BGP router to exchange with its BGP peers more than one 114 path for the same destination/NLRI. The base BGP standard [RFC 4271] 115 does not provide for such a capability. If a BGP router learns 116 multiple paths for the same NLRI (from multiple peers), it selects 117 only one as its best path and advertises the best path to its peers. 118 The primary goal of Add-Paths is to increase the visibility of paths 119 within an iBGP system. This has the effect of improving robustness 120 in case of failure, reducing the number of BGP messages exchanged 121 during such an event, and offering the potential for faster re- 122 convergence. Through careful selection of the paths to be advertised, 123 Add-Paths can also prevent routing oscillations. 125 The purpose of this document is to provide the necessary 126 recommendations to the implementers of Add-Paths so that network 127 operators have the tools needed to address their specific 128 applications and to manage the scalability impact of Add-Paths while 129 maintaining routing consistency. A router implementing Add-Paths may 130 learn many paths for a prefix and must decide which of these to 131 advertise to peers. This document analyses different algorithms for 132 making this selection and provides recommendations based on the 133 target application. 135 2. Terminology 137 In this document the following terms are used: 139 Add-Paths peer: refers a peer with which the local system has agreed 140 to receive and/or send NLRI with path identifiers 142 Primary path: A path toward a prefix that is considered a best path 143 by the BGP decision process [RFC 4271] and actively used for 144 forwarding traffic to that prefix. A router may have multiple primary 145 paths for a prefix if it implements multipath. 147 Diverse path: A BGP path associated with a different BGP next-hop and 148 BGP router than some other set of paths. The BGP router associated 149 with a path is inferred from the ORIGINATOR_ID attribute or, if there 150 is none, the BGP Identifier of the peer that advertised the path. 152 Backup path: A diverse path with respect to the primary paths toward 153 a prefix. The backup path can be used to forward traffic to the 154 destination if the primary paths fail. 156 Optimal backup path: The backup path that will be selected as the new 157 best path for a prefix when all primary paths are removed/withdrawn. 159 AS-Wide preferred paths: All paths that are considered as best when 160 applying rules of the BGP decision process up to the IGP tie-break. 162 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 163 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 164 document are to be interpreted as described in [RFC-2119]. 166 3. Add-Paths Applications 168 [draft-pmohapat] presents the applications that would benefit from 169 multiple paths advertisement in iBGP. They are summarized in the 170 following subsections. 172 3.1. Fast Connectivity Restoration 174 With the dissemination of backup paths, fast connectivity restoration 175 and convergence can be achieved. If a router has a backup path, it 176 can directly select that path as best upon failure of the primary 177 path. This minimizes packet loss in the dataplane. Sending multiple 178 paths in iBGP allows routers to receive backup paths when path 179 visibility is not sufficient with classical BGP. This is especially 180 useful when Route Reflection is used. 182 Consider a network such as the one depicted in Figure 1 and suppose 183 that none of the routers support Add-Paths. AS1 receives from AS3 2 184 paths (A and B) to a particular destination XYZ. Suppose path A is 185 preferred over path B due to path A having a lower MED (multi-exit 186 discriminator). 188 AS1 uses a route reflector RR1 to reduce the scale of its IBGP mesh. 189 If the routers in AS1 are not configured for best-external then RR1 190 knows about only path A during steady state because router B 191 suppresses/withdraws its advertisement of path (B) to RR1. If the 192 routers in AS1 do support best-external then RR1 may have both paths 193 in its Adj-RIB-IN, but regardless of the best-external configuration 194 RR1 can only advertise its best path A to its peers, including router 195 D. 197 ======== ===================== 198 = +---+ +---+ +---+ 199 = |RTR|________|RTR| |RTR| 200 = | E | | A | | C | 201 = +---+Path A->+---+ AS1 +---+ 202 = = = \ / = 203 = = = \ / = 204 = = = \ / = 205 = = = \ / = 206 = AS3 = = +---+ = 207 = = = |RR | = 208 = = = | 1 | = 209 = = = +---+ = 210 = = = / \ = 211 = = = / \ = 212 = = = / \ = 213 = = = / \ = 214 = +---+Path B->+---+ +---+ 215 = |RTR| ______|RTR| |RTR| 216 = | F | | B | | D | 217 = +---+ +---+ +---+ 218 ======== ===================== 220 Figure 1: Example Topology 222 Under these circumstances consider the steps required to restore 223 traffic from router D to destination XYZ when the link between Router 224 A and Router E fails. (Assume that router A set next-hop to self when 225 advertising path A and that router B is not configured for best- 226 external). 228 1. Router A sends a BGP UPDATE message withdrawing its advertisement 229 of path (A). 231 2. RR1 receives the withdrawal, and propagates it to its other client 232 peers, routers B, C and D. 234 3. When router B receives the withdrawal of path (A) it reruns its 235 decision process and selects path (B) as its new best path. Router 236 B advertises path (B) to RR1. 238 4. RR1 reruns its decision process and selects path (B) as its new 239 best path. RR1 advertises path (B) to client peers A, C and D. 241 5. Router D reruns its decisions process, determines path (B) to be 242 the best path, and updates its forwarding table. After this step 243 traffic from router D to destination XYZ is restored (the traffic 244 path has changed from A to B). 246 With the use of Add-Paths, the convergence time for the above path 247 failure example can be reduced considerably. The main reason for the 248 improvement is that Add-Paths allows router D to be aware of more 249 than one path to destination XYZ prior to the failure of the best 250 path (A). In steady-state (with no failures) router B decides, as 251 before, that path (A) is its best path but because of its Add-Paths 252 (or best-external) configuration it also advertises path (B) to RR1. 253 Using Add-Paths RR1 can advertise both learned paths to its IBGP 254 peers, including router D. Now consider again the scenario where the 255 link between Router A and Router E fails. In this case, with Add- 256 Paths, fewer steps are required to achieve re-convergence: 258 1. Router A sends a BGP UPDATE message withdrawing its advertisement 259 of path (A). 261 2. RR1 receives the withdrawal, and propagates it to its other client 262 peers, routers B, C and D. 264 3. Router D receives the withdrawal, reruns the decision process and 265 updates the forwarding entry for destination XYZ. 267 3.2. Load Balancing 269 Increased path diversity allows routers to install several paths in 270 their forwarding tables in order to load balance traffic across those 271 paths. 273 3.3. Churn Reduction 275 When Add-Paths is used in an AS, the availability of additional 276 backup paths means failures can be recovered locally with much less 277 path exploration in iBGP and therefore less updates disseminated in 278 eBGP. When the preferred backup path is the post-convergence path, 279 churn is minimized. 281 3.4. Suppression of MED-Related Persistent Route Oscillation 283 As described in [oscillation], Add-Paths is a valuable tool in 284 helping to stop persistent route oscillations caused by comparison of 285 paths based on MED in topologies where route reflectors or the 286 confederation structure hide some paths. With the appropriate path 287 selection algorithm Add-Paths stops these route oscillations because 288 the same set of paths are consistently advertised by the route 289 reflector or the confederation border router and the routers 290 receiving this set of paths make stable routing decisions about the 291 best path. 293 4. Implementation Guidelines 295 This section discusses recommendations for the implementation of Add- 296 Paths. The following topics are addressed: 298 . Considerations related to Add-Paths capability negotiation 300 . Receiving BGP routes from Add-Paths peers 302 . Advertising BGP routes to Add-Paths peers. This section 303 discusses various path selection algorithms, which are the 304 procedures available to an Add-Paths speaker for deciding which 305 set of paths to advertise to an Add-Paths peer for particular 306 prefixes. 308 4.1. Capability Negotiation 310 +---+ +---+ 311 |RTR|___________|RTR| 312 | A | <-BGP-> | B | 313 +---+ +---+ 315 Figure 2: BGP Peering Example 317 In Figure 2, in order for a router A to receive multiple paths per 318 NLRI from peer B, for a particular address family (AFI=x, SAFI=y), 319 the BGP capabilities advertisements during session setup must 320 indicate that peer B wants to send multiple paths for AFI=x, SAFI=y 321 and that router A is willing to receive multiple paths for AFI=x, 322 SAFI=y. Similarly, in order for router A to send multiple paths per 323 NLRI to peer B, for a particular address family (AFI=x, SAFI=y), the 324 BGP capabilities advertisements must indicate that router A wants to 325 send multiple paths for AFI=x, SAFI=y and peer B is willing to 326 receive multiple paths for AFI=x, SAFI=y. Refer to [Add-Paths] for 327 details of the Add-Paths capabilities advertisement. 329 The capabilities of the local router MUST be configurable per peer 330 and per address family, and SHOULD support the ability to configure 331 send-only operation or receive-only operation. The default mode of 332 operation shall be to both send and receive. 334 4.2. Receiving Multiple Paths 336 Currently, per standard BGP behavior, if a BGP router receives an 337 advertisement of an NLRI and path from a specific peer and that peer 338 subsequently advertises the same NLRI with different path information 339 (e.g. a different NEXT_HOP and/or different path attributes) the new 340 path effectively overwrites the existing path. 342 When Add-Paths has been negotiated with the peer, the newly 343 advertised path should be stored in the RIB-IN along with all of the 344 paths previously advertised (and not withdrawn) by the peer. 346 When an Add-Paths speaker has negotiated to receive multiple paths 347 for (AFIx, SAFIy) from a peer all advertisements and withdrawals of 348 NLRI within that address family from that peer MUST include a path 349 identifier, as described in [Add-Paths]. The path identifiers have no 350 significance to the receiving peer. If the combination of NLRI and 351 path identifier in an advertisement from a peer is unique (does not 352 match an existing route in the RIB-IN from that peer) then the route 353 is added to the RIB-IN. If the combination of NLRI and path 354 identifier in a received advertisement is the same as an existing 355 route in the RIB-IN from the peer then the new route replaces the 356 existing one. If the combination of NLRI and path identifier in a 357 received withdrawal matches an existing route in the RIB-IN from the 358 peer then that route shall be removed from the RIB-IN. 360 A BGP UPDATE message from an Add-Paths peer may advertise and 361 withdraw more than one NLRI belonging to one or more address 362 families. In this case Add-Paths may be supported for some of the 363 address families and not others. In this situation the receiving BGP 364 router should not expect that all of the path identifiers in the 365 UPDATE message will be the same. 367 4.3. Advertising Multiple Paths 369 [Add-Paths] specifies how to encode the advertisement of multiple 370 paths towards the same NLRI over an iBGP session, but provides no 371 details about which set of multiple paths should be advertised. In 372 this section, four path selection algorithms are described and 373 compared with each other. These 4 algorithms are considered to be the 374 most useful across the widest range of deployment scenarios. The list 375 of possible path selection algorithms is much larger and for the 376 interested reader Appendix A provides information about other path 377 selection modes that were considered in historical versions of this 378 document. 380 In comparing any two path selection algorithms the following factors 381 should be taken into account: 383 Control Plane Load: When a router receives multiples paths for a 384 prefix from an iBGP client it has to store more paths in its Adj-Rib- 385 Ins. 387 Control Plane Stress: Coping with multiple iBGP paths has two 388 implications on the computation that a router has to handle. First, 389 it has to compute the paths to send to its peers, i.e. more than the 390 best path. Second, it also has to handle the potential churn related 391 to the exchange of those multiple paths. 393 MED/IGP oscillations: BGP sometimes suffers from routing oscillations 394 when the physical topology differs from the logical topology, or when 395 the MED attribute is used. This is due to the limited path 396 visibility when a single path is advertised and Route Reflection is 397 used. Increasing the path visibility by advertising multiple paths 398 can help solve this issue. 400 Path optimality: When a single path is advertised, border routers do 401 not always receive the optimal path. As an example, Route Reflectors 402 typically send a single path chosen based on their own IGP tie-break 403 (although modifications to this are proposed in [BGP-ORR]). 404 Increasing path visibility would also help routers to learn the path 405 that is best suited for them w.r.t. the IGP tie-break. 407 Backup path optimality: Multiple paths advertisement gives routers 408 the opportunity to have a backup path. However, some backup paths 409 are better than others. Indeed, when a link failure occurs, if a 410 router already knows its post-convergence path, the BGP re- 411 convergence is straightforward and traffic is less impacted by the 412 transient use of non-best forwarding paths. 414 Convergence time: Advertising multiple paths in iBGP has an impact on 415 the convergence time of the BGP system. More paths need to be 416 exchanged, but on the other hand, the routing information is 417 propagated faster. With an increased path visibility, there is less 418 path exploration during the convergence. Also, with the availability 419 of backup paths, convergence time in case of failure is also reduced. 421 Target application: Depending on the application type, the number of 422 paths to advertise for a prefix will vary. For example, for fast 423 connectivity restoration, it may be sufficient to advertise only 2 424 paths to a peer so that it will have the best path and the optimal 425 backup path. For load balancing purposes, it may be desirable to 426 advertise more paths, but inclusion of the optimal backup path in the 427 set may be less critical. For route oscillation elimination, it is 428 required to advertise all group-best paths for a prefix. 430 4.3.1. Path Selection Modes 432 The following subsections describe the 4 main path selection modes 433 considered in this draft. Each mode is considered either MANDATORY or 434 OPTIONAL. A MANDATORY mode MUST be supported by any implementation 435 that claims compliance with this document. An OPTIONAL made may be 436 supported by some but not all implementations. 438 The path selection mode and any parameters applicable to the mode 439 MUST be configurable per AFI/SAFI and per peer and SHOULD be 440 configurable per prefix. To illustrate the value of this flexibility, 441 consider a prefix P that belongs to an address family F requiring 442 path IDs to be included with every NLRI (e.g. due to the Add-Paths 443 capability negotiation with the peer). If P is one of a number of 444 prefixes that would not benefit from the advertisement of multiple 445 paths then it is perfectly valid to send only the best path. 447 4.3.1.1. Advertise All Paths 449 A simple rule for advertising multiple paths in iBGP is to advertise 450 to iBGP peers all received paths minus those blocked by export 451 filters or applicable split horizon rules. This solution is easy to 452 implement, but the counterpart is that all those paths need to be 453 stored by all routers that receive them, which can be quite 454 expensive. If a path to a prefix P is advertised to N border 455 routers, with a Full Mesh of iBGP sessions, all routers have N paths 456 in their Adj-RIB-Ins. If Route Reflection is used and each client is 457 connected to 2 Route Reflectors, it may learn up to 2*N paths. 459 This solution gives a perfect path visibility to all routers, thus 460 limiting churn and losses of connectivity in case of failure. Indeed, 461 this allows routers to select their optimal primary path, and to 462 switch on their optimal backup path in case of failure. 464 However, as more paths are exchanged, the number of BGP messages 465 disseminated during the initial iBGP convergence can be high, and 466 convergence may be slower. 468 Routing oscillations are prevented with this rule, because a router 469 won't need to withdraw a previously advertised path when its best 470 path changes. 472 This path selection mode is OPTIONAL. 474 4.3.1.2. Advertise N Paths 476 Another solution is for a router to advertise a maximum of N paths to 477 iBGP peers. Here, the computational cost is the selection of the N 478 paths. Indeed, there must be a ranking of the paths in order to 479 advertise the most interesting ones. A way for a router to select N 480 paths is to run N times its decision process. At each iteration of 481 the process only those paths not selected during a previous iteration 482 and those with a different NEXT_HOP and BGP Identifier (or Originator 483 ID) combination from previously-selected paths are eligible for 484 consideration. In this mode the paths actually advertised to a peer 485 are the eligible paths (up to N) minus those blocked by export 486 filters or applicable split horizon rules. The memory cost is 487 bounded: a router receives a maximum of N paths for each prefix from 488 each peer. With N equal to 2, all routers know at least two paths and 489 can provide local recovery in case of failure. If multipath routing 490 is to be deployed in the AS, N can be increased to provide more 491 alternate paths to the routers. 493 Path optimality and backup path optimality are not guaranteed, i.e. 494 it is possible that the optimal path of a router (w.r.t. IGP tie- 495 break) is not contained in the set of paths advertised by its Route 496 Reflector. However, as the number of paths that it receives is higher 497 than without Add-Paths, it is possible that the chosen nexthop is 498 closer to the router in terms of IGP cost than the nexthop that would 499 have been chosen without Add-Paths. 501 This solution helps to reduce routing oscillations, but not in all 502 cases. Indeed, path visibility is still constrained by the maximum 503 number of paths, and configurations with routing oscillations still 504 exist. 506 This path selection mode is MANDATORY. The default value of N MUST be 507 2. The value of N MUST be configurable and MAY be upper bounded by 508 an implementation. 510 The default value of 2 ensures the availability of a backup path (if 511 2 or more paths have been received) while maintaining minimum impact 512 to memory and churn. If Add-N with N equal to 2 is insufficient to 513 meet another objective (e.g. loadsharing or MED/IGP oscillation) 514 there is always a large enough value of N that can selected, if N is 515 configurable, to meet that objective. 517 4.3.1.3. Advertise All AS-Wide Best Paths 519 Another choice is to consider the set of paths with the same AS-wide 520 preference [Basu-ibgp-osc], i.e. the paths that all routers would 521 select based on the rules of the decision process that are not 522 router-dependent (i.e. Local-preference, ASPath length and MED 523 rules). Thus, for a given router, those paths only differ by the IGP 524 cost to the nexthop or by the tie-breaking rules. The paths actually 525 advertised to a peer are the set of AS-wide best paths minus those 526 blocked by export filters or applicable split horizon rules. 528 The computational cost is reduced, as a router only has to send the 529 paths remaining before applying the IGP tie-breaking rule. However, 530 it is difficult to predict how many paths will be stored, as it 531 depends on the number of eBGP sessions on which this prefix is 532 advertised with the best AS-wide preference. 534 With this rule, the routing system is optimal: all routers can choose 535 their best path (or best paths if multipath is used) based on their 536 router-specific preferences, i.e. the IGP cost to the nexthop. Hot 537 potato routing is respected. Also, MED oscillations are prevented, 538 because the path visibility among the AS-wide preferred paths is 539 total. 541 The existence of a backup path is not guaranteed. If only one path 542 with the AS-wide best attributes exists, there is no backup path 543 disseminated. However, if such a path exists, it is optimal as it 544 has the same AS-wide preference as the primary 546 This path selection mode is OPTIONAL. 548 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths (Double 549 AS Wide) 551 This variant of "Advertise All AS Wide Best Paths" trades-off the 552 number of paths being propagated within the iBGP system for post- 553 convergence alternate paths availability and routing stability. A BGP 554 speaker running this mode will select, as candidates for 555 advertisement, its AS Wide Best paths, plus all the AS Wide Best 556 paths obtained when removing the first ones from consideration. The 557 paths actually advertised to a peer are the double-AS_wide candidate 558 paths minus those blocked by export filters or applicable split 559 horizon rules. 561 Under this mode, a BGP speaker knows multiple AS-Wide best paths or 562 the AS-Wide best path and all the second AS-Wide best paths, so that 563 routing optimality and backup path availability are ensured. Note 564 that the post-convergence paths will be known by each BGP node in an 565 AS supporting this mode. 567 The computation complexity of this mode is relatively low as it 568 requires to run the usual BGP Decision Process up to and including 569 the MED rule. The set of paths remaining after that step form the AS- 570 Wide best paths. Next, a best path selection algorithm is run up to 571 and including the MED rule, based on the paths that are not in the 572 set of AS-Wide best paths. 574 The number of paths for a prefix p, known by a given router of the 575 AS, is the number of AS-Wide best and second AS-Wide best paths found 576 at the Borders of the AS. 578 MED Oscillations are avoided by this mode, both for the primary and 579 alternate paths being picked under this mode. 581 This path selection mode is OPTIONAL. 583 4.3.1.5. Advertise Used Multipaths 585 Many BGP implementations support BGP Multipath, allowing a BGP router 586 to use multiple BGP next-hops for forwarding towards a prefix/NLRI 587 when the corresponding paths are considered equally preferred. In 588 cases where the deployment of Add-Paths is mostly aimed at providing 589 multiple paths for load balancing with BGP Multipath, a natural 590 approach for a BGP speaker supporting Add-paths is to advertise the 591 paths that are selected by its BGP multipath selection algorithm. 593 BGP Multipath selection algorithms can vary depending on the 594 implementation and configuration options. An Add-Paths mode based on 595 BGP multipath is considered practical because it lets the BGP path 596 propagation be aligned with the load balancing objectives expressed 597 by the operator configuring BGP multipath. 599 In some deployment scenarios, it is likely that such a mode leads to 600 the selection and advertisement of a large number of paths for some 601 NLRI, and hence should be controlled as per the mechanism described 602 in section 4.3.2. In case the number of multipaths exceeds the upper 603 bound on the number of advertised paths the ones that should be 604 advertised are those with the highest degree of preference by the BGP 605 decision process. This can be achieved if the advertising router has 606 strictly ordered all of its paths. 608 This path selection mode is OPTIONAL. 610 4.3.2. Derived Modes from Bounding the Number of Advertised Paths 612 For some of the modes discussed in section 4.3.1 the number of paths 613 selected by the algorithm (M) is not predictable in advance, and 614 depends on factors such as network topology. For such modes, 615 implementations MAY support the ability to limit the number of 616 advertised paths to some value N that is less than M. 618 It must be noted that the resulting derivative mode may no longer 619 meet the properties stated in section 4.3.1 (which assumes N=M). This 620 is particularly true for the MED oscillation avoidance property. The 621 use of such bounds thus needs to be considered carefully in 622 deployments where MED oscillation avoidance is a key goal of 623 deploying Add-path. If fast recovery is the main objective then it is 624 reasonable and sufficient to set N to 2. If the main goal is 625 improved load-balancing then limiting N to number of ECMP paths 626 supported by the forwarding planes of the receiving routers is also a 627 reasonable practice. 629 4.3.3. Derived Modes from Adding N More Paths 631 Some modes discussed in section 4.3.1 may result in only one or a few 632 selected paths, depending on network topology and/or router 633 configuration, and this small number of paths may not meet minimum 634 requirements for backup path or load balancing purposes. When using 635 such modes implementations MAY support the ability to add N more 636 paths to the set returned by the basic selection algorithm as 637 described in section 4.3.1. The N more paths should be the N next- 638 best paths, as determined by the BGP decision process. 640 It must be noted that the resulting derivative mode may no longer 641 meet the properties stated in section 4.3.1 (which assumes N=0). 643 5. Deployment Considerations 645 This section proposes a potential strategy for introducing Add-Paths 646 into an existing network and discusses considerations related to 647 scalability, routing consistency and routing churn. 649 5.1. Introducing Add-Paths into an Existing Network 651 There are many possible ways that Add-Paths can be introduced into an 652 existing deployed network. It is not a practical goal for this 653 document to list all of these options and discuss the pros and cons 654 for each one. It is however valuable to consider an example migration 655 strategy that may be relatively common among layer 3 service 656 providers that currently use route reflectors for scaling. This 657 example migration strategy is attractive for several reasons: 659 1. It involves incremental steps that allow the impact of Add- 660 Paths to be carefully evaluated before proceeding to the next 661 step. 663 2. It recognizes the fact that many routers will require at least 664 a software upgrade to support Add-Paths, and it will not be 665 practical to upgrade all of these routers all at once. 667 3. It reduces convergence time (in stages) with a relatively 668 moderate increase in router memory and CPU demands. 670 The example migration strategy assumes a starting point of a deployed 671 network with one or more RR clusters. None of the routers in the 672 network support Add-Paths without an upgrade, but some do support 673 best-external. Two of the clusters in this network are shown in 674 Figure 3. In cluster 2, PE1, PE2, RRy and RRz are configured for 675 best-external. This makes RRy and RRz aware of all external paths 676 received by PEs in cluster 2 and ensures that RRy and RRz can 677 advertise a path to the RRs in cluster 1 if it happens that the best 678 overall route is learned from cluster 1. It doesn't however allow 679 other clusters to be aware of more than one path per prefix learned 680 by cluster 2. 682 ========== ================== 683 = = = = 684 = +---+ +---+ +---+ = 685 = |RR |---------------|RR | <-BE| | = 686 = |a | |y |------|PE1| = 687 = | | | | | | = 688 = +---+ +---+ +---+ = 689 = | = = | \ / = 690 = | = = | \ / = 691 = | = = | \/ = 692 = | = = | /\ = 693 = | = = | / \ = 694 = | = = | / \ = 695 = +---+ +---+ +---+ = 696 = |RR |---------------|RR |------| | = 697 = |b | |z | <-BE|PE2| = 698 = | | | | | | = 699 = +---+ +---+ +---+ = 700 = = = = 701 ========== ================== 702 RR Cluster 1 RR Cluster 2 704 Figure 3: RR Cluster Before Add-Paths 706 The following sequence of steps occurs in the example migration 707 strategy: 709 1. The route reflectors are upgraded in each cluster, one by one, to 710 support Add-Paths. This allows the intra- and (eventually) inter- 711 cluster RR-to-RR sessions to start using Add-Paths. All RRs are 712 configured to use the Add-N, N=2 path selection algorithm. The 713 effect of this step is to slightly reduce convergence time when 714 the best and second-best paths for a prefix are learned by a 715 single cluster (such as cluster 2 in Figure 3). 717 2. The clients are upgraded in each cluster, one by one, to support 718 Add-Paths. On the RRs Add-Paths is configured to use the Add-N, 719 N=2 path selection algorithm towards upgraded client peers. At 720 this step clients are configured in the receive-only Add-Paths 721 mode. This means that best-external continues to operate as 722 before in the client-to-RR direction. The effect of this step is 723 to ensure that all clients have two paths per prefix for ECMP or 724 fast failover, assuming at least 2 paths are available. 726 3. The clients are re-configured to use Add-Paths in the transmit 727 direction towards their RR peers. This causes Add-Paths to replace 728 the best-external behavior. The effect of this step is to free up 729 CPU and memory resources related to the storage of paths that are 730 third best or worse. If a cluster such as the one in Figure 3 had 731 50 clients, and 10 of these learned an external route for the same 732 prefix, then the RRs in that cluster would need to store up to 12 733 paths for that prefix. This would be true even if the 2 best 734 overall paths came from another cluster. Contrast this with the 735 use of Add-Paths in the client-to-RR direction. For the same case 736 the route reflectors need only store the 2 paths learned from non- 737 client peers. 739 5.2. Scalability Considerations 741 In terms of scalability, we note that advertising multiple paths per 742 prefix requires more memory and state than the current behavior of 743 advertising the best path only. A BGP speaker that does not implement 744 Add-Paths maintains send state information in its prefix data 745 structure per neighbor as a way to determine that the prefix has been 746 advertised to the neighbor. With Add-Paths, this information has to 747 be replicated on a per path basis that needs to be advertised. 748 Mathematically, if "send state" size per prefix is 's' bytes, number 749 of neighbors is 'n', and number of paths being advertised is 'p', 750 then the current memory requirement for BGP "send state" = n * s 751 bytes; with Add-Paths, it becomes n * s * p bytes. In practice, this 752 value may be reduced with implementation optimizations similar to 753 attribute sharing. Receiving multiple paths per prefix also requires 754 more memory and state since each path is a separate entry in the Adj- 755 RIB-Ins. 757 5.3. Routing Consistency Considerations 759 As discussed in previous sections Add-Paths can help routers select 760 more optimal paths and it can help deal with certain route 761 oscillation conditions arising from incomplete knowledge of the 762 available paths. But depending on the path selection algorithm and 763 how it is used Add-Paths is not immune to its own cases of routing 764 inconsistencies. If the BGP routers within an AS do not make 765 consistent routing decisions about how to reach a particular 766 destination, route oscillations may occur and these route 767 oscillations may result in traffic loss. 769 Optimizing an Add-Paths deployment for scalability may run counter to 770 routing consistency goals, and in these circumstances operators have 771 to decide the correct tradeoff for their particular deployment. For 772 example the Advertise All Paths mode, if applied to many prefixes, is 773 far from ideal from a scalability perspective but it does guarantee 774 routing consistency and correctness. A path selection mode that 775 allows better control over scalability is the Advertise N paths mode, 776 but this is susceptible to routing inconsistency. First, if the N 777 paths do not include the best path from each neighbor AS group then 778 route oscillation cannot be precluded. Second, if the advertising 779 router (e.g. an RR) advertises N paths to peer_n and M paths to 780 peer_m, and N < M, care must be exercised to ensure that all paths 781 advertised to peer_n are included in the paths advertised to peer_m. 782 This can be assured as long as the advertising router has strictly 783 ordered all of its paths. 785 5.4. Consistency between Advertised Paths and Forwarding Paths 787 When using Add-Paths, routers may advertise paths that they have not 788 selected as best, and that they are thus not using for traffic 789 forwarding. This is generally not an issue if encapsulation is used 790 in the AS as described in [RFC4364] and all forwarding decisions, 791 including by the tunnel egress router, are based on label information 792 - i.e. if only the ingress router performs an IP FIB lookup. In this 793 situation the dataplane path followed by the packets is the one 794 intended by the ingress router, and corresponds to the control plane 795 path it selected. 797 On the other hand, if Add-Paths is used in a network without 798 encapsulation, some scenarios can result in forwarding deflection or 799 loops. Such forwarding anomalies already occur without Add-Paths, 800 when the routers on the forwarding path do not have a synchronized 801 view of the best path. They will deflect the traffic to their own 802 local view of the best path, and, when multiple deflections occur, 803 forwarding loops can occur. With Add-Paths, the issue can be 804 exacerbated due to routers advertising non-best paths. As discussed 805 above, encapsulation can help with this issue, but only to the extent 806 that it allows downstream routers to forward without an IP FIB 807 lookup. 809 A first example of such issue is when the Local-Pref of non-primary 810 paths received over iBGP sessions is modified. The ingress router 811 may thus select as best a path non-preferred by the egress, and the 812 egress router will thus deflect the traffic. 814 Another example is when the best path is selected based on tie- 815 breaking rule. When the ingress and the egress base their path 816 selection on the router-id of the neighbor that advertised the path 817 to them, the result may be different for each of them. This specific 818 issue is described and solved in [draft-pmohapat]. 820 5.5. Routing Churn 822 As noted in section 3.3 using Add-Paths between IBGP peers can help 823 to reduce routing churn with EBGP peers. This benefit does however 824 come at the cost of potentially increased churn between the IBGP Add- 825 Paths peers. In a non Add-Paths deployment a change in the preference 826 order of non-best paths requires no updates to be sent to peers. But 827 when a router has Add-Paths peers changes in non-best path preference 828 may no longer be invisible and increased route churn may be 829 observable. Choosing the right path selection mode and parameters - 830 for example not setting N unnecessarily large in the Add-N mode, is 831 important to minimizing this additional churn. 833 6. Security Considerations 835 TBD 837 7. Acknowledgments 839 This document was prepared using 2-Word-v2.0.template.dot. 841 8. Contributors 843 Virginie Van den Schrieck 844 Email: v.vandenschrieck@gmail.com 846 Rohit Gupta 847 Apple Inc 848 Email: rxgupta@apple.com 850 9. IANA Considerations 852 TBD 854 10. References 856 10.1. Normative References 858 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 859 Requirement Levels", BCP 14, RFC 2119, March 1997. 861 10.2. Informative References 863 [Add-Paths] Walton, D., Retana, A., Chen E., Scudder J., 864 "Advertisement of Multiple Paths in BGP", draft- 865 ietf-idr-add-paths-07, June 17, 2012. 867 [draft-pmohapat] Mohapatra, P., Fernando, R., Filsfils, C., and R. 868 Raszuk, "Fast Connectivity Restoration Using BGP 869 Add-path", draft-pmohapat-idr-fast-conn-restore- 870 02.txt, Oct 3, 2011. 872 [oscillation] Walton, D., Retana, A., Chen, E., Scudder, J., "BGP 873 Persistent Route Oscillation Solutions", draft- 874 walton-bgp-route-oscillation-stop-06.txt, June 14, 875 2012. 877 [Basu-ibgp-osc] Basu, A., Ong, C., Rasala, A., Sheperd, B., and G. 878 Wilfong, "Route oscillations in iBGP with Route 879 Reflection", Sigcomm 2002. 881 [BGP-ORR] Raszuk, R., Cassar, C., Aman, E., Decraene, B., "BGP 882 Optimal Route Reflection", draft-raszuk-bgp-optimal- 883 route-reflection-01, March 11, 2011. 885 [RFC4271] Rekhter, Y., Li, T., Hares, S., "A Border Gateway 886 Protocol 4 (BGP-4), January 2006. 888 Appendix A. Other Path Selection Modes 890 A.1. Advertise Neighbor-AS Group Best Path 892 [walton-osc] proposes that a router groups its paths based on the 893 neighbor AS from which it was learned, and to advertise the best path 894 in each of those groups. 896 The control plane stress induced by this solution is the computation 897 of the per-neighbor path group, and the application of the decision 898 process to each of them. The Control-Plane load is bounded by the 899 number of neighboring ASes advertising a prefix, which cannot be 900 known a-priori. 902 Path optimality and backup path optimality are not guaranteed, as the 903 paths advertised are not all the AS-wide preferred paths. Backup path 904 availability is not guaranteed. Indeed, if only one AS advertises 905 this prefix, even on multiple eBGP sessions, only one of the paths 906 may be selected and advertised. 908 A.2. Best LocPref/Second LocPref 910 This selection method consists in grouping the paths by Local 911 Preference. A router sends to its peers all paths with the highest 912 Local Preference. If there is only a single path with the highest 913 Local Preference, it also sends all paths with the second best Local 914 Preference. 916 This method ensures that all routers know all paths with the best 917 local preference. As local preference are often related to the type 918 of peering of the peer the path comes from, this ensures that in case 919 of failure, routers have a backup path of equivalent quality. This 920 prevents for example that a router switches temporarily on a peer 921 path while an alternate path from a customer is available but hidden 922 at the border of the AS. Such a situation could result in a 923 temporary withdrawal of the prefix on some eBGP sessions when the 924 router selects the path via the peer. 926 The advertisement of the Second Local Preference occurs when there is 927 no alternate path with the same quality as the best path. This way, 928 fast convergence is still ensured. Backup path is optimal, as it has 929 the second AS-Wide preference, which becomes the AS-wide best 930 preference upon failure of the primary one. 932 Sending all the paths with a given Local Preference also has a 933 positive impact on routing optimality. Indeed, this allows border 934 routers to have an increased path visibility and to choose their best 935 path based on their own criteria. 937 The computational cost of this solution is reduced when there are 938 several paths with the best local preference. In this case, it is 939 sufficient to stop the decision process after the first rule to have 940 the set of paths to be advertised. When it is necessary to advertise 941 the paths with second local-preference, the additional cost is to 942 apply a second time the first rule of the decision process, which is 943 still reasonable. The memory cost depends on the number of paths 944 with the best local preference. 946 A.3. Advertise Paths at decisive step -1 948 When the goal is to provide fast recovery by advertising candidate 949 post-reconvergence paths, one can choose to stop the decision process 950 just before the step where only one path remains. If the decision 951 process comes to IGP tie-break, all remaining paths are advertised. 952 This way, routers advertise as many paths as possible with a quality 953 as similar as possible. 955 This path selection is an intermediary solution between the two 956 preceding ones. Here, instead of stopping the decision process at 957 the local preference step or the IGP step, we stop it before the rule 958 that removes the best potential backup paths. This way, we minimize 959 the number of paths to advertise while guaranteeing the presence of a 960 backup path. Primary and backup path optimality is ensured, as all 961 paths with the same AS-wide preference as the best paths are included 962 in the set of paths advertised. 964 Authors' Addresses 966 Jim Uttaro 967 AT&T 968 200 S. Laurel Avenue 969 Middletown, NJ 07748 USA 970 Email: uttaro@att.com 972 Pierre Francois 973 IMDEA Networks 974 Avenida del Mar Mediterraneo 975 Leganes 28919 976 Spain 977 Email: pierre.francois@imdea.org 979 Pradosh Mohapatra 980 Cumulus Networks 981 pmohapat@cumulusnetworks.com 983 Roberto Fragassi 984 Alcatel-Lucent 985 600 Mountain Avenue 986 Murray Hill, New Jersey 987 Email: roberto.fragassi@alcatel-lucent.com 989 Adam Simpson 990 Alcatel-Lucent 991 600 March Road 992 Ottawa, Ontario K2K 2E6 993 Canada 994 Email: adam.simpson@alcatel-lucent.com 996 Keyur Patel 997 Cisco Systems 998 170 W. Tasman Drive 999 San Jose, CA 95134 USA 1000 Email: keyupate@cisco.com