idnits 2.17.1 draft-ietf-idr-add-paths-guidelines-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 5 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 611 has weird spacing: '...ltipath selec...' -- The document date (Dec 3, 2014) is 3424 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 4271' is mentioned on line 153, but not defined == Missing Reference: 'RFC-2119' is mentioned on line 175, but not defined == Missing Reference: 'RFC4364' is mentioned on line 808, but not defined == Unused Reference: 'RFC2119' is defined on line 912, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 939, but no explicit reference was found in the text == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-07 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IDR Working Group J. Uttaro 2 Internet-Draft AT&T 3 Intended status: Standards Track 4 Expires: Jun 3, 2015 P. Francois 5 IMDEA Networks 7 K. Patel 8 Cisco Systems 10 P. Mohapatra 11 Cumulus Networks 13 J. Haas 14 Juniper Networks 16 A. Simpson 17 R. Fragassi 18 Alcatel-Lucent 20 Dec 3, 2014 22 Best Practices for Advertisement of Multiple Paths in IBGP 23 draft-ietf-idr-add-paths-guidelines-07.txt 25 Status of this Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as Internet- 33 Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/ietf/1id-abstracts.txt 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html 46 This Internet-Draft will expire on Jun 3, 2015. 48 Copyright Notice 50 Copyright (c) 2012 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Abstract 65 Add-Paths is a BGP enhancement that allows a BGP router to advertise 66 multiple distinct paths for the same prefix/NLRI. This provides a 67 number of potential benefits, including reduced routing churn, faster 68 convergence and better loadsharing. 70 This document provides recommendations to implementers of Add-Paths 71 so that network operators have the tools needed to address their 72 specific applications and to manage the scalability impact of Add- 73 Paths. A router implementing Add-Paths may learn many paths for a 74 prefix and must decide which of these to advertise to peers. This 75 document analyses different algorithms for making this selection and 76 provides recommendations based on the target application. 78 Table of Contents 80 1. Introduction...................................................4 81 2. Terminology....................................................4 82 3. Add-Paths Applications.........................................5 83 3.1. Fast Connectivity Restoration.............................5 84 3.2. Load Balancing............................................7 85 3.3. Churn Reduction...........................................7 86 3.4. Suppression of MED-Related Persistent Route Oscillation...7 87 4. Implementation Guidelines......................................8 88 4.1. Capability Advertisement..................................8 89 4.2. Receiving Multiple Paths..................................9 90 4.3. Advertising Multiple Paths................................9 91 4.3.1. Path Selection Modes................................11 92 4.3.1.1. Advertise N Paths..............................11 93 4.3.1.2. Advertise All Paths............................12 94 4.3.1.3. Advertise All AS-Wide Best Paths...............13 95 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths 96 (Double AS Wide)........................................13 97 4.3.1.5. Advertise Used Multipaths......................14 98 4.3.2. Derived Modes from Bounding the Number of Advertised 99 Paths......................................................15 100 4.3.3. Derived Modes from Adding N More Paths..............15 101 5. Deployment Considerations.....................................15 102 5.1. Introducing Add-Paths into an Existing Network...........16 103 5.2. Scalability Considerations...............................18 104 5.3. Routing Consistency Considerations.......................18 105 5.4. Consistency between Advertised Paths and Forwarding Paths19 106 5.5. Interactions with Route Filtering........................20 107 5.6. Routing Churn............................................20 108 6. Security Considerations.......................................21 109 7. Acknowledgments...............................................21 110 8. Contributors..................................................21 111 9. IANA Considerations...........................................21 112 10. References...................................................21 113 10.1. Normative References....................................21 114 10.2. Informative References..................................21 115 Appendix A. Other Path Selection Modes...........................23 116 A.1. Advertise Neighbor-AS Group Best Path....................23 117 A.2. Best LocPref/Second LocPref..............................23 118 A.3. Advertise Paths at decisive step -1......................24 120 1. Introduction 122 The BGP Add-Paths capability enhances current BGP implementations by 123 allowing a BGP router to exchange with its BGP peers more than one 124 path for the same destination/NLRI. The base BGP standard [RFC 4271] 125 does not provide for such a capability. If a BGP router learns 126 multiple paths for the same NLRI (from multiple peers), it selects 127 only one as its best path and advertises the best path to its peers. 128 The primary goal of Add-Paths is to increase the visibility of paths 129 within an iBGP system. This has the effect of improving robustness 130 in case of failure, reducing the number of BGP messages exchanged 131 during such an event, and offering the potential for faster re- 132 convergence. Through careful selection of the paths to be advertised, 133 Add-Paths can also prevent routing oscillations. 135 The purpose of this document is to provide the necessary 136 recommendations to the implementers of Add-Paths so that network 137 operators have the tools needed to address their specific 138 applications and to manage the scalability impact of Add-Paths while 139 maintaining routing consistency. A router implementing Add-Paths may 140 learn many paths for a prefix and must decide which of these to 141 advertise to peers. This document analyses different algorithms for 142 making this selection and provides recommendations based on the 143 target application. 145 2. Terminology 147 In this document the following terms are used: 149 Add-Paths peer: refers a peer with which the local system has agreed 150 to receive and/or send NLRI with path identifiers 152 Primary path: A path toward a prefix that is considered a best path 153 by the BGP decision process [RFC 4271] and actively used for 154 forwarding traffic to that prefix. A router may have multiple primary 155 paths for a prefix if it implements multipath. 157 Diverse path: A BGP path associated with a different BGP next-hop and 158 BGP router than some other set of paths. The BGP router associated 159 with a path is inferred from the ORIGINATOR_ID attribute or, if there 160 is none, the BGP Identifier of the peer that advertised the path. 162 Backup path: A diverse path with respect to the primary paths toward 163 a prefix. The backup path can be used to forward traffic to the 164 destination if the primary paths fail. 166 Optimal backup path: The backup path that will be selected as the new 167 best path for a prefix when all primary paths are removed/withdrawn. 169 AS-Wide preferred paths: All paths that are considered as best when 170 applying rules of the BGP decision process up to the IGP tie-break. 172 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 173 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 174 document are to be interpreted as described in [RFC-2119]. 176 3. Add-Paths Applications 178 [draft-pmohapat] presents the applications that would benefit from 179 multiple paths advertisement in iBGP. They are summarized in the 180 following subsections. 182 3.1. Fast Connectivity Restoration 184 With the dissemination of backup paths, fast connectivity restoration 185 and convergence can be achieved. If a router has a backup path, it 186 can directly select that path as best upon failure of the primary 187 path. This minimizes packet loss in the dataplane. Sending multiple 188 paths in iBGP allows routers to receive backup paths when path 189 visibility is not sufficient with classical BGP. This is especially 190 useful when Route Reflection is used. 192 Consider a network such as the one depicted in Figure 1 and suppose 193 that none of the routers support Add-Paths. AS1 receives from AS3 2 194 paths (A and B) to a particular destination XYZ. Suppose path A is 195 preferred over path B due to path A having a lower MED (multi-exit 196 discriminator). 198 AS1 uses a route reflector RR1 to reduce the scale of its IBGP mesh. 199 If the routers in AS1 are not configured for best-external then RR1 200 knows about only path A during steady state because router B 201 suppresses/withdraws its advertisement of path (B) to RR1. If the 202 routers in AS1 do support best-external then RR1 may have both paths 203 in its Adj-RIB-IN, but regardless of the best-external configuration 204 RR1 can only advertise its best path A to its peers, including router 205 D. 207 ======== ===================== 208 = +---+ +---+ +---+ 209 = |RTR|________|RTR| |RTR| 210 = | E | | A | | C | 211 = +---+Path A->+---+ AS1 +---+ 212 = = = \ / = 213 = = = \ / = 214 = = = \ / = 215 = = = \ / = 216 = AS3 = = +---+ = 217 = = = |RR | = 218 = = = | 1 | = 219 = = = +---+ = 220 = = = / \ = 221 = = = / \ = 222 = = = / \ = 223 = = = / \ = 224 = +---+Path B->+---+ +---+ 225 = |RTR| ______|RTR| |RTR| 226 = | F | | B | | D | 227 = +---+ +---+ +---+ 228 ======== ===================== 230 Figure 1: Example Topology 232 Under these circumstances consider the steps required to restore 233 traffic from router D to destination XYZ when the link between Router 234 A and Router E fails. (Assume that router A set next-hop to self when 235 advertising path A and that router B is not configured for best- 236 external). 238 1. Router A sends a BGP UPDATE message withdrawing its advertisement 239 of path (A). 241 2. RR1 receives the withdrawal, and propagates it to its other client 242 peers, routers B, C and D. 244 3. When router B receives the withdrawal of path (A) it reruns its 245 decision process and selects path (B) as its new best path. Router 246 B advertises path (B) to RR1. 248 4. RR1 reruns its decision process and selects path (B) as its new 249 best path. RR1 advertises path (B) to client peers A, C and D. 251 5. Router D reruns its decisions process, determines path (B) to be 252 the best path, and updates its forwarding table. After this step 253 traffic from router D to destination XYZ is restored (the traffic 254 path has changed from A to B). 256 With the use of Add-Paths, the convergence time for the above path 257 failure example can be reduced considerably. The main reason for the 258 improvement is that Add-Paths allows router D to be aware of more 259 than one path to destination XYZ prior to the failure of the best 260 path (A). In steady-state (with no failures) router B decides, as 261 before, that path (A) is its best path but because of its Add-Paths 262 (or best-external) configuration it also advertises path (B) to RR1. 263 Using Add-Paths RR1 can advertise both learned paths to its IBGP 264 peers, including router D. Now consider again the scenario where the 265 link between Router A and Router E fails. In this case, with Add- 266 Paths, fewer steps are required to achieve re-convergence: 268 1. Router A sends a BGP UPDATE message withdrawing its advertisement 269 of path (A). 271 2. RR1 receives the withdrawal, and propagates it to its other client 272 peers, routers B, C and D. 274 3. Router D receives the withdrawal, reruns the decision process and 275 updates the forwarding entry for destination XYZ. 277 3.2. Load Balancing 279 Increased path diversity allows routers to install several paths in 280 their forwarding tables in order to load balance traffic across those 281 paths. 283 3.3. Churn Reduction 285 When Add-Paths is used in an AS, the availability of additional 286 backup paths means failures can be recovered locally with much less 287 path exploration in iBGP and therefore less updates disseminated in 288 eBGP. When the preferred backup path is the post-convergence path, 289 churn is minimized. 291 3.4. Suppression of MED-Related Persistent Route Oscillation 293 As described in [oscillation], Add-Paths is a valuable tool in 294 helping to stop persistent route oscillations caused by comparison of 295 paths based on MED in topologies where route reflectors or the 296 confederation structure hide some paths. With the appropriate path 297 selection algorithm Add-Paths stops these route oscillations because 298 the same set of paths are consistently advertised by the route 299 reflector or the confederation border router and the routers 300 receiving this set of paths make stable routing decisions about the 301 best path. 303 4. Implementation Guidelines 305 This section discusses recommendations for the implementation of Add- 306 Paths. The following topics are addressed: 308 . Considerations related to Add-Paths capability negotiation 310 . Receiving BGP routes from Add-Paths peers 312 . Advertising BGP routes to Add-Paths peers. This section 313 discusses various path selection algorithms, which are the 314 procedures available to an Add-Paths speaker for deciding which 315 set of paths to advertise to an Add-Paths peer for particular 316 prefixes. 318 4.1. Capability Advertisement 320 +---+ +---+ 321 |RTR|___________|RTR| 322 | A | <-BGP-> | B | 323 +---+ +---+ 325 Figure 2: BGP Peering Example 327 In Figure 2, in order for a router A to receive multiple paths per 328 NLRI from peer B, for a particular address family (AFI=x, SAFI=y), 329 the BGP capabilities advertisements during session setup must 330 indicate that peer B wants to send multiple paths for AFI=x, SAFI=y 331 and that router A is willing to receive multiple paths for AFI=x, 332 SAFI=y. Similarly, in order for router A to send multiple paths per 333 NLRI to peer B, for a particular address family (AFI=x, SAFI=y), the 334 BGP capabilities advertisements must indicate that router A wants to 335 send multiple paths for AFI=x, SAFI=y and peer B is willing to 336 receive multiple paths for AFI=x, SAFI=y. Refer to [Add-Paths] for 337 details of the Add-Paths capabilities advertisement. 339 The capabilities of the local router MUST be configurable per peer 340 and per address family, and SHOULD support the ability to configure 341 send-only operation or receive-only operation. The default mode of 342 operation is to both send and receive. 344 4.2. Receiving Multiple Paths 346 Currently, per standard BGP behavior, if a BGP router receives an 347 advertisement of an NLRI and path from a specific peer and that peer 348 subsequently advertises the same NLRI with different path information 349 (e.g. a different NEXT_HOP and/or different path attributes) the new 350 path effectively overwrites the existing path. 352 When Add-Paths has been negotiated with the peer, the newly 353 advertised path should be stored in the RIB-IN along with all of the 354 paths previously advertised (and not withdrawn) by the peer. 356 When an Add-Paths speaker has negotiated to receive multiple paths 357 for (AFIx, SAFIy) from a peer all advertisements and withdrawals of 358 NLRI within that address family from that peer MUST include a path 359 identifier, as described in [Add-Paths]. The path identifiers have no 360 significance to the receiving peer. If the combination of NLRI and 361 path identifier in an advertisement from a peer is unique (does not 362 match an existing route in the RIB-IN from that peer) then the route 363 is added to the RIB-IN. If the combination of NLRI and path 364 identifier in a received advertisement is the same as an existing 365 route in the RIB-IN from the peer then the new route replaces the 366 existing one. If the combination of NLRI and path identifier in a 367 received withdrawal matches an existing route in the RIB-IN from the 368 peer then that route shall be removed from the RIB-IN. 370 A BGP UPDATE message from an Add-Paths peer may advertise and 371 withdraw more than one NLRI belonging to one or more address 372 families. In this case Add-Paths may be supported for some of the 373 address families and not others. In this situation the receiving BGP 374 router should not expect that all of the path identifiers in the 375 UPDATE message will be the same. 377 4.3. Advertising Multiple Paths 379 [Add-Paths] specifies how to encode the advertisement of multiple 380 paths towards the same NLRI over an iBGP session, but provides no 381 details about which set of multiple paths should be advertised. In 382 this section, four path selection algorithms are described and 383 compared with each other. These 4 algorithms are considered to be the 384 most useful across the widest range of deployment scenarios. The list 385 of possible path selection algorithms is much larger and for the 386 interested reader Appendix A provides information about other path 387 selection modes that were considered in historical versions of this 388 document. 390 In comparing any two path selection algorithms the following factors 391 should be taken into account: 393 Control Plane Load: When a router receives multiples paths for a 394 prefix from an iBGP client it has to store more paths in its Adj-Rib- 395 Ins. 397 Control Plane Stress: Coping with multiple iBGP paths has two 398 implications on the computation that a router has to handle. First, 399 it has to compute the paths to send to its peers, i.e. more than the 400 best path. Second, it also has to handle the potential churn related 401 to the exchange of those multiple paths. 403 MED/IGP oscillations: BGP sometimes suffers from routing oscillations 404 when the physical topology differs from the logical topology, or when 405 the MED attribute is used. This is due to the limited path 406 visibility when a single path is advertised and Route Reflection is 407 used. Increasing the path visibility by advertising multiple paths 408 can help solve this issue. 410 Path optimality: When a single path is advertised, border routers do 411 not always receive the optimal path. As an example, Route Reflectors 412 typically send a single path chosen based on their own IGP tie- 413 breaking procedure (although modifications to this are proposed in 414 [BGP-ORR]). Increasing path visibility would also help routers to 415 learn the path that is best suited for them w.r.t. the IGP tie- 416 breaking. 418 Backup path optimality: Multiple paths advertisement gives routers 419 the opportunity to have a backup path. However, some backup paths 420 are better than others. Indeed, when a link failure occurs, if a 421 router already knows its post-convergence path, the BGP re- 422 convergence is straightforward and traffic is less impacted by the 423 transient use of non-best forwarding paths. 425 Convergence time: Advertising multiple paths in iBGP has an impact on 426 the convergence time of the BGP system. More paths need to be 427 exchanged, but on the other hand, the routing information is 428 propagated faster. With an increased path visibility, there is less 429 path exploration during the convergence. Also, with the availability 430 of backup paths, convergence time in case of failure is also reduced. 432 Target application: Depending on the application type, the number of 433 paths to advertise for a prefix will vary. For example, for fast 434 connectivity restoration, it may be sufficient to advertise only 2 435 paths to a peer so that it will have the best path and the optimal 436 backup path. For load balancing purposes, it may be desirable to 437 advertise more paths, but inclusion of the optimal backup path in the 438 set may be less critical. For route oscillation elimination, it is 439 required to advertise all group-best paths for a prefix. 441 4.3.1. Path Selection Modes 443 The following subsections describe the 4 main path selection modes 444 considered in this draft. Each mode is considered either MANDATORY or 445 OPTIONAL. A MANDATORY mode MUST be supported by any implementation 446 that claims compliance with this document. An OPTIONAL made may be 447 supported by some but not all implementations. 449 The path selection mode and any parameters applicable to the mode 450 MUST be configurable per AFI/SAFI and per peer and SHOULD be 451 configurable per prefix. To illustrate the value of this flexibility, 452 consider a prefix P that belongs to an address family F requiring 453 path IDs to be included with every NLRI (e.g. due to the Add-Paths 454 capability negotiation with the peer). If P is one of a number of 455 prefixes that would not benefit from the advertisement of multiple 456 paths then it is perfectly valid to send only the best path. 458 4.3.1.1. Advertise N Paths 460 With the 'Advertise N Paths' mode (Add-N for short) a router 461 advertises up to N paths per prefix towards an Add-Paths peer. The 462 computational cost of this mode is the selection of the N paths. 463 There must be a ranking of the paths in order to ensure consistency 464 in the set of paths advertised to different Add-Paths peers. The 465 recommended way for a router to consistently select N paths is to run 466 its decision process N times and consider at each iteration only the 467 paths that meet all of the following criteria: 469 (a) not selected during a previous iteration 471 (b)_diverse with respect to previously selected paths (see section 472 2 for the definition of a diverse path) 474 (c) not rejected by route filters or split horizon advertisement 475 rules 477 The memory cost of this path selection mode is bounded: a router 478 receives a maximum of N paths for each prefix from each peer. With N 479 equal to 2, all routers know at least two paths and can provide local 480 recovery in case of failure. If multipath routing is to be deployed 481 in the AS, N can be increased to provide more alternate paths to the 482 routers. 484 Path optimality and backup path optimality are not guaranteed, i.e. 485 it is possible that the optimal path of a router (w.r.t. IGP tie- 486 breaking) is not contained in the set of paths advertised by its 487 Route Reflector. However, as the number of paths that it receives is 488 higher than without Add-Paths, it is possible that the chosen nexthop 489 is closer to the router in terms of IGP cost than the nexthop that 490 would have been chosen without Add-Paths. 492 This solution helps to reduce routing oscillations, but not in all 493 cases. Indeed, path visibility is still constrained by the maximum 494 number of paths, and configurations with routing oscillations still 495 exist. 497 This path selection mode is MANDATORY. The default value of N MUST be 498 2. The value of N MUST be configurable and MAY be upper bounded by 499 an implementation. 501 The default value of 2 ensures the availability of a backup path (if 502 2 or more paths have been received) while maintaining minimum impact 503 to memory and churn. If Add-N with N equal to 2 is insufficient to 504 meet another objective (e.g. loadsharing or MED/IGP oscillation) 505 there is always a large enough value of N that can selected, if N is 506 configurable, to meet that objective. 508 4.3.1.2. Advertise All Paths 510 A simple rule for advertising multiple paths in iBGP is to advertise 511 to iBGP peers all received paths minus those blocked by export 512 filters or applicable split horizon rules. This solution is easy to 513 implement, but the counterpart is that all those paths need to be 514 stored by all routers that receive them, which can be quite 515 expensive. If a path to a prefix P is advertised to N border 516 routers, with a Full Mesh of iBGP sessions, all routers have N paths 517 in their Adj-RIB-Ins. If Route Reflection is used and each client is 518 connected to 2 Route Reflectors, it may learn up to 2*N paths. 520 This solution gives a perfect path visibility to all routers, thus 521 limiting churn and losses of connectivity in case of failure. Indeed, 522 this allows routers to select their optimal primary path, and to 523 switch on their optimal backup path in case of failure. 525 However, as more paths are exchanged, the number of BGP messages 526 disseminated during the initial iBGP convergence can be high, and 527 convergence may be slower. 529 Routing oscillations are prevented with this rule, because a router 530 won't need to withdraw a previously advertised path when its best 531 path changes. 533 This path selection mode is OPTIONAL. 535 4.3.1.3. Advertise All AS-Wide Best Paths 537 Another choice is to consider the set of paths with the same AS-wide 538 preference [Basu-ibgp-osc], i.e. the paths that all routers would 539 select based on the rules of the decision process that are not 540 router-dependent (i.e. Local-preference, ASPath length and MED 541 rules). Thus, for a given router, those paths only differ by the IGP 542 cost to the nexthop or by the tie-breaking rules. The paths actually 543 advertised to a peer are the set of AS-wide best paths minus those 544 blocked by export filters or applicable split horizon rules. 546 The computational cost is reduced, as a router only has to send the 547 paths remaining before applying the IGP tie-breaking rule. However, 548 it is difficult to predict how many paths will be stored, as it 549 depends on the number of eBGP sessions on which this prefix is 550 advertised with the best AS-wide preference. 552 With this rule, the routing system is optimal: all routers can choose 553 their best path (or best paths if multipath is used) based on their 554 router-specific preferences, i.e. the IGP cost to the nexthop. Hot 555 potato routing is respected. Also, MED oscillations are prevented, 556 because the path visibility among the AS-wide preferred paths is 557 total. 559 The existence of a backup path is not guaranteed. If only one path 560 with the AS-wide best attributes exists, there is no backup path 561 disseminated. However, if such a path exists, it is optimal as it 562 has the same AS-wide preference as the primary 564 This path selection mode is OPTIONAL. 566 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths (Double 567 AS Wide) 569 This variant of "Advertise All AS Wide Best Paths" trades-off the 570 number of paths being propagated within the iBGP system for post- 571 convergence alternate paths availability and routing stability. A BGP 572 speaker running this mode will select, as candidates for 573 advertisement, its AS Wide Best paths, plus all the AS Wide Best 574 paths obtained when removing the first ones from consideration. The 575 paths actually advertised to a peer are the double-AS wide candidate 576 paths minus those blocked by export filters or applicable split 577 horizon rules. 579 Under this mode, a BGP speaker knows multiple AS-Wide best paths or 580 the AS-Wide best path and all the second AS-Wide best paths, so that 581 routing optimality and backup path availability are ensured. Note 582 that the post-convergence paths will be known by each BGP node in an 583 AS supporting this mode. 585 The computation complexity of this mode is relatively low as it 586 requires the router to run the usual BGP Decision Process up to and 587 including the MED rule. The set of paths remaining after that step 588 form the AS-Wide best paths. Next, a best path selection algorithm 589 is run up to and including the MED rule, based on the paths that are 590 not in the set of AS-Wide best paths. 592 The number of paths for a prefix p, known by a given router of the 593 AS, is the number of AS-Wide best and second AS-Wide best paths found 594 at the Borders of the AS. 596 MED Oscillations are avoided by this mode, both for the primary and 597 alternate paths being picked under this mode. 599 This path selection mode is OPTIONAL. 601 4.3.1.5. Advertise Used Multipaths 603 Many BGP implementations support BGP Multipath, allowing a BGP router 604 to use multiple BGP next-hops for forwarding towards a prefix/NLRI 605 when the corresponding paths are considered equally preferred. In 606 cases where the deployment of Add-Paths is mostly aimed at providing 607 multiple paths for load balancing with BGP Multipath, a natural 608 approach for a BGP speaker supporting Add-paths is to advertise the 609 paths that are selected by its BGP multipath selection algorithm. 611 BGP Multipath selection algorithms can vary depending on the 612 implementation and configuration options. An Add-Paths mode based on 613 BGP multipath is considered practical because it lets the BGP path 614 propagation be aligned with the load balancing objectives expressed 615 by the operator configuring BGP multipath. 617 In some deployment scenarios, it is likely that such a mode leads to 618 the selection and advertisement of a large number of paths for some 619 NLRI, and hence should be controlled as per the mechanism described 620 in section 4.3.2. In case the number of multipaths exceeds the upper 621 bound on the number of advertised paths the ones that should be 622 advertised are those with the highest degree of preference by the BGP 623 decision process. This can be achieved if the advertising router has 624 strictly ordered all of its paths. 626 This path selection mode is OPTIONAL. 628 4.3.2. Derived Modes from Bounding the Number of Advertised Paths 630 For some of the modes discussed in section 4.3.1 the number of paths 631 selected by the algorithm (M) is not predictable in advance, and 632 depends on factors such as network topology. For such modes, 633 implementations MAY support the ability to limit the number of 634 advertised paths to some value N that is less than M. 636 It must be noted that the resulting derivative mode may no longer 637 meet the properties stated in section 4.3.1 (which assumes N=M). This 638 is particularly true for the MED oscillation avoidance property. The 639 use of such bounds thus needs to be considered carefully in 640 deployments where MED oscillation avoidance is a key goal of 641 deploying Add-path. If fast recovery is the main objective then it is 642 reasonable and sufficient to set N to 2. If the main goal is 643 improved load-balancing then limiting N to number of ECMP paths 644 supported by the forwarding planes of the receiving routers is also a 645 reasonable practice. 647 4.3.3. Derived Modes from Adding N More Paths 649 Some modes discussed in section 4.3.1 may result in only one or a few 650 selected paths, depending on network topology and/or router 651 configuration, and this small number of paths may not meet minimum 652 requirements for backup path or load balancing purposes. When using 653 such modes implementations MAY support the ability to add N more 654 paths to the set returned by the basic selection algorithm as 655 described in section 4.3.1. The N more paths should be the N next- 656 best paths, as determined by the BGP decision process. 658 It must be noted that the resulting derivative mode may no longer 659 meet the properties stated in section 4.3.1 (which assumes N=0). 661 5. Deployment Considerations 663 This section proposes a potential strategy for introducing Add-Paths 664 into an existing network and discusses considerations related to 665 scalability, routing consistency and routing churn. 667 5.1. Introducing Add-Paths into an Existing Network 669 There are many possible ways that Add-Paths can be introduced into an 670 existing deployed network. It is not a practical goal for this 671 document to list all of these options and discuss the pros and cons 672 for each one. It is however valuable to consider an example migration 673 strategy that may be relatively common among layer 3 service 674 providers that currently use route reflectors for scaling. This 675 example migration strategy is attractive for several reasons: 677 1. It involves incremental steps that allow the impact of Add- 678 Paths to be carefully evaluated before proceeding to the next 679 step. 681 2. It recognizes the fact that many routers will require at least 682 a software upgrade to support Add-Paths, and it will not be 683 practical to upgrade all of these routers all at once. 685 3. It reduces convergence time (in stages) with a relatively 686 moderate increase in router memory and CPU demands. 688 The example migration strategy assumes a starting point of a deployed 689 network with one or more RR clusters. None of the routers in the 690 network support Add-Paths without an upgrade, but some do support 691 best-external. Two of the clusters in this network are shown in 692 Figure 3. In cluster 2, PE1, PE2, RRy and RRz are configured for 693 best-external. This makes RRy and RRz aware of all external paths 694 received by PEs in cluster 2 and ensures that RRy and RRz can 695 advertise a path to the RRs in cluster 1 if it happens that the best 696 overall route is learned from cluster 1. It doesn't however allow 697 other clusters to be aware of more than one path per prefix learned 698 by cluster 2. 700 ========== ================== 701 = = = = 702 = +---+ +---+ +---+ = 703 = |RR |---------------|RR | <-BE| | = 704 = |a | |y |------|PE1| = 705 = | | | | | | = 706 = +---+ +---+ +---+ = 707 = | = = | \ / = 708 = | = = | \ / = 709 = | = = | \/ = 710 = | = = | /\ = 711 = | = = | / \ = 712 = | = = | / \ = 713 = +---+ +---+ +---+ = 714 = |RR |---------------|RR |------| | = 715 = |b | |z | <-BE|PE2| = 716 = | | | | | | = 717 = +---+ +---+ +---+ = 718 = = = = 719 ========== ================== 720 RR Cluster 1 RR Cluster 2 722 Figure 3: RR Cluster Before Add-Paths 724 The following sequence of steps occurs in the example migration 725 strategy: 727 1. The route reflectors are upgraded in each cluster, one by one, to 728 support Add-Paths. This allows the intra- and (eventually) inter- 729 cluster RR-to-RR sessions to start using Add-Paths. All RRs are 730 configured to use the Add-N, N=2 path selection algorithm. The 731 effect of this step is to slightly reduce convergence time when 732 the best and second-best paths for a prefix are learned by a 733 single cluster (such as cluster 2 in Figure 3). 735 2. The clients are upgraded in each cluster, one by one, to support 736 Add-Paths. On the RRs Add-Paths is configured to use the Add-N, 737 N=2 path selection algorithm towards upgraded client peers. At 738 this step clients are configured in the receive-only Add-Paths 739 mode. This means that best-external continues to operate as 740 before in the client-to-RR direction. The effect of this step is 741 to ensure that all clients have two paths per prefix for ECMP or 742 fast failover, assuming at least 2 paths are available. 744 3. The clients are re-configured to use Add-Paths in the transmit 745 direction towards their RR peers. This causes Add-Paths to replace 746 the best-external behavior. The effect of this step is to free up 747 CPU and memory resources related to the storage of paths that are 748 third best or worse. If a cluster such as the one in Figure 3 had 749 50 clients, and 10 of these learned an external route for the same 750 prefix, then the RRs in that cluster would need to store up to 12 751 paths for that prefix. This would be true even if the 2 best 752 overall paths came from another cluster. Contrast this with the 753 use of Add-Paths in the client-to-RR direction. For the same case 754 the route reflectors need only store the 2 paths learned from non- 755 client peers. 757 5.2. Scalability Considerations 759 In terms of scalability, we note that advertising multiple paths per 760 prefix requires more memory and state than the current behavior of 761 advertising the best path only. A BGP speaker that does not implement 762 Add-Paths maintains send state information in its prefix data 763 structure per neighbor as a way to determine that the prefix has been 764 advertised to the neighbor. With Add-Paths, this information has to 765 be replicated on a per path basis that needs to be advertised. 766 Mathematically, if "send state" size per prefix is 's' bytes, number 767 of neighbors is 'n', and number of paths being advertised is 'p', 768 then the current memory requirement for BGP "send state" = n * s 769 bytes; with Add-Paths, it becomes n * s * p bytes. In practice, this 770 value may be reduced with implementation optimizations similar to 771 attribute sharing. Receiving multiple paths per prefix also requires 772 more memory and state since each path is a separate entry in the Adj- 773 RIB-Ins. 775 5.3. Routing Consistency Considerations 777 As discussed in previous sections Add-Paths can help routers select 778 more optimal paths and it can help deal with certain route 779 oscillation conditions arising from incomplete knowledge of the 780 available paths. But depending on the path selection algorithm and 781 how it is used Add-Paths is not immune to its own cases of routing 782 inconsistencies. If the BGP routers within an AS do not make 783 consistent routing decisions about how to reach a particular 784 destination, route oscillations may occur and these route 785 oscillations may result in traffic loss. 787 Optimizing an Add-Paths deployment for scalability may run counter to 788 routing consistency goals, and in these circumstances operators have 789 to decide the correct tradeoff for their particular deployment. For 790 example the Advertise All Paths mode, if applied to many prefixes, is 791 far from ideal from a scalability perspective but it does guarantee 792 routing consistency and correctness. A path selection mode that 793 allows better control over scalability is the Advertise N paths mode, 794 but this is susceptible to routing inconsistency. First, if the N 795 paths do not include the best path from each neighbor AS group then 796 route oscillation cannot be precluded. Second, if the advertising 797 router (e.g. an RR) advertises N paths to peer_n and M paths to 798 peer_m, and N < M, care must be exercised to ensure that all paths 799 advertised to peer_n are included in the paths advertised to peer_m. 800 This can be assured as long as the advertising router has strictly 801 ordered all of its paths. 803 5.4. Consistency between Advertised Paths and Forwarding Paths 805 When using Add-Paths, routers may advertise paths that they have not 806 selected as best, and that they are thus not using for traffic 807 forwarding. This is generally not an issue if encapsulation is used 808 in the AS as described in [RFC4364] and all forwarding decisions, 809 including by the tunnel egress router, are based on label information 810 - i.e. if only the ingress router performs an IP FIB lookup. In this 811 situation the dataplane path followed by the packets is the one 812 intended by the ingress router, and corresponds to the control plane 813 path it selected. 815 On the other hand, if Add-Paths is used in a network without 816 encapsulation, some scenarios can result in forwarding deflection or 817 loops. Such forwarding anomalies already occur without Add-Paths, 818 when the routers on the forwarding path do not have a synchronized 819 view of the best path. They will deflect the traffic to their own 820 local view of the best path, and, when multiple deflections occur, 821 forwarding loops can occur. With Add-Paths, the issue can be 822 exacerbated due to routers advertising non-best paths. As discussed 823 above, encapsulation can help with this issue, but only to the extent 824 that it allows downstream routers to forward without an IP FIB 825 lookup. 827 A first example of such issue is when the Local-Pref of non-primary 828 paths received over iBGP sessions is modified. The ingress router 829 may thus select as best a path non-preferred by the egress, and the 830 egress router will thus deflect the traffic. 832 Another example is when the best path is selected based on tie- 833 breaking rule. When the ingress and the egress base their path 834 selection on the router-id of the neighbor that advertised the path 835 to them, the result may be different for each of them. This specific 836 issue is described and solved in [draft-pmohapat]. 838 In general, if the network forwards on a hop-by-hop basis and does 839 not make use of encapsulation, it is necessary to advertise the best 840 path. The second path that is advertised should be the second best 841 path using one of the path selection modes described previously. 842 Additional paths are discretionary with the presumption that they can 843 be forwarded on a hop-by-hop basis. 845 Similarly, if the network uses encapsulation, the best path should be 846 advertised for consistency, the second best path should be advertised 847 for fast routing convergence. All further paths and their choice for 848 selection are completely discretionary; the destination is presumed 849 to be reachable via encapsulation. 851 5.5. Interactions with Route Filtering 853 As noted in the previous section, modification of advertised paths 854 may lead to inconsistent route selection. This is true even when the 855 Add-Paths feature is not in use. Similarly, the use of route 856 filtering, when used carelessly for iBGP, may result in inconsistent 857 route selection in an AS with the possibility of introducing 858 forwarding loops. 860 The Add-Paths feature has additional considerations for route 861 filtering since the receiver of multiple paths is unable to determine 862 by inspection of the received NLRI which path corresponds to the 863 sender's active path for the prefix. The sender SHOULD send the best 864 path when sending multiple paths for a destination. The receiver must 865 take care when rejecting destinations to not discard the best path 866 but permit alternate paths. A failure on either the part of the 867 sender or receiver to distribute/receive the best path may result in 868 inconsistent route selection. 870 An implementation MAY support the ability to suppress advertisement 871 of all alternate paths when the export policy would otherwise 872 suppress the best path. 874 5.6. Routing Churn 876 As noted in section 3.3 using Add-Paths between IBGP peers can help 877 to reduce routing churn with EBGP peers. This benefit does however 878 come at the cost of potentially increased churn between the IBGP Add- 879 Paths peers. In a non Add-Paths deployment a change in the preference 880 order of non-best paths requires no updates to be sent to peers. But 881 when a router has Add-Paths peers changes in non-best path preference 882 may no longer be invisible and increased route churn may be 883 observable. Choosing the right path selection mode and parameters - 884 for example not setting N unnecessarily large in the Add-N mode, is 885 important to minimizing this additional churn. 887 6. Security Considerations 889 TBD 891 7. Acknowledgments 893 This document was prepared using 2-Word-v2.0.template.dot. 895 8. Contributors 897 Virginie Van den Schrieck 898 Email: v.vandenschrieck@gmail.com 900 Rohit Gupta 901 Apple Inc 902 Email: rxgupta@apple.com 904 9. IANA Considerations 906 TBD 908 10. References 910 10.1. Normative References 912 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 913 Requirement Levels", BCP 14, RFC 2119, March 1997. 915 10.2. Informative References 917 [Add-Paths] Walton, D., Retana, A., Chen E., Scudder J., 918 "Advertisement of Multiple Paths in BGP", draft- 919 ietf-idr-add-paths-07, June 17, 2012. 921 [draft-pmohapat] Mohapatra, P., Fernando, R., Filsfils, C., and R. 922 Raszuk, "Fast Connectivity Restoration Using BGP 923 Add-path", draft-pmohapat-idr-fast-conn-restore- 924 02.txt, Oct 3, 2011. 926 [oscillation] Walton, D., Retana, A., Chen, E., Scudder, J., "BGP 927 Persistent Route Oscillation Solutions", draft- 928 walton-bgp-route-oscillation-stop-06.txt, June 14, 929 2012. 931 [Basu-ibgp-osc] Basu, A., Ong, C., Rasala, A., Sheperd, B., and G. 932 Wilfong, "Route oscillations in iBGP with Route 933 Reflection", Sigcomm 2002. 935 [BGP-ORR] Raszuk, R., Cassar, C., Aman, E., Decraene, B., "BGP 936 Optimal Route Reflection", draft-raszuk-bgp-optimal- 937 route-reflection-01, March 11, 2011. 939 [RFC4271] Rekhter, Y., Li, T., Hares, S., "A Border Gateway 940 Protocol 4 (BGP-4), January 2006. 942 Appendix A. Other Path Selection Modes 944 A.1. Advertise Neighbor-AS Group Best Path 946 [walton-osc] proposes that a router groups its paths based on the 947 neighbor AS from which it was learned, and to advertise the best path 948 in each of those groups. 950 The control plane stress induced by this solution is the computation 951 of the per-neighbor path group, and the application of the decision 952 process to each of them. The Control-Plane load is bounded by the 953 number of neighboring ASes advertising a prefix, which cannot be 954 known a-priori. 956 Path optimality and backup path optimality are not guaranteed, as the 957 paths advertised are not all the AS-wide preferred paths. Backup path 958 availability is not guaranteed. Indeed, if only one AS advertises 959 this prefix, even on multiple eBGP sessions, only one of the paths 960 may be selected and advertised. 962 A.2. Best LocPref/Second LocPref 964 This selection method consists in grouping the paths by Local 965 Preference. A router sends to its peers all paths with the highest 966 Local Preference. If there is only a single path with the highest 967 Local Preference, it also sends all paths with the second best Local 968 Preference. 970 This method ensures that all routers know all paths with the best 971 local preference. As local preference are often related to the type 972 of peering of the peer the path comes from, this ensures that in case 973 of failure, routers have a backup path of equivalent quality. This 974 prevents for example that a router switches temporarily on a peer 975 path while an alternate path from a customer is available but hidden 976 at the border of the AS. Such a situation could result in a 977 temporary withdrawal of the prefix on some eBGP sessions when the 978 router selects the path via the peer. 980 The advertisement of the Second Local Preference occurs when there is 981 no alternate path with the same quality as the best path. This way, 982 fast convergence is still ensured. Backup path is optimal, as it has 983 the second AS-Wide preference, which becomes the AS-wide best 984 preference upon failure of the primary one. 986 Sending all the paths with a given Local Preference also has a 987 positive impact on routing optimality. Indeed, this allows border 988 routers to have an increased path visibility and to choose their best 989 path based on their own criteria. 991 The computational cost of this solution is reduced when there are 992 several paths with the best local preference. In this case, it is 993 sufficient to stop the decision process after the first rule to have 994 the set of paths to be advertised. When it is necessary to advertise 995 the paths with second local-preference, the additional cost is to 996 apply a second time the first rule of the decision process, which is 997 still reasonable. The memory cost depends on the number of paths 998 with the best local preference. 1000 A.3. Advertise Paths at decisive step -1 1002 When the goal is to provide fast recovery by advertising candidate 1003 post-reconvergence paths, one can choose to stop the decision process 1004 just before the step where only one path remains. If the decision 1005 process comes to IGP tie-break, all remaining paths are advertised. 1006 This way, routers advertise as many paths as possible with a quality 1007 as similar as possible. 1009 This path selection is an intermediary solution between the two 1010 preceding ones. Here, instead of stopping the decision process at 1011 the local preference step or the IGP step, we stop it before the rule 1012 that removes the best potential backup paths. This way, we minimize 1013 the number of paths to advertise while guaranteeing the presence of a 1014 backup path. Primary and backup path optimality is ensured, as all 1015 paths with the same AS-wide preference as the best paths are included 1016 in the set of paths advertised. 1018 Authors' Addresses 1020 Jim Uttaro 1021 AT&T 1022 200 S. Laurel Avenue 1023 Middletown, NJ 07748 USA 1024 Email: uttaro@att.com 1026 Pierre Francois 1027 Institute IMDEA Networks 1028 Avda. del Mar Mediterraneo, 22 1029 Leganese 28918 1030 ES 1031 Email: pierre.francois@imdea.org 1033 Pradosh Mohapatra 1034 Cumulus Networks 1035 pmohapat@cumulusnetworks.com 1037 Roberto Fragassi 1038 Alcatel-Lucent 1039 600 Mountain Avenue 1040 Murray Hill, New Jersey 1041 Email: roberto.fragassi@alcatel-lucent.com 1043 Adam Simpson 1044 Alcatel-Lucent 1045 600 March Road 1046 Ottawa, Ontario K2K 2E6 1047 Canada 1048 Email: adam.simpson@alcatel-lucent.com 1050 Keyur Patel 1051 Cisco Systems 1052 170 W. Tasman Drive 1053 San Jose, CA 95134 USA 1054 Email: keyupate@cisco.com 1056 Jeffrey Haas 1057 Juniper Networks 1058 1194 N. Mathilda Ave. 1059 Sunnyvale, CA 94089 1060 USA 1061 Email: jhaas@juniper.net