idnits 2.17.1 draft-uttaro-idr-add-paths-guidelines-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 4271' is mentioned on line 138, but not defined == Missing Reference: 'RFC-2119' is mentioned on line 157, but not defined == Missing Reference: 'RFC4364' is mentioned on line 637, but not defined == Unused Reference: 'RFC2119' is defined on line 682, but no explicit reference was found in the text == Unused Reference: 'RFC4271' is defined on line 705, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group J. Uttaro 2 Internet Draft AT&T 3 Intended status: Standards Track V. Van den Schrieck 4 Oct 22, 2010 P. Francois 5 Expires: Apr 22, 2011 UCLouvain 6 R. Fragassi 7 A. Simpson 8 Alcatel-Lucent 9 P. Mohapatra 10 Cisco Systems 12 Best Practices for Advertisement of Multiple Paths in BGP 13 draft-uttaro-idr-add-paths-guidelines-03.txt 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html 36 This Internet-Draft will expire on April 22, 2011. 38 Copyright Notice 40 Copyright (c) 2010 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the BSD License. 53 Abstract 55 Add-Paths is a BGP enhancement that allows a BGP router to advertise 56 multiple distinct paths for the same prefix/NLRI. This provides a 57 number of potential benefits, including reduced routing churn, faster 58 convergence and better loadsharing. 60 This document provides recommendations to implementers of Add-Paths 61 so that network operators have the tools needed to address their 62 specific applications and to manage the scalability impact of Add- 63 Paths. A router implementing Add-Paths may learn many paths for a 64 prefix and must decide which of these to advertise to peers. This 65 document analyses different algorithms for making this selection and 66 provides recommendations based on the target application. 68 Table of Contents 70 1. Introduction...................................................4 71 2. Terminology....................................................4 72 3. Add-Paths Applications.........................................5 73 3.1. Fast Connectivity Restoration.............................5 74 3.2. Load Balancing............................................7 75 3.3. Churn Reduction...........................................7 76 3.4. Suppression of MED-Related Persistent Route Oscillation...7 77 4. Implementation Guidelines......................................8 78 4.1. Capability Negotiation....................................8 79 4.2. Receiving Multiple Paths..................................9 80 4.3. Advertising Multiple Paths................................9 81 4.3.1. Path Selection Modes................................11 82 4.3.1.1. Advertise All Paths............................11 83 4.3.1.2. Advertise N Paths..............................11 84 4.3.1.3. Advertise All AS-Wide Best Paths...............12 85 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths 86 (Double AS Wide)........................................13 87 4.3.2. Derived Modes from Bounding the Number of Advertised 88 Paths......................................................14 89 5. Scalability and Routing Consistency Considerations............14 90 5.1. Scalability Considerations...............................14 91 5.2. Routing Consistency Considerations.......................14 92 5.3. Consistency between Advertised Paths and Forwarding Paths15 93 6. Security Considerations.......................................16 94 7. IANA Considerations...........................................16 95 8. Conclusions...................................................16 96 9. References....................................................16 97 9.1. Normative References.....................................16 98 9.2. Informative References...................................16 99 10. Acknowledgments..............................................17 100 Appendix A. Other Path Selection Modes...........................18 101 A.1. Advertise Neighbor-AS Group Best Path....................18 102 A.2. Best LocPref/Second LocPref..............................18 103 A.3. Advertise Paths at decisive step -1......................19 105 1. Introduction 107 The BGP Add-Paths capability enhances current BGP implementations by 108 allowing a BGP router to exchange with its BGP peers more than one 109 path for the same destination/NLRI. The base BGP standard [RFC 4271] 110 does not provide for such a capability. If a BGP router learns 111 multiple paths for the same NLRI (from multiple peers), it selects 112 only one as its best path and advertises the best path to its peers. 113 The primary goal of Add-Paths is to increase the visibility of paths 114 within an iBGP system. This has the effect of improving robustness 115 in case of failure, reducing the number of BGP messages exchanged 116 during such an event, and offering the potential for faster re- 117 convergence. Through careful selection of the paths to be advertised, 118 Add-Paths can also prevent routing oscillations. 120 The purpose of this document is to provide the necessary 121 recommendations to the implementers of Add-Paths so that network 122 operators have the tools needed to address their specific 123 applications and to manage the scalability impact of Add-Paths while 124 maintaining routing consistency. A router implementing Add-Paths may 125 learn many paths for a prefix and must decide which of these to 126 advertise to peers. This document analyses different algorithms for 127 making this selection and provides recommendations based on the 128 target application. 130 2. Terminology 132 In this document the following terms are used: 134 Add-Paths peer: refers a peer with which the local system has agreed 135 to receive and/or send NLRI with path identifiers 137 Primary path: A path toward a prefix that is considered a best path 138 by the BGP decision process [RFC 4271] and actively used for 139 forwarding traffic to that prefix. A router may have multiple primary 140 paths for a prefix if it implements multipath. 142 Backup path: One of the non-best paths toward a prefix. 144 Optimal backup path: the backup path that will be selected as the new 145 best path for a prefix when all primary paths are removed/withdrawn. 147 AS-Wide preferred paths: All paths that are considered as best when 148 applying rules of the BGP decision process up to the IGP tie-break. 150 Path diversity: The property that a router has several paths for a 151 given prefix and each one is associated with a unique BGP next-hop 152 (and BGP router). 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in [RFC-2119]. 158 3. Add-Paths Applications 160 [draft-pmohapat] presents the applications that would benefit from 161 multiple paths advertisement in iBGP. They are summarized in the 162 following subsections. 164 3.1. Fast Connectivity Restoration 166 With the dissemination of backup paths, fast connectivity restoration 167 and convergence can be achieved. If a router has a backup path, it 168 can directly select that path as best upon failure of the primary 169 path. This minimizes packet loss in the dataplane. Sending multiple 170 paths in iBGP allows routers to receive backup paths when path 171 visibility is not sufficient with classical BGP. This is especially 172 useful when Route Reflection is used. 174 Consider a network such as the one depicted in Figure 1 and suppose 175 that none of the routers support Add-Paths. From AS1 there are 3 176 paths (A, B and C) to a particular destination XYZ: two of the paths 177 are via AS3 and one of the paths is via AS2. In this example, Path A 178 is preferred over Path B due to Path A having a lower MED (multi-exit 179 discriminator) (MED for Path A is lower than MED for path B). 181 AS1 uses a route reflector RR1 to reduce the scale of its IBGP mesh. 182 During steady state, RR1 knows about (has in its RIB-IN) only 2 of 183 the 3 paths. Router B suppresses the advertisement of its best 184 external path (B) to RR, an IBGP peer, because its best overall path 185 is A, learnt from router A (via the RR). RR1 chooses path A as the 186 overall best since its IGP cost to router A is the lowest among path 187 A and C. During normal conditions, router D has even less knowledge 188 of the available paths to destination XYZ; it knows only about path 189 (A), the best path from RR1's perspective. 191 ======== ===================== 192 = +---+ +---+ +---+ 193 = |RTR|________|RTR| |RTR| 194 = | E | | A | | C |\ <-Path C 195 = +---+Path A->+---+ AS1 +---+ \ 196 = = = \ / = \ ======= 197 = = = \ / = +---+ = 198 = = = \ / = |RTR| = 199 = = = \ / = | G | = 200 = AS3 = = +---+ = +---+ = 201 = = = |RR | = = = 202 = = = | 1 | = = AS2 = 203 = = = +---+ = ======= 204 = = = / \ = 205 = = = / \ = 206 = = = / \ = 207 = = = / \ = 208 = +---+Path B->+---+ +---+ 209 = |RTR| ______|RTR| |RTR| 210 = | F | | B | | D | 211 = +---+ +---+ +---+ 212 ======== ===================== 214 Figure 1: Example Topology 216 Consider now the steps required to restore traffic from router D to 217 destination XYZ when the link between Router A and Router E fails. 219 1. Router A sends a BGP UPDATE message withdrawing its advertisement 220 of path (A). 222 2. RR receives the withdrawal, and propagates it to its other client 223 peers, routers B, C and D. 225 3. When router B receives the withdrawal of path (A) it reruns its 226 decision process and selects path (B) as its new best path. Router 227 B advertises path (B) to RR. 229 4. RR reruns its decision process and selects path (B) as its new 230 best path. RR advertises path (B) to client peers A, C and D. 232 5. Router D reruns its decisions process, determines path (B) to be 233 the best path, and updates its forwarding table. After this step 234 traffic from router D to destination XYZ is restored (the traffic 235 path has changed from A to B). 237 With the use of Add-Paths, the convergence time for the above path 238 failure example can be reduced considerably. The main reason for the 239 improvement is that Add-Paths allows router D to be aware of more 240 than one path to destination XYZ prior to the failure of the best 241 path (A). In steady-state (with no failures) router B decides, as 242 before, that path (A) is its best path but it also advertises path 243 (B) - which happens to be its next-best overall path and its best 244 "external" path - to RR. With Add-Paths RR1 now has knowledge of all 245 3 paths to destination XYZ and it can advertise more than just the 246 best path (A) to its peers. Suppose RR1 is allowed to advertise up to 247 3 paths for destination XYZ. In this case, with the appropriate path 248 selection algorithm, it will advertise paths (A), (B) and (C) to 249 router D. Now consider again the scenario where the link between 250 Router A and Router E fails. In this case, with Add-Paths, fewer 251 steps are required to achieve re-convergence: 253 1. Router A sends a BGP UPDATE message withdrawing its advertisement 254 of path (A). 256 2. RR1 receives the withdrawal, and propagates it to its other client 257 peers, routers B, C and D. 259 3. Router D receives the withdrawal, reruns the decision process and 260 updates the forwarding entry for destination XYZ. 262 3.2. Load Balancing 264 Increased path diversity allows routers to install several paths in 265 their forwarding tables in order to load balance traffic across those 266 paths. 268 3.3. Churn Reduction 270 When Add-Paths is used in an AS, the availability of additional 271 backup paths means failures can be recovered locally with much less 272 path exploration in iBGP and therefore less Updates disseminated in 273 eBGP. When the preferred backup path is the post-convergence path, 274 churn is minimized. 276 3.4. Suppression of MED-Related Persistent Route Oscillation 278 As described in [oscillation], Add-Paths is a valuable tool in 279 helping to stop persistent route oscillations caused by comparison of 280 paths based on MED in topologies where route reflectors or the 281 confederation structure hide some paths. With the appropriate path 282 selection algorithm Add-Paths stops these route oscillations because 283 the same set of paths are consistently advertised by the route 284 reflector or the confederation border router and the routers 285 receiving this set of paths make stable routing decisions about the 286 best path. 288 4. Implementation Guidelines 290 In this section, we discuss recommendations for the implementation of 291 add-paths. We first discuss the BGP capability negotiations related 292 to the use of Add-paths among iBGP peers, as well as their 293 configuration aspects. Next, we provide an overview of RIB-IN 294 management issues for the support of Add-paths. Finally, we discuss 295 the properties of various algorithms for the selection of the paths 296 to be advertised by a BGP speaker supporting Add-paths. The goal of 297 this last section is to recommend, in future revisions of the draft, 298 a default paths selection mode, as well as the minimal set of modes 299 to be supported by a BGP speaker supporting Add-paths. 301 4.1. Capability Negotiation 303 +---+ +---+ 304 |RTR|___________|RTR| 305 | A | <-BGP-> | B | 306 +---+ +---+ 308 Figure 2: BGP Peering Example 310 In Figure 2, in order for a router A to receive multiple paths per 311 NLRI from peer B, for a particular address family (AFI=x, SAFI=y), 312 the BGP capabilities advertisements during session setup must 313 indicate that peer B wants to send multiple paths for AFI=x, SAFI=y 314 and that router A is willing to receive multiple paths for AFI=x, 315 SAFI=y. Similarly, in order for router A to send multiple paths per 316 NLRI to peer B, for a particular address family (AFI=x, SAFI=y), the 317 BGP capabilities advertisements must indicate that router A wants to 318 send multiple paths for AFI=x, SAFI=y and peer B is willing to 319 receive multiple paths for AFI=x, SAFI=y. Refer to [Add-Paths] for 320 details of the Add-Paths capabilities advertisement. 322 The capabilities of the local router shall be configurable per peer 323 and per address family, with the ability to configure send-only 324 operation or receive-only operation. The default mode of operation 325 shall be to both send and receive. 327 4.2. Receiving Multiple Paths 329 Currently, per standard BGP behavior, if a BGP router receives an 330 advertisement of an NLRI and path from a specific peer and that peer 331 subsequently advertises the same NLRI with different path information 332 (e.g. a different NEXT_HOP and/or different path attributes) the new 333 path effectively overwrites the existing path. 335 When Add-Paths has been negotiated with the peer, the newly 336 advertised path should be stored in the RIB-IN along with all of the 337 paths previously advertised (and not withdrawn) by the peer. 339 When the Add-Paths receive capability for (AFIx, SAFIy) has been 340 negotiated with a peer all advertisements and withdrawals of NLRI 341 within that address family by that peer shall include a path 342 identifier, as described in [Add-Paths]. The path identifiers have no 343 significance to the receiving peer. If the combination of NLRI and 344 path identifier in an advertisement from a peer is unique (does not 345 match an existing route in the RIB-IN from that peer) then the route 346 is added to the RIB-IN. If the combination of NLRI and path 347 identifier in a received advertisement is the same as an existing 348 route in the RIB-IN from the peer then the new route replaces the 349 existing one. If the combination of NLRI and path identifier in a 350 received withdrawal matches an existing route in the RIB-IN from the 351 peer then that route shall be removed from the RIB-IN. 353 A BGP UPDATE message from a peer sending NLRI with the path 354 identifier may advertise and withdraw more than one NLRI belonging to 355 one or more address families. In this case Add-Paths may be supported 356 for some of the address families and not others. In this situation 357 the receiving BGP router should not expect that all of the path 358 identifiers in the UPDATE message will be the same. 360 4.3. Advertising Multiple Paths 362 [Add-Paths] specifies how to encode the advertisement of multiple 363 paths towards the same NLRI over an iBGP session, but provides no 364 details about which set of multiple paths should be advertised. In 365 this section, four path selection algorithms are described and 366 compared with each other. These 4 algorithms are considered to be the 367 most useful across the widest range of deployment scenarios. Of 368 course the list of possible path selection algorithms is much larger 369 and for the interested reader Appendix A provides information about 370 other path selection modes that were considered in historical 371 versions of this document. 373 In comparing any two path selection algorithms the following factors 374 should be taken into account: 376 Control Plane Load: When a router receives multiples paths for a 377 prefix from an iBGP client it has to store more paths in its Adj-Rib- 378 Ins. 380 Control Plane Stress: Coping with multiple iBGP paths has two 381 implications on the computation that a router has to handle. First, 382 it has to compute the paths to send to its peers, i.e. more than the 383 best path. Second, it also has to handle the potential churn related 384 to the exchange of those multiple paths. 386 MED/IGP oscillations: BGP sometimes suffers from routing oscillations 387 when the physical topology differs from the logical topology, or when 388 the MED attribute is used. This is due to the limited path 389 visibility when a single path is advertised and Route Reflection is 390 used. Increasing the path visibility by advertising multiple paths 391 can help solve this issue. 393 Path optimality: When a single path is advertised, border routers do 394 not always receive the optimal path. As an example, Route Reflectors 395 send a single path chosen based on their own IGP tie-break. 396 Increasing path visibility would also help routers to learn the path 397 that is best suited for them w.r.t. the IGP tie-break. 399 Backup path optimality: Multiple paths advertisement gives routers 400 the opportunity to have a backup path. However, some backup paths 401 are better than others. Indeed, when a link failure occurs, if a 402 router already knows its post-convergence path, the BGP re- 403 convergence is straightforward and traffic is less impacted by the 404 transient use of non-best forwarding paths. 406 Convergence time: Advertising multiple paths in iBGP has an impact on 407 the convergence time of the BGP system. More paths need to be 408 exchanged, but on the other hand, the routing information is 409 propagated faster. With an increased path visibility, there is less 410 path exploration during the convergence. Also, with the availability 411 of backup paths, convergence time in case of failure is also reduced. 413 Target application: Depending on the application type, the number of 414 paths to advertise for a prefix will vary. For example, for fast 415 connectivity restoration, it may be sufficient to advertise only 2 416 paths to a peer so that it will have the best path and the optimal 417 backup path. For load balancing purposes, it may be desirable to 418 advertise more paths, but inclusion of the optimal backup path in the 419 set may be less critical. For route oscillation elimination, it is 420 required to advertise all group-best paths for a prefix. 422 4.3.1. Path Selection Modes 424 The following subsections describe the 4 main path selection modes 425 considered in this draft. Each mode is considered either MANDATORY or 426 OPTIONAL. A MANDATORY mode should be present in any implementation 427 that claims compliance with [Add-Paths]. An OPTIONAL made may be 428 supported by some but not all implementations. 430 The path selection mode and any parameters applicable to the mode 431 MUST be configurable per AFI/SAFI and per peer and SHOULD be 432 configurable per prefix. 434 4.3.1.1. Advertise All Paths 436 A simple rule for advertising multiple paths in iBGP is to simply 437 advertise to iBGP peers all received paths, provided they pass export 438 filters. This solution is easy to implement, but the counterpart is 439 that all those paths need to be stored by all routers that receive 440 them, which can be quite expensive. If a path to a prefix P is 441 advertised to N border routers, with a Full Mesh of iBGP sessions, 442 all routers have N paths in their Adj-RIB-Ins. If Route Reflection 443 is used and each client is connected to 2 Route Reflectors, it may 444 learn up to 2*N paths. 446 This solution gives a perfect path visibility to all routers, thus 447 limiting churn and losses of connectivity in case of failure. Indeed, 448 this allows routers to select their optimal primary path, and to 449 switch on their optimal backup path in case of failure. 451 However, as more paths are exchanged, the number of BGP messages 452 disseminated during the initial iBGP convergence can be high, and 453 convergence may be slower. 455 Routing oscillations are prevented with this rule, because a router 456 won't need to withdraw a previously advertised path when its best 457 path changes. 459 Routers that support Add-Path MAY support this path selection mode. 460 It is an OPTIONAL mode. 462 4.3.1.2. Advertise N Paths 464 Another solution is for a router to advertise a maximum of N paths to 465 iBGP peers. Here, the computational cost is the selection of the N 466 paths. Indeed, there must be a ranking of the paths in order to 467 advertise the most interesting ones. A way for a router to select N 468 paths is to run N times its decision process. At each iteration of 469 the process only those paths not selected during a previous iteration 470 and not having a NEXT_HOP or BGP Identifier (or Originator ID) in 471 common with the previously-selected paths are eligible for 472 consideration. The memory cost is bounded: a router receives a 473 maximum of N paths for each prefix from each peer. With N equal to 2, 474 all routers know at least two paths and can provide local recovery in 475 case of failure. If multipath routing is to be deployed in the AS, N 476 can be increased to provide more alternate paths to the routers. 478 Path optimality and backup path optimality are not guaranteed, but as 479 path diversity is better, the nexthops of the chosen primary and 480 backup path are more likely to be closer to the router than with 481 classical BGP. 483 This solution helps to reduce routing oscillations, but not in all 484 cases. Indeed, path visibility is still constrained by the maximum 485 number of paths, and configurations with routing oscillations still 486 exist. 488 Routers that support Add-Path MUST support this path selection mode. 489 The default value of N must be 2. The value of N MUST be 490 configurable and MAY be upper bounded by an implementation. 492 The default value of 2 ensures the availability of a backup path (if 493 2 or more paths have been received) while maintaining minimum impact 494 to memory and churn. If Add-N with N equal to 2 is insufficient to 495 meet another objective (e.g. loadsharing or MED/IGP oscillation) 496 there is always a large enough value of N that can selected, if N is 497 configurable, to meet that objective. 499 4.3.1.3. Advertise All AS-Wide Best Paths 501 Another choice is to advertise all paths with the same AS-wide 502 preference [Basu-ibgp-osc], i.e. the paths that all routers would 503 select based on the rules of the decision process that are not 504 router-dependent (i.e. Local-preference, ASPath length and MED 505 rules). Thus, for a given router, those paths only differ by the IGP 506 cost to the nexthop or by the tie-breaking rules. 508 The computational cost is reduced, as a router only has to send the 509 paths remaining before applying the IGP tie-breaking rule. However, 510 it is difficult to predict how many paths will be stored, as it 511 depends on the number of eBGP sessions on which this prefix is 512 advertised with the best AS-wide preference. 514 With this rule, the routing system is optimal: all routers can choose 515 their best path (or best paths if multipath is used) based on their 516 router-specific preferences, i.e. the IGP cost to the nexthop. Hot 517 potato routing is respected. Also, MED oscillations are prevented, 518 because the path visibility among the AS-wide preferred paths is 519 total. 521 The existence of a backup path is not guaranteed. If only one path 522 with the AS-wide best attributes exists, there is no backup path 523 disseminated. However, if such a path exists, it is optimal as it 524 has the same AS-wide preference as the primary 526 Routers that support Add-Path MAY support this path selection mode. 527 It is an OPTIONAL mode. 529 4.3.1.4. Advertise ALL AS-Wide Best and Next-Best Paths (Double AS Wide) 531 This variant of "Advertise All AS Wide Best Paths" trades-off the 532 number of paths being propagated within the iBGP system for post- 533 convergence alternate paths availability and routing stability. A BGP 534 speaker running this mode will select for advertisement its AS Wide 535 Best paths, plus all the AS Wide Best paths obtained when removing 536 the first ones from consideration. 538 Under this mode, a BGP speaker knows multiple AS-Wide best paths or 539 the AS-Wide best path and all the second AS-Wide best paths, so that 540 routing optimality and backup path availability are ensured. Note 541 that the post-convergence paths will be known by each BGP node in an 542 AS supporting this mode. 544 The computation complexity of this mode is relatively low as it 545 requires to run the usual BGP Decision Process up to and including 546 the MED rule. The set of paths remaining after that step form the AS- 547 Wide best paths. Next, a best path selection algorithm is run up to 548 and including the MED rule, based on the paths that are not in the 549 set of AS-Wide best paths. 551 The number of paths for a prefix p, known by a given router of the 552 AS, is the number of AS-Wide best and second AS-Wide best paths found 553 at the Borders of the AS. 555 MED Oscillations are avoided by this mode, both for the primary and 556 alternate paths being picked under this mode. 558 Routers that support Add-Path MAY support this path selection mode. 559 It is an OPTIONAL mode. 561 4.3.2. Derived Modes from Bounding the Number of Advertised Paths 563 For some of the modes discussed in section 4.3.1 the number of paths 564 selected by the algorithm (M) is not predictable in advance, and 565 depends on factors such as network topology. For such modes, 566 implementations MAY support the ability to limit the number of 567 advertised paths to some value N that is less than M. 569 It must be noted that the resulting derivative mode may no longer 570 meet the properties stated in section 4.3.1 (which assumes N=M). This 571 is particularly true for the MED oscillation avoidance property. The 572 use of such bounds thus needs to be considered carefully in 573 deployments where MED oscillation avoidance is a key goal of 574 deploying Add-path. If fast recovery is the main objective then it is 575 reasonable and sufficient to set N to 2. If the main goal is 576 improved load-balancing then limiting N to number of ECMP paths 577 supported by the forwarding planes of the receiving routers is also a 578 reasonable practice. 580 5. Scalability and Routing Consistency Considerations 582 When Add-Paths is introduced into a network it can have important 583 implications on nodal and network scalability and routing consistency 584 and correctness. 586 5.1. Scalability Considerations 588 In terms of scalability, we note that advertising multiple paths per 589 prefix requires more memory and state than the current behavior of 590 advertising the best path only. A BGP speaker that does not implement 591 Add-Paths maintains send state information in its prefix data 592 structure per neighbor as a way to determine that the prefix has been 593 advertised to the neighbor. With Add-Paths, this information has to 594 be replicated on a per path basis that needs to be advertised. 595 Mathematically, if "send state" size per prefix is 's' bytes, number 596 of neighbors is 'n', and number of paths being advertised is 'p', 597 then the current memory requirement for BGP "send state" = n * s 598 bytes; with Add-Paths, it becomes n * s * p bytes. In practice, this 599 value may be reduced with implementation optimizations similar to 600 attribute sharing. Receiving multiple paths per prefix also requires 601 more memory and state since each path is a separate entry in the Adj- 602 RIB-Ins. 604 5.2. Routing Consistency Considerations 606 As discussed in previous sections Add-Paths can help routers select 607 more optimal paths and it can help deal with certain route 608 oscillation conditions arising from incomplete knowledge of the 609 available paths. But depending on the path selection algorithm and 610 how it is used Add-Paths is not immune to its own cases of routing 611 inconsistencies. If the BGP routers within an AS do not make 612 consistent routing decisions about how to reach a particular 613 destination, route oscillations may occur and these route 614 oscillations may result in traffic loss. 616 Optimizing an Add-Paths deployment for scalability may run counter to 617 routing consistency goals, and in these circumstances operators have 618 to decide the correct tradeoff for their particular deployment. For 619 example the Advertise All Paths mode, if applied to many prefixes, is 620 far from ideal from a scalability perspective but it does guarantee 621 routing consistency and correctness. A path selection mode that 622 allows better control over scalability is the Advertise N paths mode, 623 but this is susceptible to routing inconsistency. First, if the N 624 paths do not include the best path from each neighbor AS group then 625 route oscillation cannot be precluded. Second, if the advertising 626 router (e.g. an RR) advertises N paths to peer_n and M paths to 627 peer_m, and N < M, care must be exercised to ensure that all paths 628 advertised to peer_n are included in the paths advertised to peer_m. 629 This can be assured as long as the advertising router has strictly 630 ordered all of its paths 632 5.3. Consistency between Advertised Paths and Forwarding Paths 634 When using Add-Paths, routers may advertise paths that they have not 635 selected as best, and that they are thus not using for traffic 636 forwarding. If two levels of encapsulation are used in the network 637 as described in [RFC4364], this is not an issue, as only the ingress 638 router performs a lookup in its BGP-fed FIB. The traffic is 639 encapsulated to the egress link, and no other router on the 640 forwarding path needs to perform a BGP lookup. The dataplane path 641 followed by the packets is the one intended by the ingress router, 642 and corresponds to the control plane path it advertises. 644 However, in some networks using Add-Paths without double 645 encapsulation, some scenarios can result in forwarding deflection or 646 loops. Such forwarding anomalies already occur without Add-Paths, 647 when the routers on the forwarding path do not use the same nexthop 648 as the ingress router. They will deflect the traffic to their own 649 nexthop, and, when multiple deflections occur, forwarding loops can 650 appear. With Add-Paths, the issue can be exacerbated due to routers 651 advertising non-best paths, even when one level of encapsulation is 652 used. Indeed, both the ingress and the egress routers perform a BGP 653 lookup, and traffic can be deflected by the egress router. 655 A first example of such issue is when the Local-Pref of paths 656 received over iBGP sessions is modified. The ingress router may thus 657 select as best a path non-preferred by the egress, and the egress 658 router will thus deflect the traffic. 660 Another example is when the best path is selected based on tie- 661 breaking rule. When the ingress and the egress base their path 662 selection on the router-id of the neighbor that advertised the path 663 to them, the result may be different for each of them. This specific 664 issue is described and solved in [draft-pmohapat]. 666 6. Security Considerations 668 TBD 670 7. IANA Considerations 672 TBD 674 8. Conclusions 676 TBD 678 9. References 680 9.1. Normative References 682 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 683 Requirement Levels", BCP 14, RFC 2119, March 1997. 685 9.2. Informative References 687 [Add-Paths] Walton, D., Retana, A., Chen E., Scudder J., 688 "Advertisement of Multiple Paths in BGP", February 6, 689 2010. 691 [draft-pmohapat] Mohapatra, P., Fernando, R., Filsfils, C., and R. 692 Raszuk, "Fast Connectivity Restoration Using BGP Add- 693 path", draft-pmohapat-idr-fast-conn-restore-00.txt (work 694 in progress), September 2008. 696 [oscillation] Walton, D., Retana, A., Chen, E., Scudder, J., "BGP 697 Persistent Route Oscillation Solutions", draft-walton- 698 bgp-route-oscillation-stop-03.txt, May 10, 2010. 700 [Basu-ibgp-osc] Basu, A., Ong, C., Rasala, A., Sheperd, B., and G. 702 Wilfong, "Route oscillations in iBGP with Route 703 Reflection", Sigcomm 2002. 705 [RFC4271] Rekhter, Y., Li, T., Hares, S., "A Border Gateway 706 Protocol 4 (BGP-4), January 2006. 708 10. Acknowledgments 710 This document was prepared using 2-Word-v2.0.template.dot. 712 Appendix A. Other Path Selection Modes 714 A.1. Advertise Neighbor-AS Group Best Path 716 [walton-osc] proposes that a router groups its paths based on the 717 neighbor AS from which it was learned, and to advertise the best path 718 in each of those groups. 720 The control plane stress induced by this solution is the computation 721 of the per-neighbor path group, and the application of the decision 722 process to each of them. The Control-Plane load is bounded by the 723 number of neighboring ASes advertising a prefix, which cannot be 724 known a-priori. 726 Path optimality and backup path optimality are not guaranteed, as the 727 paths advertised are not all the AS-wide preferred paths. Backup path 728 availability is not guaranteed. Indeed, if only one AS advertises 729 this prefix, even on multiple eBGP sessions, only one of the paths 730 may be selected and advertised. 732 A.2. Best LocPref/Second LocPref 734 This selection method consists in grouping the paths by Local 735 Preference. A router sends to its peers all paths with the highest 736 Local Preference. If there is only a single path with the highest 737 Local Preference, it also sends all paths with the second best Local 738 Preference. 740 This method ensures that all routers know all paths with the best 741 local preference. As local preference are often related to the type 742 of peering of the peer the path comes from, this ensures that in case 743 of failure, routers have a backup path of equivalent quality. This 744 prevents for example that a router switches temporarily on a peer 745 path while an alternate path from a customer is available but hidden 746 at the border of the AS. Such a situation could result in a 747 temporary withdrawal of the prefix on some eBGP sessions when the 748 router selects the path via the peer. 750 The advertisement of the Second Local Preference occurs when there is 751 no alternate path with the same quality as the best path. This way, 752 fast convergence is still ensured. Backup path is optimal, as it has 753 the second AS-Wide preference, which becomes the AS-wide best 754 preference upon failure of the primary one. 756 Sending all the paths with a given Local Preference also has a 757 positive impact on routing optimality. Indeed, this allows border 758 routers to have an increased path visibility and to choose their best 759 path based on their own criteria. 761 The computational cost of this solution is reduced when there are 762 several paths with the best local preference. In this case, it is 763 sufficient to stop the decision process after the first rule to have 764 the set of paths to be advertised. When it is necessary to advertise 765 the paths with second local-preference, the additional cost is to 766 apply a second time the first rule of the decision process, which is 767 still reasonable. The memory cost depends on the number of paths 768 with the best local preference. 770 A.3. Advertise Paths at decisive step -1 772 When the goal is to provide fast recovery by advertising candidate 773 post-reconvergence paths, one can choose to stop the decision process 774 just before the step where only one path remains. If the decision 775 process comes to IGP tie-break, all remaining paths are advertised. 776 This way, routers advertise as many paths as possible with a quality 777 as similar as possible. 779 This path selection is an intermediary solution between the two 780 preceding ones. Here, instead of stopping the decision process at 781 the local preference step or the IGP step, we stop it before the rule 782 that removes the best potential backup paths. This way, we minimize 783 the number of paths to advertise while guaranteeing the presence of a 784 backup path. Primary and backup path optimality is ensured, as all 785 paths with the same AS-wide preference as the best paths are included 786 in the set of paths advertised. 788 Authors' Addresses 790 Jim Uttaro 791 AT&T 792 200 S. Laurel Avenue 793 Middletown, NJ 07748 USA 794 Email: uttaro@att.com 796 Virginie Van den Schrieck 797 UCLouvain 798 Place Ste Barbe, 2 799 Louvain-la-Neuve 1348 BE 800 Email: virginie.vandenschrieck@uclouvain.be 801 URI: http://inl.info.ucl.ac.be/vvandens 803 Pierre Francois 804 UCLouvain 805 Place Ste Barbe, 2 806 Louvain-la-Neuve 1348 BE 807 Email: pierre.francois@uclouvain.be 808 URI: http://inl.info.ucl.ac.be/pfr 810 Roberto Fragassi 811 Alcatel-Lucent 812 600 Mountain Avenue 813 Murray Hill, New Jersey 814 Email: roberto.fragassi@alcatel-lucent.com 816 Adam Simpson 817 Alcatel-Lucent 818 600 March Road 819 Ottawa, Ontario K2K 2E6 820 Canada 821 Email: adam.simpson@alcatel-lucent.com 823 Pradosh Mohapatra 824 Cisco Systems 825 170 W. Tasman Drive 826 San Jose, CA 95134 USA 827 Email: pmohapat@cisco.com