idnits 2.17.1 draft-pmohapat-idr-fast-conn-restore-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 20. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 759. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 770. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 777. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 783. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: a. Border routers SHOULD not apply the modification to the selection rules as proposed in [RFC5004] to avoid best path transitions for parallel EBGP connection scenario where the border router wishes to transitively transmit the NEXT_HOP value unchanged. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 27, 2008) is 5683 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-01) exists of draft-decraene-bgp-graceful-shutdown-requirements-00 ** Downref: Normative reference to an Informational draft: draft-decraene-bgp-graceful-shutdown-requirements (ref. 'I-D.decraene-bgp-graceful-shutdown-requirements') == Outdated reference: A later version (-01) exists of draft-marques-idr-best-external-00 -- Possible downref: Normative reference to a draft: ref. 'I-D.walton-bgp-add-paths' Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Mohapatra 3 Internet-Draft Cisco Systems 4 Intended status: Standards Track R. Fernando 5 Expires: March 31, 2009 Juniper Networks 6 C. Filsfils 7 Cisco Systems 8 R. Raszuk 9 Juniper Networks 10 September 27, 2008 12 Fast Connectivity Restoration Using BGP Add-path 13 draft-pmohapat-idr-fast-conn-restore-00 15 Status of this Memo 17 By submitting this Internet-Draft, each author represents that any 18 applicable patent or other IPR claims of which he or she is aware 19 have been or will be disclosed, and any of which he or she becomes 20 aware will be disclosed, in accordance with Section 6 of BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on March 31, 2009. 40 Abstract 42 A BGP route defines an association of an address prefix with an "exit 43 point" from the current Autonomous System (AS). If the exit point 44 becomes unreachable due to a failure, the route becomes invalid. 45 This usually triggers an exchange of BGP control messages after which 46 a new BGP route for the given prefix is installed. However, 47 connectivity can be restored more quickly if the router maintains 48 precomputed BGP backup routes. It can then switch to a backup route 49 immediately upon learning that an exit point is unreachable, without 50 needing to wait for the BGP control messages exchange. This document 51 specifies the procedures to be used by BGP to maintain and distribute 52 the precomputed backup routes. Maintaining these additional routes 53 is also useful in promoting load balancing, performing maintenance 54 without causing traffic loss, and in reducing churn in the BGP 55 control plane. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 61 2. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 3. Design Considerations . . . . . . . . . . . . . . . . . . . . 5 63 3.1. Ensuring Loop-Free Path Selection in an AS . . . . . . . . 5 64 3.1.1. Border routers announcing single path . . . . . . . . 5 65 3.1.2. Border routers announcing multiple paths . . . . . . . 6 66 3.1.3. Confederations . . . . . . . . . . . . . . . . . . . . 6 67 3.2. Keeping Path Attributes Independent of Decision Process . 7 68 4. Border router attr_set attribute . . . . . . . . . . . . . . . 7 69 5. Calculation of Best and Backup Paths . . . . . . . . . . . . . 8 70 6. Advertising Multiple Paths . . . . . . . . . . . . . . . . . . 12 71 7. Deployment Considerations . . . . . . . . . . . . . . . . . . 12 72 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 8.1. Fast Connectivity Restoration . . . . . . . . . . . . . . 13 74 8.2. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 13 75 8.3. Churn Reduction . . . . . . . . . . . . . . . . . . . . . 14 76 8.3.1. Inter-domain Churn Reduction . . . . . . . . . . . . . 14 77 8.3.2. Intra-Domain Churn Reduction . . . . . . . . . . . . . 14 78 8.4. Graceful Maintenance . . . . . . . . . . . . . . . . . . . 16 79 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 80 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 11. Security Considerations . . . . . . . . . . . . . . . . . . . 16 82 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 83 12.1. Normative References . . . . . . . . . . . . . . . . . . . 17 84 12.2. Informative References . . . . . . . . . . . . . . . . . . 17 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 86 Intellectual Property and Copyright Statements . . . . . . . . . . 19 88 1. Introduction 90 Within an autonomous system, the availability of multiple routes to a 91 given destination, where each of the routes has a different "exit 92 point" from the local AS provides the following benefits: 94 o Fault tolerance: Knowledge of multiple "exit points" leads to 95 reduction in restoration time after failure. For instance, a 96 border router on receiving multiple paths to the same destination 97 could decide to precompute a backup path and have it ready so that 98 when the primary path becomes invalid, it could use the backup to 99 quickly restore connectivity. Currently the restoration time is 100 dependent on BGP protocol re-convergence that includes a set of 101 withdraw and advertisement messages in the network before a new 102 best path can be learnt. 104 o Load balancing: The availability of multiple paths to reach the 105 same destination enables load balancing of traffic provided the 106 paths for the given destination satisfy certain constraints. 108 o Churn reduction: The advertisement of multiple routes, in certain 109 scenarios (Section 8.3.2), could lead to less churn in the network 110 upon a failure, since the presence of multiple paths helps contain 111 the failure to the local AS where the failure occurs. 113 o Graceful maintenance: The availability of alternate exit points 114 allows one to bring down a router for maintenance without causing 115 significant traffic loss. 117 Unfortunately, the border routers in an AS do not receive multiple 118 paths for all prefixes. The reason is three-fold: 120 o The current BGP specification [RFC4271] specifies routers to 121 advertise only the best path for a destination to speakers. The 122 availability of multiple paths requires simultaneous distribution 123 of multiple routes for a given prefix by a BGP speaker. We refer 124 to this property of the network as "path diversity". 126 o When a router selects an IBGP learnt path as best, it does not 127 announce any path for that prefix to IBGP though it may have EBGP 128 learnt paths available. This loss of information leads to added 129 churn and increases convergence time if the preferred path goes 130 away. A mechanism to advertise the best-external path to IBGP is 131 proposed in [I-D.marques-idr-best-external]. 133 o Most service providers deploy one of the scaling techniques like 134 route reflectors [RFC4456] or confederations [RFC5065] inside the 135 AS and avoid iBGP full mesh. Thus even when multiple paths exist, 136 the aggregation points (route reflectors or confederation border 137 routers) advertise only the best path (as per the BGP base 138 protocol). 140 As an effect of this behavior, the ingress border routers to an AS do 141 not receive additional paths necessary to provide the benefits cited 142 above: e.g. perform a local recovery during network failures or 143 achieve load balancing in steady state across multiple exit points. 145 The mechanism to extend BGP to allow a given BGP speaker to advertise 146 multiple paths simultaneously for a destination is defined in 147 [I-D.walton-bgp-add-paths]. The current draft describes the use of 148 this generic technique and certain additional procedures and 149 implementation guidelines to enable the above applications. 151 More specifically, this document describes extensions to BGP decision 152 process to select backup paths in a manner that ensures the important 153 property of consistent route selection within an AS. It also 154 introduces a new BGP attribute, attr_set, that border routers should 155 use to advertise multiple EBGP learnt paths for a given destination. 156 To aid with better description of the applications, the draft 157 illustrates certain use case scenarios for each. 159 One implication of multiple path advertisement is the associated 160 cost, namely the performance overhead of processing and memory 161 overhead of storing additional paths. It is anticipated that the 162 benefits listed above outweigh the cost in most scenarios. Be that 163 as it may, it is also expected that there will be configuration knobs 164 provided to limit the number of additional paths propagated within an 165 AS. 167 1.1. Requirements Language 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 [RFC2119]. 173 2. Basic Idea 175 This document proposes two main additions to the BGP procedures: 177 1. The decision process is modified to determine backup paths along 178 with the best path selection when multiple paths for a 179 destination are available. 181 2. In addition to using these backup paths for fast connectivity 182 restoration locally, BGP speakers also advertise these paths to 183 IBGP to increase the overall path diversity. 185 As alluded to in Section 1, BGP speakers that are the aggregation 186 points (router reflectors or confederation border routers) need to 187 announce backup paths to increase the path diversity at the 188 ingress routers of an IBGP network (see Figure 2). It may also be 189 useful, in certain cases, for the border routers to advertise 190 multiple paths received via EBGP for a destination when it is 191 redundantly connected and is transparently passing the NEXT_HOP 192 field unchanged instead of setting it to self (see Figure 4). To 193 this end, the draft defines a new attribute, attr_set, that the 194 border routers should advertise to ensure path selection 195 consistency. 197 The following sections elaborate on these points. 199 3. Design Considerations 201 3.1. Ensuring Loop-Free Path Selection in an AS 203 It is critical that BGP speakers within an AS have an eventual 204 consistent routing view of destinations and do not make conflicting 205 decisions regarding best path selection that would otherwise cause 206 forwarding loops. The current BGP protocol ensures this property by 207 defining a decision process that takes the attributes of paths as 208 input and determines a degree of preference of the paths by applying 209 a constant function. A consistent view of attributes is disseminated 210 through IBGP. Thus each BGP speaker within the AS determines the 211 same degree of preference of the paths after applying the constant 212 function independently. (The one exception is where IGP metric plays 213 the tie breaking role. In this case, different routers may choose 214 different next hops that are closer to them; but loop freedom is 215 guaranteed.). 217 When the above mechanism is extended to select backup paths for the 218 applications cited in this document, it is equally important to 219 maintain the same consistency property for the backup paths, i.e. 220 there should be no loops created when routers use the backup path in 221 forwarding. The rest of the document goes into the details of this 222 for various scenarios. 224 3.1.1. Border routers announcing single path 226 In scenarios where all border routers advertise a single external 227 path (their best path or best-external path) into IBGP, a consistent 228 routing view of best path and backup paths can be created across the 229 AS with the current BGP selection rules. 231 3.1.2. Border routers announcing multiple paths 233 There are scenarios where border routers need to advertise the best 234 and backup EBGP learnt paths with NEXT_HOP unchanged to IBGP. If the 235 border router sets next hop to self, the paths become 236 indistinguishable and hence advertisement of only the best path is 237 sufficient. An example scenario is depicted in Figure 4. 239 By using the add-path ([I-D.walton-bgp-add-paths]) extensions, the 240 border routers could advertise multiple such EBGP-learnt paths. But 241 doing so can potentially create an inconsistency between the paths 242 that the sending and receiving routers select for forwarding. In 243 other words, the routers in the IBGP mesh can make independent and 244 separate decisions on the route selection since some of the values 245 that play a role in the tie breaking steps of the decision process at 246 the sender are not available to the rest of the BGP speakers of the 247 AS. These are mainly (1) the interior cost, i.e. the metric to reach 248 the external next hop, (2) BGP identifier of the peer, (3) the peer 249 IP address. Due to this reduction in information, there can be 250 inconsistency in the routing view within an AS. 252 Additionally, [RFC5004] proposes an extension to avoid best path 253 transitions at the border router between external paths based on a 254 temporal order of receiving the paths. This can also create an 255 inconsistency across the BGP speakers in the path selection. 257 This document proposes two modifications to ensure consistency: 259 a. Border routers SHOULD not apply the modification to the selection 260 rules as proposed in [RFC5004] to avoid best path transitions for 261 parallel EBGP connection scenario where the border router wishes 262 to transitively transmit the NEXT_HOP value unchanged. 264 b. To overcome the "information reduction" problem described above, 265 the document specifies an attribute called "border router 266 attr_set attribute" that encodes the properties of each path 267 advertised that would otherwise not be included using the normal 268 attributes in a BGP UPDATE message (see Section 4). 270 3.1.3. Confederations 272 When an AS employs confederations and the confederation border 273 routers advertise multiple paths, there is no way to distinguish the 274 originator (the actual egress border router originating the prefix to 275 the AS). To ensure consistent path selection, the confederation 276 border routers should create the ORIGINATOR_ID attribute as described 277 in [RFC4456] that carries the BGP identifier of the originator of the 278 route to the local AS. 280 3.2. Keeping Path Attributes Independent of Decision Process 282 In addition to providing consistency in path selection, the solution 283 should satisfy the following important property: the attributes 284 associated with a particular path should be invariant when a 285 different path is advertised or withdrawn. Other things being equal, 286 it is best to avoid the potential churn introduced by the feedback 287 loops that would occur if path attributes were changed at the sender 288 as a result of running the decision process. Thus we do not use any 289 attributes with semantics like "this is my second best path", "this 290 is my third best path", etc. This requirement precludes use of 291 marking or other means of indicating path ordering from sender's 292 perspective since a change in the ordering requires re-advertising 293 most of the paths. 295 4. Border router attr_set attribute 297 Border router attr_set attribute is an optional non-transitive 298 attribute that is composed of a set of Type-Length-Value (TLVs) 299 encodings. The type code of the attribute is to be assigned by IANA. 300 Each TLV contains an attribute of the path from the border router 301 that is not otherwise sent as part of the UPDATE message. The TLV is 302 structured as follows: 304 0 1 305 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 307 | Attr Type | Length | 308 | (1 octet) | (1 octet) | 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | | 311 | Value | 312 | | 313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 Figure 1: Border router Attr_set attribute fomrat 317 a. Attr Type (1 octet): It identifies the type of the attribute that 318 is encoded by the border router. Unknown types are to be ignored 319 and skipped upon receipt. This document defines the following 320 types: 322 * Interior Cost: Attr Type = 1 323 * peer BGP Identifier: Attr Type = 2 325 * IPv4 Peer Address: Attr Type = 3 327 * IPv6 Peer Address: Attr Type = 4 329 b. Length (1 octet): the total number of octets of the Value field. 331 c. Value (variable): The value field encodes the attribute of the 332 corresponding type. For "Interior Cost" type, it encodes the 333 four octet metric value. For "BGP Identifier" type, it encodes 334 the four-octet router identifier of the neighbor for the path. 335 For "IPv4 Peering Address" type, the 4 byte BGP IPv4 peering 336 address is encoded. For "IPv6 Peering Address" type, the 16 byte 337 BGP IPv6 peering address is encoded. 339 A brief description of how a BGP speaker constructs the attribute is 340 provided in Section 6. 342 5. Calculation of Best and Backup Paths 343 /----------------------------------------\ 344 | +----+ IBGP | 345 | r1 | 346 | +----+ | 347 . 348 | . | 349 . 350 | +----+ | 351 | RR | 352 | . +----+ . | 353 . . 354 | . . | 355 +----+ +----+ 356 | | r3 | | r4 | | 357 +----+ +----+ 358 | | | | 359 \ | P1 | P2 / 360 ---------------------------------------- 361 | | 362 EBGP | | EBGP 363 ............... 364 / \ 365 Destination a 366 \ / 367 ............... 369 Figure 2: Basic RR topology 370 /----------------------------------------\ 371 | | 372 +----+ +----+ 373 | | r1 |...........| r2 | | 374 +----+ +----+ 375 | . . | 376 . AS 65502 . 377 | . . | 378 . +----+ . 379 | .....|CBR2|....... | 380 +----+ 381 | | | 382 | CONFED EBGP 383 | +----+ | 384 .....|CBR1|....... 385 | . +----+ . | 386 . . 387 | . AS 65501 . | 388 . . 389 | +----+ +----+ | 390 | r3 |...........| r4 | 391 | +----+ +----+ | 392 |P1 |P2 393 | | | | 394 ---------------------------------------- 395 | | 396 EBGP | | EBGP 397 ............... 398 / \ 399 Destination a 400 \ / 401 ............... 403 Figure 3: Confederation topology 404 /----------------------------------------\ 405 | +----+ IBGP | 406 | r1 | 407 | +----+ | 408 . 409 | . | 410 . 411 | +----+ | 412 | RR | 413 | . +----+ . | 414 . . 415 | . . | 416 +----+ +----+ 417 | | r3 | | r4 | | 418 +----+ +----+ 419 | | | | 420 \ P1| |P2 / 421 ---------------------------------------- 422 | | 423 EBGP| |EBGP 425 ............... 426 / \ 427 Destination a 428 \ / 429 ............... 431 Figure 4: Border router with parallel eBGP links 433 The decision process as described in [RFC4271] is followed to 434 determine the overall best path for a destination. In addition, the 435 following rule SHOULD be inserted into the tie breaking rules of the 436 BGP decision process after step f) (Sect. 9.1.2.2: [RFC4271]) and 437 after the CLUSTER_LIST length check step (Sect. 9: [RFC4456]): a BGP 438 speaker SHOULD apply the tie breaking steps (steps (e), (f), and (g) 439 as defined in [RFC4271]) with the values encoded in the "border 440 router attr_set" attribute. 442 Note that the above step effectively compares multiple paths that are 443 advertised by the same egress border router (since the BGP Identifier 444 comparison step earlier would have eliminated paths from different 445 egress border routers). 447 Consider the network in Figure 4. r3 learns two paths P1 and P2 for 448 destination a and wishes to advertise both to the iBGP mesh with 449 NEXT_HOP value unchanged. We need to ensure that both r3 and the 450 other ingress routers in the network (r1, r4) make a consistent route 451 selection for the best and the backup paths for destination a. The 452 current tie breaking rules [step f) comparison of router ID or 453 ORIGINATOR_ID and step g) comparision of peering ID] are insufficient 454 since at the ingress routers, both the paths will be received with 455 same values for each of the above parameters. Hence an additional 456 tie breaking rule comparing the original values that the border 457 router itself used to tie break the paths is required. 459 Once the best path is chosen, eliminate that path and all paths that 460 have the same BGP Identifier or NEXT_HOP as the choosen best path. 461 Note that as specified in [RFC4456], if the path carries the 462 ORIGINATOR_ID attribute, that should be treated as the BGP 463 Identifier. Then rerun the best path procedure to choose the backup 464 path. The Tie Breaking rules of the BGP decision process for second 465 best path selection are also modified as described above. 467 This mechanism can be recursively used to calculate multiple backup 468 paths if desired. 470 6. Advertising Multiple Paths 472 The technique outlined in [I-D.walton-bgp-add-paths] is used to 473 advertise best and backup paths selected with the rules described in 474 Section 5. For the purposes of the applications cited in this 475 document, the "Path Identifier" is always treated as an opaque value 476 with no semantics. 478 When an egress border router chooses to advertise multiple paths 479 learnt via EBGP to IBGP, it SHOULD include the attr_set attribute as 480 defined in Section 4 for each of the paths. The attribute is 481 constructed by encoding the following properties of the path in TLV 482 format: 484 o The interior cost to reach the NEXT_HOP of the path, encoded with 485 type 1. 487 o The BGP identifier of the EBGP peer from which it received the 488 path, encoded with type 2. 490 o The peer address of the EBGP peer from which it received the path, 491 encoded either with type 3 or 4. 493 7. Deployment Considerations 495 To ensure consistency in path selection process across all the 496 routers in an AS, the deployment considerations from the individual 497 scaling technology employed in the network should be inherited/ 498 applied. For example, as specified in [RFC4456], the intra-cluster 499 IGP metric values should be better than the inter-cluster IGP metric 500 values. Similar considerations as specified in [RFC5065] should be 501 designed. 503 8. Applications 505 8.1. Fast Connectivity Restoration 507 Consider the network in Figure 2. All 4 routers indicated are part 508 of a single AS. r3 and r4 are the border routers. Suppose r3 and r4 509 receive paths P1 and P2 for the same prefix. Also assume that P1 is 510 the preferred exit. 512 There are two scenarios to consider: 514 o case 1: P1 is the preferred exit for all routers within the AS 515 (including r4). In this case, if r4 follows [RFC4271], r4 516 withdraws P2 from the IBGP cloud. 518 o case 2: P2 is preferred exit by r4. In this case, if RR follows 519 [RFC4271], RR gets both paths, chooses one and sends it to r1. 521 In both the cases above, 'r1' holds only a single path and only after 522 a failure that makes P1 unavailable, it receives the alternate path 523 (P2). 525 However, if both paths were available to 'r1' and all other border 526 routers in the network, then they could precompute backup paths and 527 keep them ready to restore connectivity upon being notified of a 528 failure. The failure notification could be triggered due to a link 529 failure between 'r3' and its EBGP neighbor. This failure could be 530 propagated to other routers in r3's AS either via IGP or BGP, 531 resulting in invalidating on all these routers their primary paths 532 that were advertised by that neighbor to r3 (and that r3 subsequently 533 re-advertised into IBGP). Once these paths are invalidated, all 534 these routers could switch to the precomputed backup paths, without 535 waiting for any additional BGP advertisements. 537 8.2. Load Balancing 539 In the above network, not only can the additional path be used as a 540 standby best, but can also be used in steady state to load balance 541 traffic across the two exit points. 543 8.3. Churn Reduction 545 There are two aspects to reducing churn - Inter-domain and Intra- 546 domain. 548 8.3.1. Inter-domain Churn Reduction 550 Consider the network diagram in Figure 5. 552 +----+ 553 | r5 | 554 +----+ 555 | EBGP 556 ------------- 557 | 558 +----+ 559 | r1 | 560 +----+ 561 . . 562 . . 563 +----+ +----+ 564 | r3 | | r4 | 565 +----+ +----+ 566 | P1 | P2 568 Figure 5 570 'r5' is an EBGP peer of 'r1'. Today, if path P1 goes away, due to 571 the non-availability of other paths, 'r1' sends a withdraw to r5 thus 572 triggering a churn in the Internet. This could be significant if 573 there are multiple prefixes involved. On the other hand, if r1 had 574 an alternate path (with identical attributes), then this churn could 575 be entirely avoided by r1 performing a local repair. 577 8.3.2. Intra-Domain Churn Reduction 579 Since advertising multiple paths in general increases the path 580 diversity at the border routers, some of the control plane churn in 581 terms of a stream of advertisements, withdraws, and re-advertisements 582 can be reduced, thus improving the stability of the network. 584 AS 2 585 | 586 | 587 +----+ +----+ 588 | r1 | | r2 | 589 +----+ +----+ 590 . . 591 . . 592 . . 593 +----+ 594 | RR | 595 . +----+ . 596 . . 597 . . 598 +----+ +----+ 599 | r3 | | r4 | 600 +----+ +----+ 601 \ / 602 \ / eBGP dual-homing 603 AS 1(a) 605 Figure 6 607 Assuming router r3's path is the best path in the AS, RR advertises 608 the corresponding route information to the iBGP network. If r3 goes 609 down (or the peering link [r3, AS1] fails and r3 didn't change the 610 next hop to itself), the following will be sequence of updates from 611 router r1 to AS 2: 613 o Initial update for all prefixes when r1 chooses best path, 615 o Withdraws for all prefixes when r1 detects failure, 617 o Re-advertisement of all prefixes when the RR chooses router r4's 618 path as the new best path and advertises to r1. 620 With both the paths advertised and received on router r1, the 621 sequence of updates reduces to: 623 o Initial update for all prefixes when r1 chooses best path, 625 o Re-advertisement of all prefixes when r1 detects failure and 626 chooses router r4's path as the new best path 628 8.4. Graceful Maintenance 630 [I-D.decraene-bgp-graceful-shutdown-requirements] defines 631 requirements for graceful maintenance of routers in a service 632 provider network. Current BGP operations treat this as a sudden link 633 or node failure and try to reconverge that can take in the order of 634 seconds or minutes. 636 With the procedures defined in this document, since alternate paths 637 are available at the ingress routers, taking down egress routers from 638 the network does not result in a network-wide reconvergence event. 640 9. Acknowledgements 642 The authors would like to thank Enke Chen for the many discussions 643 resulting in this work. In addition, the authors would also like to 644 acknowledge valuable review and suggestions from Eric Rosen, Yakov 645 Rekhter, and John Scudder on this document. 647 10. IANA Considerations 649 This document defines a new BGP optional non-transitive attribute 650 type, called Border Router Attr_set attribute. The attribute type is 651 to be assigned by IANA. 653 This document introduces Attr TLVs within the above attribute. The 654 type space for these should be set up by IANA as a registry of 655 1-octet attr types. These should be assigned on a first-come-first- 656 serve basis. 658 This document defines the following attr types that should be 659 assigned in the registry: 661 Attr Type 662 --------------- ----- 663 Interior Cost 1 664 Peer BGP Identifier 2 665 IPv4 Peer Address 3 666 IPv6 Peer Address 4 668 11. Security Considerations 670 There are no additional security risks introduced by this design. 672 12. References 674 12.1. Normative References 676 [I-D.decraene-bgp-graceful-shutdown-requirements] 677 Decraene, B., Francois, P., pelsser, c., Ahmad, Z., and A. 678 Armengol, "Requirements for the graceful shutdown of BGP 679 sessions", 680 draft-decraene-bgp-graceful-shutdown-requirements-00 (work 681 in progress), February 2008. 683 [I-D.marques-idr-best-external] 684 Marques, P., Fernando, R., Chen, E., and P. Mohapatra, 685 "Advertisement of the best-external route to IBGP", 686 draft-marques-idr-best-external-00 (work in progress), 687 July 2008. 689 [I-D.walton-bgp-add-paths] 690 Walton, D., "Advertisement of Multiple Paths in BGP", 691 draft-walton-bgp-add-paths-06 (work in progress), 692 July 2008. 694 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 695 Requirement Levels", BCP 14, RFC 2119, March 1997. 697 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 698 Protocol 4 (BGP-4)", RFC 4271, January 2006. 700 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 701 Reflection: An Alternative to Full Mesh Internal BGP 702 (IBGP)", RFC 4456, April 2006. 704 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 705 System Confederations for BGP", RFC 5065, August 2007. 707 12.2. Informative References 709 [RFC5004] Chen, E. and S. Sangli, "Avoid BGP Best Path Transitions 710 from One External to Another", RFC 5004, September 2007. 712 Authors' Addresses 714 Pradosh Mohapatra 715 Cisco Systems 716 170 W. Tasman Drive 717 San Jose, CA 95134 718 USA 720 Email: pmohapat@cisco.com 722 Rex Fernando 723 Juniper Networks 724 1194 N. Mathilda Ave 725 Sunnyvale, CA 94089 726 USA 728 Email: rex@juniper.net 730 Clarence Filsfils 731 Cisco Systems 732 Brussels, 733 Belgium 735 Email: cfilsfil@cisco.com 737 Robert Raszuk 738 Juniper Networks 739 1194 N. Mathilda Ave 740 Sunnyvale, CA 94089 741 USA 743 Email: raszuk@juniper.net 745 Full Copyright Statement 747 Copyright (C) The IETF Trust (2008). 749 This document is subject to the rights, licenses and restrictions 750 contained in BCP 78, and except as set forth therein, the authors 751 retain all their rights. 753 This document and the information contained herein are provided on an 754 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 755 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 756 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 757 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 758 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 759 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 761 Intellectual Property 763 The IETF takes no position regarding the validity or scope of any 764 Intellectual Property Rights or other rights that might be claimed to 765 pertain to the implementation or use of the technology described in 766 this document or the extent to which any license under such rights 767 might or might not be available; nor does it represent that it has 768 made any independent effort to identify any such rights. Information 769 on the procedures with respect to rights in RFC documents can be 770 found in BCP 78 and BCP 79. 772 Copies of IPR disclosures made to the IETF Secretariat and any 773 assurances of licenses to be made available, or the result of an 774 attempt made to obtain a general license or permission for the use of 775 such proprietary rights by implementers or users of this 776 specification can be obtained from the IETF on-line IPR repository at 777 http://www.ietf.org/ipr. 779 The IETF invites any interested party to bring to its attention any 780 copyrights, patents or patent applications, or other proprietary 781 rights that may cover technology that may be required to implement 782 this standard. Please address the information to the IETF at 783 ietf-ipr@ietf.org.