idnits 2.17.1 draft-pmohapat-idr-fast-conn-restore-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: a. Border routers SHOULD not apply the modification to the selection rules as proposed in [RFC5004] to avoid best path transitions for parallel EBGP connection scenario where the border router wishes to transitively transmit the NEXT_HOP value unchanged. -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 10, 2011) is 4789 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational draft: draft-ietf-grow-bgp-graceful-shutdown-requirements (ref. 'I-D.ietf-grow-bgp-graceful-shutdown-requirements') == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-04 == Outdated reference: A later version (-05) exists of draft-ietf-idr-best-external-03 -- Possible downref: Normative reference to a draft: ref. 'I-D.ietf-idr-best-external' Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Mohapatra 3 Internet-Draft R. Fernando 4 Intended status: Standards Track C. Filsfils 5 Expires: September 11, 2011 R. Raszuk 6 Cisco Systems 7 March 10, 2011 9 Fast Connectivity Restoration Using BGP Add-path 10 draft-pmohapat-idr-fast-conn-restore-01 12 Abstract 14 A BGP route defines an association of an address prefix with an "exit 15 point" from the current Autonomous System (AS). If the exit point 16 becomes unreachable due to a failure, the route becomes invalid. 17 This usually triggers an exchange of BGP control messages after which 18 a new BGP route for the given prefix is installed. However, 19 connectivity can be restored more quickly if the router maintains 20 precomputed BGP backup routes. It can then switch to a backup route 21 immediately upon learning that an exit point is unreachable, without 22 needing to wait for the BGP control messages exchange. This document 23 specifies the procedures to be used by BGP to maintain and distribute 24 the precomputed backup routes. Maintaining these additional routes 25 is also useful in promoting load balancing, performing maintenance 26 without causing traffic loss, and in reducing churn in the BGP 27 control plane. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on September 11, 2011. 46 Copyright Notice 48 Copyright (c) 2011 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 This document may contain material from IETF Documents or IETF 62 Contributions published or made publicly available before November 63 10, 2008. The person(s) controlling the copyright in some of this 64 material may not have granted the IETF Trust the right to allow 65 modifications of such material outside the IETF Standards Process. 66 Without obtaining an adequate license from the person(s) controlling 67 the copyright in such materials, this document may not be modified 68 outside the IETF Standards Process, and derivative works of it may 69 not be created outside the IETF Standards Process, except to format 70 it for publication as an RFC or to translate it into languages other 71 than English. 73 Table of Contents 75 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 76 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 77 2. Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . 5 78 3. Design Considerations . . . . . . . . . . . . . . . . . . . . 6 79 3.1. Ensuring Loop-Free Path Selection in an AS . . . . . . . . 6 80 3.1.1. Border routers announcing single path . . . . . . . . 6 81 3.1.2. Border routers announcing multiple paths . . . . . . . 7 82 3.1.3. Confederations . . . . . . . . . . . . . . . . . . . . 7 83 3.2. Keeping Path Attributes Independent of Decision Process . 8 84 4. Edge_Discriminator attribute . . . . . . . . . . . . . . . . . 8 85 5. Calculation of Best and Backup Paths . . . . . . . . . . . . . 9 86 6. Advertising Multiple Paths . . . . . . . . . . . . . . . . . . 13 87 7. Deployment Considerations . . . . . . . . . . . . . . . . . . 13 88 8. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 14 89 8.1. Fast Connectivity Restoration . . . . . . . . . . . . . . 14 90 8.2. Load Balancing . . . . . . . . . . . . . . . . . . . . . . 14 91 8.3. Churn Reduction . . . . . . . . . . . . . . . . . . . . . 15 92 8.3.1. Inter-domain Churn Reduction . . . . . . . . . . . . . 15 93 8.3.2. Intra-Domain Churn Reduction . . . . . . . . . . . . . 15 94 8.4. Graceful Maintenance . . . . . . . . . . . . . . . . . . . 17 95 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 96 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 97 11. Security Considerations . . . . . . . . . . . . . . . . . . . 17 98 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 99 12.1. Normative References . . . . . . . . . . . . . . . . . . . 18 100 12.2. Informative References . . . . . . . . . . . . . . . . . . 18 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 103 1. Introduction 105 Within an autonomous system, the availability of multiple routes to a 106 given destination, where each of the routes has a different "exit 107 point" from the local AS provides the following benefits: 109 o Fault tolerance: Knowledge of multiple "exit points" leads to 110 reduction in restoration time after failure. For instance, a 111 border router on receiving multiple paths to the same destination 112 could decide to precompute a backup path and have it ready so that 113 when the primary path becomes invalid, it could use the backup to 114 quickly restore connectivity. Currently the restoration time is 115 dependent on BGP protocol re-convergence that includes a set of 116 withdraw and advertisement messages in the network before a new 117 best path can be learnt. 119 o Load balancing: The availability of multiple paths to reach the 120 same destination enables load balancing of traffic provided the 121 paths for the given destination satisfy certain constraints. 123 o Churn reduction: The advertisement of multiple routes, in certain 124 scenarios (Section 8.3.2), could lead to less churn in the network 125 upon a failure, since the presence of multiple paths helps contain 126 the failure to the local AS where the failure occurs. 128 o Graceful maintenance: The availability of alternate exit points 129 allows one to bring down a router for maintenance without causing 130 significant traffic loss. 132 Unfortunately, the border routers in an AS do not receive multiple 133 paths for all prefixes. The reason is three-fold: 135 o The current BGP specification [RFC4271] specifies routers to 136 advertise only the best path for a destination to speakers. The 137 availability of multiple paths requires simultaneous distribution 138 of multiple routes for a given prefix by a BGP speaker. We refer 139 to this property of the network as "path diversity". 141 o When a router selects an IBGP learnt path as best, it does not 142 announce any path for that prefix to IBGP though it may have EBGP 143 learnt paths available. This loss of information leads to added 144 churn and increases convergence time if the preferred path goes 145 away. A mechanism to advertise the best-external path to IBGP is 146 proposed in [I-D.ietf-idr-best-external]. 148 o Most service providers deploy one of the scaling techniques like 149 route reflectors [RFC4456] or confederations [RFC5065] inside the 150 AS and avoid iBGP full mesh. Thus even when multiple paths exist, 151 the aggregation points (route reflectors or confederation border 152 routers) advertise only the best path (as per the BGP base 153 protocol). 155 As an effect of this behavior, the ingress border routers to an AS do 156 not receive additional paths necessary to provide the benefits cited 157 above: e.g. perform a local recovery during network failures or 158 achieve load balancing in steady state across multiple exit points. 160 The mechanism to extend BGP to allow a given BGP speaker to advertise 161 multiple paths simultaneously for a destination is defined in 162 [I-D.ietf-idr-add-paths]. The current draft describes the use of 163 this generic technique and certain additional procedures and 164 implementation guidelines to enable the above applications. 166 More specifically, this document describes extensions to BGP decision 167 process to select backup paths in a manner that ensures the important 168 property of consistent route selection within an AS. It also 169 introduces a new BGP attribute, Edge_Discriminator, that border 170 routers should use to advertise multiple EBGP learnt paths for a 171 given destination. To aid with better description of the 172 applications, the draft illustrates certain use case scenarios for 173 each. 175 One implication of multiple path advertisement is the associated 176 cost, namely the performance overhead of processing and memory 177 overhead of storing additional paths. It is anticipated that the 178 benefits listed above outweigh the cost in most scenarios. Be that 179 as it may, it is also expected that there will be configuration knobs 180 provided to limit the number of additional paths propagated within an 181 AS. 183 1.1. Requirements Language 185 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 186 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 187 document are to be interpreted as described in RFC 2119 [RFC2119]. 189 2. Basic Idea 191 This document proposes two main additions to the BGP procedures: 193 1. The decision process is modified to determine backup paths along 194 with the best path selection when multiple paths for a 195 destination are available. 197 2. In addition to using these backup paths for fast connectivity 198 restoration locally, BGP speakers also advertise these paths to 199 IBGP to increase the overall path diversity. 201 As alluded to in Section 1, BGP speakers that are the aggregation 202 points (router reflectors or confederation border routers) need to 203 announce backup paths to increase the path diversity at the 204 ingress routers of an IBGP network (see Figure 2). It may also be 205 useful, in certain cases, for the border routers to advertise 206 multiple paths received via EBGP for a destination when it is 207 redundantly connected and is transparently passing the NEXT_HOP 208 field unchanged instead of setting it to self (see Figure 4). To 209 this end, the draft defines a new attribute, Edge_Discriminator, 210 that the border routers should advertise to ensure path selection 211 consistency. 213 The following sections elaborate on these points. 215 3. Design Considerations 217 3.1. Ensuring Loop-Free Path Selection in an AS 219 It is critical that BGP speakers within an AS have an eventual 220 consistent routing view of destinations and do not make conflicting 221 decisions regarding best path selection that would otherwise cause 222 forwarding loops. The current BGP protocol ensures this property by 223 defining a decision process that takes the attributes of paths as 224 input and determines a degree of preference of the paths by applying 225 a constant function. A consistent view of attributes is disseminated 226 through IBGP. Thus each BGP speaker within the AS determines the 227 same degree of preference of the paths after applying the constant 228 function independently. (The one exception is where IGP metric plays 229 the tie breaking role. In this case, different routers may choose 230 different next hops that are closer to them; but loop freedom is 231 guaranteed.). 233 When the above mechanism is extended to select backup paths for the 234 applications cited in this document, it is equally important to 235 maintain the same consistency property for the backup paths, i.e. 236 there should be no loops created when routers use the backup path in 237 forwarding. The rest of the document goes into the details of this 238 for various scenarios. 240 3.1.1. Border routers announcing single path 242 In scenarios where all border routers advertise a single external 243 path (their best path or best-external path) into IBGP, a consistent 244 routing view of best path and backup paths can be created across the 245 AS with the current BGP selection rules. 247 3.1.2. Border routers announcing multiple paths 249 There are scenarios where border routers need to advertise the best 250 and backup EBGP learnt paths with NEXT_HOP unchanged to IBGP. If the 251 border router sets next hop to self, the paths become 252 indistinguishable and hence advertisement of only the best path is 253 sufficient. An example scenario is depicted in Figure 4. 255 By using the add-path ([I-D.ietf-idr-add-paths]) extensions, the 256 border routers could advertise multiple such EBGP-learnt paths. But 257 doing so can potentially create an inconsistency between the paths 258 that the sending and receiving routers select for forwarding. In 259 other words, the routers in the IBGP mesh can make independent and 260 separate decisions on the route selection since some of the values 261 that play a role in the tie breaking steps of the decision process at 262 the sender are not available to the rest of the BGP speakers of the 263 AS. These are mainly (1) the interior cost, i.e. the metric to reach 264 the external next hop, (2) BGP identifier of the peer, (3) the peer 265 IP address. Due to this reduction in information, there can be 266 inconsistency in the routing view within an AS. 268 Additionally, [RFC5004] proposes an extension to avoid best path 269 transitions at the border router between external paths based on a 270 temporal order of receiving the paths. This can also create an 271 inconsistency across the BGP speakers in the path selection. 273 This document proposes two modifications to ensure consistency: 275 a. Border routers SHOULD not apply the modification to the selection 276 rules as proposed in [RFC5004] to avoid best path transitions for 277 parallel EBGP connection scenario where the border router wishes 278 to transitively transmit the NEXT_HOP value unchanged. 280 b. To overcome the "information reduction" problem described above, 281 the document specifies an attribute called "Edge_Discriminator 282 attribute" that encodes the properties of each path advertised 283 that would otherwise not be included using the normal attributes 284 in a BGP UPDATE message (see Section 4). 286 3.1.3. Confederations 288 When an AS employs confederations and the confederation border 289 routers advertise multiple paths, there is no way to distinguish the 290 originator (the actual egress border router originating the prefix to 291 the AS). To ensure consistent path selection, the confederation 292 border routers should create the ORIGINATOR_ID attribute as described 293 in [RFC4456] that carries the BGP identifier of the originator of the 294 route to the local AS. 296 3.2. Keeping Path Attributes Independent of Decision Process 298 In addition to providing consistency in path selection, the solution 299 should satisfy the following important property: the attributes 300 associated with a particular path should be invariant when a 301 different path is advertised or withdrawn. Other things being equal, 302 it is best to avoid the potential churn introduced by the feedback 303 loops that would occur if path attributes were changed at the sender 304 as a result of running the decision process. Thus we do not use any 305 attributes with semantics like "this is my second best path", "this 306 is my third best path", etc. This requirement precludes use of 307 marking or other means of indicating path ordering from sender's 308 perspective since a change in the ordering requires re-advertising 309 most of the paths. 311 4. Edge_Discriminator attribute 313 Edge_Discriminator attribute is an optional non-transitive attribute 314 that is composed of a set of Type-Length-Value (TLVs) encodings. The 315 type code of the attribute is to be assigned by IANA. Each TLV 316 contains an attribute of the path from the border router that is not 317 otherwise sent as part of the UPDATE message. The TLV is structured 318 as follows: 320 0 1 321 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 323 | Attr Type | Length | 324 | (1 octet) | (1 octet) | 325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 326 | | 327 | Value | 328 | | 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 331 Figure 1: Edge_Discriminator attribute format 333 a. Attr Type (1 octet): It identifies the type of the attribute that 334 is encoded by the border router. Unknown types are to be ignored 335 and skipped upon receipt. This document defines the following 336 types: 338 * Interior Cost: Attr Type = 1 340 * peer BGP Identifier: Attr Type = 2 342 * IPv4 Peer Address: Attr Type = 3 344 * IPv6 Peer Address: Attr Type = 4 346 b. Length (1 octet): the total number of octets of the Value field. 348 c. Value (variable): The value field encodes the attribute of the 349 corresponding type. For "Interior Cost" type, it encodes the 350 four octet metric value. For "BGP Identifier" type, it encodes 351 the four-octet router identifier of the neighbor for the path. 352 For "IPv4 Peering Address" type, the 4 byte BGP IPv4 peering 353 address is encoded. For "IPv6 Peering Address" type, the 16 byte 354 BGP IPv6 peering address is encoded. 356 A brief description of how a BGP speaker constructs the attribute is 357 provided in Section 6. 359 5. Calculation of Best and Backup Paths 360 /----------------------------------------\ 361 | +----+ IBGP | 362 | r1 | 363 | +----+ | 364 . 365 | . | 366 . 367 | +----+ | 368 | RR | 369 | . +----+ . | 370 . . 371 | . . | 372 +----+ +----+ 373 | | r3 | | r4 | | 374 +----+ +----+ 375 | | | | 376 \ | P1 | P2 / 377 ---------------------------------------- 378 | | 379 EBGP | | EBGP 380 ............... 381 / \ 382 Destination a 383 \ / 384 ............... 386 Figure 2: Basic RR topology 387 /----------------------------------------\ 388 | | 389 +----+ +----+ 390 | | r1 |...........| r2 | | 391 +----+ +----+ 392 | . . | 393 . AS 65502 . 394 | . . | 395 . +----+ . 396 | .....|CBR2|....... | 397 +----+ 398 | | | 399 | CONFED EBGP 400 | +----+ | 401 .....|CBR1|....... 402 | . +----+ . | 403 . . 404 | . AS 65501 . | 405 . . 406 | +----+ +----+ | 407 | r3 |...........| r4 | 408 | +----+ +----+ | 409 |P1 |P2 410 | | | | 411 ---------------------------------------- 412 | | 413 EBGP | | EBGP 414 ............... 415 / \ 416 Destination a 417 \ / 418 ............... 420 Figure 3: Confederation topology 421 /----------------------------------------\ 422 | +----+ IBGP | 423 | r1 | 424 | +----+ | 425 . 426 | . | 427 . 428 | +----+ | 429 | RR | 430 | . +----+ . | 431 . . 432 | . . | 433 +----+ +----+ 434 | | r3 | | r4 | | 435 +----+ +----+ 436 | | | | 437 \ P1| |P2 / 438 ---------------------------------------- 439 | | 440 EBGP| |EBGP 442 ............... 443 / \ 444 Destination a 445 \ / 446 ............... 448 Figure 4: Border router with parallel eBGP links 450 The decision process as described in [RFC4271] is followed to 451 determine the overall best path for a destination. In addition, the 452 following rule SHOULD be inserted into the tie breaking rules of the 453 BGP decision process after step f) (Sect. 9.1.2.2: [RFC4271]) and 454 after the CLUSTER_LIST length check step (Sect. 9: [RFC4456]): a BGP 455 speaker SHOULD apply the tie breaking steps (steps (e), (f), and (g) 456 as defined in [RFC4271]) with the values encoded in the 457 Edge_Discriminator attribute. 459 Note that the above step effectively compares multiple paths that are 460 advertised by the same egress border router (since the BGP Identifier 461 comparison step earlier would have eliminated paths from different 462 egress border routers). 464 Consider the network in Figure 4. r3 learns two paths P1 and P2 for 465 destination a and wishes to advertise both to the iBGP mesh with 466 NEXT_HOP value unchanged. We need to ensure that both r3 and the 467 other ingress routers in the network (r1, r4) make a consistent route 468 selection for the best and the backup paths for destination a. The 469 current tie breaking rules [step f) comparison of router ID or 470 ORIGINATOR_ID and step g) comparision of peering ID] are insufficient 471 since at the ingress routers, both the paths will be received with 472 same values for each of the above parameters. Hence an additional 473 tie breaking rule comparing the original values that the border 474 router itself used to tie break the paths is required. 476 Once the best path is chosen, eliminate that path and all paths that 477 have the same BGP Identifier or NEXT_HOP as the choosen best path. 478 Note that as specified in [RFC4456], if the path carries the 479 ORIGINATOR_ID attribute, that should be treated as the BGP 480 Identifier. Then rerun the best path procedure to choose the backup 481 path. The Tie Breaking rules of the BGP decision process for second 482 best path selection are also modified as described above. 484 This mechanism can be recursively used to calculate multiple backup 485 paths if desired. 487 6. Advertising Multiple Paths 489 The technique outlined in [I-D.ietf-idr-add-paths] is used to 490 advertise best and backup paths selected with the rules described in 491 Section 5. For the purposes of the applications cited in this 492 document, the "Path Identifier" is always treated as an opaque value 493 with no semantics. 495 When an egress border router chooses to advertise multiple paths 496 learnt via EBGP to IBGP, it SHOULD include the Edge_Discriminator 497 attribute as defined in Section 4 for each of the paths. The 498 attribute is constructed by encoding the following properties of the 499 path in TLV format: 501 o The interior cost to reach the NEXT_HOP of the path, encoded with 502 type 1. 504 o The BGP identifier of the EBGP peer from which it received the 505 path, encoded with type 2. 507 o The peer address of the EBGP peer from which it received the path, 508 encoded either with type 3 or 4. 510 7. Deployment Considerations 512 To ensure consistency in path selection process across all the 513 routers in an AS, the deployment considerations from the individual 514 scaling technology employed in the network should be inherited/ 515 applied. For example, as specified in [RFC4456], the intra-cluster 516 IGP metric values should be better than the inter-cluster IGP metric 517 values. Similar considerations as specified in [RFC5065] should be 518 designed. 520 8. Applications 522 8.1. Fast Connectivity Restoration 524 Consider the network in Figure 2. All 4 routers indicated are part 525 of a single AS. r3 and r4 are the border routers. Suppose r3 and r4 526 receive paths P1 and P2 for the same prefix. Also assume that P1 is 527 the preferred exit. 529 There are two scenarios to consider: 531 o case 1: P1 is the preferred exit for all routers within the AS 532 (including r4). In this case, if r4 follows [RFC4271], r4 533 withdraws P2 from the IBGP cloud. 535 o case 2: P2 is preferred exit by r4. In this case, if RR follows 536 [RFC4271], RR gets both paths, chooses one and sends it to r1. 538 In both the cases above, 'r1' holds only a single path and only after 539 a failure that makes P1 unavailable, it receives the alternate path 540 (P2). 542 However, if both paths were available to 'r1' and all other border 543 routers in the network, then they could precompute backup paths and 544 keep them ready to restore connectivity upon being notified of a 545 failure. The failure notification could be triggered due to a link 546 failure between 'r3' and its EBGP neighbor. This failure could be 547 propagated to other routers in r3's AS either via IGP or BGP, 548 resulting in invalidating on all these routers their primary paths 549 that were advertised by that neighbor to r3 (and that r3 subsequently 550 re-advertised into IBGP). Once these paths are invalidated, all 551 these routers could switch to the precomputed backup paths, without 552 waiting for any additional BGP advertisements. 554 8.2. Load Balancing 556 In the above network, not only can the additional path be used as a 557 standby best, but can also be used in steady state to load balance 558 traffic across the two exit points. 560 8.3. Churn Reduction 562 There are two aspects to reducing churn - Inter-domain and Intra- 563 domain. 565 8.3.1. Inter-domain Churn Reduction 567 Consider the network diagram in Figure 5. 569 +----+ 570 | r5 | 571 +----+ 572 | EBGP 573 ------------- 574 | 575 +----+ 576 | r1 | 577 +----+ 578 . . 579 . . 580 +----+ +----+ 581 | r3 | | r4 | 582 +----+ +----+ 583 | P1 | P2 585 Figure 5 587 'r5' is an EBGP peer of 'r1'. Today, if path P1 goes away, due to 588 the non-availability of other paths, 'r1' sends a withdraw to r5 thus 589 triggering a churn in the Internet. This could be significant if 590 there are multiple prefixes involved. On the other hand, if r1 had 591 an alternate path (with identical attributes), then this churn could 592 be entirely avoided by r1 performing a local repair. 594 8.3.2. Intra-Domain Churn Reduction 596 Since advertising multiple paths in general increases the path 597 diversity at the border routers, some of the control plane churn in 598 terms of a stream of advertisements, withdraws, and re-advertisements 599 can be reduced, thus improving the stability of the network. 601 AS 2 602 | 603 | 604 +----+ +----+ 605 | r1 | | r2 | 606 +----+ +----+ 607 . . 608 . . 609 . . 610 +----+ 611 | RR | 612 . +----+ . 613 . . 614 . . 615 +----+ +----+ 616 | r3 | | r4 | 617 +----+ +----+ 618 \ / 619 \ / eBGP dual-homing 620 AS 1(a) 622 Figure 6 624 Assuming router r3's path is the best path in the AS, RR advertises 625 the corresponding route information to the iBGP network. If r3 goes 626 down (or the peering link [r3, AS1] fails and r3 didn't change the 627 next hop to itself), the following will be sequence of updates from 628 router r1 to AS 2: 630 o Initial update for all prefixes when r1 chooses best path, 632 o Withdraws for all prefixes when r1 detects failure, 634 o Re-advertisement of all prefixes when the RR chooses router r4's 635 path as the new best path and advertises to r1. 637 With both the paths advertised and received on router r1, the 638 sequence of updates reduces to: 640 o Initial update for all prefixes when r1 chooses best path, 642 o Re-advertisement of all prefixes when r1 detects failure and 643 chooses router r4's path as the new best path 645 8.4. Graceful Maintenance 647 [I-D.ietf-grow-bgp-graceful-shutdown-requirements] defines 648 requirements for graceful maintenance of routers in a service 649 provider network. Current BGP operations treat this as a sudden link 650 or node failure and try to reconverge that can take in the order of 651 seconds or minutes. 653 With the procedures defined in this document, since alternate paths 654 are available at the ingress routers, taking down egress routers from 655 the network does not result in a network-wide reconvergence event. 657 9. Acknowledgements 659 The authors would like to thank Enke Chen for the many discussions 660 resulting in this work. In addition, the authors would also like to 661 acknowledge valuable review and suggestions from Eric Rosen, Yakov 662 Rekhter, and John Scudder on this document. 664 10. IANA Considerations 666 This document defines a new BGP optional non-transitive attribute 667 type, called Edge_Discriminator attribute. The attribute type is to 668 be assigned by IANA. 670 This document introduces Attr TLVs within the above attribute. The 671 type space for these should be set up by IANA as a registry of 672 1-octet attr types. These should be assigned on a first-come-first- 673 serve basis. 675 This document defines the following attr types that should be 676 assigned in the registry: 678 Attr Type 679 --------------- ----- 680 Interior Cost 1 681 Peer BGP Identifier 2 682 IPv4 Peer Address 3 683 IPv6 Peer Address 4 685 11. Security Considerations 687 There are no additional security risks introduced by this design. 689 12. References 691 12.1. Normative References 693 [I-D.ietf-grow-bgp-graceful-shutdown-requirements] 694 Takeda, T., Decraene, B., Francois, P., pelsser, c., 695 Ahmad, Z., and A. Armengol, "Requirements for the graceful 696 shutdown of BGP sessions", 697 draft-ietf-grow-bgp-graceful-shutdown-requirements-07 698 (work in progress), January 2011. 700 [I-D.ietf-idr-add-paths] 701 Walton, D., Retana, A., Chen, E., and J. Scudder, 702 "Advertisement of Multiple Paths in BGP", 703 draft-ietf-idr-add-paths-04 (work in progress), 704 August 2010. 706 [I-D.ietf-idr-best-external] 707 Marques, P., Fernando, R., Chen, E., and P. Mohapatra, 708 "Advertisement of the best external route in BGP", 709 draft-ietf-idr-best-external-03 (work in progress), 710 March 2011. 712 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 713 Requirement Levels", BCP 14, RFC 2119, March 1997. 715 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 716 Protocol 4 (BGP-4)", RFC 4271, January 2006. 718 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 719 Reflection: An Alternative to Full Mesh Internal BGP 720 (IBGP)", RFC 4456, April 2006. 722 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 723 System Confederations for BGP", RFC 5065, August 2007. 725 12.2. Informative References 727 [RFC5004] Chen, E. and S. Sangli, "Avoid BGP Best Path Transitions 728 from One External to Another", RFC 5004, September 2007. 730 Authors' Addresses 732 Pradosh Mohapatra 733 Cisco Systems 734 170 W. Tasman Drive 735 San Jose, CA 95134 736 USA 738 Email: pmohapat@cisco.com 740 Rex Fernando 741 Cisco Systems 742 170 W. Tasman Drive 743 San Jose, CA 95134 744 USA 746 Email: rex@cisco.com 748 Clarence Filsfils 749 Cisco Systems 750 Brussels, 751 Belgium 753 Email: cfilsfil@cisco.com 755 Robert Raszuk 756 Cisco Systems 757 170 W. Tasman Drive 758 San Jose, CA 95134 759 USA 761 Email: raszuk@cisco.com