idnits 2.17.1 draft-dickson-idr-second-best-backup-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 765. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 776. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 783. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 789. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([5]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 690 has weird spacing: '...|backup for 1...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 25, 2008) is 5904 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 3345 (ref. '1') ** Downref: Normative reference to an Informational RFC: RFC 4264 (ref. '2') Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 idr B. Dickson 3 Internet-Draft Afilias Canada, Inc 4 Expires: August 28, 2008 February 25, 2008 6 Enhanced BGP Capabilities for Exchanging Second-best and Back-up Paths 7 draft-dickson-idr-second-best-backup-01 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on August 28, 2008. 34 Copyright Notice 36 Copyright (C) The IETF Trust (2008). 38 Abstract 40 This Internet Draft describes an enhanced way to exchange prefix 41 information, to permit multiple copies of a prefix with different 42 paths to be announced and withdrawn. 44 This negotiated capability provides faster local (inter-AS) and 45 global (intra-AS) convergence, reduces path-hunting, improves route- 46 reflector behaviour, including eliminating both persistent 47 oscillations and BGP "wedgies". 49 Additional prefix instances have new optional BGP attributes, to 50 control path selection. 52 Withdrawl of prefixes will require new attributes to disambiguate 53 prefix instances. 55 Benefits are seen both when deployed intra-AS, and on inter-AS 56 peering. 58 Author's Note 60 This Internet Draft is intended to result in this draft or a related 61 draft(s) being placed on the Standards Track for idr. 63 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 64 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 65 document are to be interpreted as described in [5]. 67 Intended Status: Proposed Standard. 69 Table of Contents 71 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1. Localized Information . . . . . . . . . . . . . . . . . . 4 73 1.2. The Withdrawl Problem . . . . . . . . . . . . . . . . . . 5 74 1.3. The Uniqueness Problem . . . . . . . . . . . . . . . . . . 5 75 2. Proposed Changes . . . . . . . . . . . . . . . . . . . . . . . 6 76 2.1. New Negotiated Option: USE_SECOND_BEST_AND_BACKUP . . . . 6 77 2.2. New Optional Path Attribute: SECOND_BEST . . . . . . . . . 6 78 2.3. New Optional Path Attribute: BACKUP_ONLY . . . . . . . . . 6 79 2.4. New Optional Path Attribute: BACKUP_ONLY_SECOND_BEST . . . 6 80 2.5. New Update Format . . . . . . . . . . . . . . . . . . . . 7 81 2.6. New Withdraw Format . . . . . . . . . . . . . . . . . . . 7 82 3. Modifications to BGP Behavior . . . . . . . . . . . . . . . . 10 83 3.1. Changes to Path Selection Rules . . . . . . . . . . . . . 10 84 3.2. Second Best - Basic Method . . . . . . . . . . . . . . . . 11 85 3.3. Second Best - Route Reflector . . . . . . . . . . . . . . 11 86 3.4. Second Best - Inter-AS Hybrid Method . . . . . . . . . . . 11 87 3.5. Backup Only - Basic Method . . . . . . . . . . . . . . . . 11 88 3.6. Backup Only - Route Reflector . . . . . . . . . . . . . . 12 89 3.7. IBGP vs EBGP . . . . . . . . . . . . . . . . . . . . . . . 12 90 4. Implementation Guidelines . . . . . . . . . . . . . . . . . . 13 91 5. Security Considerations . . . . . . . . . . . . . . . . . . . 15 92 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 93 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 17 94 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 95 8.1. Normative References . . . . . . . . . . . . . . . . . . . 18 96 8.2. Informative References . . . . . . . . . . . . . . . . . . 18 97 Appendix A. Path-Hunting Examples . . . . . . . . . . . . . . . . 19 98 Appendix B. Persistent Oscillation Examples . . . . . . . . . . . 20 99 Appendix C. BGP Wedgie Examples . . . . . . . . . . . . . . . . . 22 100 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 Intellectual Property and Copyright Statements . . . . . . . . . . 26 103 1. Background 105 Even when all the best current practises are observed, operational 106 problems may be experienced when running a BGP network. 108 These include slow convergence due to "path-hunting", persistant 109 oscillations [1], and BGP "wedgies" [2]. 111 Standardization of MRAI timers helps this, as well as RFC 5004 [4]. 113 These RFCs identify the above issues as needing further work. 115 1.1. Localized Information 117 The problems listed above occur as a result of additional information 118 not being available (either on a transient basis, or permanently.) 120 In the case of "path hunting", the information needed for achieving a 121 stable final state is eventually received, but until it is, sub- 122 optimal forwarding will occur, and possibly even transient routing 123 loops. 125 The "problem" mechanisms involved are: 127 o the suppression of announcement of "second-best" paths, because of 128 IBGP-received "best" paths; 130 o the suppression by route-reflectors, of IBGP non-best paths (i.e. 131 those normally seen directly by IBGP peers) 133 o the suppression of announcement of "second-best" paths, because of 134 EBGP-received "best" paths. 136 o the lack of explicit global mechanism for expressing de-prefering 137 announcements via "back-up" providers. 139 When a prefix+path received is better than the local "best", the new 140 "best" is normally sent. 142 However, once a new "best" is received, the side-effect is to force 143 the speaker to WITHDRAW the previous best path within the same 144 "regime" (IBGP mesh or EBGP peers). 146 When we consider the extra (e.g. suppressed) information, with 147 special rules on what to send and how to treat it, the specified 148 problems may go away, or be reduced in scope, duration, or 149 likelihood. 151 1.2. The Withdrawl Problem 153 When a prefix (plus path) is withdrawn, the desired stable state is 154 for the next-best path for that prefix (if one exists) to be chosen 155 at each BGP speaker per its local policy. 157 If that second-best path is already on hand, the delay and 158 intermediate states can be reduced or entirely avoided. This is 159 especially true for both intra-AS and inter-AS "path hunting". 161 To avoid inconsistent behavior, routing loops, and routing- 162 information loops, the second-best path received from a neighbor, 163 should never be selected as a best path locally. 165 The second-best path from a neighbor MUST ONLY be considered as a 166 candidate for best path, when the previous best path from that 167 neighbor is withdrawn. When this occurs, the path in question is 168 promoted to "best" status. 170 1.3. The Uniqueness Problem 172 Currently, for each prefix, only one path for that prefix is ever 173 announced from one peer to another (except in the instance of Route 174 Reflectors). Because of this property, uniqueness, a withdrawl on a 175 prefix does not require path information. This also means that a 176 change of best path is accomplished via an update for a prefix with 177 the new path information. 179 If, however, more than one path for a given prefix was sent, then any 180 attempt to withdraw a prefix+path would require that the specific 181 path for the prefix being withdrawn be supplied in the withdrawl 182 update message. 184 In an environment where multiple paths per prefix are possible, but 185 only one path per prefix is maintained, then two steps would be 186 involved in changing the "best" path. In no particular order, that 187 would be the withdrawl of the old prefix+path, and the announcement 188 of the new prefix+path. 190 2. Proposed Changes 192 2.1. New Negotiated Option: USE_SECOND_BEST_AND_BACKUP 194 This is a new BGP Capabilities value, which can be optionally 195 included in the capabilities negotiation. The specific value is a 196 code-point to be assigned by IANA. 198 When negotiated: 200 o Update messages MUST be in the new format 202 o Updates without any of the new optional attributes are considered 203 BEST 205 o For each prefix, at most one of each type (BEST, SECOND_BEST, 206 BACKUP_ONLY, BACKUP_ONLY_SECOND_BEST) may be sent 208 2.2. New Optional Path Attribute: SECOND_BEST 210 This is a new BGP Path Attribute type. It MAY be used only if the 211 USE_SECOND_BEST_AND_BACKUP capability has been negotiated. The type 212 value is a new code point to be assigned by IANA. 214 This is an Optional, Non-Transitive, Non-Extended, Non-Partial 215 attribute. All the "attr flag bits" (from BGP [3]) are zero. The 216 length is 1, and the value is 1. 218 2.3. New Optional Path Attribute: BACKUP_ONLY 220 This is a new BGP Path Attribute type. The type value is a new code 221 point to be assigned by IANA. This is an Optional, Transitive, Non- 222 Extended, Non-Partial attribute, with the "attr flag bits" (from BGP 223 [3]) set to appropriate values. The length is 1, and the value is 1. 225 2.4. New Optional Path Attribute: BACKUP_ONLY_SECOND_BEST 227 This is a new BGP Path Attribute type. It MAY be used only if the 228 USE_SECOND_BEST_AND_BACKUP capability has been negotiated. The type 229 value is a new code point to be assigned by IANA. 231 This is an Optional, Non-Transitive, Non-Extended, Non-Partial 232 attribute. All the "attr flag bits" (from BGP [3]) are zero. The 233 length is 1, and the value is 1. 235 2.5. New Update Format 237 Update messages are identical to existing format, with the exception 238 of the new Withdrawl format, and the new optional Path Attributes 239 (SECOND_BEST ,BACKUP_ONLY, and.BACKUP_ONLY_SECOND_BEST). If BGP 240 capability USE_SECOND_BEST_AND_BACKUP has been negotiated, any Update 241 MAY have a Path Attribute(s) which include SECOND_BEST, BACKUP_ONLY, 242 and/or BACKUP_ONLY_SECOND_BEST. More than one instance of a given 243 prefix, with distinct values of Path Attributes, MAY be sent between 244 BGP speakers. 246 At most four instances may be sent, specifically one of each 247 combination of with/without SECOND_BEST and BACKUP_ONLY and 248 BACKUP_ONLY_SECOND_BEST: One with neither, one with SECOND_BEST only, 249 one with just BACKUP_ONLY, and one with BACKUP_ONLY_SECOND_BEST. both 250 SECOND_BEST and BACKUP_ONLY. 252 Two prefix paths are considered identical if they differ only in the 253 presence or absence of any of the new attributes. An Update which 254 contains a path which differs by either or both of these, will result 255 in the path information for the prefix being modified. 257 2.6. New Withdraw Format 259 Since it is no longer possible to identify which instance of an 260 prefix is affected by an update containing a withdrawl, a new format 261 for Withdrawls is needed. For simplicity of implementations, this 262 consists of four Withdrawl sections, one for each of the types (BEST, 263 SECOND_BEST, BACKUP_ONLY, BACKUP_ONLY_SECOND_BEST). They occur in 264 REVERSE order, to simplify state transitions if/when a "BEST" path is 265 withdrawn. Each Withdrawl section has the same format as the 266 original Withdrawl section. 268 +-----------------------------------------------------+ 269 | Withdrawn Routes Length (2 octets) | 270 +-----------------------------------------------------+ 271 | Withdrawn Routes (variable) | 272 +-----------------------------------------------------+ 273 | Total Path Attribute Length (2 octets) | 274 +-----------------------------------------------------+ 275 | Path Attributes (variable) | 276 +-----------------------------------------------------+ 277 | Network Layer Reachability Information (variable) | 278 +-----------------------------------------------------+ 280 Figure 1 282 Withdrawn Routes Length: This 2-octets unsigned integer indicates 283 the total length of the Withdrawn Routes field in octets. Its 284 value allows the length of the Network Layer Reachability 285 Information field to be determined, as specified below. 287 A value of 0 indicates that no routes are being withdrawn from 288 service, and that the WITHDRAWN ROUTES field is not present in 289 this UPDATE message. 291 Withdrawn Routes Field: This field now consists of four sub-fields 292 and their respective lengths. The value for Withdrawn Routes 293 Length above, must be the sum of the four lengths, plus 8 (the sum 294 of the lengths of the Subfield Lengths). 296 The format and sequence of the subfields is as follows: 298 +----------------------------------------------------------------+ 299 | Withdrawn BACKUP_ONLY_SECOND_BEST Routes Length (2 octets) | 300 +----------------------------------------------------------------+ 301 | Withdrawn BACKUP_ONLY_SECOND_BEST Routes (variable) | 302 +----------------------------------------------------------------+ 303 | Withdrawn BACKUP_ONLY Routes Length (2 octets) | 304 +----------------------------------------------------------------+ 305 | Withdrawn BACKUP_ONLY Routes (variable) | 306 +----------------------------------------------------------------+ 307 | Withdrawn SECOND_BEST Routes Length (2 octets) | 308 +----------------------------------------------------------------+ 309 | Withdrawn SECOND_BEST Routes (variable) | 310 +----------------------------------------------------------------+ 311 | Withdrawn BEST Routes Length (2 octets) | 312 +----------------------------------------------------------------+ 313 | Withdrawn BEST Routes (variable) | 314 +----------------------------------------------------------------+ 316 Figure 2 318 Withdrawn Routes Subfield Lengths These 2-octets unsigned integers 319 indicates the total length of their respective Withdrawn Routes 320 subfields in octets. 322 Withdrawn Routes Subfields: Each of these is a variable-length field 323 that contains a list of IP address prefixes for the routes that 324 are being withdrawn from service. Each IP address prefix is 325 encoded as a 2-tuple of the form , whose fields 326 are described below: 328 +---------------------------+ 329 | Length (1 octet) | 330 +---------------------------+ 331 | Prefix (variable) | 332 +---------------------------+ 334 3. Modifications to BGP Behavior 336 3.1. Changes to Path Selection Rules 338 The path selection rules for BGP (section 9.1.2.2 of BGP4 [3]) are 339 changed as follows: 341 o The following rule is placed before step (a): If paths with and 342 without BACKUP_ONLY (or BACKUP_ONLY_SECOND_BEST) are both 343 available, those with BACKUP_ONLY/BACKUP_ONLY_SECOND_BEST are 344 eliminated 346 o The following rule is a modification to step (c): Step (c) is 347 first performed INCLUDING paths with SECOND_BEST. If, at the end 348 of the first attempt at step (c), only paths with SECOND_BEST 349 remain, re-run step (c), this time EXCLUDING the paths with 350 SECOND_BEST. After this modified version of step (c), the 351 remaining paths MUST NOT have the SECOND_BEST attribute. In other 352 words, Step (c) MUST remove any SECOND_BEST paths. 354 o The remainder of the usual BGP path selection rules are applied as 355 normal 357 o If the final path selected has the BACKUP_ONLY/ 358 BACKUP_ONLY_SECOND_BEST attribute, the attribute BACKUP_ONLY MUST 359 be set. 361 The path selection rules for "Second Best" path are as follows: 363 o The already-selected "best" path is removed from the set of paths 364 to compare 366 o The same rules are applied as for the "best" path 368 o The selected path is advertised with the attribute SECOND_BEST 369 applied 371 o If the selected path had the BACKUP_ONLY attribute, the attribute 372 BACKUP_ONLY_SECOND_BEST must be set. 374 The prefix instances for consideration of second-best path are the 375 REMAINDER of non-SECOND_BEST instances, and the SECOND_BEST instance 376 received on the in-RIB from which the best path was selected (if one 377 exists). Only one SECOND_BEST instance received may be considered 378 for the local (and out-RIB) SECOND_BEST path. 380 3.2. Second Best - Basic Method 382 Once the capabality for doing so has been negotiated between a pair 383 of BGP speakers, each sends the best two paths for each prefix. The 384 path information will include the additional SECOND_BEST attribute on 385 the second best path. 387 When the current "best" path is withdrawn, the withdrawl MAY be 388 propogated without having to perform a full BGP table path selection. 389 The current "second best" path in the local-RIB is promoted to 390 "best". This is because the alternate candidates have already been 391 evaluated and "second-best" has already been selected. 393 Whenever an AS consists of a mesh of BGP speakers who have negotiated 394 this capability, the withdrawl will propogate through the entire AS. 395 This will either have no effect, or with a change in "best" without 396 requiring non-local information to choose the new "best" path. 398 3.3. Second Best - Route Reflector 400 The "best" and "second best" are reflected. The same mechanism is 401 used for determining both best and second-best per prefix. Updates 402 must be reflected whenever the choice of either or both of the "best" 403 or "second best" change. Withdrawls may be propogated immediately. 405 3.4. Second Best - Inter-AS Hybrid Method 407 When a withdrawl of the current best path is received from a peer 408 doing USE_SECOND_BEST_AND_BACKUP, and the rules for sending updates 409 require that an update for this prefix be sent to a peer who does not 410 support USE_SECOND BEST_AND_BACKUP, the current second-best instance 411 of the prefix is sent to that peer in an Update. The neighbor does 412 not need the withdrawal, since the new path replaces the old path. 414 When the selection of best path results in the selection of a path 415 with BACKUP_ONLY, the path is sent as the best path. This is the 416 only time where a BACKUP_ONLY path is sent as BEST, without 417 preserving the BACKUP_ONLY attribute. 419 3.5. Backup Only - Basic Method 421 The main reason for establishing the BACKUP_ONLY attribute is to 422 permit the global implementation of actual "backup only" 423 announcements. It is not to facilitate change of policies, or to 424 circumvent local policies, instead it is to make possible the 425 implementation of policies where those have been negotiated by two or 426 more parties. 428 Currently, there are several documented scenarios in the "Wedgies" 429 RFC [2] where the mutually desired policy is either unable to be 430 implemented, or does not deterministically reach the desired state. 432 Use of the BACKUP_ONLY attribute on announcements sent to a backup 433 provider, permit these problems to be resolved. 435 The same prefix is announced to both the primary and backup provider. 436 When announced to the primary provider, the BACKUP_ONLY attribute is 437 NOT set. When announced to the backup provider, the BACKUP_ONLY 438 attribute IS set. 440 The propogation of the BACKUP_ONLY instance will be limited by the 441 availability of multiple paths and the use of SECOND_BEST peerings. 443 In Figure 10 (of Appendix C), the BACKUP_ONLY instance will be seen 444 by the backup provider, and be passed with both SECOND_BEST and 445 BACKUP_ONLY to the backup provider's transit provider. The latter 446 will prefer any other instace without BACKUP_ONLY, even if it has 447 applied a LOCAL_PREFERENCE to the received prefix instance. Should 448 the other instance be withdrawn, the BACKUP_ONLY will be selected and 449 subsequently propogated. The withdrawl will also eventually result 450 in an Update with the BACKUP_ONLY attribute but WITHOUT the 451 SECOND_BEST attribute (since the prefix will now only be reachable 452 via the backup provider.) 454 3.6. Backup Only - Route Reflector 456 Route Reflectors operate the same as always. The BACKUP_ONLY 457 attribute MUST be preserved during reflection. Thus, if "Second 458 Best" is in operation, then the BACKUP_ONLY attribute of both best 459 and second-best MUST be preserved on both instances. And, if "Second 460 Best" is not in use, then the selected "best" prefix, if it has 461 BACKUP_ONLY set, must be reflected with BACKUP_ONLY as well. 463 3.7. IBGP vs EBGP 465 The same rules apply for EBGP->EBGP, EBGP->IBGP, IBGP->EBGP, and 466 IBGP->IBGP. If a particular peering has had 467 USE_SECOND_BEST_AND_BACKUP negotiated, then any update for a 468 particular prefix that results in new selection of either or both of 469 best and second-best, the new selections (and possible withdrawl of 470 old selections) is sent to the appropriate peers. Additionally, 471 updates which have BACKUP_ONLY MAY be sent. 473 4. Implementation Guidelines 475 In order to encourage effective implementation schemes, and to 476 demonstrate some of the benefits of deployment, here are some 477 suggestions for facilitating fast propogation of path changes, which 478 are anticipated as improving behavior. This applies in particular to 479 Path Hunting issues. 481 In-RIB-SBAB (many) -> RIB-SBAB -> out-RIB-SBAB 482 | \ 483 v `-> out-RIB (to non-SBAB peers) 484 RIB -> FIB 486 +----------+---------+----------+-----------------| 487 | PREFIX | IN-SBAB | OUT-SBAB | *PATH-info-ptr | 488 +----------+---------+----------+-----------------| 490 Figure 4 492 Where IN-SBAB and OUT-SBAB are 4-bit fields indicating what the 493 SECOND_BEST_AND_BACKUP (SBAB) are attributes (BEST, SECOND_BEST, 494 BACKUP_ONLY, SECOND_BEST_BACKUP_ONLY). IN-SBAB are the attributes 495 received from a peer, and for ONLY those prefixes selected for 496 inclusion into the RIB-SBAB, what the corresponding attributes are. 498 For example, if all external peers have NOT negotiated SBAB, those 499 prefixes would have SBAB binary values of 1000. Each In-RIB-SBAB 500 would have at most one instance. And for each prefix, at most one 501 In-RIB-SBAB would be selected as best, and have its corresponding 502 OUT-SBAB set to binary value 1000. 504 This forward-chaining allows for processing of SBAB updates to 505 determine whether withdrawals need to be flooded to peers, and if so, 506 what SBAB attribute to apply to the withdrawals that are flooded. 507 This flooding MAY be performed in parallel to normal BGP table update 508 processing. 510 For clarity, it should be pointed out that: 512 o The process for the step RIB-SBAB to RIB is "select prefixes 513 marked 'best'". 515 o The process for the step RIB-SBAB to out-RIB is also "select 516 prefixes marked 'best'". 518 o The process for the step RIB-SBAB to out-RIB-SBAB is the same as 519 ordinary RIB to out-RIB, except for preservation of SBAB 520 attributes (if any). 522 5. Security Considerations 524 No additional security considerations beyond those already present in 525 BGP are introduced. 527 6. IANA Considerations 529 IANA will need to assign new code points for BGP Capabilities for 530 USE_SECOND_BEST_AND_BACKUP. IANA will need to assign new code points 531 for BGP Attribute Types for SECOND_BEST, BACKUP_ONLY and 532 BACKUP_ONLY_SECOND_BEST. 534 7. Acknowledgements 536 The author wishes to acknowledge the helpful guidance of Joe Abley, 537 Tony Li, and Yakhov Rehkter. The author also wishes to acknowledge 538 the insight gained from his Scottish Deerhound, Skylar, winning a 539 Reserve Best-in-Show. (The selection method of "second best" comes 540 from the Reserve system used at the group and best-in-show levels of 541 dog shows). 543 8. References 545 8.1. Normative References 547 [1] McPherson, D., Gill, V., Walton, D., and A. Retana, "Border 548 Gateway Protocol (BGP) Persistent Route Oscillation Condition", 549 RFC 3345, August 2002. 551 [2] Griffin, T. and G. Huston, "BGP Wedgies", RFC 4264, 552 November 2005. 554 [3] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 555 (BGP-4)", RFC 4271, January 2006. 557 [4] Chen, E. and S. Sangli, "Avoid BGP Best Path Transitions from 558 One External to Another", RFC 5004, September 2007. 560 8.2. Informative References 562 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 563 Levels", BCP 14, RFC 2119, March 1997. 565 Appendix A. Path-Hunting Examples 567 (These will be included in a subsequent version of this ID.) 569 Appendix B. Persistent Oscillation Examples 571 Consider the example in Figure 5 where o R1, R2, R3, R4, and R5 572 belong to one AS. o R1 is a route reflector with R2 and R3 as its 573 clients. o R4 is a route reflector with R5 as its client. o The IGP 574 metrics are as listed. o External paths (a), (b), and (c) are as 575 described in Figure 6. 577 +----+ 1 +----+ 578 | R1 |-------------| R4 | 579 +----+ +----+ 580 | \ | 581 | \ | 582 3| \ 2 | 6 583 | \ | 584 | \ | 585 +----+ +----+ +----+ 586 | R2 | | R3 | | R5 | 587 +----+ +----+ +----+ 588 | | | 589 (a) (b) (c) 591 Figure 5 593 Path AS_PATH MED 594 a 1 3 10 595 b 2 3 1 596 c 2 3 0 598 Figure 6 600 With the addition of "second best", we have: 602 R1 has the following: 604 Path AS_PATH MED IGP-metric 605 a 1 3 10 3 (received:best) (best) 606 b 2 3 1 2 (received:best) 607 c 2 3 0 7 (received:best) (second_best - not sent) 609 R4 has the following: 611 Path AS_PATH MED IGP-metric 612 a 1 3 10 4 (received:best) (best - not sent) 613 c 2 3 0 6 (received: best) (second_best) 615 This results in R1 having: 617 Path AS_PATH MED IGP-metric 618 a 1 3 10 3 (received:best) (best) 619 b 2 3 1 2 (received:best) 620 c 2 3 0 7 (received:second_best) (second_best - not sent) 622 By including the second_best in the best path calculation, the 623 persistent oscillation problem is resolved. 625 Appendix C. BGP Wedgie Examples 627 The following examples from RFC 4264 [2] show the effects of the 628 proposed changes, in resolving "wedgie" issues. 630 +----+ +----+ 631 |AS 3|----------------|AS 4| 632 +----+ peer peer +----+ 633 |provider |provider 634 | | 635 |customer | 636 +----+ | 637 |AS 2| | 638 +----+ | 639 |provider | 640 | | 641 |customer |customer 642 +-------+ +----------+ 643 backup| |primary 644 +----+ 645 |AS 1| 646 +----+ 648 Figure 10 650 In Figure 10 above, the announcement via the backup link is sent with 651 BACKUP_ONLY. 653 o AS 4 sends the "best" (the direct link to AS 1) to AS 3. 655 o AS 2 sends its "best", which is the BACKUP_ONLY path from AS 1, to 656 AS 3, also with BACKUP_ONLY (since it is a transitive attribute). 658 o AS 3 and AS 4 exchange their respective "best" paths. 660 o AS 3 prefers the path "4 1" over "2 1" because "2 1" is 661 BACKUP_ONLY. 663 o AS 3 sends a revised BACKUP_ONLY update to AS 4 as SECOND_BEST. 665 o AS 3 sends the new "best" to AS 2. 667 o AS 2 sends a revised BACKUP_ONLY update to AS 3 as SECOND_BEST. 669 This state will be reached regardless of sequence of disconnects and 670 reconnects. 672 Link failures will also result in propogation of withdrawls of "best" 673 and the SECOND_BEST promotions will result in immediate correct 674 behavior. 676 +----+ +----+ 677 |AS 3|----------------|AS 4| 678 +----+ peer peer +----+ 679 |provider |provider 680 | | 681 |customer |customer 682 +----+ +----+ 683 |AS 2| |AS 5| 684 +----+ +----+ 685 |provider |provider 686 | | 687 |customer |customer 688 +-------+ +----------+ 689 backup| |primary for 192.9.200.0/25 690 primary| |backup for 192.9.200.128/25 691 +----+ 692 |AS 1| 693 +----+ 695 Figure 11 697 In Figure 11 above, the announcements via the backup links will work 698 the same as in Example 1. 700 +----+ +----+ 701 |AS 3|----------------|AS 4| 702 +----+ peer peer +----+ 703 ||provider |providerS 704 |+-----------+ | 705 |customer |customer | 706 +----+ +----+ | 707 |AS 2|-------|AS 5| | 708 +----+ peer +----+ | 709 |provider |provider | 710 | | | 711 |customer +-+customer |customer 712 +-------+ |+----------+ 713 backup| ||primary 714 +----+ 715 |AS 1| 716 +----+ 718 Figure 12 720 In Figure 12 above, the announcements via both backup links will 721 result in: 723 o AS 2 selecting its best path via "3 4 1" (the only path it hears 724 from AS 3) 726 o AS 2 hearing two paths from AS 5: 728 * its "second best" path "5 3 4 1" 730 * another path marked SECOND_BEST and BACKUP_ONLY with path "5 1" 732 o AS 2 hearing a BACKUP_ONLY directly from AS 1 734 Any announcement that AS 3 hears from AS 2 or AS 5 will always be 735 marked BACKUP_ONLY. Thus, any combination of break/restore on any 736 links in any order, will always result in the desired state being 737 reached. 739 Author's Address 741 Brian Dickson 742 Afilias Canada, Inc 743 4141 Yonge St, 744 Suite 204 745 North York, ON M2P 2A8 746 Canada 748 Email: brian.peter.dickson@gmail.com 749 URI: www.afilias.info 751 Full Copyright Statement 753 Copyright (C) The IETF Trust (2008). 755 This document is subject to the rights, licenses and restrictions 756 contained in BCP 78, and except as set forth therein, the authors 757 retain all their rights. 759 This document and the information contained herein are provided on an 760 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 761 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 762 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 763 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 764 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 765 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 767 Intellectual Property 769 The IETF takes no position regarding the validity or scope of any 770 Intellectual Property Rights or other rights that might be claimed to 771 pertain to the implementation or use of the technology described in 772 this document or the extent to which any license under such rights 773 might or might not be available; nor does it represent that it has 774 made any independent effort to identify any such rights. Information 775 on the procedures with respect to rights in RFC documents can be 776 found in BCP 78 and BCP 79. 778 Copies of IPR disclosures made to the IETF Secretariat and any 779 assurances of licenses to be made available, or the result of an 780 attempt made to obtain a general license or permission for the use of 781 such proprietary rights by implementers or users of this 782 specification can be obtained from the IETF on-line IPR repository at 783 http://www.ietf.org/ipr. 785 The IETF invites any interested party to bring to its attention any 786 copyrights, patents or patent applications, or other proprietary 787 rights that may cover technology that may be required to implement 788 this standard. Please address the information to the IETF at 789 ietf-ipr@ietf.org. 791 Acknowledgment 793 Funding for the RFC Editor function is provided by the IETF 794 Administrative Support Activity (IASA).