idnits 2.17.1 draft-dickson-add-paths-ordered-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 371. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 382. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 389. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 395. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([4]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 13, 2008) is 5759 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC4271' on line 202 -- Looks like a reference, but probably isn't: 'RFC2858' on line 202 -- Looks like a reference, but probably isn't: 'RFC3107' on line 217 -- Looks like a reference, but probably isn't: 'RFC2842' on line 259 -- Looks like a reference, but probably isn't: 'IANA-AFI' on line 279 -- Looks like a reference, but probably isn't: 'IANA-SAFI' on line 283 == Unused Reference: '2' is defined on line 334, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 3345 (ref. '1') Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 idr B. Dickson 3 Internet-Draft Afilias Canada, Inc 4 Expires: January 14, 2009 July 13, 2008 6 Enhanced BGP Capabilities for Exchanging Second-Best Paths 7 draft-dickson-add-paths-ordered-01 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on January 14, 2009. 34 Copyright Notice 36 Copyright (C) The IETF Trust (2008). 38 Abstract 40 This Internet Draft describes an enhanced format for encoding prefix 41 information, to permit multiple copies of a prefix with different 42 paths to be announced and withdrawn. 44 Prefix instances using the new format include both unique 45 identifiers, and ordinals to control path selection. 47 Withdrawal of prefixes requires a slight modification to disambiguate 48 prefix instances. 50 Author's Note 52 This Internet Draft is intended to result in this draft or a related 53 draft(s) being placed on the Standards Track for idr. 55 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 56 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 57 document are to be interpreted as described in [4]. 59 Intended Status: Proposed Standard. 61 Table of Contents 63 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 1.1. The Best Path Chaining and the Best Path Tree . . . . . . 3 65 1.2. The Withdrawal Problem . . . . . . . . . . . . . . . . . . 3 66 1.3. The Uniqueness Property . . . . . . . . . . . . . . . . . 4 67 2. Proposed Changes . . . . . . . . . . . . . . . . . . . . . . . 4 68 2.1. How to Identify a Path . . . . . . . . . . . . . . . . . . 5 69 2.2. Extended NLRI Encodings . . . . . . . . . . . . . . . . . 5 70 2.3. ADD_PATH_ORDERED Capability . . . . . . . . . . . . . . . 6 71 3. Security Considerations . . . . . . . . . . . . . . . . . . . 7 72 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 73 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 8 74 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 75 6.1. Normative References . . . . . . . . . . . . . . . . . . . 8 76 6.2. Informative References . . . . . . . . . . . . . . . . . . 8 77 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 78 Intellectual Property and Copyright Statements . . . . . . . . . . 10 80 1. Background 82 Even when all the best current practises are observed, operational 83 problems may be experienced when running a BGP network. 85 These include slow convergence due to "path-hunting" and persistant 86 oscillations [1]. 88 Standardization of MRAI timers helps path-hunting, and oscillations 89 can be worked around with RFC 5004 [3]. 91 However, both of these RFCs identify the above issues as needing 92 further work. 94 1.1. The Best Path Chaining and the Best Path Tree 96 In a stable system of BGP speakers, for every given prefix, the 97 selected best paths should form a spanning tree. At each node, the 98 best path selected points further up the tree. The root of the tree 99 is the destination, i.e. the originator of the prefix. The path from 100 any leaf to the root forms a "chain" of best paths. 102 There are any number of ways that path attributes may be modified 103 over time, at arbitrary places in this tree. When this happens, 104 individual segments of the tree may conceptually "stretch" or 105 "shrink". These changes may have no effect on the overall set of 106 choices of best path, or they may cause a cascade effect "below" that 107 point in the tree, with nodes migrating to new locations in a new 108 version of the tree. 110 However, each node makes its choice of best path locally, and every 111 time a node changes its selection of best path, that change is 112 visible to its peers, and may in turn affect their own choice of best 113 path. This propogation of changes is not instantaneous, and owing to 114 the non-tree-like nature of the actual connectivity between nodes, 115 can and does result in race conditions. 117 Depending on connectivity, peering policy, and initial conditions, 118 the behavior may border on that of systems best describe through 119 chaos theory. The time to reach a stable state, while generally 120 bounded, is often far from fast, not necessarily predictable, and not 121 necessarily consistent. 123 1.2. The Withdrawal Problem 125 Under normal circumstances, a change in attributes for a prefix will 126 "flow" along the tree of best paths, without disrupting the structure 127 of the tree itself signficantly. Even when a node selects a new best 128 path (and thus re-attaches itself to the tree in a new location), it 129 typically will continue to pass the new attributes along the branch 130 of the tree for which it is the root. 132 However, under certain circumstances, its choice of new best path, 133 requires it to WITHDRAW the prefix from those peers, and effectively 134 sever the branch. It is in the after-effects of this truncation that 135 much of the path-hunting behavior gets triggered. 137 When a withdrawal effectively severs a branch of the tree, all the 138 nodes on the tree will need to find new paths to the root. The 139 problem is, that it takes some time for them to learn this fact. 141 In the mean time, the nodes in the severed branch may continue to 142 use, and propogate, paths that are technically infeasible. 144 The idea is to fast-track the flooding of the infeasibility of paths 145 throughout all parts of the tree below a given link, so as to 146 minimize the use of infeasible paths. 148 1.3. The Uniqueness Property 150 Currently, for each prefix, only one path for that prefix is ever 151 announced from one peer to another (ignoring Route Reflectors). 152 Because of this property, uniqueness, a withdrawal on a prefix does 153 not require path information. This also means that a change of best 154 path is accomplished via an update for a prefix with the new path 155 information. 157 If, however, more than one path for a given prefix were sent, then 158 any attempt to withdraw a prefix+path would require some mechanism to 159 distinguish between prefix instances. 161 In an environment where multiple path announcments per prefix are 162 possible, but only one "best" path per prefix is maintained, then two 163 steps would be involved in changing the "best" path. In no 164 particular order, that would be the withdrawal of the old prefix+ 165 path, and the announcement of the new prefix+path. 167 2. Proposed Changes 169 What is being proposed is, maintaining the "best N" for each prefix, 170 and sending all of these rather than just the "best" path. 172 The supposition is that pruning all infeasible branches, while 173 maintaining information on the next N best paths, allows for fast 174 removal of all (possibly best) paths which are dependent on 175 infeasible paths, and fast reconvergence with pre-computed alternate 176 paths. It is expected that the N-best mechanism should act as a 177 stop-gap until, but not actually replace, full BGP path selection to 178 generate a new set of "best N" paths. 180 2.1. How to Identify a Path 182 As defined in [RFC4271], a path refers to the information reported in 183 the path attribute field of an UPDATE message. As the procedures 184 specified in [RFC4271] allow only the advertisement of one path for a 185 particular address prefix, a path for an address prefix from a BGP 186 peer can be keyed on the address prefix. 188 In order for a BGP speaker to advertise multiple paths for the same 189 address prefix, a new identifier (termed "Path Identifier" hereafter) 190 needs to be introduced so that a particular path for an address 191 prefix can be identified by the combination of the address prefix and 192 the Path Identifier. 194 Depending on the application and the configuration of a particular 195 peer, the Path Identifier for a path can be an AS number, or a BGP 196 Identifier, or an opaque number, with which a path is associated by 197 the BGP speaker that advertises the path. 199 2.2. Extended NLRI Encodings 201 In order to carry the Path Identifier in an UPDATE message, the 202 existing NLRI encodings specified in [RFC4271, RFC2858] are extended 203 as the following: 205 +-----------------------------+ 206 | Path Identifier (4 octets) | 207 +-----------------------------+ 208 | Path Ordinal (1 octet) | 209 +-----------------------------+ 210 | Length (1 octet) | 211 +-----------------------------+ 212 | Prefix (variable) | 213 +-----------------------------+ 215 Figure 1 217 and the NLRI encoding specified in [RFC3107] is extended as the 218 following: 220 +-----------------------------+ 221 | Path Identifier (4 octets) | 222 +-----------------------------+ 223 | Path Ordinal (1 octet) | 224 +-----------------------------+ 225 | Length (1 octet) | 226 +-----------------------------+ 227 | Label (3 octets) | 228 +-----------------------------+ 229 ......................... 230 +-----------------------------+ 231 | Prefix (variable) | 232 +-----------------------------+ 234 Figure 2 236 Update messages are otherwise identical to existing format. If BGP 237 capability ADD_PATHS_ORDERED has been negotiated, every Update MUST 238 have the New Update Format. More than one instance of a given 239 prefix, with distinct values of Path Attributes, MAY be sent between 240 BGP speakers. 242 At most N instances may be sent, where N is the value sent along with 243 the ADD_PATHS_ORDERED capability. 245 Two prefix paths are considered identical if they differ only in the 246 value of the ordinal. An Update which contains a path which differs 247 from the previous path with that value of UNIQ (identifier), will 248 result in the path information for the prefix and UNIQ being 249 modified. 251 The Ordinal must be non-zero, but the rules governing values of 252 Ordinal(s) used are specific to RFCs which refer to this document. 253 For example, BGP Equal-Cost Multipath may allow two paths with the 254 same Ordinal to be used. Similarly, BGP N-best Paths may require 255 per-prefix Ordinals be unique. 257 2.3. ADD_PATH_ORDERED Capability 259 The ADD_PATH_ORDERED Capability is a new BGP capability [RFC2842]. 260 The Capability Code for this capability is specified in the IANA 261 Considerations section of this document. The Capability Length field 262 of this capability is variable. The Capability Value field consists 263 of zero or more of the tuples as follows: 265 +------------------------------------------------+ 266 | Address Family Identifier (2 octets) | 267 +------------------------------------------------+ 268 | Subsequent Address Family Identifier (1 octet) | 269 +------------------------------------------------+ 270 | Maximum Ordinal Value (1 octet) | 271 +------------------------------------------------+ 273 Figure 3 275 The meaning and use of the fields are as follows: 276 Address Family Identifier (AFI): This field carries the identity of 277 the Network Layer protocol for which the BGP speaker intends to 278 advertise multiple paths. Presently defined values for this field 279 are specified in [IANA-AFI]. 280 Subsequent Address Family Identifier (SAFI): This field provides 281 additional information about the type of the Network Layer 282 Reachability Information carried in the attribute. Presently 283 defined values for this field are specified in [IANA-SAFI]. 284 Maximum Ordinal Value (MOV): This field specifies the maximum value 285 the speaker will send in the Ordinal field of any Update. It does 286 not mean that that the speaker will necessarily send any 287 particular Ordinal value within that range, nor that more than one 288 Ordinal value will be used. The value is an unsigned 8-bit value 289 greater than zero. 290 When advertising the ADD_PATH_ORDERED Capability to a peer, a BGP 291 speaker conveys to the peer that the speaker is capable of receiving 292 multiple paths as well as the single path from the peer for address 293 families that the speaker supports. When a tuple is 294 included in the capability, it indicates that the BGP speaker intends 295 to advertise multiple paths for the . If the ADD- 296 PATH Capability is also received from the peer, the speaker would 297 then follow the procedures for advertising multiple paths to the peer 298 for the specified . 300 3. Security Considerations 302 No additional security considerations beyond those already present in 303 BGP are introduced. 305 4. IANA Considerations 307 IANA will need to assign a new code point for BGP Capabilities for 308 ADD_PATH_ORDERED. 310 5. Acknowledgements 312 The author wishes to acknowledge the helpful guidance of Joe Abley, 313 Tony Li, and Yakhov Rehkter. The author thanks the following for 314 feedback during the review and revision process: Joel M. Halpern, 315 Tony Li. The author has based much of this document on an expired 316 Internet Draft, "draft-walton-bgp-addp-paths-05", and has used 317 substantial portions of that draft verbatim. The original authors of 318 that draft were Daniel Walton, Alvaro Retana, and Enke Chen, of Cisco 319 Systems. 321 The author also wishes to acknowledge the insight gained from his 322 Scottish Deerhound, Skylar, winning a Reserve Best-in-Show. (The 323 selection method of "second best" comes from the Reserve system used 324 at the group and best-in-show levels of dog shows). 326 6. References 328 6.1. Normative References 330 [1] McPherson, D., Gill, V., Walton, D., and A. Retana, "Border 331 Gateway Protocol (BGP) Persistent Route Oscillation Condition", 332 RFC 3345, August 2002. 334 [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 335 (BGP-4)", RFC 4271, January 2006. 337 [3] Chen, E. and S. Sangli, "Avoid BGP Best Path Transitions from 338 One External to Another", RFC 5004, September 2007. 340 6.2. Informative References 342 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 343 Levels", BCP 14, RFC 2119, March 1997. 345 Author's Address 347 Brian Dickson 348 Afilias Canada, Inc 349 4141 Yonge St, 350 Suite 204 351 North York, ON M2P 2A8 352 Canada 354 Email: brian.peter.dickson@gmail.com 355 URI: www.afilias.info 357 Full Copyright Statement 359 Copyright (C) The IETF Trust (2008). 361 This document is subject to the rights, licenses and restrictions 362 contained in BCP 78, and except as set forth therein, the authors 363 retain all their rights. 365 This document and the information contained herein are provided on an 366 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 367 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 368 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 369 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 370 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 371 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 373 Intellectual Property 375 The IETF takes no position regarding the validity or scope of any 376 Intellectual Property Rights or other rights that might be claimed to 377 pertain to the implementation or use of the technology described in 378 this document or the extent to which any license under such rights 379 might or might not be available; nor does it represent that it has 380 made any independent effort to identify any such rights. Information 381 on the procedures with respect to rights in RFC documents can be 382 found in BCP 78 and BCP 79. 384 Copies of IPR disclosures made to the IETF Secretariat and any 385 assurances of licenses to be made available, or the result of an 386 attempt made to obtain a general license or permission for the use of 387 such proprietary rights by implementers or users of this 388 specification can be obtained from the IETF on-line IPR repository at 389 http://www.ietf.org/ipr. 391 The IETF invites any interested party to bring to its attention any 392 copyrights, patents or patent applications, or other proprietary 393 rights that may cover technology that may be required to implement 394 this standard. Please address the information to the IETF at 395 ietf-ipr@ietf.org. 397 Acknowledgment 399 Funding for the RFC Editor function is provided by the IETF 400 Administrative Support Activity (IASA).