idnits 2.17.1 draft-keyupate-bgp-rcn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 8, 2010) is 5010 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4271' is defined on line 435, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-idr-rfc4893bis-01 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Patel 3 Internet-Draft C. Appanna 4 Intended status: Standards Track P. Mohapatra 5 Expires: February 9, 2011 Cisco Systems 6 J. Scudder 7 Juniper Networks 8 J. Uttaro 9 AT&T 10 August 8, 2010 12 Root cause notification to solve BGP path hunting 13 draft-keyupate-bgp-rcn-00.txt 15 Abstract 17 Whenever a prefix is withdrawn using BGP withdrawal mechanism, it 18 triggers a number of updates in certain scenarios before the prefix 19 is completly withdrawn from the entire BGP network. This phenomenon 20 is popularly known as _path exploration_ or _path hunting_ and occurs 21 because of path vector property of BGP. It results in a series of 22 unwanted or redundant transitions that overloads the BGP network. 24 This document describes a mechanism to help limit the amount of such 25 path exploration by defining two optional transitive path attributes 26 for BGP: SPEAKERID_PATH and ROOT_CAUSE. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on February 9, 2011. 45 Copyright Notice 47 Copyright (c) 2010 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 This document may contain material from IETF Documents or IETF 61 Contributions published or made publicly available before November 62 10, 2008. The person(s) controlling the copyright in some of this 63 material may not have granted the IETF Trust the right to allow 64 modifications of such material outside the IETF Standards Process. 65 Without obtaining an adequate license from the person(s) controlling 66 the copyright in such materials, this document may not be modified 67 outside the IETF Standards Process, and derivative works of it may 68 not be created outside the IETF Standards Process, except to format 69 it for publication as an RFC or to translate it into languages other 70 than English. 72 Table of Contents 74 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 75 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 76 2. Reference Diagram . . . . . . . . . . . . . . . . . . . . . . 5 77 3. SPEAKERID_PATH attribute . . . . . . . . . . . . . . . . . . . 6 78 4. ROOT_CAUSE attribute . . . . . . . . . . . . . . . . . . . . . 8 79 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 80 5.1. Sending SPEAKERID_PATH attribute . . . . . . . . . . . . . 9 81 5.2. Sending ROOT_CAUSE attribute . . . . . . . . . . . . . . . 9 82 5.2.1. At the point of occurrence . . . . . . . . . . . . . . 9 83 5.2.2. At an intermediate point . . . . . . . . . . . . . . . 9 84 5.3. Receiving ROOT_CAUSE Attribute . . . . . . . . . . . . . . 10 85 5.4. Usage of BGP Aggregates . . . . . . . . . . . . . . . . . 10 86 5.5. BGP Confederation . . . . . . . . . . . . . . . . . . . . 10 87 5.6. BGP Inactive Timer . . . . . . . . . . . . . . . . . . . . 10 88 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 89 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 90 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 91 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 92 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 93 9.2. Informative References . . . . . . . . . . . . . . . . . . 11 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 96 1. Introduction 98 Whenever a prefix is withdrawn using BGP withdrawal mechanism, it 99 triggers a number of updates in certain scenarios before the prefix 100 is completly withdrawn from the entire BGP network. This phenomenon 101 is popularly known as _path exploration_ or _path hunting_ and occurs 102 because of path vector property of BGP. It results in a series of 103 unwanted or redundant transitions that overloads the BGP network 104 ([I-D.li-bgp-stability]). 106 It is interesting to note that these redundant transitions can end up 107 triggering route dampening ([RFC2439], if deployed in the network. 108 Additionally, route dampening itself is known to cause path 109 exploration in the network due to the delay it introduces 110 ([I-D.li-bgp-stability]). This effectively creates a spiral effect 111 on BGP instability. Both the generation of unwanted update messages 112 and the triggering of route dampening can adversly affect the BGP 113 convergence time. 115 The problem lies in the way BGP path vector is defined. With a link 116 state protocol, each router stores a complete view of the entire 117 network and derives reachability information from that view. In the 118 event of a flap, each router can correctly determine all paths that 119 suffer from the same root cause. This is not scalable in large 120 networks in which BGP operates. By design, BGP advertises only the 121 path it is using in terms of ASes to its neighbors with each prefix. 122 Unfortunately, this information is coarse even in a simple topology 123 as the number of possible paths through the routers is quite large. 124 When a route is not reachable, because the detail route information 125 is not included, BGP selection process may end up choosing an 126 alternative path that is actually not available. After sets of such 127 transitions, BGP speaker will resolve this abnormality and decide on 128 correct available path based on receiver side loop detection. 130 This document proposes a mechanism to identify unreachable paths for 131 which BGP withdrawals are not received and prevent them from being 132 selected as prefered paths. This helps avoid unnecessary route 133 flapping within the network. A new optional transitive path 134 attribute, SPEAKERID_PATH is tagged in BGP announcements as the 135 prefix travels through the network, essentially creating more 136 granular information about routers in the path. When a prefix is 137 withdrawn, another optional transitive attribute, ROOT_CAUSE is 138 attached to the implicit or explicit withdraws that are generated at 139 different points in the network. This attribute is created once at 140 the point of occurrence of the fault and gets attached to the 141 resulting UPDATE message throughout the network unchanged. At a 142 receiving speaker, the ROOT_CAUSE attribute is matched against the 143 SPEAKERID_PATH attributes of available paths to help identify and 144 avoid those that are unreachable since they are affected by the same 145 root cause. 147 Path exploration caused by new prefix advertisements is not discussed 148 in this document. 150 1.1. Requirements Language 152 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 153 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 154 document are to be interpreted as described in RFC 2119 [RFC2119]. 156 2. Reference Diagram 158 +-------+ 159 | AS5 | 160 .................R11 R13....... 161 . | | . 162 . +--R12--+ . 163 . . . 164 . . . 165 . . . 166 . . . 167 +--R4---+ +---R7--+ +--R10--+ 168 | | | | | | 169 R2 R3........... R5 R6............... R8 R9 170 . | AS2 | | AS3 | | AS4 | 171 . +-------+ +-------+ +-------+ 172 . 173 . 174 . 175 . 176 +-- R1--+ 177 | | 178 | AS1 | 179 | | 180 +-------+ 182 The figure above describes a topology that leads to classic path 183 hunting problem. In steady state, AS5 has 3 paths for prefixes 184 received from AS1: 186 +----------+---------+ 187 | Path | AS_PATH | 188 +----------+---------+ 189 | p1(best) | 2 1 | 190 | p2 | 3 2 1 | 191 | p3 | 4 3 2 1 | 192 +----------+---------+ 194 When the link between AS1 and AS2 goes down, it leads to a series of 195 events and actions at AS5 as follows: 197 +------+---------------------+--------------------------------------+ 198 | Step | Event | Action | 199 +------+---------------------+--------------------------------------+ 200 | 1 | Recv withdraw of p1 | Select p2 as best | 201 | | | Send AS_PATH (5 3 2 1) upstream | 202 | -- | -- | -- | 203 | 2 | Recv withdraw of p2 | Select p3 as best | 204 | | | Send AS_PATH (5 4 3 2 1) upstream | 205 | -- | -- | -- | 206 | 3 | Recv withdraw of p3 | Prefixes have no path | 207 | | | Send withdraw for the prefixes | 208 | | | upstream | 209 +------+---------------------+--------------------------------------+ 211 This trivial example creates unnecessary churn in the network till 212 the end state is reached. 214 3. SPEAKERID_PATH attribute 216 SPEAKERID_PATH is an optional transitive attribute that is very 217 similar in encoding and operation to the AS_PATH attribute. It is 218 composed of a sequence of SPEAKERID path segments. Each segment is 219 represented by a triple (type, length, value). Following is the 220 format: 222 0 1 223 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 225 | Type | Length | 226 | (1 octet) | (1 octet) | 227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 228 ~ ~ 229 ~ Value ~ 230 ~ ~ 231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 233 The type is a 1-octet field with the following value defined: 235 Value Type definition 237 1 AS_ID_SEQUENCE: ordered set of AS and Speaker ID pair 238 a route in the UPDATE message has traversed. 240 The length is a 1-octet field, containing the number of such pairs. 241 Thus when the type is 1, the value contains one or more entries of 242 the following: 244 +---------------------------------+ 245 | AS (4 bytes) | 246 +---------------------------------+ 247 | SPEAKER-ID (4 bytes) | 248 +---------------------------------+ 250 The use and meaning of these fields are as follows: 252 AS: The AS is a four-octet field that indicates the AS number of 253 the BGP speaker. If this ASN is from the public ASN space, it 254 must have been assigned by the appropriate authority (use of ASN 255 values from the private ASN space is strongly discouraged). Note 256 that when a four-octet AS supporting speaker (NEW) announces an 257 UPDATE to a two-octet AS supporting speaker (OLD), it encodes 258 AS_TRANS as a two-octet AS in the AS_PATH attribute instead of its 259 own AS ([I-D.ietf-idr-rfc4893bis]). But while encoding the 260 SPEAKERID_PATH attribute, it MUST put its own four-octet AS in 261 this field regardless of whether the neighbor to whom the UPDATE 262 message is being sent is an OLD or NEW speaker. 264 SPEAKER-ID: The SPEAKER-ID is a four-octet field that indicates 265 the router-id of the BGP speaker. If the router-id is from the 266 public address space, it must have been assigned by the 267 appropriate authority. (use of the private ip address as a 268 router-id is strongly discourged). 270 4. ROOT_CAUSE attribute 272 ROOT_CAUSE is an optional transitive attribute that is composed of 273 one or more triple (type, length, value). Following is the format: 275 0 1 276 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 | Type | Length | 279 | (1 octet) | (1 octet) | 280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 281 ~ ~ 282 ~ Value ~ 283 ~ ~ 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 The type is a 1-octet field with the following value defined: 288 Value Type definition 290 1 AS_ID_CONN: AS and router-ID pairs from both sides of the 291 connection that is the point of occurrence for the 292 withdraw. 294 The length is a 1-octet field, containing the length in octets of the 295 value field. When the type is 1, the value contains the following: 297 +--------------+ 298 |Flags(1 octet)| 299 +---------------------------------+ 300 | left AS (4 bytes) | 301 +---------------------------------+ 302 | left SPEAKER-ID (4 bytes) | 303 +---------------------------------+ 304 | right AS (4 bytes) | 305 +---------------------------------+ 306 | right SPEAKER-ID (4 bytes) | 307 +---------------------------------+ 309 5. Operation 311 5.1. Sending SPEAKERID_PATH attribute 313 When a BGP speaker supporting the mechanism described in this 314 document propagates a route it learned from another BGP speaker's 315 UPDATE message, it modifies the route's SPEAKERID_PATH attribute by 316 prepending its own router-ID and AS number as the last pair of the 317 sequence. If there is no such attribute, the local system creates 318 the attribute, creates a new segment in the attribute of type 319 AS_ID_SEQUENCE and places its own pair into that segment. If the act 320 of prepending will cause an overflow in the existing segment (i.e. 321 more than 255 pairs), it MUST prepend a new segment of type 322 AS_ID_SEQUENCE and prepend its own pair to this new segment. This 323 operation should be performed regardless of whether the peer is IBGP 324 or EBGP. 326 5.2. Sending ROOT_CAUSE attribute 328 5.2.1. At the point of occurrence 330 A BGP speaker originates the ROOT_CAUSE attribute into an UPDATE 331 message in one of the following scenarios: 333 o A session with a peer AS goes down or the associated link goes 334 down and the received prefixes need to be withdrawn or their 335 bestpath changes. 337 o it receives withdraws for some prefixes without the ROOT_CAUSE 338 attribute and they in turn need to be either withdrawn from the 339 ASes upstream or re-advertised with new paths. 341 While originating the attribute, the speaker encodes the router-ID 342 and AS of each side of the session. 344 5.2.2. At an intermediate point 346 Any speaker receiving a withdrawal UPDATE message with ROOT_CAUSE 347 attribute should preserve and announce the resulting UPDATE message 348 with the same attribute value. This can be an explicit withdraw for 349 a prefix or an implicit withdraw. 351 Any speaker receiving a reachable UPDATE message with ROOT_CAUSE 352 attribute should preserve the attribute and not announce the 353 attribute in resulting UPDATE message unless the resulting UPDATE 354 message is an explicit withdrawal message. 356 5.3. Receiving ROOT_CAUSE Attribute 358 Whenever a BGP speaker receives an update message to process 359 withdrawn prefixes, it does the following: 361 o Remove the BGP path of the prefix withdrawn. 363 o Find all the other paths that have matching ROOT_CAUSE information 364 to the one present in path that is removed. Place these paths on 365 an Inactive timer for an Inactive time interval. Do not select 366 these paths for the BGP bespath selection. 368 5.4. Usage of BGP Aggregates 370 Whenever a BGP speaker creates an aggregate route from more specific 371 routes, it will not inherit any BGP SPEAKERID_PATH information from 372 its more specific routes used for aggregation. Instead, it will 373 create its own SPEAKERID_PATH attribute when it announces the 374 aggregate route to its BGP peers, i.e. the attribute will contain one 375 segment with only its own (AS, router-id) pair when it announces the 376 aggregate. 378 5.5. BGP Confederation 380 BGP Confederation Speaker peering with EBGP peers and receiving 381 routes from them will exchange BGP Route Originator attributes as 382 well. Whenever a Special Withdrawal message is received, following 383 is done: 385 o Remove the path announced by peer (sending a Special Withdrawal 386 message). 388 o Not select any other BGP Paths with matching Route Originator 389 Attribute (as one received in the Special Withdrawal). 391 o If there arent any alternate paths available, forward the Special 392 Withdrawal message (with originate Route Originator Attribute). 394 5.6. BGP Inactive Timer 396 BGP inactive timer is used for suppressing path information from 397 being used in BGP bestpath selection. This prevents BGP from 398 selecting such alternate paths for which withdrawals are not received 399 yet. A BGP speaker should remove suppress paths whenever withdrawn. 400 A BGP speaker must subject all the suppress paths for BGP bestpath 401 selection if they are not withdrawn even after inactive timer 402 expires. The timeout for an Inactive Timer should be kept big enough 403 to allow the withdrawal information to propagate across the AS. 405 6. Acknowledgements 407 Authors would like to thank Robert Raszuk and Pedro Marques for their 408 input. 410 7. IANA Considerations 412 IANA shall assign codepoints for the SPEAKERID_PATH and ROOT_CAUSE 413 attributes. These codepoints will come from the "BGP Path 414 Attributes" registry. 416 8. Security Considerations 418 This extension to BGP does not change the underlying security issues. 420 9. References 422 9.1. Normative References 424 [I-D.ietf-idr-rfc4893bis] 425 Vohra, Q. and E. Chen, "BGP Support for Four-octet AS 426 Number Space", draft-ietf-idr-rfc4893bis-01 (work in 427 progress), October 2009. 429 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 430 Requirement Levels", BCP 14, RFC 2119, March 1997. 432 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route 433 Flap Damping", RFC 2439, November 1998. 435 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 436 Protocol 4 (BGP-4)", RFC 4271, January 2006. 438 9.2. Informative References 440 [I-D.li-bgp-stability] 441 Huston, G. and T. Li, "BGP Stability Improvements", 442 draft-li-bgp-stability-01 (work in progress), June 2007. 444 Authors' Addresses 446 Keyur Patel 447 Cisco Systems 448 170 W. Tasman Drive 449 San Jose, CA 95134 450 USA 452 Email: keyupate@cisco.com 454 Chandra Appanna 455 Cisco Systems 456 170 W. Tasman Drive 457 San Jose, CA 95134 458 USA 460 Email: chandra@cisco.com 462 Pradosh Mohapatra 463 Cisco Systems 464 170 W. Tasman Drive 465 San Jose, CA 95134 466 USA 468 Email: pmohapat@cisco.com 470 John Scudder 471 Juniper Networks 472 1194 N. Mathilda Ave 473 Sunnyvale, CA 94089 474 USA 476 Email: jgs@juniper.net 478 James Uttaro 479 AT&T 480 200 S. Laurel Ave 481 Middletown, NJ 07748 482 USA 484 Email: uttaro@att.com