idnits 2.17.1 draft-ietf-idr-best-external-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The abstract seems to contain references ([RFC4456], [RFC4271], [RFC5065], [RFC1771]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 3, 2012) is 4495 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 3345 Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Marques 3 Internet-Draft 4 Expires: July 6, 2012 R. Fernando 5 E. Chen 6 P. Mohapatra 7 Cisco Systems 8 H. Gredler 9 Juniper Networks 10 January 3, 2012 12 Advertisement of the best external route in BGP 13 draft-ietf-idr-best-external-05 15 Abstract 17 The current BGP-4 protocol specification [RFC4271] states that the 18 selection process chooses the best path for a given route which is 19 added to the Loc-Rib and advertised to all peers. 21 Previous versions [RFC1771] of the specification defined a different 22 rule for Internal BGP Updates. Given that Internal paths are not re- 23 advertised to Internal peers, it was specified that the best of the 24 external paths, as determined by the path selection tie breaking 25 algorithm, would be advertised to Internal peers. 27 This document extends that procedure to operate in environments where 28 Route Reflection [RFC4456] or Confederations [RFC5065] are used and 29 explains why advertising the additional routing information can 30 improve convergence time without causing routing loops. 32 Additional benefits include reduction of inter-domain churn and 33 avoidance of permanent route oscillation. 35 Status of this Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on July 6, 2012. 51 Copyright Notice 53 Copyright (c) 2012 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5 70 3. Generalization . . . . . . . . . . . . . . . . . . . . . . . . 6 71 4. Algorithm for selection of the Adj-RIB-OUT path . . . . . . . 7 72 5. Advertisement Rules . . . . . . . . . . . . . . . . . . . . . 9 73 6. Consistency between routing and forwarding . . . . . . . . . . 10 74 7. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 12 75 8. Fast Connectivity Restoration . . . . . . . . . . . . . . . . 13 76 9. Inter-Domain Churn Reduction . . . . . . . . . . . . . . . . . 14 77 10. Reducing Persistent IBGP oscillation . . . . . . . . . . . . . 15 78 11. Deployment Considerations . . . . . . . . . . . . . . . . . . 16 79 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 80 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 81 14. Security Considerations . . . . . . . . . . . . . . . . . . . 19 82 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21 85 1. Introduction 87 Earlier versions of the BGP-4 protocol specification [RFC1771] 88 prescribed different route advertisement rules for Internal and 89 External peers. While the overall best path would be advertised to 90 External peers, Internal peers are advertised the best of the 91 externally received paths. 93 This Internal advertisement rule was never implemented as specified 94 and was latter dropped from the protocol. There is a trade-off in 95 advertising the "best-external" route versus the behavior that became 96 common standard of not advertising the route when the selected best 97 path is received from an Internal peer. By not advertising 98 information in this case it is possible to reduce state both in the 99 local BGP speaker as well as in the network overall. Early BGP 100 implementations where very concerned with reducing state as they 101 where limited to relatively low memory footprints (e.g. 16 MB). 102 There is also the possible concern regarding advertising a path 103 different than the path that has been selected for forwarding. 105 However, advertising the best external route, when different from the 106 best route, presents additional information into an IBGP mesh which 107 may be of value for several purposes including: 109 o Faster restoration of connectivity. By providing additional 110 paths, that may be used to fail over in case the primary path 111 becomes invalid or is withdrawn. 113 o Reducing inter-domain churn and traffic black-holing due to the 114 readily available alternate path. 116 o Reducing the potential for situations of permanent IBGP route 117 oscillation [RFC3345]. 119 o Improving selection of lower MED routes from the same neighboring 120 AS. 122 This document defines procedures to select the best external route 123 for each peer. It also describes how above benefits are realized 124 with best external route announcement with the help of certain 125 scenarios. 127 2. Requirements Language 129 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 130 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 131 document are to be interpreted as described in RFC 2119 [RFC2119]. 133 3. Generalization 135 The BGP-4 protocol [RFC1771] has extended with two alternative 136 mechanisms that provide ways to reduce the operational complexity of 137 route distribution within an AS: Route Reflection [RFC4456] and 138 Confederations [RFC5065]. It is important to be able to express 139 route advertisement rules in the context of both of these mechanisms. 141 When Route Reflection is used, Internal peers are further classified 142 depending of the reflection cluster they belong to. Non-client 143 internal peers form one BGP peering mesh. Each set of RR clients 144 with the same "cluster-id" configuration forms a separate mesh. 146 When selecting the path to add to the Adj-RIB-OUT, this document 147 specifies that the path that originate from the same mesh MAY be 148 excluded from consideration. This results in an Adj-RIB-OUT 149 selection per mesh (the set of non-client peers or a specific 150 cluster). 152 Similarly, when BGP Confederations are used, each confederated AS is 153 a BGP mesh. As with the Route Reflection scenario, when selecting 154 the path to add to the Adj-RIB-OUT, routes from the same mesh MAY be 155 excluded. 157 4. Algorithm for selection of the Adj-RIB-OUT path 159 The objective of this protocol extension is to improve the quality of 160 the routing information known to a particular BGP mesh with minimum 161 additional cost in terms of processing and state. 163 Towards that goal, it is useful to define a total order between the 164 Adj-RIB-In routes which provides both the same overall best path as 165 the algorithm defined in the current BGP-4 specification [RFC4271] as 166 well as an ordering of alternate routes. Using this total order it 167 is then computationally efficient to select the path for a specific 168 Adj-RIB-OUT by excluding the routes that have been received from the 169 BGP mesh corresponding to the peer (or set of peers). 171 In order to achieve this, it is helpful to introduce the concept of 172 path group. A group is the set of paths that compare as equal 173 through all the steps prior to the MED comparison step (as defined in 174 section 9.1.2.2 of RFC 4271 [RFC4271] and have been received from the 175 same neighbor AS. 177 Paths are ordered within a group via MED or subsequent route 178 selection rules. 180 In pseudo-code: 182 function compare(path_1, path_2) { 183 cmp_result cmp = selection_steps_before_med(path_1, path_2); 184 if (cmp != cmp_result.equal) { 185 return cmp; 186 } 187 if (neighbor_as(path_1) == neighbor_as(path_2)) { 188 return selection_steps_after_med(path_1, path_2); 189 } 191 if (is_group_best(path_1)) { 192 if (!is_group_best(path_2)) { 193 return cmp_result.greater_than; 194 } 195 return selection_steps_after_med(path_1, path_2); 196 } else { 197 if (is_group_best(path_2)) { 198 return cmp_result.less_than; 199 } 200 /* Compare the best paths of respective groups */ 201 return compare(group_best(path_1), group_best(path_2)); 202 } 203 } 205 As an example, the following set of received routes: 207 +------+----+-----+--------+ 208 | Path | AS | MED | rtr_id | 209 +------+----+-----+--------+ 210 | a | 1 | 10 | 10 | 211 | | | | | 212 | b | 2 | 5 | 1 | 213 | | | | | 214 | c | 1 | 5 | 5 | 215 | | | | | 216 | d | 2 | 20 | 20 | 217 | | | | | 218 | e | 2 | 30 | 30 | 219 | | | | | 220 | f | 3 | 10 | 20 | 221 +------+----+-----+--------+ 223 Path Attribute Table 225 Would yield the following order (from the most to the least 226 preferred): 228 b < d < e < c < a < f 230 In this example, comparison of the best path within each group 231 provides the sequence (b < c < f). The remaining paths are ordered 232 in relation to their respective group best. 234 The first path in the ordering above is the best overall path for a 235 given NLRI. When selecting a path for a particular Adj-RIB-Out (or 236 set of RIB-Outs) an implementation MAY choose to select the first 237 path in the global order which was not received from the same BGP 238 mesh (as defined above) as the target peer (or peers). 240 5. Advertisement Rules 242 1. When advertising a route to a non-client Internal peer, a BGP 243 speaker MAY choose to select the first path in order that did not 244 originate from the same BGP mesh (i.e. the set of non-client 245 Internal peers) whenever the best overall path has been received 246 from this mesh and would be suppressed by the Internal BGP non- 247 readvertisement rule. 249 2. When advertising a route to a Route Reflection client peer, in 250 case the overall best path has been received from the same 251 cluster, a BGP speaker MUST be able to advertise the best overall 252 path to all the members of the cluster other than the originator, 253 unless "client-to-client" reflection is disabled. The 254 implementation MAY choose to advertise an alternate path to the 255 specific peer that originates the best overall path by excluding 256 from consideration all paths with the same originator-id. 258 3. When "client-to-client" reflection is disabled and the cluster is 259 operating as a mesh, a Route Reflector MAY opt to advertise to 260 the cluster the preferred path from the set of paths not received 261 from the cluster. While this deployment mode is currently 262 uncommon, it can be a practical way to guarantee path diversity 263 inside the cluster. 265 4. A confederation border route MAY choose to advertise an alternate 266 path towards its Internal BGP mesh or towards a con-fed member AS 267 following the same procedure as defined above. 269 6. Consistency between routing and forwarding 271 The internal update advertisement rules contained in the original 272 BGP-4 specification [RFC1771] can lead to situations where traffic is 273 forwarded through a route other than the route advertised by BGP. 275 Inconsistencies between forwarding and routing are highly 276 undesirable. Service providers use BGP with the dual objective of 277 learning reach-ability information and expressing policy over network 278 resources. The latter assumes that forwarding follows routing 279 information. 281 Consider the Autonomous system presented in figure 1, where r1 ... r4 282 are members of a single IBGP mesh and routes a, b, and c are received 283 from external peers. 285 AS 1 (c) 286 | 287 +----+ +----+ 288 | r1 |...........| r2 | 289 +----+ +----+ 290 . 291 . 292 . 293 . 294 . 295 . 296 +----+ +----+ 297 | r3 |...........| r4 | --- ebgp --- AS X 298 +----+ +----+ 299 / \ 300 / \ 301 AS 1 (a) AS 2 (b) 303 Inconsistency in Routing 305 +------+----+-----+--------+ 306 | Path | AS | MED | rte_id | 307 +------+----+-----+--------+ 308 | a | 1 | 10 | 1 | 309 | | | | | 310 | b | 2 | 5 | 10 | 311 | | | | | 312 | c | 1 | 5 | 5 | 313 +------+----+-----+--------+ 315 Path Attribute Table 317 Following the rules as specified in RFC 1771 [RFC1771], router r3 318 will select path (b) received from AS 2 as its overall best to 319 install in the Loc-Rib, since path (b) is preferable to path (c), the 320 lowest MED route from AS 1. However for the purposes of Internal 321 Update route selection, it will ignore the presence of path (c), and 322 elect (a) as its advertisement, via the router-id tie-breaking rule. 324 In this scenario, router r4 will receive (c) from r1 and (a) from r3. 325 It will pick the lowest MED route (c) and advertise it out via IBGP 326 to AS X. However at this point routing is inconsistent with 327 forwarding as traffic received from AS X will be forwarded towards AS 328 2, while the IBGP advertisement is being made for an AS 1 path. 330 Routing policies are typically specified in terms of neighboring 331 AS-es. In the situation above, assuming that AS 1 is network for 332 which this AS provides transit services while AS 2 and AS X are peer 333 networks, one can easily see how the inconsistency between routing 334 and forwarding would lead to transit being inadvertently provided 335 between AS X and AS 2. This could lead to persistent forwarding 336 loops. 338 Inconsistency between routing and forwarding may happen, whenever a 339 GP speaker chooses to advertise an external route into IBGP that is 340 different from the overall best route and its overall best is 341 external. 343 7. Applications 344 8. Fast Connectivity Restoration 346 When two exits are available to reach a particular destination and 347 one is preferred over the other, the availability of an alternate 348 path provides fast connectivity restoration when the primary path 349 fails. 351 Restoration can be quick since the alternate path is already at hand. 352 The border router could recompute the backup route and perinatal it 353 in FIB ready to be switched when the primary goes away. Note that 354 this requires the border router that's the backup to also perinatal 355 the secondary path and switch to it on failure. 357 9. Inter-Domain Churn Reduction 359 Within an AS, the non availability of backup best leads to a border 360 router sending a withdraw upstream when the primary fails. This 361 leads to inter-domain churn and packet loss for the time the network 362 takes to converge to the alternate path. Having the alternate path 363 will reduces the churn and eliminates packet loss. 365 10. Reducing Persistent IBGP oscillation 367 Advertising the best external route, according to the algorithm 368 described in this document will reduce the possibility of route 369 oscillation by introducing additional information into the IBGP 370 system. 372 For a permanent oscillation condition to occur, it is necessary that 373 a circular dependency between paths occurs such that the selection of 374 a new best path by a router, in response to a received IBGP 375 advertisement, causes the withdrawal of information that another 376 router depends on in order to generate the original event. 378 In vanilla BGP, when only the best overall route is advertised, as in 379 most implementations, oscillation can occur whenever there are 2 or 380 clusters/sub-AS-es such that at least one cluster has more than one 381 path that can potentially contribute to the dependency. 383 11. Deployment Considerations 385 The mechanism specified in the draft allows a BGP speaker to 386 advertise a route that is not the best route used for forwarding. 387 This is a departure from the current behavior. However, consistency 388 in the path selection process across the AS is still guaranteed since 389 the ingress routers will not choose the best-external route as the 390 best route for a destination in steady state (for the same reason 391 that the BGP speaker announcing the best-external route chose an IBGP 392 route as best instead of the externally learnt route). Though it is 393 possible to alter this assurance by defining route policies on IBGP 394 sessions, use of such policies in IBGP is not recommended, especially 395 with best-external announcement turned on in the network. It is also 396 worth noting that such inconsistency in routing and forwarding is 397 mitigated in a tunneled network. 399 12. Acknowledgments 401 This document greatly benefits from the comments of Yakov Rekhter, 402 John Scudder, Eric Rosen, Jenny Yuan, Jay Borkenhagen, Salkat Ray and 403 Jakob Heitz. 405 13. IANA Considerations 407 This document has no actions for IANA. 409 14. Security Considerations 411 There are no additional security risks introduced by this design. 413 15. References 415 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 416 (BGP-4)", RFC 1771, March 1995. 418 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 419 Requirement Levels", BCP 14, RFC 2119, March 1997. 421 [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, 422 "Border Gateway Protocol (BGP) Persistent Route 423 Oscillation Condition", RFC 3345, August 2002. 425 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 426 Protocol 4 (BGP-4)", RFC 4271, January 2006. 428 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 429 Reflection: An Alternative to Full Mesh Internal BGP 430 (IBGP)", RFC 4456, April 2006. 432 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 433 System Confederations for BGP", RFC 5065, August 2007. 435 Authors' Addresses 437 Pedro Marques 439 Email: pedro.r.marques@gmail.com 441 Rex Fernando 442 Cisco Systems 443 170 W. Tasman Dr. 444 San Jose, CA 95134 445 US 447 Email: rex@cisco.com 449 Enke Chen 450 Cisco Systems 451 170 W. Tasman Dr. 452 San Jose, CA 95134 453 US 455 Email: enkechen@cisco.com 457 Pradosh Mohapatra 458 Cisco Systems 459 170 W. Tasman Dr. 460 San Jose, CA 95134 461 US 463 Email: pmohapat@cisco.com 465 Hannes Gredler 466 Juniper Networks 467 1194 N. Mathilda Ave. 468 Sunnyvale, CA 94089 469 US 471 Email: hannes@juniper.net