idnits 2.17.1 draft-marques-idr-best-external-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 25, 2009) is 5510 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 3345 Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Marques 3 Internet-Draft R. Fernando 4 Intended status: Standards Track Juniper Networks 5 Expires: September 26, 2009 E. Chen 6 P. Mohapatra 7 Cisco Systems 8 March 25, 2009 10 Advertisement of the best external route in BGP 11 draft-marques-idr-best-external-01.txt 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. This document may contain material 17 from IETF Documents or IETF Contributions published or made publicly 18 available before November 10, 2008. The person(s) controlling the 19 copyright in some of this material may not have granted the IETF 20 Trust the right to allow modifications of such material outside the 21 IETF Standards Process. Without obtaining an adequate license from 22 the person(s) controlling the copyright in such materials, this 23 document may not be modified outside the IETF Standards Process, and 24 derivative works of it may not be created outside the IETF Standards 25 Process, except to format it for publication as an RFC or to 26 translate it into languages other than English. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on September 26, 2009. 46 Copyright Notice 48 Copyright (c) 2009 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents in effect on the date of 53 publication of this document (http://trustee.ietf.org/license-info). 54 Please review these documents carefully, as they describe your rights 55 and restrictions with respect to this document. 57 Abstract 59 The base BGP specifications prevent a BGP speaker from advertising 60 any route that is not the best route for a BGP destination. This 61 document specifies a modification of this rule. Routes are divided 62 into two categories, "external" and "internal". A specification is 63 provided for choosing a "best external route" (for a particular value 64 of the Network Layer Reachability Information). A BGP speaker is 65 then allowed to advertise its "best external route" to its internal 66 BGP peers, even if that is not the best route for the destination. 67 The document explains why advertising the best external route can 68 improve convergence time without causing routing loops. Additional 69 benefits include reduction of inter-domain churn and avoidance of 70 permanent route oscillation. The document also generalizes the 71 notions of "internal" and "external" so that they can be applied to 72 Route Reflector Clusters and Autonomous System Confederations. 74 Table of Contents 76 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 77 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 78 2. Algorithm for selection of best external route . . . . . . . . 5 79 3. Advertisement Rules . . . . . . . . . . . . . . . . . . . . . 6 80 4. Consistency between routing and forwarding . . . . . . . . . . 6 81 5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 8 82 5.1. Fast Connectivity Restoration . . . . . . . . . . . . . . 8 83 5.2. Inter-Domain Churn Reduction . . . . . . . . . . . . . . . 9 84 5.3. Reducing Persistent IBGP oscillation . . . . . . . . . . . 9 85 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 87 8. Security Considerations . . . . . . . . . . . . . . . . . . . 9 88 9. Normative References . . . . . . . . . . . . . . . . . . . . . 9 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 91 1. Introduction 93 The base BGP specifications prevent a BGP speaker from advertising 94 any route that is not the best route for a BGP destination. This 95 document specifies a modification of this rule. Routes are divided 96 into two categories, "external" and "internal". A specification is 97 provided for choosing a "best external route" (for a particular value 98 of the Network Layer Reachability Information). A BGP speaker is 99 then allowed to advertise its "best external route" to its internal 100 BGP peers, even if that is not the best route for the destination. 101 The document explains why advertising the best external route can 102 improve convergence time without causing routing loops. Additional 103 benefits include reduction of inter-domain churn and avoidance of 104 permanent route oscillation. 106 The document also generalizes the notions of "internal" and 107 "external" so that they can be applied to Route Reflector Clusters 108 [RFC4456] and Autonomous System Confederations [RFC5065]. More 109 specifically, two routers in the same route reflector cluster having 110 an IBGP session between them are defined to be "internal" peers, 111 whereas two routers in different clusters having an IBGP session are 112 defined to be "external" peers. Similarly, two routers in the same 113 member AS of a confederation having an IBGP session between them are 114 "internal" peers, whereas two routers in different member ASs of a 115 confederation having a confed EBGP session between them are defined 116 to be "external" peers. The definition of "best external route" 117 ensues from this definition in that it is the most preferred route 118 among those received from the "external" neighbors. 120 Advertising the best external route, when different from the best 121 route, presents additional information into an IBGP mesh which may be 122 of value for several purposes including: 124 o Faster restoration of connectivity, by providing additional paths, 125 that may be used to fail over in case the primary path becomes 126 invalid or is withdrawn. 128 o Reducing inter-domain churn and traffic blackholing due to the 129 readily available alternate path. 131 o Reducing the potential for situations of permanent IBGP route 132 oscillation, as discussed in some scenarios [RFC3345]. 134 o Improving selection of lower MED routes from the same neighboring 135 AS. 137 This document defines procedures to select the best external route 138 for each destination. It also describes how above benefits are 139 realized with best external route announcement with the help of 140 certain scenarios. 142 1.1. Requirements Language 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in RFC 2119 [RFC2119]. 148 2. Algorithm for selection of best external route 150 Given that the intent in advertising an external route, when the best 151 route for the same destination is an internal route, is to provide 152 additional information into the IBGP mesh into which a route is 153 participating, it is desirable to take into account the routes 154 received from internal neighbors in the selection process. 156 We propose a route selection algorithm that selects a total order 157 between routes and which selects the same best route as the one 158 currently specified [RFC4271]. 160 In order to achieve this, we need to introduce the concept of route 161 group. For a given NLRI, suppose the BGP decision process has run 162 through all the steps prior to the MED comparison step (as defined in 163 section 9.1.2.2 of [RFC4271]. Look at the set of routes that are 164 still under consideration at that time. Now partition this set into 165 a number of disjoint route groups, where two routes are in the same 166 group if and only if the neighbor AS of each route is the same. 168 Routes are ordered within a group via MED or subsequent route 169 selection rules. 171 The order of all routes for the same destination is determined by the 172 order of the best route in each group. 174 As an example, the following set of received routes: 176 Path AS MED rtr_id 177 a 1 10 10 178 b 2 5 1 179 c 1 5 5 180 d 2 20 20 181 e 2 30 30 182 f 3 10 20 183 Figure 1: Path Attribute Table 185 Would yield the following order (from the most to the least 186 preferred): 188 b < d < e < c < a < f 190 In this example, comparison of the best route within each group 191 provides the sequence (b < c < f). The remaining routes are ordered 192 in relation to their respective group best. 194 The first route in the above ordering is indeed the best route for a 195 given destination. Eliminating the best route and executing the 196 above steps leads us to a new total order of the routes. The route 197 to be advertised to a particular domain is selected by choosing the 198 most preferred route that is external to that particular domain in 199 the above order. Note that whenever the overall best route is 200 external it will automatically be selected by this algorithm. 202 3. Advertisement Rules 204 1. In an AS domain, if a router has installed an internal route as 205 best, it should advertise its "best external route" (as defined 206 in the draft) to its internal neighbors. 208 2. In a Cluster domain, if a router (route reflector) has installed 209 an external route as best, it should advertise its "best internal 210 route" to its external neighbors. (Advertising to internal 211 neighbors is unchanged.) Similarly, if the route reflector has 212 installed an internal route as best, it should advertise its 213 "best external route" to its internal (client) peers. In order 214 for the reflector to be able to advertise the best external route 215 into the cluster, it is necessary that client-to-client 216 reflection be disabled, since its advertisement may otherwise 217 contain the best route within the cluster domain. 219 3. In a Confederation Member domain, if a router (confederation 220 border router) has installed an internal route as best, it 221 advertises its best external route to its internal neighbors. 222 However, if it has installed an external route as best, it 223 advertises its best internal route to its external neighbors. 225 4. Consistency between routing and forwarding 227 The BGP protocol, as defined in [RFC1771], specifies that a BGP 228 speaker shall advertise to its internal peers the route with the 229 highest degree of preference among routes to the same destination 230 received from external neighbors. 232 This section discusses problems present with the approach described 233 in [RFC1771] and the next section offers an alternative algorithm to 234 select a best external route which can be advertised to an IBGP mesh. 236 The internal update advertisement rules contained in the original 237 BGP-4 specification [RFC1771] can lead to situations where traffic is 238 forwarded through a route other than the route advertised by BGP. 240 Inconsistencies between forwarding and routing are highly 241 undesirable. Service providers use BGP with the dual objective of 242 learning reachability information and expressing policy over network 243 resources. The latter assumes that forwarding follows routing 244 information. 246 Consider the Autonomous system presented in figure 1, where r1 ... r4 247 are members of a single IBGP mesh and routes a, b, and c are received 248 from external peers. 250 AS 1 (c) 251 | 252 +----+ +----+ 253 | r1 |...........| r2 | 254 +----+ +----+ 255 . 256 . 257 . 258 . 259 . 260 . 261 +----+ +----+ 262 | r3 |...........| r4 | --- ebgp --- AS X 263 +----+ +----+ 264 / \ 265 / \ 266 AS 1 (a) AS 2 (b) 268 Figure 2: Inconsistency in Routing 269 Path AS MED rtr_id 270 a 1 10 1 271 b 2 5 10 272 c 1 5 5 274 Figure 3: Path Attribute Table - 2 276 Following the rules as specified in [RFC1771], router r3 will select 277 path (b) received from AS 2 as its overall best to install in the 278 Loc-Rib, since path (b) is preferable to path (c), the lowest MED 279 route from AS 1. However for the purposes of Internal Update route 280 selection, it will ignore the presence of path (c), and elect (a) as 281 its advertisement, via the router-id tie-breaking rule. 283 In this scenario, router r4 will receive (c) from r1 and (a) from r3. 284 It will pick the lowest MED route (c) and advertise it out via ebgp 285 to AS X. However at this point routing is inconsistent with 286 forwarding as traffic received from AS X will be forwarded towards AS 287 2, while the ebgp advertisement is being made for an AS 1 path. 289 Routing policies are typically specified in terms of neighboring 290 ASes. In the situation above, assuming that AS 1 is network for 291 which this AS provides transit services while AS 2 and AS X are peer 292 networks, one can easily see how the inconsistency between routing 293 and forwarding would lead to transit being inadvertently provided 294 between AS X and AS 2. This could lead to persistent forwarding 295 loops. 297 Inconsistency between routing and forwarding may happen, whenever a 298 bgp speaker chooses to advertise an external route into IBGP that is 299 different from the overall best route and its overall best is 300 external. 302 5. Applications 304 5.1. Fast Connectivity Restoration 306 When two exits are available to reach a particular destination and 307 one is preferred over the other, the availability of an alternate 308 path provides fast connectivity restoration when the primary path 309 fails. 311 Restoration can be quick since the alternate path is already at hand. 312 The border router could precompute the backup route and preinstall it 313 in FIB ready to be switched when the primary goes away. Note that 314 this requires the border router that's the backup to also preinstall 315 the secondary path and switch to it on failure. 317 5.2. Inter-Domain Churn Reduction 319 Within an AS, the non availability of backup best leads to a border 320 router sending a withdraw upstream when the primary fails. This 321 leads to inter-domain churn and packet loss for the time the network 322 takes to converge to the alternate path. Having the alternate path 323 will reduces the churn and eliminates packet loss. 325 5.3. Reducing Persistent IBGP oscillation 327 Advertising the best external route, according to the algorithm 328 described in this document will reduce the possibility of route 329 oscillation by introducing additional information into the IBGP 330 system. 332 For a permanent oscillation condition to occur, it is necessary that 333 a circular dependency between paths occurs such that the selection of 334 a new best path by a router, in response to a received IBGP 335 advertisement, causes the withdrawal of information that another 336 router depends on in order to generate the original event. 338 In vanilla BGP, when only the best overall route is advertised, as in 339 most implementations, oscillation can occur whenever there are 2 or 340 clusters/sub-ASes such that at least one cluster has more than one 341 path that can potentially contribute to the dependency. 343 6. Acknowledgments 345 This document greatly benefits from the comments of Yakov Rekhter, 346 John Scudder, Eric Rosen, and Jenny Yuan. 348 7. IANA Considerations 350 This document has no actions for IANA. 352 8. Security Considerations 354 There are no additional security risks introduced by this design. 356 9. Normative References 358 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 359 (BGP-4)", RFC 1771, March 1995. 361 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 362 Requirement Levels", BCP 14, RFC 2119, March 1997. 364 [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, 365 "Border Gateway Protocol (BGP) Persistent Route 366 Oscillation Condition", RFC 3345, August 2002. 368 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 369 Protocol 4 (BGP-4)", RFC 4271, January 2006. 371 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 372 Reflection: An Alternative to Full Mesh Internal BGP 373 (IBGP)", RFC 4456, April 2006. 375 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 376 System Confederations for BGP", RFC 5065, August 2007. 378 Authors' Addresses 380 Pedro Marques 381 Juniper Networks 382 1194 N. Mathilda Ave 383 Sunnyvale, CA 94089 384 USA 386 Phone: 387 Email: roque@juniper.net 389 Rex Fernando 390 Juniper Networks 391 1194 N. Mathilda Ave 392 Sunnyvale, CA 94089 393 USA 395 Phone: 396 Email: rex@juniper.net 397 Enke Chen 398 Cisco Systems 399 170 W. Tasman Drive 400 San Jose, CA 95134 401 USA 403 Phone: 404 Email: enkechen@cisco.com 406 Pradosh Mohapatra 407 Cisco Systems 408 170 W. Tasman Drive 409 San Jose, CA 95134 410 USA 412 Phone: 413 Email: pmohapat@cisco.com