idnits 2.17.1 draft-ietf-idr-best-external-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 4, 2010) is 5196 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 3345 Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Marques 3 Internet-Draft R. Fernando 4 Intended status: Standards Track E. Chen 5 Expires: August 8, 2010 P. Mohapatra 6 Cisco Systems 7 February 4, 2010 9 Advertisement of the best external route in BGP 10 draft-ietf-idr-best-external-01.txt 12 Abstract 14 The base BGP specifications prevent a BGP speaker from advertising 15 any route that is not the best route for a BGP destination. This 16 document specifies a modification of this rule. Routes are divided 17 into two categories, "external" and "internal". A specification is 18 provided for choosing a "best external route" (for a particular value 19 of the Network Layer Reachability Information). A BGP speaker is 20 then allowed to advertise its "best external route" to its internal 21 BGP peers, even if that is not the best route for the destination. 22 The document explains why advertising the best external route can 23 improve convergence time without causing routing loops. Additional 24 benefits include reduction of inter-domain churn and avoidance of 25 permanent route oscillation. The document also generalizes the 26 notions of "internal" and "external" so that they can be applied to 27 Route Reflector Clusters and Autonomous System Confederations. 29 Status of this Memo 31 This Internet-Draft is submitted to IETF in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as Internet- 37 Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt. 47 The list of Internet-Draft Shadow Directories can be accessed at 48 http://www.ietf.org/shadow.html. 50 This Internet-Draft will expire on August 8, 2010. 52 Copyright Notice 54 Copyright (c) 2010 IETF Trust and the persons identified as the 55 document authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date of 60 publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with respect 62 to this document. Code Components extracted from this document must 63 include Simplified BSD License text as described in Section 4.e of 64 the Trust Legal Provisions and are provided without warranty as 65 described in the BSD License. 67 This document may contain material from IETF Documents or IETF 68 Contributions published or made publicly available before November 69 10, 2008. The person(s) controlling the copyright in some of this 70 material may not have granted the IETF Trust the right to allow 71 modifications of such material outside the IETF Standards Process. 72 Without obtaining an adequate license from the person(s) controlling 73 the copyright in such materials, this document may not be modified 74 outside the IETF Standards Process, and derivative works of it may 75 not be created outside the IETF Standards Process, except to format 76 it for publication as an RFC or to translate it into languages other 77 than English. 79 Table of Contents 81 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 82 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 83 2. Algorithm for selection of best external route . . . . . . . . 4 84 3. Advertisement Rules . . . . . . . . . . . . . . . . . . . . . . 5 85 4. Consistency between routing and forwarding . . . . . . . . . . 5 86 5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 7 87 5.1. Fast Connectivity Restoration . . . . . . . . . . . . . . . 7 88 5.2. Inter-Domain Churn Reduction . . . . . . . . . . . . . . . 8 89 5.3. Reducing Persistent IBGP oscillation . . . . . . . . . . . 8 90 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 8 91 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 92 8. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 93 9. Normative References . . . . . . . . . . . . . . . . . . . . . 8 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 96 1. Introduction 98 The base BGP specifications prevent a BGP speaker from advertising 99 any route that is not the best route for a BGP destination. This 100 document specifies a modification of this rule. Routes are divided 101 into two categories, "external" and "internal". A specification is 102 provided for choosing a "best external route" (for a particular value 103 of the Network Layer Reachability Information). A BGP speaker is 104 then allowed to advertise its "best external route" to its internal 105 BGP peers, even if that is not the best route for the destination. 106 The document explains why advertising the best external route can 107 improve convergence time without causing routing loops. Additional 108 benefits include reduction of inter-domain churn and avoidance of 109 permanent route oscillation. 111 The document also generalizes the notions of "internal" and 112 "external" so that they can be applied to Route Reflector Clusters 113 [RFC4456] and Autonomous System Confederations [RFC5065]. More 114 specifically, two routers in the same route reflector cluster having 115 an IBGP session between them are defined to be "internal" peers, 116 whereas two routers in different clusters having an IBGP session are 117 defined to be "external" peers. Similarly, two routers in the same 118 member AS of a confederation having an IBGP session between them are 119 "internal" peers, whereas two routers in different member ASs of a 120 confederation having a confed EBGP session between them are defined 121 to be "external" peers. The definition of "best external route" 122 ensues from this definition in that it is the most preferred route 123 among those received from the "external" neighbors. 125 Advertising the best external route, when different from the best 126 route, presents additional information into an IBGP mesh which may be 127 of value for several purposes including: 129 o Faster restoration of connectivity, by providing additional paths, 130 that may be used to fail over in case the primary path becomes 131 invalid or is withdrawn. 133 o Reducing inter-domain churn and traffic blackholing due to the 134 readily available alternate path. 136 o Reducing the potential for situations of permanent IBGP route 137 oscillation, as discussed in some scenarios [RFC3345]. 139 o Improving selection of lower MED routes from the same neighboring 140 AS. 142 This document defines procedures to select the best external route 143 for each destination. It also describes how above benefits are 144 realized with best external route announcement with the help of 145 certain scenarios. 147 1.1. Requirements Language 149 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 150 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 151 document are to be interpreted as described in RFC 2119 [RFC2119]. 153 2. Algorithm for selection of best external route 155 Given that the intent in advertising an external route, when the best 156 route for the same destination is an internal route, is to provide 157 additional information into the IBGP mesh into which a route is 158 participating, it is desirable to take into account the routes 159 received from internal neighbors in the selection process. 161 We propose a route selection algorithm that selects a total order 162 between routes and which selects the same best route as the one 163 currently specified [RFC4271]. 165 In order to achieve this, we need to introduce the concept of route 166 group. For a given NLRI, suppose the BGP decision process has run 167 through all the steps prior to the MED comparison step (as defined in 168 section 9.1.2.2 of [RFC4271]. Look at the set of routes that are 169 still under consideration at that time. Now partition this set into 170 a number of disjoint route groups, where two routes are in the same 171 group if and only if the neighbor AS of each route is the same. 173 Routes are ordered within a group via MED or subsequent route 174 selection rules. 176 The order of all routes for the same destination is determined by the 177 order of the best route in each group. 179 As an example, the following set of received routes: 181 Path AS MED rtr_id 182 a 1 10 10 183 b 2 5 1 184 c 1 5 5 185 d 2 20 20 186 e 2 30 30 187 f 3 10 20 188 Figure 1: Path Attribute Table 190 Would yield the following order (from the most to the least 191 preferred): 193 b < d < e < c < a < f 195 In this example, comparison of the best route within each group 196 provides the sequence (b < c < f). The remaining routes are ordered 197 in relation to their respective group best. 199 The first route in the above ordering is indeed the best route for a 200 given destination. Eliminating the best route and executing the 201 above steps leads us to a new total order of the routes. The route 202 to be advertised to a particular domain is selected by choosing the 203 most preferred route that is external to that particular domain in 204 the above order. Note that whenever the overall best route is 205 external it will automatically be selected by this algorithm. 207 3. Advertisement Rules 209 1. In an AS domain, if a router has installed an internal route as 210 best, it should advertise its "best external route" (as defined 211 in the draft) to its internal neighbors. 213 2. In a Cluster domain, if a router (route reflector) has installed 214 an external route as best, it should advertise its "best internal 215 route" to its external neighbors. (Advertising to internal 216 neighbors is unchanged.) Similarly, if the route reflector has 217 installed an internal route as best, it should advertise its 218 "best external route" to its internal (client) peers. In order 219 for the reflector to be able to advertise the best external route 220 into the cluster, it is necessary that client-to-client 221 reflection be disabled, since its advertisement may otherwise 222 contain the best route within the cluster domain. 224 3. In a Confederation Member domain, if a router (confederation 225 border router) has installed an internal route as best, it 226 advertises its best external route to its internal neighbors. 227 However, if it has installed an external route as best, it 228 advertises its best internal route to its external neighbors. 230 4. Consistency between routing and forwarding 232 The BGP protocol, as defined in [RFC1771], specifies that a BGP 233 speaker shall advertise to its internal peers the route with the 234 highest degree of preference among routes to the same destination 235 received from external neighbors. 237 This section discusses problems present with the approach described 238 in [RFC1771] and the next section offers an alternative algorithm to 239 select a best external route which can be advertised to an IBGP mesh. 241 The internal update advertisement rules contained in the original 242 BGP-4 specification [RFC1771] can lead to situations where traffic is 243 forwarded through a route other than the route advertised by BGP. 245 Inconsistencies between forwarding and routing are highly 246 undesirable. Service providers use BGP with the dual objective of 247 learning reachability information and expressing policy over network 248 resources. The latter assumes that forwarding follows routing 249 information. 251 Consider the Autonomous system presented in figure 1, where r1 ... r4 252 are members of a single IBGP mesh and routes a, b, and c are received 253 from external peers. 255 AS 1 (c) 256 | 257 +----+ +----+ 258 | r1 |...........| r2 | 259 +----+ +----+ 260 . 261 . 262 . 263 . 264 . 265 . 266 +----+ +----+ 267 | r3 |...........| r4 | --- ebgp --- AS X 268 +----+ +----+ 269 / \ 270 / \ 271 AS 1 (a) AS 2 (b) 273 Figure 2: Inconsistency in Routing 274 Path AS MED rtr_id 275 a 1 10 1 276 b 2 5 10 277 c 1 5 5 279 Figure 3: Path Attribute Table - 2 281 Following the rules as specified in [RFC1771], router r3 will select 282 path (b) received from AS 2 as its overall best to install in the 283 Loc-Rib, since path (b) is preferable to path (c), the lowest MED 284 route from AS 1. However for the purposes of Internal Update route 285 selection, it will ignore the presence of path (c), and elect (a) as 286 its advertisement, via the router-id tie-breaking rule. 288 In this scenario, router r4 will receive (c) from r1 and (a) from r3. 289 It will pick the lowest MED route (c) and advertise it out via ebgp 290 to AS X. However at this point routing is inconsistent with 291 forwarding as traffic received from AS X will be forwarded towards AS 292 2, while the ebgp advertisement is being made for an AS 1 path. 294 Routing policies are typically specified in terms of neighboring 295 ASes. In the situation above, assuming that AS 1 is network for 296 which this AS provides transit services while AS 2 and AS X are peer 297 networks, one can easily see how the inconsistency between routing 298 and forwarding would lead to transit being inadvertently provided 299 between AS X and AS 2. This could lead to persistent forwarding 300 loops. 302 Inconsistency between routing and forwarding may happen, whenever a 303 bgp speaker chooses to advertise an external route into IBGP that is 304 different from the overall best route and its overall best is 305 external. 307 5. Applications 309 5.1. Fast Connectivity Restoration 311 When two exits are available to reach a particular destination and 312 one is preferred over the other, the availability of an alternate 313 path provides fast connectivity restoration when the primary path 314 fails. 316 Restoration can be quick since the alternate path is already at hand. 317 The border router could precompute the backup route and preinstall it 318 in FIB ready to be switched when the primary goes away. Note that 319 this requires the border router that's the backup to also preinstall 320 the secondary path and switch to it on failure. 322 5.2. Inter-Domain Churn Reduction 324 Within an AS, the non availability of backup best leads to a border 325 router sending a withdraw upstream when the primary fails. This 326 leads to inter-domain churn and packet loss for the time the network 327 takes to converge to the alternate path. Having the alternate path 328 will reduces the churn and eliminates packet loss. 330 5.3. Reducing Persistent IBGP oscillation 332 Advertising the best external route, according to the algorithm 333 described in this document will reduce the possibility of route 334 oscillation by introducing additional information into the IBGP 335 system. 337 For a permanent oscillation condition to occur, it is necessary that 338 a circular dependency between paths occurs such that the selection of 339 a new best path by a router, in response to a received IBGP 340 advertisement, causes the withdrawal of information that another 341 router depends on in order to generate the original event. 343 In vanilla BGP, when only the best overall route is advertised, as in 344 most implementations, oscillation can occur whenever there are 2 or 345 clusters/sub-ASes such that at least one cluster has more than one 346 path that can potentially contribute to the dependency. 348 6. Acknowledgments 350 This document greatly benefits from the comments of Yakov Rekhter, 351 John Scudder, Eric Rosen, and Jenny Yuan. 353 7. IANA Considerations 355 This document has no actions for IANA. 357 8. Security Considerations 359 There are no additional security risks introduced by this design. 361 9. Normative References 363 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 364 (BGP-4)", RFC 1771, March 1995. 366 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 367 Requirement Levels", BCP 14, RFC 2119, March 1997. 369 [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, 370 "Border Gateway Protocol (BGP) Persistent Route 371 Oscillation Condition", RFC 3345, August 2002. 373 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 374 Protocol 4 (BGP-4)", RFC 4271, January 2006. 376 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 377 Reflection: An Alternative to Full Mesh Internal BGP 378 (IBGP)", RFC 4456, April 2006. 380 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 381 System Confederations for BGP", RFC 5065, August 2007. 383 Authors' Addresses 385 Pedro Marques 386 Cisco Systems 387 170 W. Tasman Drive 388 San Jose, CA 95134 389 USA 391 Phone: 392 Email: roque@cisco.com 394 Rex Fernando 395 Cisco Systems 396 170 W. Tasman Drive 397 San Jose, CA 95134 398 USA 400 Phone: 401 Email: rex@cisco.com 402 Enke Chen 403 Cisco Systems 404 170 W. Tasman Drive 405 San Jose, CA 95134 406 USA 408 Phone: 409 Email: enkechen@cisco.com 411 Pradosh Mohapatra 412 Cisco Systems 413 170 W. Tasman Drive 414 San Jose, CA 95134 415 USA 417 Phone: 418 Email: pmohapat@cisco.com