idnits 2.17.1 draft-chen-bgp-redist-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 9 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 10 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC4271, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 12, 2021) is 982 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force E. Chen 3 Internet Draft J. Yuan 4 Updates: 4271 (if approved) Palo Alto Networks 5 Intended status: Standards Track August 12, 2021 6 Expires: February 13, 2022 8 Deterministic Route Redistribution into BGP 9 draft-chen-bgp-redist-03.txt 11 Abstract 13 In this document we present several examples of non-deterministic 14 routing behavior involving route redistribution into BGP. In order 15 to eliminate such non-deterministic behavior, we propose an 16 enhancement to BGP route selection that would take into account the 17 administrative distance under certain conditions. We also recommend 18 that the LOCAL_PREF value be reduced for the redistributed backup 19 route, and be calculated automatically based on the administrative 20 distance. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on February 13, 2022. 39 Copyright Notice 41 Copyright (c) 2021 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 1. Introduction 56 A routing protocol usually downloads its best (or active) route to 57 the routing table, also known as Routing Information Base (RIB), 58 which in turn selects the best (or active) route to program the 59 forwarding table. 61 When comparing routes from different routing protocols, RIB typically 62 uses the "administrative distance" [ADMIN-DIS] (abbreviated as 63 "admin-distance" hereafter) as the tie breaker. The convention is 64 that a route with a lower admin-distance is more preferred, and that 65 is assumed in this document when specific admin-distance values are 66 given as examples. The admin-distance associated with a route in RIB 67 is commonly used to implement various routing schemes such as 68 designating primary and backup routes in a network. 70 On the other hand, the route selection in BGP [RFC4271] involves 71 comparing the LOCAL_PREF, AS_PATH and other BGP attributes. The 72 bestpath in BGP usually becomes the candidate for downloading to the 73 RIB, and for advertising to BGP neighbors. 75 It is common to redistribute routes from other routing protocols 76 (such as "static routing" [STATIC-R]) into BGP for route propagation. 77 This topic is briefly discussed in [Sect. 9.4, RFC4271]. A 78 redistributed route is usually assigned a fixed LOCAL_PREF value, and 79 has an empty AS_PATH attribute. 81 The interaction between RIB and BGP follows these general rules: 83 o A local route may be redistributed into BGP only if it is active 84 in RIB based on the admin-distance. 86 o Only the bestpath in BGP is downloaded to RIB. 88 Currently the admin-distance does not play any role in BGP route 89 selection. Due to the lack of such correlation between RIB and BGP, 90 when a backup route (based on the admin-distance) is redistributed 91 into BGP as shown in the next section, routing may converge to 92 different paths depending on the order of path arrivals. Such non- 93 deterministic routing behavior is clearly detrimental to network 94 operations. 96 In order to eliminate such non-deterministic behavior, we propose an 97 enhancement to BGP route selection that would take into account the 98 admin-distance under certain conditions. We also recommend that the 99 LOCAL_PREF value be reduced for the redistributed backup route, and 100 be calculated automatically based on the admin-distance. 102 The proposed enhancement and recommendation are backward compatible, 103 and can be deployed on an individual router basis. 105 Although the static routing is used as examples in the document, the 106 proposed enhancement and recommendation also apply when a route is 107 redistributed from other routing protocols into BGP. 109 1.1. Requirements Language 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 113 "OPTIONAL" in this document are to be interpreted as described in BCP 114 14 [RFC2119] [RFC8174] when, and only when, they appear in all 115 capitals, as shown here. 117 2. The Problem 119 In this section several examples are presented to illustrate the non- 120 deterministic routing behavior involving route redistribution into 121 BGP. 123 2.1. On a Single Router 125 Consider an example in which there are two paths for the same 126 destination on a single router. As shown in the following table, the 127 primary path A is received from an external BGP neighbor, and the 128 backup path B is a static route and is configured for redistribution 129 into BGP. 131 Path Type Admin_Distance LOCAL_PREF AS_PATH 132 ----------------------------------------------------- 133 A EBGP 20 100 65535 134 B Static 150 100 -- 136 Depending on the order of path arrivals, the path that arrives first 137 would be selected as the bestpath in both RIB and BGP. 139 More specifically, if Path A is received in BGP and is downloaded to 140 RIB first, it would remain as the best in RIB (due to the admin- 141 distance) even when Path B shows up in RIB later. In this case Path A 142 would be the best one in both RIB and BGP. 144 If Path B shows up in RIB and is redistributed into BGP first, it 145 would remain as the best in BGP (due to it being a local route or 146 with a shorter AS-PATH) even when Path A is received in BGP later. In 147 this case Path B would be the best one in both RIB and BGP. 149 2.2. Network-wide Behavior 151 Consider the following example in which Routers R1, R2 and R3 are 152 part of a provider network and IBGP sessions are maintained among 153 them. There are two customer connections, a primary connection on R1 154 and a backup connection on R2. The customer route X is statically 155 routed on both R1 and R2, and is redistributed into BGP. On R2, the 156 backup path for X is configured with a less preferred admin-distance 157 than the one for IBGP paths. 159 +----+ 160 | R3 | 161 +----+ 162 / \ 163 / ibgp \ 164 +----+ +----+ 165 | R1 |----------| R2 | 166 +----+ +----+ 167 | | 168 | | 169 | | 170 X X 172 While R1 consistently selects the local static route as the best one, 173 the route selection on R2 would be non-deterministic. As shown in 174 the following figure, there are potentially two BGP paths A and B for 175 X on R2, with Path A learned from R1 and Path B locally 176 redistributed. 178 Path Type Admin_Distance LOCAL_PREF AS_PATH 179 ----------------------------------------------------- 180 A IBGP 200 100 -- 181 B Static 210 100 -- 183 Depending on the order of arrivals of these two paths, the path that 184 arrives first would be selected as the bestpath in both RIB and BGP. 186 More specifically, if Path A is received in BGP and is downloaded to 187 RIB first, it would remain as the best in RIB (due to the admin- 188 distance) even when Path B shows up in RIB later. In this case A 189 would be the best one in both RIB and BGP. 191 If Path B shows up in RIB and is redistributed into BGP first, it 192 would remain as the best in BGP (due to it being a local route or 193 with a lower IGP metric) even when Path A is received in BGP later. 194 In this case Path B would be the best one in both RIB and BGP. 196 The non-deterministic route selection on R2 may cause other nodes 197 (like R3) to converge to different paths as well. The routing 198 behavior in the network would be non-deterministic, and inconsistent 199 with the intended routing design. 201 A network using BGP route reflection [RFC4456] (or BGP confederation 202 [RFC5065]) may experience additional cases of network-wide "non- 203 deterministic" routing behavior. For example in the following 204 figure, when both R1 and R2 advertise their respective local routes 205 to the route reflector (RR) simultaneously, the RR would use the "IGP 206 metric" to choose the bestpath between the two IBGP paths. As a 207 result the network may or may not converge to the primary path. 209 +----+ 210 | RR | 211 +----+ 212 / \ 213 / \ 214 +----+ +----+ 215 | R1 | | R2 | 216 +----+ +----+ 217 | | 218 | | 219 | | 220 X X 222 3. The Proposed Solution 224 In order to eliminate the non-deterministic routing behavior 225 involving route redistribution into BGP, we propose an enhancement to 226 BGP route selection that would take into account the admin-distance 227 under certain conditions. We also recommend that the LOCAL_PREF 228 value be reduced for the redistributed backup route, and calculated 229 automatically based on the admin-distance. 231 3.1. Enhancement to BGP Route Selection 233 To make it deterministic on a single router regarding the route being 234 sourced and advertised to the network, we propose that the following 235 procedure be added prior to the step that compares the degrees of 236 preference of routes and identifies the route that has the highest 237 degree of preference, as described in Sect. 9.1.2 [RFC4271] for BGP 238 route selection: 240 When comparing a locally redistributed route with another route 241 that is either locally aggregated or received from an external 242 neighbor, favor the one with a more preferred admin-distance. The 243 admin-distance for a BGP route is obtained as follows: 245 For a locally redistributed route, it is inherited from the 246 route being redistributed from RIB. 248 For a non-redistributed route, it is of the same value as the 249 admin-distance assigned to the route for the purpose of RIB 250 installation (regardless of whether it is actually installed 251 in RIB). 253 It should be noted that IBGP paths are deliberately excluded from the 254 algorithm. As the admin-distance is not propagated by BGP, involving 255 IBGP paths in the admin-distance comparison can easily result in 256 unintended routing behavior and even route churns. To influence 257 route selection in a network, use the LOCAL_PREF attribute as 258 described in the next section. 260 3.2. Setting the LOCAL_PREF Value 262 When a non-BGP route is designated as a backup route in the network, 263 it should be assigned a less preferred admin-distance than the value 264 for IBGP routes. When such a route is redistributed into BGP, the 265 LOCAL_PREF value for the redistributed route SHOULD be set lower than 266 the LOCAL_PREF values of the primary route and other more preferred 267 routes. 269 Assuming the default LOCAL_PREF value is assigned to the primary 270 route, then the LOCAL_PREF value for the redistributed backup route 271 can be calculated automatically as described by the following pseudo- 272 code: 274 if (redist_admin_distance > ibgp_admin_distance) { 275 offset = redist_admin_distance - ibgp_admin_distance; 276 if (default_local_pref > offset) 277 calculated_local_pref = default_local_pref - offset; 278 else 279 calculated_local_pref = 0; 280 } 282 in which 284 o "redist_admin_distance" is the admin-distance of the route 285 being redistributed. 287 o "ibgp_admin_distance" is the admin-distance for IBGP routes on 288 the local router. 290 o "default_local_pref" is the default LOCAL_PREF value in the 291 network. 293 o "calculated_local_pref" is the calculated LOCAL_PREF value for 294 the redistributed route. 296 Clearly, in order for the calculated LOCAL_PREF value to truly 297 reflect the intended routing design, the admin-distance needs to be 298 assigned properly. Guideline is provided on assigning the admin- 299 distance in the next section. 301 This algorithm would not apply if the "default_local_pref" is not 302 assigned to the primary route, in which case manual configuration 303 should be used. 305 In addition to lowering the LOCAL_PREF value, it may be necessary to 306 modify the parameters for the aforementioned redistributed route 307 pertaining to any vendor-specific route selection criteria preceding 308 the LOCAL_PREF comparison. For example, the "weight" parameter 309 exists in a number of implementations in which case the "weight" for 310 the aforementioned redistributed route should be made equal to the 311 default "weight" for IBGP routes. 313 3.3. Admin-distance Assignment 315 In order to achieve the desired routing scheme using the LOCAL_PREF 316 calculated from the admin-distance, coordination would be necessary 317 for the admin-distance assignment when the same destination is 318 redistributed from multiple routers in a network. 320 While the default LOCAL_PREF value is usually consistent in a 321 network, the default admin-distance for IBGP routes can vary from one 322 node to another in a multi-vendor network. 324 The coordination of the admin-distance assignment can be simplified 325 by examining the "role" that a non-BGP route is supposed to play 326 (such as being the primary, the secondary or the tertiary), and then 327 associate an "offset" to the route based on its role. Among the 328 routes involved, the less preferred a route is, the higher the offset 329 should be. Then the admin-distance for the route can be assigned as 330 (ibgp_admin_distance + offset), and the desired LOCAL_PREF value 331 would be automatically calculated using the algorithm described in 332 the previous section. 334 As an example shown in the following table, there are three non-BGP 335 paths for the same destination on separate routers A, B and C in the 336 network and they are designated as the primary, the secondary and the 337 tertiary. The default LOCAL_PREF value is 100 in the network, and the 338 "ibgp_admin_distance" is 200 on the router with the secondary path, 339 and 170 on the router with the tertiary path. 341 The desired LOCAL_PREF values for the redistributed routes are 342 obtained using the algorithm and procedures described in this 343 document. 345 Router Role Offset Admin_Distance LOCAL_PREF 346 -------------------------------------------------------------- 347 A Primary - 50 100 (Default) 348 B Secondary 10 200 + 10 100 - 10 349 C Tertiary 20 170 + 20 100 - 20 351 3.4. Configuration Option 353 Configuration can be used to achieve the equivalent outcome by 354 setting the appropriate LOCAL_PREF value (and also the "weight" 355 parameter if applicable) for the redistributed backup route. It can 356 also be used to override the LOCAL_PREF value calculated based on the 357 admin-distance value of the redistributed route as proposed in this 358 document. 360 When route redistribution is part of a more complex routing scheme 361 beyond what can be automated with the proposed solution, 362 configuration can also be used following the general principles 363 discussed in this document. 365 4. IANA Considerations 367 This document has no request for IANA. 369 5. Security Considerations 371 The solution proposed in this document does not change the underlying 372 security or confidentiality issues inherent in the existing BGP 373 [RFC4271]. 375 6. Acknowledgments 377 The authors would like to thank Naiming Shen, Acee Lindem and Robert 378 Raszuk for inputs and discussions. 380 7. References 382 7.1. Normative References 384 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 385 Requirement Levels", BCP 14, RFC 2119, 386 DOI 10.17487/RFC2119, March 1997, 387 . 389 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 390 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 391 DOI 10.17487/RFC4271, January 2006, 392 . 394 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 395 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 396 May 2017, . 398 7.2. Informative References 400 [STATIC-R] Static routing, Wikipedia, 401 https://en.wikipedia.org/wiki/Static_routing 403 [ADMIN-DIS] Administrative distance, Wikipedia, 404 https://en.wikipedia.org/wiki/Administrative_distance. 406 [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route 407 Reflection: An Alternative to Full Mesh Internal BGP 408 (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, 409 . 411 [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous 412 System Confederations for BGP", RFC 5065, 413 DOI 10.17487/RFC5065, August 2007, 414 . 416 Authors' Addresses 418 Enke Chen 419 Palo Alto Networks 421 Email: enchen@paloaltonetworks.com 423 Jenny Yuan 424 Palo Alto Networks 426 Email: jyuan@paloaltonetworks.com