idnits 2.17.1 draft-ietf-grow-bgp-wedgies-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 420. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 397. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 404. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 410. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 206 has weird spacing: '...|backup for 1...' == Line 289 has weird spacing: '...lt from unexp...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 5, 2004) is 7143 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (ref. '1') (Obsoleted by RFC 4271) Summary: 7 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Grow T. Griffin 2 Internet-Draft Intel 3 Expires: April 5, 2005 G. Huston 4 APNIC 5 October 5, 2004 7 BGP Wedgies 8 draft-ietf-grow-bgp-wedgies-00.txt 10 Status of this Memo 12 This document is an Internet-Draft and is subject to all provisions 13 of section 3 of RFC 3667. By submitting this Internet-Draft, each 14 author represents that any applicable patent or other IPR claims of 15 which he or she is aware have been or will be disclosed, and any of 16 which he or she become aware will be disclosed, in accordance with 17 RFC 3668. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as 22 Internet-Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on April 5, 2005. 37 Copyright Notice 39 Copyright (C) The Internet Society (2004). 41 Abstract 43 It has commonly been assumed that the Border Gateway Protocol (BGP) 44 is a tool for distributing reachability information in a manner that 45 creates forwarding paths in a deterministic manner. In this memo we 46 will describe a class of BGP configurations for which there is more 47 than one potential outcome, and where forwarding states other than 48 the intended state are equally stable, and that the stable state 49 where BGP converges may be selected by BGP in a non-deterministic 50 manner. These stable, but unintended, BGP states are termed here 51 "BGP Wedgies". 53 1. Introduction 55 It has commonly been assumed that the Border Gateway Protocol (BGP) 56 [1] is a tool for distributing reachability information in a manner 57 that creates forwarding paths in a deterministic manner. This is a 58 'problem statement' memo that describes a class of BGP configurations 59 for which there is more than one stable forwarding state. In this 60 class of configurations forwarding states other than the intended 61 state are equally stable, and the stable state where BGP converges 62 may be selected by BGP in a non-deterministic manner. 64 These stable, but unintended, BGP states are termed here "BGP 65 Wedgies". 67 2. Describing BGP Routing Policy 69 BGP routing policies generally reflect each network administrator's 70 objective to optimize their position with respect to their network's 71 cost, performance and reliability. 73 With respect to cost optimization, the local network's default 74 routing policy often reflects a local preference to prefer routes 75 learned from a customer to routes learned from some form of peering 76 exchange. In the same vein the local network is often configured to 77 prefer routes learned from a peer or a customer over those learned 78 from a directly connected upstream transit provider. These 79 preferences may be expressed via a local preference configuration 80 setting, where the local preference overrides the AS path length 81 metric of the base BGP operation. 83 In terms of engineering reliability in the inter-domain routing 84 environment it is commonly the case that a service provider may enter 85 into arrangements with two or more upstream transit providers, 86 passing routes to both providers , and receiving traffic from both 87 sources. If the path to one upstream fails the traffic will switch 88 to other links, and once the path is recovered, the traffic should 89 switch back. 91 In such situations of multiple upstream providers it is also 92 commonplace to place a relative preference on the providers, so that 93 one connection is regarded as a preferred, or "primary" connection, 94 and other connections are regarded as less preferred, or "backup" 95 connections. The intent is typically that the backup connections 96 will be used for traffic only for the duration of a failure in the 97 primary connection. 99 It is possible to express this primary / backup policy using local AS 100 path prepending, where the AS path is artificially lengthened towards 101 the backup providers, using additional instances of the local AS. 102 This is not a deterministic selection algorithm, as the selected 103 primary provider may in turn be using AS path prepending to its 104 backup upstream provider, and in certain cases the path through the 105 backup provider may still be selected as the shortest AS path length. 107 An alternative approach to routing policy specification uses BGP 108 communities [2]. In this case the provider publishes a set of 109 community values that allows the client to select the provider's 110 local preference. The client can use a community to mark a route as 111 "backup only" towards the backup provider, and "primary preferred' to 112 the primary provider. In this case the local preference overrides 113 the AS path length metric, so that if the route is marked "backup 114 only", the route will be selected only when there is no other source 115 of the route. 117 3. BGP Wedgies 119 The richness of local policy expression through the use of 120 communities, when coupled with the behavior of a distance vector 121 protocol like BGP leads to the observation that certain 122 configurations have more than one "solution", or more than one stable 123 BGP state. An example of such a situation is indicated in Figure 1. 125 +----+ +----+ 126 |AS 3|----------------|AS 4| 127 +----+ peer peer +----+ 128 |provider |provider 129 | | 130 |customer | 131 +----+ | 132 |AS 2| | 133 +----+ | 134 |provider | 135 | | 136 |customer |customer 137 +-------+ +----------+ 138 backup| |primary 139 +----+ 140 |AS 1| 141 +----+ 143 Figure 1 145 In this case AS1 has marked its advertisement of prefixes to AS2 as 146 "backup only", and its advertisement of prefixes to AS4 as "primary". 147 AS3 will hear AS4's advertisement across the peering link, and pick 148 of AS1's prefixes with the path "AS4, AS1". AS3 will advertise this 149 to AS2. AS2 will hear two paths to AS1, the first is by the direct 150 connection to AS1, and the second is via the path "AS3, AS4, AS1". 151 AS2 will prefer the longer path as the directly connected routes are 152 marked "backup only", and AS2's local preference decision will prefer 153 the AS3 advertisement over the AS1 advertisement. 155 This is the intended outcome of AS1's policy settings, where no 156 traffic passes from AS2 to AS1, and AS2, reaches AS1 via a path that 157 transits AS3 and AS4. 159 This intended outcome is achieved as long as AS1 announces its routes 160 on the primary path, to AS4, before announcing its backup routes to 161 AS2. 163 If the AS1 AS4 path is broken AS4 will withdraw its advertisement of 164 AS1's routes to AS3, who, in turn will send a withdrawal to AS2. 165 As2, will then select the backup path to AS1. AS2 will advertise 166 this path to AS3, and AS3 will advertise this path to AS4. Again, 167 this is part of the intended operation of the primary / backup policy 168 setting. 170 When connectivity between AS4 and AS1 is restored the BGP state will 171 not revert to the original state. AS4 will learn the primary path to 172 AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3, 173 using a default preference of preferring customer-advertised routes 174 over peer routes will continue to prefer the "AS2, AS1" path. AS3 175 will not pass any updates to AS2. After the restoration of the 176 circuit traffic from AS3 to AS1 and from AS2 to AS1 will be presented 177 to AS1 via the backup path, even through the primary path via AS4 is 178 in service. 180 The intended forwarding state can only be restored by AS1 181 deliberately bringing down its eBGP session with AS2, even though it 182 is carrying traffic. This will cause the BGP state to revert to the 183 intended configuration. 185 It is often the case that an AS will attempt to balance incoming 186 traffic across multiple providers, again using the primary / backup 187 mechanism. For some prefixes one link is configured as the primary 188 link, and the others as the backup link, while for other prefixes 189 another link is selected as the primary link. An example is shown in 190 Figure 2. 192 +----+ +----+ 193 |AS 3|----------------|AS 4| 194 +----+ peer peer +----+ 195 |provider |provider 196 | | 197 |customer |customer 198 +----+ +----+ 199 |AS 2| |AS 5| 200 +----+ +----+ 201 |provider |provider 202 | | 203 |customer |customer 204 +-------+ +----------+ 205 backup| |primary for 192.9.200.0/25 206 primary| |backup for 192.9.200.128/25 207 +----+ 208 |AS 1| 209 +----+ 211 Figure 2 213 The intended configuration has all incoming traffic for addresses in 214 the range 192.9.200.0/25 via the link from AS5, and all incoming 215 traffic for addresses in the range 192.9.200.128/25 from AS2. 217 In this case if the link between AS3 and AS4 is reset, AS3 will learn 218 both routes from AS2, and AS4 will learn both routes from AS5. As 219 these customer routes are preferred over peer routes, when the link 220 between AS3 and AS4 is restored, neither AS will alter its routing 221 behavior with respect to AS1's routes. This situation is now wedged, 222 in that there is no eBGP peering that can be reset that will flip BGP 223 back to the intended state. This is an instance of a BGP Wedgie. 225 The mediation is that AS1 has to withdraw the backup advertisements 226 on both paths and then operate for an interval without backup, and 227 then readvertise the prefixes. The length of the interval cannot be 228 readily determined in advance, as it has to be sufficiently long so 229 as to allow AS2 and AS5 to learn of an alternate path to AS1. At 230 this stage the backup routes can be readvertised. 232 4. Multi-Party BGP Wedgies 234 This situation can be more complex when three or more parties provide 235 upstream transit services to an AS. An example is indicated in 236 Figure 3. 238 +----+ +----+ 239 |AS 3|----------------|AS 4| 240 +----+ peer peer +----+ 241 ||provider |providerS 242 |+-----------+ | 243 |customer |customer | 244 +----+ +----+ | 245 |AS 2|-------|AS 5| | 246 +----+ peer +----+ | 247 |provider |provider | 248 | | | 249 |customer +-+customer |customer 250 +-------+ |+----------+ 251 backup| ||primary 252 +----+ 253 |AS 1| 254 +----+ 256 Figure 3 258 In this example the intended state is that AS2 and AS5 are both 259 backup providers, and AS4 is the primary provider. When the link 260 between AS1 and AS4 breaks and is subsequently restored, AS3 will 261 continue to direct traffic to AS1 via AS2 or AS5. In this case a 262 single reset of the link between AS2 and AS1 will not restore the 263 original intended BGP state, as the BGP-selected best route to AS1 264 will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5. 266 What AS1 is observing is incoming traffic on the backup link from 267 AS2. Resetting this connection will not restore traffic back to the 268 primary path, but instead will switch incoming traffic over to AS5. 269 The action required to correct the situation is to simultaneously 270 reset both the link to AS2, and also the link to AS5. This is not 271 necessarily an intuitive solution, as at any point on time only one 272 of these links will be carrying backup traffic, yet both BGP sessions 273 need to be brought down at the same time in order to commence 274 restoration of the intended primary and backup state. 276 5. BGP and Determinism 278 BGP does not behave deterministically in all cases, and, as a 279 consequence, there is intended and unintended non-determinism in BGP. 280 For example, the default final tie break in some implementations of 281 BGP is to prefer the longest-lived route. To achieve determinism in 282 this last step it would be necessary to use a comparison operator 283 that has a predictable outcome, such as a comparison of router 284 identifiers. This class of non-deterministic behavior is termed here 285 "intended" non-determinism, in that the policy interactions are, to 286 some extent, predictable by network administrators. 288 BGP is also able to generate outcomes that can be described as 289 "unintended non- determinism" that can result from unexpected policy 290 interactions. These outcomes do not represent misconfiguration in 291 the standard sense, since all policies may look completely rational 292 locally, but their interaction across multiple routing entities can 293 cause unintended outcomes, and BGP may reach a state that includes 294 such unintended outcomes in a non-deterministic manner. 296 Unintended non-determinism in BGP would not be as critical an issue 297 if all stable routings were guaranteed to be consistent with the 298 policy writer's intent. However, this is not always the case. The 299 above examples indicate that the operation of BGP allows multiple 300 stable states to exist from a single configuration state, where some 301 of these states are not consistent with the policy writer's intent. 302 These particular examples can be described as a form of "route 303 pinning", where the route is pinned to a non-preferred path. 305 The challenge for the network administrator is to ensure that an 306 intended state is maintained. Under certain circumstances this can 307 only be achieved by deliberate service disruption, involving the 308 withdrawal of routes being used to forward traffic, and 309 re-advertising routes in a certain sequence in order to induce an 310 intended BGP state. However, the knowledge that is required by any 311 single network operator administrator in order to understand the 312 reason why BGP has stabilized to an unintended state requires BGP 313 policy configuration knowledge of remote networks. In effect there 314 is insufficient local information for any single network 315 administrator to correctly identify the root cause of the unintended 316 BGP state, nor is there sufficient information to allow any single 317 network administrator to undertake a sequence of steps to rectify the 318 situation back to the intended routing state. 320 It is reasonable to anticipate that as the density of interconnection 321 increases, and also that the capability for policy-based preference 322 setting of learned and re-advertised routes will become more 323 expressive. It is therefore reasonable to anticipate that the 324 incidence of unintended BGP states will increase, and the ability to 325 understand the necessary sequence of route withdrawals and 326 re-advertisements will become more challenging to determine in 327 advance. 329 Whether this could lead to BGP routing system reaching a point where 330 each network consistently cannot direct traffic in a deterministic 331 manner is at this stage a matter of speculation. BGP Wedgies are an 332 illustration that a sufficiently complex interconnection topology, 333 coupled with a sufficiently expressive set of policy constructs, can 334 lead to a number of stable BGP states, rather than a single intended 335 state. As the topology complexity increases it is not possible to 336 deterministically predict which state the BGP routing system may 337 converge to. Paradoxically, the demands of inter-domain traffic 338 engineering appear to require both greater levels of expressive 339 capability in policy-based routing directives, operating across 340 denser interconnectivity topologies in a deterministic manner. This 341 may not be a sustainable outcome in BGP-based routing systems. 343 6. Security Considerations 345 BGP is a relaying protocol, where route information is received, 346 processed and forwarded. BGP contains no specific mechanisms to 347 prevent the unauthorized modification of the information by a 348 forwarding agent, allowing routing information to be modified, 349 deleted or false information to be inserted without the knowledge of 350 the originator of the routing information or any of the recipients. 352 The memo proposes no modifications to the BGP protocol, nor does it 353 propose any changes to the manner of deployment of BGP, and therefore 354 introduces no new factors in terms of the security and integrity of 355 inter-domain routing. 357 The memo illustrates that in attempting to create policy-based 358 outcomes relalting to path selection for incoming traffic it is 359 possible to generate BGP configurations where there are multiple 360 stable outcomes, rather than a single outcome. Furthermore, of these 361 instances of multiple outcomes, there are cases where the BGP 362 selection of a particular outcome is not a deterministic selection. 364 7. References 366 7.1 Normative References 368 [1] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", 369 RFC 1771, March 1995. 371 7.2 Informative References 373 [2] Chandrasekeran, R., Traina, P. and T. Li, "BGP Communities 374 Attribute", RFC 1997, August 1996. 376 Authors' Addresses 378 Tim Griffin 379 Intel Research Cambridge 381 EMail: Tim.Griffin@intel.com 383 Geoff Huston 384 Asia Pacific Network Information Centre 386 EMail: gih@apnic.net 388 Intellectual Property Statement 390 The IETF takes no position regarding the validity or scope of any 391 Intellectual Property Rights or other rights that might be claimed to 392 pertain to the implementation or use of the technology described in 393 this document or the extent to which any license under such rights 394 might or might not be available; nor does it represent that it has 395 made any independent effort to identify any such rights. Information 396 on the procedures with respect to rights in RFC documents can be 397 found in BCP 78 and BCP 79. 399 Copies of IPR disclosures made to the IETF Secretariat and any 400 assurances of licenses to be made available, or the result of an 401 attempt made to obtain a general license or permission for the use of 402 such proprietary rights by implementers or users of this 403 specification can be obtained from the IETF on-line IPR repository at 404 http://www.ietf.org/ipr. 406 The IETF invites any interested party to bring to its attention any 407 copyrights, patents or patent applications, or other proprietary 408 rights that may cover technology that may be required to implement 409 this standard. Please address the information to the IETF at 410 ietf-ipr@ietf.org. 412 Disclaimer of Validity 414 This document and the information contained herein are provided on an 415 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 416 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 417 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 418 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 419 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 420 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 422 Copyright Statement 424 Copyright (C) The Internet Society (2004). This document is subject 425 to the rights, licenses and restrictions contained in BCP 78, and 426 except as set forth therein, the authors retain all their rights. 428 Acknowledgment 430 Funding for the RFC Editor function is currently provided by the 431 Internet Society.