idnits 2.17.1 draft-ietf-grow-bgp-wedgies-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1.a on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 424. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 401. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 408. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 414. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: This document is an Internet-Draft and is subject to all provisions of Section 3 of RFC 3667. By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 209 has weird spacing: '...|backup for 1...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 29, 2004) is 7327 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) Summary: 7 errors (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 GROW T. Griffin 3 Internet-Draft University of Cambridge 4 Expires: September 27, 2004 G. Huston 5 APNIC 6 March 29, 2004 8 BGP Wedgies 9 draft-ietf-grow-bgp-wedgies-01.txt 11 Status of this Memo 13 This document is an Internet-Draft and is subject to all provisions 14 of section 3 of RFC 3667. By submitting this Internet-Draft, each 15 author represents that any applicable patent or other IPR claims of 16 which he or she is aware have been or will be disclosed, and any of 17 which he or she become aware will be disclosed, in accordance with 18 RFC 3668. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as 23 Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on September 27, 2004. 38 Copyright Notice 40 Copyright (C) The Internet Society (2004). 42 Abstract 44 It has commonly been assumed that the Border Gateway Protocol (BGP) 45 is a tool for distributing reachability information in a manner that 46 creates forwarding paths in a deterministic manner. In this memo we 47 will describe a class of BGP configurations for which there is more 48 than one potential outcome, and where forwarding states other than 49 the intended state are equally stable, and that the stable state 50 where BGP converges may be selected by BGP in a non-deterministic 51 manner. These stable, but unintended, BGP states are termed here 52 "BGP Wedgies". 54 1. Introduction 56 It has commonly been assumed that the Border Gateway Protocol (BGP) 57 [RFC1771] is a tool for distributing reachability information in a 58 manner that creates forwarding paths in a deterministic manner. This 59 is a 'problem statement' memo that describes a class of BGP 60 configurations for which there is more than one stable forwarding 61 state. In this class of configurations forwarding states other than 62 the intended state are equally stable, and the stable state where BGP 63 converges may be selected by BGP in a non-deterministic manner. 65 These stable, but unintended, BGP states are termed here "BGP 66 Wedgies". 68 2. Describing BGP Routing Policy 70 BGP routing policies generally reflect each network administrator's 71 objective to optimize their position with respect to their network's 72 cost, performance and reliability. 74 With respect to cost optimization, the local network's default 75 routing policy often reflects a local preference to prefer routes 76 learned from a customer to routes learned from some form of peering 77 exchange. In the same vein the local network is often configured to 78 prefer routes learned from a peer or a customer over those learned 79 from a directly connected upstream transit provider. These 80 preferences may be expressed via a local preference configuration 81 setting, where the local preference overrides the AS path length 82 metric of the base BGP operation. 84 In terms of engineering reliability in the inter-domain routing 85 environment it is commonly the case that a service provider may enter 86 into arrangements with two or more upstream transit providers, 87 passing routes to both providers , and receiving traffic from both 88 sources. If the path to one upstream fails the traffic will switch 89 to other links, and once the path is recovered, the traffic should 90 switch back. 92 In such situations of multiple upstream providers it is also 93 commonplace to place a relative preference on the providers, so that 94 one connection is regarded as a preferred, or "primary" connection, 95 and other connections are regarded as less preferred, or "backup" 96 connections. The intent is typically that the backup connections 97 will be used for traffic only for the duration of a failure in the 98 primary connection. 100 It is possible to express this primary / backup policy using local AS 101 path prepending, where the AS path is artificially lengthened towards 102 the backup providers, using additional instances of the local AS. 103 This is not a deterministic selection algorithm, as the selected 104 primary provider may in turn be using AS path prepending to its 105 backup upstream provider, and in certain cases the path through the 106 backup provider may still be selected as the shortest AS path length. 108 An alternative approach to routing policy specification uses BGP 109 communities [RFC1997]. In this case the provider publishes a set of 110 community values that allows the client to select the provider's 111 local preference setting. The client can use a community to mark a 112 route as "backup only" towards the backup provider, and "primary 113 preferred' to the primary provider, assuming both providers suppoprt 114 community values with such semantics. In this case the local 115 preference overrides the AS path length metric, so that if the route 116 is marked "backup only", the route will be selected only when there 117 is no other source of the route. 119 3. BGP Wedgies 121 The richness of local policy expression through the use of 122 communities, when coupled with the behavior of a distance vector 123 protocol like BGP leads to the observation that certain 124 configurations have more than one "solution", or more than one stable 125 BGP state. An example of such a situation is indicated in Figure 1. 127 +----+ +----+ 128 |AS 3|----------------|AS 4| 129 +----+ peer peer +----+ 130 |provider |provider 131 | | 132 |customer | 133 +----+ | 134 |AS 2| | 135 +----+ | 136 |provider | 137 | | 138 |customer |customer 139 +-------+ +----------+ 140 backup| |primary 141 +----+ 142 |AS 1| 143 +----+ 145 Figure 1 147 In this case AS1 has marked its advertisement of prefixes to AS2 as 148 "backup only", and its advertisement of prefixes to AS4 as "primary". 149 AS3 will hear AS4's advertisement across the peering link, and pick 150 of AS1's prefixes with the path "AS4, AS1". AS3 will advertise this 151 to AS2. AS2 will hear two paths to AS1, the first is by the direct 152 connection to AS1, and the second is via the path "AS3, AS4, AS1". 153 AS2 will prefer the longer path as the directly connected routes are 154 marked "backup only", and AS2's local preference decision will prefer 155 the AS3 advertisement over the AS1 advertisement. 157 This is the intended outcome of AS1's policy settings, where no 158 traffic passes from AS2 to AS1, and AS2, reaches AS1 via a path that 159 transits AS3 and AS4. 161 This intended outcome is achieved as long as AS1 announces its routes 162 on the primary path, to AS4, before announcing its backup routes to 163 AS2. 165 If the AS1 - AS4 path is broken, causing aBGP sesssion failure 166 between AS1 and AS4, then AS4 will withdraw its advertisement of 167 AS1's routes to AS3, who, in turn will send a withdrawal to AS2. 168 As2, will then select the backup path to AS1. AS2 will advertise 169 this path to AS3, and AS3 will advertise this path to AS4. Again, 170 this is part of the intended operation of the primary / backup policy 171 setting. 173 When connectivity between AS4 and AS1 is restored the BGP state will 174 not revert to the original state. AS4 will learn the primary path to 175 AS1, and readvertise this to AS3 using the path "AS4, AS1". AS3, 176 using a default preference of preferring customer-advertised routes 177 over peer routes will continue to prefer the "AS2, AS1" path. AS3 178 will not pass any updates to AS2. After the restoration of the 179 circuit traffic from AS3 to AS1 and from AS2 to AS1 will be presented 180 to AS1 via the backup path, even through the primary path via AS4 is 181 in service. 183 The intended forwarding state can only be restored by AS1 184 deliberately bringing down its eBGP session with AS2, even though it 185 is carrying traffic. This will cause the BGP state to revert to the 186 intended configuration. 188 It is often the case that an AS will attempt to balance incoming 189 traffic across multiple providers, again using the primary / backup 190 mechanism. For some prefixes one link is configured as the primary 191 link, and the others as the backup link, while for other prefixes 192 another link is selected as the primary link. An example is shown in 193 Figure 2. 195 +----+ +----+ 196 |AS 3|----------------|AS 4| 197 +----+ peer peer +----+ 198 |provider |provider 199 | | 200 |customer |customer 201 +----+ +----+ 202 |AS 2| |AS 5| 203 +----+ +----+ 204 |provider |provider 205 | | 206 |customer |customer 207 +-------+ +----------+ 208 backup| |primary for 192.9.200.0/25 209 primary| |backup for 192.9.200.128/25 210 +----+ 211 |AS 1| 212 +----+ 214 Figure 2 216 The intended configuration has all incoming traffic for addresses in 217 the range 192.9.200.0/25 via the link from AS5, and all incoming 218 traffic for addresses in the range 192.9.200.128/25 from AS2. 220 In this case if the link between AS3 and AS4 is reset, AS3 will learn 221 both routes from AS2, and AS4 will learn both routes from AS5. As 222 these customer routes are preferred over peer routes, when the link 223 between AS3 and AS4 is restored, neither AS will alter its routing 224 behavior with respect to AS1's routes. This situation is now wedged, 225 in that there is no eBGP peering that can be reset that will flip BGP 226 back to the intended state. This is an instance of a BGP Wedgie. 228 The restoration path here is that AS1 has to withdraw the backup 229 advertisements on both paths and operate for an interval without 230 backup, and then readvertise the backup prefix advertisements. The 231 length of the interval cannot be readily determined in advance, as it 232 has to be sufficiently long so as to allow AS2 and AS5 to learn of an 233 alternate path to AS1. At this stage the backup routes can be 234 readvertised. 236 4. Multi-Party BGP Wedgies 238 This situation can be more complex when three or more parties provide 239 upstream transit services to an AS. An example is indicated in 240 Figure 3. 242 +----+ +----+ 243 |AS 3|----------------|AS 4| 244 +----+ peer peer +----+ 245 ||provider |provider 246 |+-----------+ | 247 |customer |customer | 248 +----+ +----+ | 249 |AS 2|-------|AS 5| | 250 +----+ peer +----+ | 251 |provider |provider | 252 | | | 253 |customer +-+customer |customer 254 +-------+ |+----------+ 255 backup| ||primary 256 +----+ 257 |AS 1| 258 +----+ 260 Figure 3 262 In this example the intended state is that AS2 and AS5 are both 263 backup providers, and AS4 is the primary provider. When the link 264 between AS1 and AS4 breaks and is subsequently restored, AS3 will 265 continue to direct traffic to AS1 via AS2 or AS5. In this case a 266 single reset of the link between AS2 and AS1 will not restore the 267 original intended BGP state, as the BGP-selected best route to AS1 268 will switch to AS5, and AS2 and AS3 will learn a path to AS1 via AS5. 270 What AS1 is observing is incoming traffic on the backup link from 271 AS2. Resetting this connection will not restore traffic back to the 272 primary path, but instead will switch incoming traffic over to AS5. 273 The action required to correct the situation is to simultaneously 274 reset both the link to AS2, and also the link to AS5. This is not 275 necessarily an intuitive solution, as at any point on time only one 276 of these links will be carrying backup traffic, yet both BGP sessions 277 need to be brought down at the same time in order to commence 278 restoration of the intended primary and backup state. 280 5. BGP and Determinism 282 BGP does not behave deterministically in all cases, and, as a 283 consequence, there is intended and unintended non-determinism in BGP. 284 For example, the default final tie break in some implementations of 285 BGP is to prefer the longest-lived route. To achieve determinism in 286 this last step it would be necessary to use a comparison operator 287 that has a predictable outcome, such as a comparison of router 288 identifiers. This class of non-deterministic behavior is termed here 289 "intended" non-determinism, in that the policy interactions are, to 290 some extent, predictable by network administrators. 292 BGP is also able to generate outcomes that can be described as 293 "unintended non- determinism" that can result from unexpected policy 294 interactions. These outcomes do not represent misconfiguration in 295 the standard sense, since all policies may look completely rational 296 locally, but their interaction across multiple routing entities can 297 cause unintended outcomes, and BGP may reach a state that includes 298 such unintended outcomes in a non-deterministic manner. 300 Unintended non-determinism in BGP would not be as critical an issue 301 if all stable routings were guaranteed to be consistent with the 302 policy writer's intent. However, this is not always the case. The 303 above examples indicate that the operation of BGP allows multiple 304 stable states to exist from a single configuration state, where some 305 of these states are not consistent with the policy writer's intent. 306 These particular examples can be described as a form of "route 307 pinning", where the route is pinned to a non-preferred path. 309 The challenge for the network administrator is to ensure that an 310 intended state is maintained. Under certain circumstances this can 311 only be achieved by deliberate service disruption, involving the 312 withdrawal of routes being used to forward traffic, and 313 re-advertising routes in a certain sequence in order to induce an 314 intended BGP state. However, the knowledge that is required by any 315 single network operator administrator in order to understand the 316 reason why BGP has stabilized to an unintended state requires BGP 317 policy configuration knowledge of remote networks. In effect there 318 is insufficient local information for any single network 319 administrator to correctly identify the root cause of the unintended 320 BGP state, nor is there sufficient information to allow any single 321 network administrator to undertake a sequence of steps to rectify the 322 situation back to the intended routing state. 324 It is reasonable to anticipate that as the density of interconnection 325 increases, and also that the capability for policy-based preference 326 setting of learned and re-advertised routes will become more 327 expressive. It is therefore reasonable to anticipate that the 328 incidence of unintended BGP states will increase, and the ability to 329 understand the necessary sequence of route withdrawals and 330 re-advertisements will become more challenging to determine in 331 advance. 333 Whether this could lead to BGP routing system reaching a point where 334 each network consistently cannot direct traffic in a deterministic 335 manner is at this stage a matter of speculation. BGP Wedgies are an 336 illustration that a sufficiently complex interconnection topology, 337 coupled with a sufficiently expressive set of policy constructs, can 338 lead to a number of stable BGP states, rather than a single intended 339 state. As the topology complexity increases it is not possible to 340 deterministically predict which state the BGP routing system may 341 converge to. Paradoxically, the demands of inter-domain traffic 342 engineering appear to require both greater levels of expressive 343 capability in policy-based routing directives, operating across 344 denser interconnectivity topologies in a deterministic manner. This 345 may not be a sustainable outcome in BGP-based routing systems. 347 6. Security Considerations 349 BGP is a relaying protocol, where route information is received, 350 processed and forwarded. BGP contains no specific mechanisms to 351 prevent the unauthorized modification of the information by a 352 forwarding agent, allowing routing information to be modified, 353 deleted or false information to be inserted without the knowledge of 354 the originator of the routing information or any of the recipients. 356 The memo proposes no modifications to the BGP protocol, nor does it 357 propose any changes to the manner of deployment of BGP, and therefore 358 introduces no new factors in terms of the security and integrity of 359 inter-domain routing. 361 The memo illustrates that in attempting to create policy-based 362 outcomes relating to path selection for incoming traffic it is 363 possible to generate BGP configurations where there are multiple 364 stable outcomes, rather than a single outcome. Furthermore, of these 365 instances of multiple outcomes, there are cases where the BGP 366 selection of a particular outcome is not a deterministic selection. 368 7. References 370 7.1 Normative References 372 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 373 (BGP-4)", RFC 1771, March 1995. 375 7.2 Informative References 377 [RFC1997] Chandrasekeran, R., Traina, P. and T. Li, "BGP Communities 378 Attribute", RFC 1997, August 1996. 380 Authors' Addresses 382 Tim Griffin 383 University of Cambridge 385 EMail: Timothy.Griffin@cl.cam.ac.uk 387 Geoff Huston 388 Asia Pacific Network Information Centre 390 EMail: gih@apnic.net 392 Intellectual Property Statement 394 The IETF takes no position regarding the validity or scope of any 395 Intellectual Property Rights or other rights that might be claimed to 396 pertain to the implementation or use of the technology described in 397 this document or the extent to which any license under such rights 398 might or might not be available; nor does it represent that it has 399 made any independent effort to identify any such rights. Information 400 on the procedures with respect to rights in RFC documents can be 401 found in BCP 78 and BCP 79. 403 Copies of IPR disclosures made to the IETF Secretariat and any 404 assurances of licenses to be made available, or the result of an 405 attempt made to obtain a general license or permission for the use of 406 such proprietary rights by implementers or users of this 407 specification can be obtained from the IETF on-line IPR repository at 408 http://www.ietf.org/ipr. 410 The IETF invites any interested party to bring to its attention any 411 copyrights, patents or patent applications, or other proprietary 412 rights that may cover technology that may be required to implement 413 this standard. Please address the information to the IETF at 414 ietf-ipr@ietf.org. 416 Disclaimer of Validity 418 This document and the information contained herein are provided on an 419 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 420 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 421 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 422 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 423 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 424 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 426 Copyright Statement 428 Copyright (C) The Internet Society (2004). This document is subject 429 to the rights, licenses and restrictions contained in BCP 78, and 430 except as set forth therein, the authors retain all their rights. 432 Acknowledgment 434 Funding for the RFC Editor function is currently provided by the 435 Internet Society.