idnits 2.17.1 draft-ietf-idr-bgp-analysis-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 19 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC1264], [RFC1774]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 370 has weird spacing: '...extreme insta...' == Line 489 has weird spacing: '... AS4 is a mul...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 2003) is 7653 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1774' is mentioned on line 45, but not defined == Outdated reference: A later version (-26) exists of draft-ietf-idr-bgp4-19 -- Obsolete informational reference (is this intentional?): RFC 1105 (Obsoleted by RFC 1163) -- Duplicate reference: RFC1105, mentioned in 'RFC1163', was also mentioned in 'RFC1105'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC1163') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1264 (Obsoleted by RFC 4794) -- Duplicate reference: RFC1105, mentioned in 'RFC1267', was also mentioned in 'RFC1163'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC1267') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1519 (Obsoleted by RFC 4632) -- Obsolete informational reference (is this intentional?): RFC 1771 (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2842 (Obsoleted by RFC 3392) Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT David Meyer 2 draft-ietf-idr-bgp-analysis-02.txt Keyur Patel 3 Category Informational 4 Expires: October 2003 April 2003 6 BGP-4 Protocol Analysis 7 9 Status of this Document 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This document is a product of an individual. Comments are solicited 31 and should be addressed to the author(s). 33 Copyright Notice 35 Copyright (C) The Internet Society (2003). All Rights Reserved. 37 Abstract 39 The purpose of this report is to document how the requirements for 40 advancing a routing protocol from Draft Standard to full Standard 41 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 43 This report satisfies the requirement for "the second report", as 44 described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill 45 the requirement, this report augments RFC 1774 [RFC1774] and 46 summarizes the key features of BGP protocol, and analyzes the 47 protocol with respect to scaling and performance. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Key Features and algorithms of the BGP protocol. . . . . . . . 4 53 2.1. Key Features. . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.2. BGP Algorithms. . . . . . . . . . . . . . . . . . . . . . . 5 55 2.3. BGP Finite State Machine (FSM). . . . . . . . . . . . . . . 6 56 3. BGP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 7 57 4. BGP Persistent Peer Oscillations . . . . . . . . . . . . . . . 8 58 5. BGP Performance characteristics and Scalability. . . . . . . . 8 59 5.1. Link bandwidth and CPU utilization. . . . . . . . . . . . . 8 60 5.1.1. CPU utilization. . . . . . . . . . . . . . . . . . . . . 9 61 5.1.2. Memory requirements. . . . . . . . . . . . . . . . . . . 11 62 6. BGP Policy Expressiveness and its Implications . . . . . . . . 12 63 6.1. Existence of Unique Stable Routings . . . . . . . . . . . . 13 64 6.2. Existence of Stable Routings. . . . . . . . . . . . . . . . 14 65 7. Applicability. . . . . . . . . . . . . . . . . . . . . . . . . 15 66 8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 16 67 9. Informative References . . . . . . . . . . . . . . . . . . . . 17 68 10. Author's Addresses. . . . . . . . . . . . . . . . . . . . . . 18 69 11. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 18 71 1. Introduction 73 BGP-4 is an inter-autonomous system routing protocol designed for 74 TCP/IP internets. Version 1 of the BGP protocol was published in RFC 75 1105 [RFC1105]. Since then BGP versions 2, 3, and 4 have been 76 developed. Version 2 was documented in RFC 1163 [RFC1163]. Version 3 77 is documented in RFC 1267 [RFC1267]. Version 4 is documented in the 78 [BGP4] (version 4 of BGP will hereafter be referred to as BGP). The 79 changes between versions are explained in Appendix A of [BGP4]. 80 Possible applications of BGP in the Internet are documented in RFC 81 1772 [RFC1772]. 83 2. Key Features and algorithms of the BGP protocol 85 This section summarizes the key features and algorithms of the BGP 86 protocol. BGP is an inter-autonomous system routing protocol; it is 87 designed to be used between multiple autonomous systems. BGP assumes 88 that routing within an autonomous system is done by an intra- 89 autonomous system routing protocol. BGP does not make any assumptions 90 about intra-autonomous system routing protocols deployed within the 91 various autonomous systems. Specifically, BGP does not require all 92 autonomous systems to run the same intra-autonomous system routing 93 protocol (i.e., interior gateway protocol or IGP). 95 Finally, note that BGP is a real inter-autonomous system routing 96 protocol, and as such it imposes no constraints on the underlying 97 Internet topology. The information exchanged via BGP is sufficient to 98 construct a graph of autonomous systems connectivity from which 99 routing loops may be pruned and many routing policy decisions at the 100 autonomous system level may be enforced. 102 2.1. Key Features 104 The key features of the protocol are the notion of path attributes 105 and aggregation of network layer reachability information (NLRI). 106 Path attributes provide BGP with flexibility and extensibility. Path 107 attributes are partitioned into well-known and optional. The 108 provision for optional attributes allows experimentation that may 109 involve a group of BGP routers without affecting the rest of the 110 Internet. New optional attributes can be added to the protocol in 111 much the same way that new options are added to, say, the Telnet 112 protocol [RFC854]. 114 One of the most important path attributes is the Autonomous System 115 Path, or AS_PATH. AS reachability information traverses the Internet, 116 this information is augmented by the list of autonomous systems that 117 have been traversed thus far, forming the AS_PATH. The AS_PATH allows 118 straightforward suppression of the looping of routing information. In 119 addition, the AS_PATH serves as a powerful and versatile mechanism 120 for policy-based routing. 122 BGP enhances the AS_PATH attribute to include sets of autonomous 123 systems as well as lists via the AS_SET attribute. This extended 124 format allows generated aggregate routes to carry path information 125 from the more specific routes used to generate the aggregate. It 126 should be noted however, that as of this writing, AS_SETs are rarely 127 used in the Internet [ROUTEVIEWS]. 129 2.2. BGP Algorithms 131 BGP uses an algorithm that is neither a pure distance vector 132 algorithm or a pure link state algorithm. It is instead a modified 133 distance vector algorithm that uses path information to avoid 134 traditional distance vector problems. Each route within BGP pairs 135 destination with path information to that destination. Path 136 information (also known as AS_PATH information) is stored within the 137 AS_PATH attribute in BGP. This allows BGP to reconstruct large 138 portions of overall topology whenever required. 140 BGP uses an incremental update strategy in order to conserve 141 bandwidth and processing power. That is, after initial exchange of 142 complete routing information, a pair of BGP routers exchanges only 143 changes to that information. Such an incremental update design 144 requires reliable transport between a pair of BGP routers to function 145 correctly. BGP solves this problem by using TCP for reliable 146 transport. 148 In addition to incremental updates, BGP has added the concept of 149 route aggregation so that information about groups of networks may be 150 aggregated and sent as a single Network Layer Reachability (NLRI). 152 Finally, note that BGP is a self-contained protocol. That is, BGP 153 specifies how routing information is exchanged both between BGP 154 speakers in different autonomous systems, and between BGP speakers 155 within a single autonomous system. 157 2.3. BGP Finite State Machine (FSM) 159 The BGP FSM is a set of rules that are applied to a BGP speaker's set 160 of configured peers for the BGP operation. A BGP implementation 161 requires that a BGP speaker must connect to and listen on TCP port 162 179 for accepting any new BGP connections from its peers. The BGP 163 Finite State Machine, or FSM, must be initiated and maintained for 164 each new incoming and outgoing peer connections. However, in steady 165 state operation, there will be only one BGP FSM per connection per 166 peer. 168 There may exist a temporary period where in a BGP peer may have 169 separate incoming and outgoing connections resulting into two 170 different BGP FSMs for a peer (instead of one). This can be resolved 171 following BGP connection collision rules defined in the [BGP4]. 173 Following are different states of BGP FSM for its peers: 175 IDLE: State when BGP peer refuses any incoming 176 connections. 178 CONNECT: State in which BGP peer is waiting for 179 its TCP connection to be completed. 181 ACTIVE: State in which BGP peer is trying to acquire a 182 peer by listening and accepting TCP connection. 184 OPENSENT: BGP peer is waiting for OPEN message from its 185 peer. 187 OPENCONFIRM: BGP peer is waiting for KEEPALIVE or NOTIFICATION 188 message from its peer. 190 ESTABLISHED: BGP peer connection is established and exchanges 191 UPDATE, NOTIFICATION, and KEEPALIVE messages with 192 its peer. 194 There are different BGP events that operate on above mentioned states 195 of BGP FSM for its peers. These BGP events are used for initiating and 196 terminating peer connections. They also assist BGP in identifying any 197 persistent peer connection oscillations and provide a mechanism 198 for controlling them. 200 Following are different BGP events: 202 Manual Start: Manually start the peer connection. 204 Manual Stop: Manually stop the peer connection. 206 Automatic Start: Local system automatically starts the peer 207 connection. 209 Manual start with 210 passive TCP flag: Local system administrator manually starts the 211 peer connection with peer in passive mode. 213 Automatic start 214 with passive TCP flag: Local system administrator automatically starts 215 the peer connection with peer in passive mode. 217 Automatic start 218 with bgp_stop_flap 219 option set: Local system administrator automatically starts 220 the peer connection with peer oscillation 221 damping enabled. 223 Automatic start with 224 bgp_stop_flap option 225 set and passive TCP 226 establishment 227 option set: Local system administrator automatically starts 228 the peer connection with peer oscillation 229 damping enabled and with peer in passive mode. 231 Automatic stop: Local system automatically stops the 232 BGP connection. 234 Both, Manual Start and Manual Stop are mandatory BGP events. All 235 other events are optional. 237 3. BGP Capabilities 239 The BGP Capability mechanism [RFC2842] provides an easy and flexible 240 way to introduce new features within the protocol. In particular, the 241 BGP capability mechanism allows peers to negotiate various optional 242 features during startup. This allows the base BGP protocol to contain 243 only essential functionality, while at the same time providing a 244 flexible mechanism for signaling protocol extensions. 246 4. BGP Persistent Peer Oscillations 248 Ideally, whenever a BGP speaker detects an error in any peer 249 connection, it shuts down the peer and changes its FSM state to IDLE. 250 BGP speaker requires a Start event to re-initiate its idle peer 251 connection. If the error remains persistent and BGP speaker generates 252 Start event automatically then it may result in persistent peer 253 flapping. However, although peer oscillation is found to be wide- 254 spread in BGP implementations, methods for preventing persistent peer 255 oscillations are outside the scope of base BGP protocol 256 specification. 258 5. BGP Performance characteristics and Scalability 260 In this section, we provide "order of magnitude" answers to the 261 questions of how much link bandwidth, router memory and router CPU 262 cycles the BGP protocol will consume under normal conditions. In 263 particular, we will address the scalability of BGP and its 264 limitations. 266 It is important to note that BGP does not require all the routers 267 within an autonomous system to participate in the BGP protocol. In 268 particular, only the border routers that provide connectivity between 269 the local autonomous system and their adjacent autonomous systems 270 need participate in BGP. The ability to constrain the set of BGP 271 speakers is one way to address scaling issues. 273 5.1. Link bandwidth and CPU utilization 275 Immediately after the initial BGP connection setup, BGP peers 276 exchange complete set of routing information. If we denote the total 277 number of routes in the Internet by N, the mean AS distance of the 278 Internet by M (distance at the level of an autonomous system, 279 expressed in terms of the number of autonomous systems), the total 280 number of autonomous systems in the Internet by A, and assume that 281 the networks are uniformly distributed among the autonomous systems, 282 then the worst case amount of bandwidth consumed during the initial 283 exchange between a pair of BGP speakers is 285 MR = O(N + M * A) 287 The following table illustrates the typical amount of bandwidth 288 consumed during the initial exchange between a pair of BGP speakers 289 based on the above assumptions (ignoring bandwidth consumed by the 290 BGP Header). For purposes of the estimates here, we will calculate MR 291 = 4 * (N + (M * A)). 293 # NLRI Mean AS Distance # AS's Bandwidth (MR) 294 ---------- ---------------- ------ ---------------- 295 40,000 15 400 184,000 bytes 296 100,000 10 10,000 800,000 bytes 297 120,000 10 15,000 1,080,000 bytes 298 140,000 15 20,000 1,760,000 bytes 300 [note that most of this bandwidth is consumed by the NLRI exchange] 302 BGP was created specifically to reduce the size of the set of NLRI 303 entries which have to be carried and exchanged by border routers. The 304 aggregation scheme, defined in RFC 1519 [RFC1519], describes the 305 provider-based aggregation scheme in use in today's Internet. 307 Due to the advantages of advertising a few large aggregate blocks 308 instead of many smaller class-based individual networks, it is 309 difficult to estimate the actual reduction in bandwidth and 310 processing that BGP has provided over BGP-3. If we simply enumerate 311 all aggregate blocks into their individual class-based networks, we 312 would not take into account "dead" space that has been reserved for 313 future expansion. The best metric for determining the success of 314 BGP's aggregation is to sample the number NLRI entries in the 315 globally connected Internet today and compare it to projected growth 316 rates before BGP was deployed. 318 At the time of this writing, the full set of exterior routes carried 319 by BGP is approximately 120,000 network entries [ROUTEVIEWS]. 321 5.1.1. CPU utilization 323 An important and fundamental feature of BGP is that BGP's CPU 324 utilization depends only on the stability of the Internet. If the 325 Internet is stable, then the only link bandwidth and router CPU 326 cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE 327 messages. The KEEPALIVE messages are exchanged only between peers. 328 The suggested frequency of the exchange is 30 seconds. The KEEPALIVE 329 messages are quite short (19 octets), and require virtually no 330 processing. As a result, the bandwidth consumed by the KEEPALIVE 331 messages is about 5 bits/sec. Operational experience confirms that 332 the overhead (in terms of bandwidth and CPU) associated with the 333 KEEPALIVE messages should be viewed as negligible. 335 During periods of Internet instability, changes to the reachability 336 information are passed between routers in UPDATE messages. If we 337 denote the number of routing changes per second by C, then in the 338 worst case the amount of bandwidth consumed by the BGP can be 339 expressed as O(C * M). The greatest overhead per UPDATE message 340 occurs when each UPDATE message contains only a single network. It 341 should be pointed out that in practice routing changes exhibit strong 342 locality with respect to the AS path. That is, routes that change are 343 likely to have common AS path. In this case, multiple networks can be 344 grouped into a single UPDATE message, thus significantly reducing the 345 amount of bandwidth required (see also Appendix F.1 of [BGP4]). 347 Since in the steady state the link bandwidth and router CPU cycles 348 consumed by the BGP protocol are dependent only on the stability of 349 the Internet, it follows that BGP should have no scaling problems in 350 the areas of link bandwidth and router CPU utilization. This assumes 351 that as the Internet grows, the overall stability of the inter-AS 352 connectivity of the Internet can be controlled. In particular, while 353 the size of the IPv4 Internet routing table is bounded by O(2^32 * 354 M), (where M is a slow-moving function describing the AS 355 interconnectivity of the network), no such bound can be formulated 356 for the dynamic properties (i.e., stability) of BGP. Finally, since 357 the dynamic properties of the network cannot be quantitatively 358 bounded, stability must be addressed via heuristics such as BGP 359 Route Flap Damping [RFC2439]. Due to the nature of BGP, such damping 360 should be viewed as a matter local to an autonomous system matter 361 (see also Appendix F.2 of [BGP4]). 363 It may also be instructive to compare bandwidth and CPU requirements 364 of BGP with the Exterior Gateway Protocol (EGP). While with BGP the 365 complete information is exchanged only at the connection 366 establishment time, with EGP the complete information is exchanged 367 periodically (usually every 3 minutes). Note that both for BGP and 368 for EGP the amount of information exchanged is roughly on the order 369 of the number of networks reachable via a peer that sends the 370 information. Therefore, even if one assumes extreme instabilities of 371 BGP, its worst case behavior will be the same as the steady state 372 behavior of its predecessor, EGP. 374 Operational experience with BGP showed that the incremental update 375 approach employed by BGP provides qualitative improvement in both 376 bandwidth and CPU utilization when compared with complete periodic 377 updates used by EGP (see also presentation by Dennis Ferguson at the 378 Twentieth IETF, March 11-15, 1991, St. Louis). 380 5.1.2. Memory requirements 382 To quantify the worst case memory requirements for BGP, we denote the 383 total number of networks in the Internet by N, the mean AS distance 384 of the Internet by M (distance at the level of an autonomous system, 385 expressed in terms of the number of autonomous systems), the total 386 number of autonomous systems in the Internet by A, and the total 387 number of BGP speakers that a system is peering with by K (note that 388 K will usually be dominated by the total number of the BGP speakers 389 within a single autonomous system). Then the worst case memory 390 requirements (MR) can be expressed as 392 MR = O((N + M * A) * K) 394 It is interesting to note that prior to the introduction of BGP in 395 the NSFNET Backbone, memory requirements on the NSFNET Backbone 396 routers running EGP were on the order of O(N *K). 398 Since a mean AS distance M is a slow moving function of the 399 interconnectivity ("meshiness") of the Internet, for all practical 400 purposes the worst case router memory requirements are on the order 401 of the total number of networks in the Internet times the number of 402 peers the local system is peering with. We expect that the total 403 number of networks in the Internet will grow much faster than the 404 average number of peers per router. As a result, BGP's memory scaling 405 properties are linearly related to the total number of networks in 406 the Internet. 408 The following table illustrates typical memory requirements of a 409 router running BGP. It is assumed that each network is encoded as 410 four bytes, each AS is encoded as two bytes, and each networks is 411 reachable via some fraction of all of the peers (# BGP peers/per 412 net). For purposes of the estimates here, we will calculate MR = 413 ((N*4) + (M*A)*2) * K. 415 # Networks Mean AS Distance # AS's # BGP peers/per net Memory Req (MR) 416 ---------- ---------------- ------ ------------------- -------------- 417 100,000 20 3,000 20 1,040,000 418 100,000 20 15,000 20 1,040,000 419 120,000 10 15,000 100 75,000,000 420 140,000 15 20,000 100 116,000,000 422 In analyzing BGP's memory requirements, we focus on the size of the 423 forwarding table (and ignoring implementation details). In 424 particular, we derive upper bounds for the size of the forwarding 425 table. For example, at the time of this writing, the forwarding 426 tables of a typical backbone router carry on the order of 120,000 427 entries. Given this number, one might ask whether it would be 428 possible to have a functional router with a table that will have 429 1,000,000 entries. Clearly the answer to this question is independent 430 of BGP. Interestingly, in his review of the BGP protocol for the BGP 431 review committee in March of 1990, Paul Tsuchiya noted that "BGP does 432 not scale well. This is not really the fault of BGP. It is the fault 433 of the flat IP address space. Given the flat IP address space, any 434 routing protocol must carry network numbers in its updates." The 435 introduction of the provider based aggregation schemes (e.g., RFC 436 1519 [RFC1519]) have sought to address this issue, to the extent 437 possible, within the context of current addressing architectures. 439 6. BGP Policy Expressiveness and its Implications 441 BGP is unique among deployed IP routing protocols in that routing is 442 determined using semantically rich routing policies. Although routing 443 policies are usually the first thing that comes to a network 444 operator's mind concerning BGP, it is important to note that the 445 languages and techniques for specifying BGP routing policies are not 446 actually a part of the protocol specification (see RFC 2622 [RFC2622] 447 for an example of such a policy language). In addition, the BGP 448 specification contains few restrictions, either explicitly or 449 implicitly, on routing policy languages. These languages have 450 typically been developed by vendors and have evolved through 451 interactions with network engineers in an environment lacking vendor- 452 independent standards. 454 The complexity of typical BGP configurations, at least in provider 455 networks, has been steadily increasing. Router vendors typically 456 provided hundreds of special commands for use in the configuration of 457 BGP, and this command set is continually expanding. For example, BGP 458 communities [RFC1997] allow policy writers to selectively attach tags 459 to routes and use these to signal policy information to other BGP- 460 speaking routers. Many providers allow customers, and sometimes 461 peers, to send communities that determine the scope and preference of 462 their routes. These developments have more and more given the task of 463 writing BGP configurations aspects associated with open-ended 464 programming. This has allowed network operators to encode complex 465 policies in order to address many unforeseen situations, and has 466 opened the door for a great deal of creativity and experimentation in 467 routing policies. This policy flexibility is one of the main reasons 468 why BGP is so well suited to the commercial environment of the 469 current Internet. 471 However, this rich policy expressiveness has come with a cost that is 472 often not recognized. In particular, it is possible to construct 473 locally defined routing policies that can lead to unexpected global 474 routing anomalies such as (unintended) nondeterminism and to protocol 475 divergence. If the interacting policies causing such anomalies are 476 defined in different autonomous systems, then these problems can be 477 very difficult to debug and correct. In the following sections, we 478 describe two such cases relating to the existence (or lack thereof) 479 of stable routings. 481 6.1. Existence of Unique Stable Routings 483 One can easily construct sets of policies for which BGP can not 484 guarantee that stable routings are unique. This can be illustrated by 485 the following simple example. Consider the example of four Autonomous 486 Systems, AS1, AS2, AS3, and AS4. AS1 and AS2 are peers, and they 487 provide transit for AS3 and AS4 respectively, Suppose further that 488 AS3 provides transit for AS4 (in this case AS3 is a customer of AS1, 489 and AS4 is a multihomed customer of both AS3 and AS4). AS4 may want 490 to use the link to AS3 as a "backup" link, and sends AS3 a community 491 value that AS3 has configured to lower the preference of AS4's routes 492 to a level below that of its upstream provider, AS1. The intended 493 "backup routing" to AS4 is illustrated here: 495 AS1 ------> AS2 496 /|\ | 497 | | 498 | | 499 | \|/ 500 AS3 ------- AS4 502 That is, the AS3-AS4 link is intended to be used only when the 503 AS2-AS4 link is down (for outbound traffic, AS4 simply gives routes 504 from AS2 a higher local preference). This is a common scenario in 505 today's Internet. But note that this configuration has another stable 506 solution: 508 AS1 ------- AS2 509 | | 510 | | 511 | | 512 \|/ \|/ 513 AS3 ------> AS4 515 In this case, AS3 does not translate the "depref my route" community 516 received from AS4 into a "depref my route" community for AS1, and so 517 if AS1 hears the route to AS4 that transits AS3 it will prefer that 518 route (since AS3 is a customer). This state could be reached, for 519 example, by starting in the "correct" backup routing shown first, 520 bringing down the AS2-AS4 BGP session, and then bringing it back up. 521 In general, BGP has no way to prefer the "intended" solution over the 522 anomalous one, and which is picked will depend on the unpredictable 523 order of BGP messages. 525 While this example is relatively simple, many operators may fail to 526 recognize that the true source of the problem is that the BGP 527 policies of ASes can interact in unexpected ways, and that these 528 interactions can result in multiple stable routings. One can imagine 529 that the interactions could be much more complex in the real 530 Internet. We suspect that such anomalies will only become more common 531 as BGP continues to evolve with richer policy expressiveness. For 532 example, extended communities provide an even more flexible means of 533 signaling information within and between autonomous systems than is 534 possible with RFC 1997 communities. At the same time, applications of 535 communities by network operators are evolving to address complex 536 issues of inter-domain traffic engineering. 538 6.2. Existence of Stable Routings 540 One can also construct a set of policies for which BGP can not 541 guarantee that a stable routing exists (or worse, that a stable 542 routing will ever be found). For example, RFC 3345 [RFC3345] 543 documents several scenarios that lead to route oscillations 544 associated with the use of the Multi-Exit Discriminator or MED, 545 attribute. Route oscillation will happen in BGP when a set of 546 policies has no solution. That is, when there is no stable routing 547 that satisfies the constraints imposed by policy, then BGP has no 548 choice by to keep trying. In addition, BGP configurations can have a 549 stable routing, yet the protocol may not be able to find it; BGP can 550 "get trapped" down a blind alley that has no solution. 552 Protocol divergence is not, however, a problem associated solely with 553 use of the MED attribute. This potential exists in BGP even without 554 the use of the MED attribute. Hence, like the unintended 555 nondeterminism described in the previous section, this type of 556 protocol divergence is an unintended consequence of the unconstrained 557 nature of BGP policy languages. 559 7. Applicability 561 In this section we answer the question of which environments is BGP 562 well suited, and for which environments it is not suitable. This 563 question is partially answered in Section 2 of RFC 1771 [RFC1771], 564 which states: 566 "To characterize the set of policy decisions that can be enforced 567 using BGP, one must focus on the rule that an AS advertises to its 568 neighbor ASs only those routes that it itself uses. This rule 569 reflects the "hop-by-hop" routing paradigm generally used 570 throughout the current Internet. Note that some policies cannot 571 be supported by the "hop-by-hop" routing paradigm and thus require 572 techniques such as source routing to enforce. For example, BGP 573 does not enable one AS to send traffic to a neighbor AS intending 574 that the traffic take a different route from that taken by traffic 575 originating in the neighbor AS. On the other hand, BGP can 576 support any policy conforming to the "hop-by-hop" routing 577 paradigm. Since the current Internet uses only the "hop-by-hop" 578 routing paradigm and since BGP can support any policy that 579 conforms to that paradigm, BGP is highly applicable as an inter-AS 580 routing protocol for the current Internet." 582 One of the important points here is that the BGP protocol contains 583 only the functionality that is essential, while at the same time 584 providing a flexible mechanism within the protocol that allow us to 585 extend its functionality. For example, BGP capabilities provide an 586 easy and flexible way to introduce new features within the protocol. 587 Finally, since BGP was designed with flexibility and extensibility in 588 mind, new and/or evolving requirements can be addressed via existing 589 mechanisms. 591 To summarize, BGP is well suitable as an inter-autonomous system 592 routing protocol for the IPv4 Internet that is based on IP [RFC791] 593 as the Internet Protocol and "hop-by-hop" routing paradigm. Finally, 594 BGP is equally applicable to IPv6 [RFC2460] internets. 596 8. Acknowledgments 598 We would like to thank Paul Traina for authoring previous versions of 599 this document. Tim Griffin and Randy Presuhn also provided many 600 insightful comments on earlier versions of this document. 602 9. Informative References 604 [BGP4] Rekhter, Y., T. Li., and S. Hares, Editors, "A 605 Border Gateway Protocol 4 (BGP-4)", 606 draft-ietf-idr-bgp4-19.txt. Work in progress. 608 [RFC791] "INTERNET PROTOCOL", DARPA INTERNET PROGRAM 609 PROTOCOL SPECIFICATION, RFC 791, September, 610 1981. 612 [RFC854] Postel, J. and J. Reynolds, "TELNET PROTOCOL 613 SPECIFICATION", RFC 854, May, 1983. 615 [RFC1105] Lougheed, K., and Y. Rekhter, "Border Gateway 616 Protocol BGP", RFC 1105, June 1989. 618 [RFC1163] Lougheed, K., and Rekhter, Y, "Border Gateway 619 Protocol BGP", RFC 1105, June 1990. 621 [RFC1264] Hinden, R., "Internet Routing Protocol 622 Standardization Criteria", RFC 1264, October 1991. 624 [RFC1267] Lougheed, K., and Rekhter, Y, "Border Gateway 625 Protocol 3 (BGP-3)", RFC 1105, October 1991. 627 [RFC1519] Fuller, V., Li. T., Yu J., and K. Varadhan, 628 "Classless Inter-Domain Routing (CIDR): an 629 Address Assignment and Aggregation Strategy", RFC 630 1519, September 1993. 632 [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway 633 Protocol 4 (BGP-4)", RFC 1771, March 1995. 635 [RFC1772] Rekhter, Y., and P. Gross, Editors, "Application 636 of the Border Gateway Protocol in the Internet", 637 RFC 1772, March 1995. 639 [RFC1997] Chandra. R, and T. Li, "BGP Communities 640 Attribute", RFC 1997, August, 1996. 642 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, 643 "BGP Route Flap Damping", RFC 2439, November 644 1998. 646 [RFC2460] Deering, S, and R. Hinden, "Internet Protocol, 647 Version 6 (IPv6) Specification", RFC 2460, 648 December, 1998. 650 [RFC2622] C. Alaettinoglu, et al., "Routing Policy 651 Specification Language (RPSL)" RFC 2622, May, 652 1998. 654 [RFC2842] Chandra, R. and J. Scudder, "Capabilities 655 Advertisement with BGP-4", RFC 2842, May 2000. 657 [RFC3345] McPherson, D., Gill, V., Walton, D., and 658 A. Retana, "Border Gateway Protocol (BGP) Persistent 659 Route Oscillation Condition", RFC 3345, 660 August, 2002. 662 [ROUTEVIEWS] Meyer, D., "The Route Views Project", 663 http://www.routeviews.org 665 10. Author's Addresses 667 David Meyer 668 Email: dmm@maoz.com 670 Keyur Patel 671 Cisco Systems 672 Email: keyupate@cisco.com 674 11. Full Copyright Statement 676 Copyright (C) The Internet Society (2003). All Rights Reserved. 678 This document and translations of it may be copied and furnished to 679 others, and derivative works that comment on or otherwise explain it 680 or assist in its implementation may be prepared, copied, published 681 and distributed, in whole or in part, without restriction of any 682 kind, provided that the above copyright notice and this paragraph are 683 included on all such copies and derivative works. However, this 684 document itself may not be modified in any way, such as by removing 685 the copyright notice or references to the Internet Society or other 686 Internet organizations, except as needed for the purpose of 687 developing Internet standards in which case the procedures for 688 copyrights defined in the Internet Standards process must be 689 followed, or as required to translate it into languages other than 690 English. 692 The limited permissions granted above are perpetual and will not be 693 revoked by the Internet Society or its successors or assigns. 695 This document and the information contained herein is provided on an 696 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 697 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 698 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 699 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 700 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.