idnits 2.17.1 draft-ietf-idr-bgp-analysis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 19 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC1264], [RFC1774]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 374 has weird spacing: '...extreme insta...' == Line 493 has weird spacing: '... AS4 is a mul...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2003) is 7652 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1774' is mentioned on line 45, but not defined == Outdated reference: A later version (-26) exists of draft-ietf-idr-bgp4-20 -- Obsolete informational reference (is this intentional?): RFC 1105 (Obsoleted by RFC 1163) -- Duplicate reference: RFC1105, mentioned in 'RFC1163', was also mentioned in 'RFC1105'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC1163') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1264 (Obsoleted by RFC 4794) -- Duplicate reference: RFC1105, mentioned in 'RFC1267', was also mentioned in 'RFC1163'. -- Obsolete informational reference (is this intentional?): RFC 1105 (ref. 'RFC1267') (Obsoleted by RFC 1163) -- Obsolete informational reference (is this intentional?): RFC 1519 (Obsoleted by RFC 4632) -- Obsolete informational reference (is this intentional?): RFC 1771 (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2842 (Obsoleted by RFC 3392) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT David Meyer 2 draft-ietf-idr-bgp-analysis-03.txt Keyur Patel 3 Category Informational 4 Expires: November 2003 May 2003 6 BGP-4 Protocol Analysis 7 9 Status of this Document 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This document is a product of an individual. Comments are solicited 31 and should be addressed to the author(s). 33 Copyright Notice 35 Copyright (C) The Internet Society (2003). All Rights Reserved. 37 Abstract 39 The purpose of this report is to document how the requirements for 40 advancing a routing protocol from Draft Standard to full Standard 41 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 43 This report satisfies the requirement for "the second report", as 44 described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill 45 the requirement, this report augments RFC 1774 [RFC1774] and 46 summarizes the key features of BGP protocol, and analyzes the 47 protocol with respect to scaling and performance. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Key Features and algorithms of the BGP protocol. . . . . . . . 4 53 2.1. Key Features. . . . . . . . . . . . . . . . . . . . . . . . 4 54 2.2. BGP Algorithms. . . . . . . . . . . . . . . . . . . . . . . 5 55 2.3. BGP Finite State Machine (FSM). . . . . . . . . . . . . . . 6 56 3. BGP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 7 57 4. BGP Persistent Peer Oscillations . . . . . . . . . . . . . . . 8 58 5. BGP Performance characteristics and Scalability. . . . . . . . 8 59 5.1. Link bandwidth and CPU utilization. . . . . . . . . . . . . 8 60 5.1.1. CPU utilization. . . . . . . . . . . . . . . . . . . . . 9 61 5.1.2. Memory requirements. . . . . . . . . . . . . . . . . . . 11 62 6. BGP Policy Expressiveness and its Implications . . . . . . . . 12 63 6.1. Existence of Unique Stable Routings . . . . . . . . . . . . 13 64 6.2. Existence of Stable Routings. . . . . . . . . . . . . . . . 14 65 7. Applicability. . . . . . . . . . . . . . . . . . . . . . . . . 15 66 8. Intellectual Property. . . . . . . . . . . . . . . . . . . . . 16 67 9. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 16 68 10. Security Considerations . . . . . . . . . . . . . . . . . . . 17 69 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 70 12. References. . . . . . . . . . . . . . . . . . . . . . . . . . 17 71 12.1. Informative References . . . . . . . . . . . . . . . . . . 17 72 13. Author's Addresses. . . . . . . . . . . . . . . . . . . . . . 18 73 14. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 19 75 1. Introduction 77 BGP-4 is an inter-autonomous system routing protocol designed for 78 TCP/IP internets. Version 1 of the BGP protocol was published in RFC 79 1105 [RFC1105]. Since then BGP versions 2, 3, and 4 have been 80 developed. Version 2 was documented in RFC 1163 [RFC1163]. Version 3 81 is documented in RFC 1267 [RFC1267]. Version 4 is documented in the 82 [BGP4] (version 4 of BGP will hereafter be referred to as BGP). The 83 changes between versions are explained in Appendix A of [BGP4]. 84 Possible applications of BGP in the Internet are documented in RFC 85 1772 [RFC1772]. 87 2. Key Features and algorithms of the BGP protocol 89 This section summarizes the key features and algorithms of the BGP 90 protocol. BGP is an inter-autonomous system routing protocol; it is 91 designed to be used between multiple autonomous systems. BGP assumes 92 that routing within an autonomous system is done by an intra- 93 autonomous system routing protocol. BGP does not make any assumptions 94 about intra-autonomous system routing protocols deployed within the 95 various autonomous systems. Specifically, BGP does not require all 96 autonomous systems to run the same intra-autonomous system routing 97 protocol (i.e., interior gateway protocol or IGP). 99 Finally, note that BGP is a real inter-autonomous system routing 100 protocol, and as such it imposes no constraints on the underlying 101 Internet topology. The information exchanged via BGP is sufficient to 102 construct a graph of autonomous systems connectivity from which 103 routing loops may be pruned and many routing policy decisions at the 104 autonomous system level may be enforced. 106 2.1. Key Features 108 The key features of the protocol are the notion of path attributes 109 and aggregation of network layer reachability information (NLRI). 110 Path attributes provide BGP with flexibility and extensibility. Path 111 attributes are partitioned into well-known and optional. The 112 provision for optional attributes allows experimentation that may 113 involve a group of BGP routers without affecting the rest of the 114 Internet. New optional attributes can be added to the protocol in 115 much the same way that new options are added to, say, the Telnet 116 protocol [RFC854]. 118 One of the most important path attributes is the Autonomous System 119 Path, or AS_PATH. AS reachability information traverses the Internet, 120 this information is augmented by the list of autonomous systems that 121 have been traversed thus far, forming the AS_PATH. The AS_PATH allows 122 straightforward suppression of the looping of routing information. In 123 addition, the AS_PATH serves as a powerful and versatile mechanism 124 for policy-based routing. 126 BGP enhances the AS_PATH attribute to include sets of autonomous 127 systems as well as lists via the AS_SET attribute. This extended 128 format allows generated aggregate routes to carry path information 129 from the more specific routes used to generate the aggregate. It 130 should be noted however, that as of this writing, AS_SETs are rarely 131 used in the Internet [ROUTEVIEWS]. 133 2.2. BGP Algorithms 135 BGP uses an algorithm that is neither a pure distance vector 136 algorithm or a pure link state algorithm. It is instead a modified 137 distance vector algorithm that uses path information to avoid 138 traditional distance vector problems. Each route within BGP pairs 139 destination with path information to that destination. Path 140 information (also known as AS_PATH information) is stored within the 141 AS_PATH attribute in BGP. This allows BGP to reconstruct large 142 portions of overall topology whenever required. 144 BGP uses an incremental update strategy in order to conserve 145 bandwidth and processing power. That is, after initial exchange of 146 complete routing information, a pair of BGP routers exchanges only 147 changes to that information. Such an incremental update design 148 requires reliable transport between a pair of BGP routers to function 149 correctly. BGP solves this problem by using TCP for reliable 150 transport. 152 In addition to incremental updates, BGP has added the concept of 153 route aggregation so that information about groups of networks may be 154 aggregated and sent as a single Network Layer Reachability (NLRI). 156 Finally, note that BGP is a self-contained protocol. That is, BGP 157 specifies how routing information is exchanged both between BGP 158 speakers in different autonomous systems, and between BGP speakers 159 within a single autonomous system. 161 2.3. BGP Finite State Machine (FSM) 163 The BGP FSM is a set of rules that are applied to a BGP speaker's set 164 of configured peers for the BGP operation. A BGP implementation 165 requires that a BGP speaker must connect to and listen on TCP port 166 179 for accepting any new BGP connections from its peers. The BGP 167 Finite State Machine, or FSM, must be initiated and maintained for 168 each new incoming and outgoing peer connections. However, in steady 169 state operation, there will be only one BGP FSM per connection per 170 peer. 172 There may exist a temporary period where in a BGP peer may have 173 separate incoming and outgoing connections resulting into two 174 different BGP FSMs for a peer (instead of one). This can be resolved 175 following BGP connection collision rules defined in the [BGP4]. 177 Following are different states of BGP FSM for its peers: 179 IDLE: State when BGP peer refuses any incoming 180 connections. 182 CONNECT: State in which BGP peer is waiting for 183 its TCP connection to be completed. 185 ACTIVE: State in which BGP peer is trying to acquire a 186 peer by listening and accepting TCP connection. 188 OPENSENT: BGP peer is waiting for OPEN message from its 189 peer. 191 OPENCONFIRM: BGP peer is waiting for KEEPALIVE or NOTIFICATION 192 message from its peer. 194 ESTABLISHED: BGP peer connection is established and exchanges 195 UPDATE, NOTIFICATION, and KEEPALIVE messages with 196 its peer. 198 There are different BGP events that operate on above mentioned states 199 of BGP FSM for its peers. These BGP events are used for initiating and 200 terminating peer connections. They also assist BGP in identifying any 201 persistent peer connection oscillations and provide a mechanism 202 for controlling them. 204 Following are different BGP events: 206 Manual Start: Manually start the peer connection. 208 Manual Stop: Manually stop the peer connection. 210 Automatic Start: Local system automatically starts the peer 211 connection. 213 Manual start with 214 passive TCP flag: Local system administrator manually starts the 215 peer connection with peer in passive mode. 217 Automatic start 218 with passive TCP flag: Local system administrator automatically starts 219 the peer connection with peer in passive mode. 221 Automatic start 222 with bgp_stop_flap 223 option set: Local system administrator automatically starts 224 the peer connection with peer oscillation 225 damping enabled. 227 Automatic start with 228 bgp_stop_flap option 229 set and passive TCP 230 establishment 231 option set: Local system administrator automatically starts 232 the peer connection with peer oscillation 233 damping enabled and with peer in passive mode. 235 Automatic stop: Local system automatically stops the 236 BGP connection. 238 Both, Manual Start and Manual Stop are mandatory BGP events. All 239 other events are optional. 241 3. BGP Capabilities 243 The BGP Capability mechanism [RFC2842] provides an easy and flexible 244 way to introduce new features within the protocol. In particular, the 245 BGP capability mechanism allows peers to negotiate various optional 246 features during startup. This allows the base BGP protocol to contain 247 only essential functionality, while at the same time providing a 248 flexible mechanism for signaling protocol extensions. 250 4. BGP Persistent Peer Oscillations 252 Ideally, whenever a BGP speaker detects an error in any peer 253 connection, it shuts down the peer and changes its FSM state to IDLE. 254 BGP speaker requires a Start event to re-initiate its idle peer 255 connection. If the error remains persistent and BGP speaker generates 256 Start event automatically then it may result in persistent peer 257 flapping. However, although peer oscillation is found to be wide- 258 spread in BGP implementations, methods for preventing persistent peer 259 oscillations are outside the scope of base BGP protocol 260 specification. 262 5. BGP Performance characteristics and Scalability 264 In this section, we provide "order of magnitude" answers to the 265 questions of how much link bandwidth, router memory and router CPU 266 cycles the BGP protocol will consume under normal conditions. In 267 particular, we will address the scalability of BGP and its 268 limitations. 270 It is important to note that BGP does not require all the routers 271 within an autonomous system to participate in the BGP protocol. In 272 particular, only the border routers that provide connectivity between 273 the local autonomous system and their adjacent autonomous systems 274 need participate in BGP. The ability to constrain the set of BGP 275 speakers is one way to address scaling issues. 277 5.1. Link bandwidth and CPU utilization 279 Immediately after the initial BGP connection setup, BGP peers 280 exchange complete set of routing information. If we denote the total 281 number of routes in the Internet by N, the mean AS distance of the 282 Internet by M (distance at the level of an autonomous system, 283 expressed in terms of the number of autonomous systems), the total 284 number of autonomous systems in the Internet by A, and assume that 285 the networks are uniformly distributed among the autonomous systems, 286 then the worst case amount of bandwidth consumed during the initial 287 exchange between a pair of BGP speakers is 289 MR = O(N + M * A) 291 The following table illustrates the typical amount of bandwidth 292 consumed during the initial exchange between a pair of BGP speakers 293 based on the above assumptions (ignoring bandwidth consumed by the 294 BGP Header). For purposes of the estimates here, we will calculate MR 295 = 4 * (N + (M * A)). 297 # NLRI Mean AS Distance # AS's Bandwidth (MR) 298 ---------- ---------------- ------ ---------------- 299 40,000 15 400 184,000 bytes 300 100,000 10 10,000 800,000 bytes 301 120,000 10 15,000 1,080,000 bytes 302 140,000 15 20,000 1,760,000 bytes 304 [note that most of this bandwidth is consumed by the NLRI exchange] 306 BGP was created specifically to reduce the size of the set of NLRI 307 entries which have to be carried and exchanged by border routers. The 308 aggregation scheme, defined in RFC 1519 [RFC1519], describes the 309 provider-based aggregation scheme in use in today's Internet. 311 Due to the advantages of advertising a few large aggregate blocks 312 instead of many smaller class-based individual networks, it is 313 difficult to estimate the actual reduction in bandwidth and 314 processing that BGP has provided over BGP-3. If we simply enumerate 315 all aggregate blocks into their individual class-based networks, we 316 would not take into account "dead" space that has been reserved for 317 future expansion. The best metric for determining the success of 318 BGP's aggregation is to sample the number NLRI entries in the 319 globally connected Internet today and compare it to projected growth 320 rates before BGP was deployed. 322 At the time of this writing, the full set of exterior routes carried 323 by BGP is approximately 120,000 network entries [ROUTEVIEWS]. 325 5.1.1. CPU utilization 327 An important and fundamental feature of BGP is that BGP's CPU 328 utilization depends only on the stability of the Internet. If the 329 Internet is stable, then the only link bandwidth and router CPU 330 cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE 331 messages. The KEEPALIVE messages are exchanged only between peers. 332 The suggested frequency of the exchange is 30 seconds. The KEEPALIVE 333 messages are quite short (19 octets), and require virtually no 334 processing. As a result, the bandwidth consumed by the KEEPALIVE 335 messages is about 5 bits/sec. Operational experience confirms that 336 the overhead (in terms of bandwidth and CPU) associated with the 337 KEEPALIVE messages should be viewed as negligible. 339 During periods of Internet instability, changes to the reachability 340 information are passed between routers in UPDATE messages. If we 341 denote the number of routing changes per second by C, then in the 342 worst case the amount of bandwidth consumed by the BGP can be 343 expressed as O(C * M). The greatest overhead per UPDATE message 344 occurs when each UPDATE message contains only a single network. It 345 should be pointed out that in practice routing changes exhibit strong 346 locality with respect to the AS path. That is, routes that change are 347 likely to have common AS path. In this case, multiple networks can be 348 grouped into a single UPDATE message, thus significantly reducing the 349 amount of bandwidth required (see also Appendix F.1 of [BGP4]). 351 Since in the steady state the link bandwidth and router CPU cycles 352 consumed by the BGP protocol are dependent only on the stability of 353 the Internet, it follows that BGP should have no scaling problems in 354 the areas of link bandwidth and router CPU utilization. This assumes 355 that as the Internet grows, the overall stability of the inter-AS 356 connectivity of the Internet can be controlled. In particular, while 357 the size of the IPv4 Internet routing table is bounded by O(2^32 * 358 M), (where M is a slow-moving function describing the AS 359 interconnectivity of the network), no such bound can be formulated 360 for the dynamic properties (i.e., stability) of BGP. Finally, since 361 the dynamic properties of the network cannot be quantitatively 362 bounded, stability must be addressed via heuristics such as BGP 363 Route Flap Damping [RFC2439]. Due to the nature of BGP, such damping 364 should be viewed as a matter local to an autonomous system matter 365 (see also Appendix F.2 of [BGP4]). 367 It may also be instructive to compare bandwidth and CPU requirements 368 of BGP with the Exterior Gateway Protocol (EGP). While with BGP the 369 complete information is exchanged only at the connection 370 establishment time, with EGP the complete information is exchanged 371 periodically (usually every 3 minutes). Note that both for BGP and 372 for EGP the amount of information exchanged is roughly on the order 373 of the number of networks reachable via a peer that sends the 374 information. Therefore, even if one assumes extreme instabilities of 375 BGP, its worst case behavior will be the same as the steady state 376 behavior of its predecessor, EGP. 378 Operational experience with BGP showed that the incremental update 379 approach employed by BGP provides qualitative improvement in both 380 bandwidth and CPU utilization when compared with complete periodic 381 updates used by EGP (see also presentation by Dennis Ferguson at the 382 Twentieth IETF, March 11-15, 1991, St. Louis). 384 5.1.2. Memory requirements 386 To quantify the worst case memory requirements for BGP, we denote the 387 total number of networks in the Internet by N, the mean AS distance 388 of the Internet by M (distance at the level of an autonomous system, 389 expressed in terms of the number of autonomous systems), the total 390 number of autonomous systems in the Internet by A, and the total 391 number of BGP speakers that a system is peering with by K (note that 392 K will usually be dominated by the total number of the BGP speakers 393 within a single autonomous system). Then the worst case memory 394 requirements (MR) can be expressed as 396 MR = O((N + M * A) * K) 398 It is interesting to note that prior to the introduction of BGP in 399 the NSFNET Backbone, memory requirements on the NSFNET Backbone 400 routers running EGP were on the order of O(N *K). 402 Since a mean AS distance M is a slow moving function of the 403 interconnectivity ("meshiness") of the Internet, for all practical 404 purposes the worst case router memory requirements are on the order 405 of the total number of networks in the Internet times the number of 406 peers the local system is peering with. We expect that the total 407 number of networks in the Internet will grow much faster than the 408 average number of peers per router. As a result, BGP's memory scaling 409 properties are linearly related to the total number of networks in 410 the Internet. 412 The following table illustrates typical memory requirements of a 413 router running BGP. It is assumed that each network is encoded as 414 four bytes, each AS is encoded as two bytes, and each networks is 415 reachable via some fraction of all of the peers (# BGP peers/per 416 net). For purposes of the estimates here, we will calculate MR = 417 ((N*4) + (M*A)*2) * K. 419 # Networks Mean AS Distance # AS's # BGP peers/per net Memory Req (MR) 420 ---------- ---------------- ------ ------------------- -------------- 421 100,000 20 3,000 20 1,040,000 422 100,000 20 15,000 20 1,040,000 423 120,000 10 15,000 100 75,000,000 424 140,000 15 20,000 100 116,000,000 426 In analyzing BGP's memory requirements, we focus on the size of the 427 forwarding table (and ignoring implementation details). In 428 particular, we derive upper bounds for the size of the forwarding 429 table. For example, at the time of this writing, the forwarding 430 tables of a typical backbone router carry on the order of 120,000 431 entries. Given this number, one might ask whether it would be 432 possible to have a functional router with a table that will have 433 1,000,000 entries. Clearly the answer to this question is independent 434 of BGP. Interestingly, in his review of the BGP protocol for the BGP 435 review committee in March of 1990, Paul Tsuchiya noted that "BGP does 436 not scale well. This is not really the fault of BGP. It is the fault 437 of the flat IP address space. Given the flat IP address space, any 438 routing protocol must carry network numbers in its updates." The 439 introduction of the provider based aggregation schemes (e.g., RFC 440 1519 [RFC1519]) have sought to address this issue, to the extent 441 possible, within the context of current addressing architectures. 443 6. BGP Policy Expressiveness and its Implications 445 BGP is unique among deployed IP routing protocols in that routing is 446 determined using semantically rich routing policies. Although routing 447 policies are usually the first thing that comes to a network 448 operator's mind concerning BGP, it is important to note that the 449 languages and techniques for specifying BGP routing policies are not 450 actually a part of the protocol specification (see RFC 2622 [RFC2622] 451 for an example of such a policy language). In addition, the BGP 452 specification contains few restrictions, either explicitly or 453 implicitly, on routing policy languages. These languages have 454 typically been developed by vendors and have evolved through 455 interactions with network engineers in an environment lacking vendor- 456 independent standards. 458 The complexity of typical BGP configurations, at least in provider 459 networks, has been steadily increasing. Router vendors typically 460 provided hundreds of special commands for use in the configuration of 461 BGP, and this command set is continually expanding. For example, BGP 462 communities [RFC1997] allow policy writers to selectively attach tags 463 to routes and use these to signal policy information to other BGP- 464 speaking routers. Many providers allow customers, and sometimes 465 peers, to send communities that determine the scope and preference of 466 their routes. These developments have more and more given the task of 467 writing BGP configurations aspects associated with open-ended 468 programming. This has allowed network operators to encode complex 469 policies in order to address many unforeseen situations, and has 470 opened the door for a great deal of creativity and experimentation in 471 routing policies. This policy flexibility is one of the main reasons 472 why BGP is so well suited to the commercial environment of the 473 current Internet. 475 However, this rich policy expressiveness has come with a cost that is 476 often not recognized. In particular, it is possible to construct 477 locally defined routing policies that can lead to unexpected global 478 routing anomalies such as (unintended) nondeterminism and to protocol 479 divergence. If the interacting policies causing such anomalies are 480 defined in different autonomous systems, then these problems can be 481 very difficult to debug and correct. In the following sections, we 482 describe two such cases relating to the existence (or lack thereof) 483 of stable routings. 485 6.1. Existence of Unique Stable Routings 487 One can easily construct sets of policies for which BGP can not 488 guarantee that stable routings are unique. This can be illustrated by 489 the following simple example. Consider the example of four Autonomous 490 Systems, AS1, AS2, AS3, and AS4. AS1 and AS2 are peers, and they 491 provide transit for AS3 and AS4 respectively, Suppose further that 492 AS3 provides transit for AS4 (in this case AS3 is a customer of AS1, 493 and AS4 is a multihomed customer of both AS3 and AS4). AS4 may want 494 to use the link to AS3 as a "backup" link, and sends AS3 a community 495 value that AS3 has configured to lower the preference of AS4's routes 496 to a level below that of its upstream provider, AS1. The intended 497 "backup routing" to AS4 is illustrated here: 499 AS1 ------> AS2 500 /|\ | 501 | | 502 | | 503 | \|/ 504 AS3 ------- AS4 506 That is, the AS3-AS4 link is intended to be used only when the 507 AS2-AS4 link is down (for outbound traffic, AS4 simply gives routes 508 from AS2 a higher local preference). This is a common scenario in 509 today's Internet. But note that this configuration has another stable 510 solution: 512 AS1 ------- AS2 513 | | 514 | | 515 | | 516 \|/ \|/ 517 AS3 ------> AS4 519 In this case, AS3 does not translate the "depref my route" community 520 received from AS4 into a "depref my route" community for AS1, and so 521 if AS1 hears the route to AS4 that transits AS3 it will prefer that 522 route (since AS3 is a customer). This state could be reached, for 523 example, by starting in the "correct" backup routing shown first, 524 bringing down the AS2-AS4 BGP session, and then bringing it back up. 525 In general, BGP has no way to prefer the "intended" solution over the 526 anomalous one, and which is picked will depend on the unpredictable 527 order of BGP messages. 529 While this example is relatively simple, many operators may fail to 530 recognize that the true source of the problem is that the BGP 531 policies of ASes can interact in unexpected ways, and that these 532 interactions can result in multiple stable routings. One can imagine 533 that the interactions could be much more complex in the real 534 Internet. We suspect that such anomalies will only become more common 535 as BGP continues to evolve with richer policy expressiveness. For 536 example, extended communities provide an even more flexible means of 537 signaling information within and between autonomous systems than is 538 possible with RFC 1997 communities. At the same time, applications of 539 communities by network operators are evolving to address complex 540 issues of inter-domain traffic engineering. 542 6.2. Existence of Stable Routings 544 One can also construct a set of policies for which BGP can not 545 guarantee that a stable routing exists (or worse, that a stable 546 routing will ever be found). For example, RFC 3345 [RFC3345] 547 documents several scenarios that lead to route oscillations 548 associated with the use of the Multi-Exit Discriminator or MED, 549 attribute. Route oscillation will happen in BGP when a set of 550 policies has no solution. That is, when there is no stable routing 551 that satisfies the constraints imposed by policy, then BGP has no 552 choice by to keep trying. In addition, BGP configurations can have a 553 stable routing, yet the protocol may not be able to find it; BGP can 554 "get trapped" down a blind alley that has no solution. 556 Protocol divergence is not, however, a problem associated solely with 557 use of the MED attribute. This potential exists in BGP even without 558 the use of the MED attribute. Hence, like the unintended 559 nondeterminism described in the previous section, this type of 560 protocol divergence is an unintended consequence of the unconstrained 561 nature of BGP policy languages. 563 7. Applicability 565 In this section we answer the question of which environments is BGP 566 well suited, and for which environments it is not suitable. This 567 question is partially answered in Section 2 of RFC 1771 [RFC1771], 568 which states: 570 "To characterize the set of policy decisions that can be enforced 571 using BGP, one must focus on the rule that an AS advertises to its 572 neighbor ASs only those routes that it itself uses. This rule 573 reflects the "hop-by-hop" routing paradigm generally used 574 throughout the current Internet. Note that some policies cannot 575 be supported by the "hop-by-hop" routing paradigm and thus require 576 techniques such as source routing to enforce. For example, BGP 577 does not enable one AS to send traffic to a neighbor AS intending 578 that the traffic take a different route from that taken by traffic 579 originating in the neighbor AS. On the other hand, BGP can 580 support any policy conforming to the "hop-by-hop" routing 581 paradigm. Since the current Internet uses only the "hop-by-hop" 582 routing paradigm and since BGP can support any policy that 583 conforms to that paradigm, BGP is highly applicable as an inter-AS 584 routing protocol for the current Internet." 586 One of the important points here is that the BGP protocol contains 587 only the functionality that is essential, while at the same time 588 providing a flexible mechanism within the protocol that allow us to 589 extend its functionality. For example, BGP capabilities provide an 590 easy and flexible way to introduce new features within the protocol. 591 Finally, since BGP was designed with flexibility and extensibility in 592 mind, new and/or evolving requirements can be addressed via existing 593 mechanisms. 595 To summarize, BGP is well suitable as an inter-autonomous system 596 routing protocol for the IPv4 Internet that is based on IP [RFC791] 597 as the Internet Protocol and "hop-by-hop" routing paradigm. Finally, 598 BGP is equally applicable to IPv6 [RFC2460] internets. 600 8. Intellectual Property 602 The IETF takes no position regarding the validity or scope of any 603 intellectual property or other rights that might be claimed to 604 pertain to the implementation or use of the technology described in 605 this document or the extent to which any license under such rights 606 might or might not be available; neither does it represent that it 607 has made any effort to identify any such rights. Information on the 608 IETF's procedures with respect to rights in standards-track and 609 standards-related documentation can be found in BCP-11. Copies of 610 claims of rights made available for publication and any assurances of 611 licenses to be made available, or the result of an attempt made to 612 obtain a general license or permission for the use of such 613 proprietary rights by implementors or users of this specification can 614 be obtained from the IETF Secretariat. 616 The IETF invites any interested party to bring to its attention any 617 copyrights, patents or patent applications, or other proprietary 618 rights which may cover technology that may be required to practice 619 this standard. Please address the information to the IETF Executive 620 Director. 622 9. Acknowledgments 624 We would like to thank Paul Traina for authoring previous versions of 625 this document. Tim Griffin and Randy Presuhn also provided many 626 insightful comments on earlier versions of this document. 628 10. Security Considerations 630 This document presents an analysis of the BGP protocol and as such 631 presents no new security implications for BGP. 633 11. IANA Considerations 635 This document presents an analysis of the BGP protocol and hence 636 presents no new IANA considerations. 638 12. References 640 12.1. Informative References 642 [BGP4] Rekhter, Y., T. Li., and S. Hares, Editors, "A 643 Border Gateway Protocol 4 (BGP-4)", 644 draft-ietf-idr-bgp4-20.txt. Work in progress. 646 [RFC791] "INTERNET PROTOCOL", DARPA INTERNET PROGRAM 647 PROTOCOL SPECIFICATION, RFC 791, September, 648 1981. 650 [RFC854] Postel, J. and J. Reynolds, "TELNET PROTOCOL 651 SPECIFICATION", RFC 854, May, 1983. 653 [RFC1105] Lougheed, K., and Y. Rekhter, "Border Gateway 654 Protocol BGP", RFC 1105, June 1989. 656 [RFC1163] Lougheed, K., and Rekhter, Y, "Border Gateway 657 Protocol BGP", RFC 1105, June 1990. 659 [RFC1264] Hinden, R., "Internet Routing Protocol 660 Standardization Criteria", RFC 1264, October 1991. 662 [RFC1267] Lougheed, K., and Rekhter, Y, "Border Gateway 663 Protocol 3 (BGP-3)", RFC 1105, October 1991. 665 [RFC1519] Fuller, V., Li. T., Yu J., and K. Varadhan, 666 "Classless Inter-Domain Routing (CIDR): an 667 Address Assignment and Aggregation Strategy", RFC 668 1519, September 1993. 670 [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway 671 Protocol 4 (BGP-4)", RFC 1771, March 1995. 673 [RFC1772] Rekhter, Y., and P. Gross, Editors, "Application 674 of the Border Gateway Protocol in the Internet", 675 RFC 1772, March 1995. 677 [RFC1997] Chandra. R, and T. Li, "BGP Communities 678 Attribute", RFC 1997, August, 1996. 680 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, 681 "BGP Route Flap Damping", RFC 2439, November 682 1998. 684 [RFC2460] Deering, S, and R. Hinden, "Internet Protocol, 685 Version 6 (IPv6) Specification", RFC 2460, 686 December, 1998. 688 [RFC2622] C. Alaettinoglu, et al., "Routing Policy 689 Specification Language (RPSL)" RFC 2622, May, 690 1998. 692 [RFC2842] Chandra, R. and J. Scudder, "Capabilities 693 Advertisement with BGP-4", RFC 2842, May 2000. 695 [RFC3345] McPherson, D., Gill, V., Walton, D., and 696 A. Retana, "Border Gateway Protocol (BGP) Persistent 697 Route Oscillation Condition", RFC 3345, 698 August, 2002. 700 [ROUTEVIEWS] Meyer, D., "The Route Views Project", 701 http://www.routeviews.org 703 13. Author's Addresses 705 David Meyer 706 Email: dmm@maoz.com 708 Keyur Patel 709 Cisco Systems 710 Email: keyupate@cisco.com 712 14. Full Copyright Statement 714 Copyright (C) The Internet Society (2003). All Rights Reserved. 716 This document and translations of it may be copied and furnished to 717 others, and derivative works that comment on or otherwise explain it 718 or assist in its implementation may be prepared, copied, published 719 and distributed, in whole or in part, without restriction of any 720 kind, provided that the above copyright notice and this paragraph are 721 included on all such copies and derivative works. However, this 722 document itself may not be modified in any way, such as by removing 723 the copyright notice or references to the Internet Society or other 724 Internet organizations, except as needed for the purpose of 725 developing Internet standards in which case the procedures for 726 copyrights defined in the Internet Standards process must be 727 followed, or as required to translate it into languages other than 728 English. 730 The limited permissions granted above are perpetual and will not be 731 revoked by the Internet Society or its successors or assigns. 733 This document and the information contained herein is provided on an 734 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 735 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 736 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 737 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 738 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.