idnits 2.17.1 draft-ietf-idr-bgp-analysis-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 20 instances of too long lines in the document, the longest one being 4 characters in excess of 72. ** The abstract seems to contain references ([RFC1264], [RFC1774]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 491 has weird spacing: '... AS4 is a mul...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 2003) is 7675 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1774' is mentioned on line 46, but not defined == Outdated reference: A later version (-26) exists of draft-ietf-idr-bgp4-19 ** Obsolete normative reference: RFC 1105 (Obsoleted by RFC 1163) -- Duplicate reference: RFC1105, mentioned in 'RFC1163', was also mentioned in 'RFC1105'. ** Obsolete normative reference: RFC 1105 (ref. 'RFC1163') (Obsoleted by RFC 1163) ** Obsolete normative reference: RFC 1264 (Obsoleted by RFC 4794) -- Duplicate reference: RFC1105, mentioned in 'RFC1267', was also mentioned in 'RFC1163'. ** Obsolete normative reference: RFC 1105 (ref. 'RFC1267') (Obsoleted by RFC 1163) ** Obsolete normative reference: RFC 1519 (Obsoleted by RFC 4632) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 2842 (Obsoleted by RFC 3392) ** Downref: Normative reference to an Informational RFC: RFC 3345 -- Possible downref: Non-RFC (?) normative reference: ref. 'ROUTEVIEWS' Summary: 15 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT David Meyer 3 draft-ietf-idr-bgp-analysis-01.txt Keyur Patel 4 Category Informational 5 Expires: October 2003 April 2003 7 BGP-4 Protocol Analysis 8 10 Status of this Document 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This document is a product of an individual. Comments are solicited 32 and should be addressed to the author(s). 34 Copyright Notice 36 Copyright (C) The Internet Society (2003). All Rights Reserved. 38 Abstract 40 The purpose of this report is to document how the requirements for 41 advancing a routing protocol from Draft Standard to full Standard 42 have been satisfied by Border Gateway Protocol version 4 (BGP-4). 44 This report satisfies the requirement for "the second report", as 45 described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill 46 the requirement, this report augments RFC 1774 [RFC1774] and 47 summarizes the key features of BGP protocol, and analyzes the 48 protocol with respect to scaling and performance. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 2. Key Features and algorithms of the BGP protocol. . . . . . . . 4 54 2.1. Key Features. . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.2. BGP Algorithms. . . . . . . . . . . . . . . . . . . . . . . 5 56 2.3. BGP Finite State Machine (FSM). . . . . . . . . . . . . . . 6 57 3. BGP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 7 58 4. BGP Persistent Peer Oscillations . . . . . . . . . . . . . . . 8 59 5. BGP Performance characteristics and Scalability. . . . . . . . 8 60 5.1. Link bandwidth and CPU utilization. . . . . . . . . . . . . 8 61 5.1.1. CPU utilization. . . . . . . . . . . . . . . . . . . . . 9 62 5.1.2. Memory requirements. . . . . . . . . . . . . . . . . . . 11 63 6. BGP Policy Expressiveness and its Implications . . . . . . . . 12 64 6.1. Existence of Unique Stable Routings . . . . . . . . . . . . 13 65 6.2. Existence of Stable Routings. . . . . . . . . . . . . . . . 14 66 7. Applicability. . . . . . . . . . . . . . . . . . . . . . . . . 15 67 8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 16 68 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 69 10. Author's Address. . . . . . . . . . . . . . . . . . . . . . . 18 70 11. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 18 72 1. Introduction 74 BGP-4 is an inter-autonomous system routing protocol designed for 75 TCP/IP internets. Version 1 of the BGP protocol was published in RFC 76 1105 [RFC1105]. Since then BGP versions 2, 3, and 4 have been 77 developed. Version 2 was documented in RFC 1163 [RFC1163]. Version 3 78 is documented in RFC 1267 [RFC1267]. Version 4 is documented in the 79 [BGP4] (version 4 of BGP will hereafter be referred to as BGP). The 80 changes between versions are explained in Appendix A of [BGP4]. 81 Possible applications of BGP in the Internet are documented in RFC 82 1772 [RFC1772]. 84 2. Key Features and algorithms of the BGP protocol 86 This section summarizes the key features and algorithms of the BGP 87 protocol. BGP is an inter-autonomous system routing protocol; it is 88 designed to be used between multiple autonomous systems. BGP assumes 89 that routing within an autonomous system is done by an intra- 90 autonomous system routing protocol. BGP does not make any assumptions 91 about intra-autonomous system routing protocols deployed within the 92 various autonomous systems. Specifically, BGP does not require all 93 autonomous systems to run the same intra-autonomous system routing 94 protocol (i.e., interior gateway protocol or IGP). 96 Finally, note that BGP is a real inter-autonomous system routing 97 protocol, and as such it imposes no constraints on the underlying 98 Internet topology. The information exchanged via BGP is sufficient to 99 construct a graph of autonomous systems connectivity from which 100 routing loops may be pruned and many routing policy decisions at the 101 autonomous system level may be enforced. 103 2.1. Key Features 105 The key features of the protocol are the notion of path attributes 106 and aggregation of network layer reachability information (NLRI). 107 Path attributes provide BGP with flexibility and extensibility. Path 108 attributes are partitioned into well-known and optional. The 109 provision for optional attributes allows experimentation that may 110 involve a group of BGP routers without affecting the rest of the 111 Internet. New optional attributes can be added to the protocol in 112 much the same way that new options are added to, say, the Telnet 113 protocol [RFC854]. 115 One of the most important path attributes is the AS_PATH. AS 116 reachability information traverses the Internet, this information is 117 augmented by the list of autonomous systems that have been traversed 118 thus far, forming the AS_PATH. The AS_PATH allows straightforward 119 suppression of the looping of routing information. In addition, the 120 AS_PATH serves as a powerful and versatile mechanism for policy-based 121 routing. 123 BGP enhances the AS_PATH attribute to include sets of autonomous 124 systems as well as lists via the AS_SET attribute. This extended 125 format allows generated aggregate routes to carry path information 126 from the more specific routes used to generate the aggregate. It 127 should be noted however, that as of this writing, AS_SETs are rarely 128 used in the Internet [ROUTEVIEWS]. 130 2.2. BGP Algorithms 132 BGP uses an algorithm that is neither a pure distance vector 133 algorithm or a pure link state algorithm. It is instead a modified 134 distance vector algorithm that uses path information to avoid 135 traditional distance vector problems. Each route within BGP pairs 136 destination with path information to that destination. Path 137 information (also known as AS_PATH information) is stored within the 138 AS_PATH attribute in BGP. This allows BGP to reconstruct large 139 portions of overall topology whenever required. 141 BGP uses an incremental update strategy in order to conserve 142 bandwidth and processing power. That is, after initial exchange of 143 complete routing information, a pair of BGP routers exchanges only 144 changes to that information. Such an incremental update design 145 requires reliable transport between a pair of BGP routers to function 146 correctly. BGP solves this problem by using TCP for reliable 147 transport. 149 In addition to incremental updates, BGP has added the concept of 150 route aggregation so that information about groups of networks may be 151 aggregated and sent as a single Network Layer Reachability (NLRI). 153 Finally, note that BGP is a self-contained protocol. That is, BGP 154 specifies how routing information is exchanged both between BGP 155 speakers in different autonomous systems, and between BGP speakers 156 within a single autonomous system. 158 2.3. BGP Finite State Machine (FSM) 160 The BGP FSM is a set of rules that are applied to a BGP speaker's set 161 of configured peers for the BGP operation. A BGP implementation 162 requires that a BGP speaker must connect to and listen on TCP port 163 179 for accepting any new BGP connections from it's peers. The BGP 164 Finite State Machine, or FSM, must be initiated and maintained for 165 each new incoming and outgoing peer connections. However, in steady 166 state operation, there will be only one BGP FSM per connection per 167 peer. 169 There may exist a temporary period where in a BGP peer may have 170 separate incoming and outgoing connections resulting into two 171 different BGP FSMs for a peer (instead of one). This can be resolved 172 following BGP connection collision rules defined in the [BGP4]. 174 Following are different states of BGP FSM for its peers: 176 IDLE: State when BGP peer refuses any incoming 177 connections. 179 CONNECT: State in which BGP peer is waiting for 180 its TCP connection to be completed. 182 ACTIVE: State in which BGP peer is trying to acquire a 183 peer by listening and accepting TCP connection. 185 OPENSENT: BGP peer is waiting for OPEN message from its 186 peer. 188 OPENCONFIRM: BGP peer is waiting for KEEPALIVE or NOTIFICATION 189 message from its peer. 191 ESTABLISHED: BGP peer connection is established and exchanges 192 UPDATE, NOTIFICATION, and KEEPALIVE messages with 193 its peer. 195 There are different BGP events that operates on above mentioned states 196 of BGP FSM for its peers. These BGP events are used for initiating and 197 terminating peer connections. They also assist BGP in identifying any 198 persistent peer connection oscillations and provides mechanism 199 for controlling it. 201 Following are different BGP events: 203 Manual Start: Manually start the peer connection. 205 Manual Stop: Manually stop the peer connection. 207 Automatic Start: Local system automatically starts the peer 208 connection. 210 Manual start with 211 passive TCP flag: Local system administrator manually starts the 212 peer connection with peer in passive mode. 214 Automatic start 215 with passive TCP flag: Local system administrator automatically starts 216 the peer connection with peer in passive mode. 218 Automatic start 219 with bgp_stop_flap 220 option set: Local system administrator automatically starts 221 the peer connection with peer oscillation 222 damping enabled 224 Automatic start with 225 bgp_stop_flap option 226 set and passive TCP 227 establishment 228 option set: Local system administrator automatically starts 229 the peer connection with peer oscillation 230 damping enabled and with peer in passive mode. 232 Automatic stop: Local system automatically stops the 233 BGP connection. 235 Both, Manual Start and Manual Stop are mandatory BGP events. All 236 other events are optional. 238 3. BGP Capabilities 240 The BGP Capability mechanism [RFC2842] provides easy and flexible way 241 to introduce new features within the protocol. In particular, the BGP 242 capability mechanism allows peers to negotiate various optional 243 features during startup. This allows the base BGP protocol to contain 244 only essential functionality, while at the same time providing a 245 flexible mechanism for signaling protocol extensions. 247 4. BGP Persistent Peer Oscillations 249 Ideally, whenever a BGP speaker detects an error in any peer 250 connection, it shuts down the peer and changes its FSM state to IDLE. 251 BGP speaker requires a Start event to re-initiate its idle peer 252 connection. If error remains persistent and BGP speaker generates 253 Start event automatically then it may result in persistent peer 254 flapping. However, although peer oscillation is found to be wide- 255 spread in BGP implementations, methods for preventing persistent peer 256 oscillations are outside the scope of base BGP protocol 257 specification. 259 5. BGP Performance characteristics and Scalability 261 In this section, we provide "order of magnitude" answers to the 262 questions of how much link bandwidth, router memory and router CPU 263 cycles the BGP protocol will consume under normal conditions. In 264 particular, we will address the scalability of BGP and its 265 limitations. 267 It is important to note that BGP does not require all the routers 268 within an autonomous system to participate in the BGP protocol. In 269 particular, only the border routers that provide connectivity between 270 the local autonomous system and their adjacent autonomous systems 271 need participate in BGP. The ability to constraint the set of BGP 272 speakers is one way to address scaling issues. 274 5.1. Link bandwidth and CPU utilization 276 Immediately after the initial BGP connection setup, BGP peers 277 exchange complete set of routing information. If we denote the total 278 number of routes in the Internet by N, the mean AS distance of the 279 Internet by M (distance at the level of an autonomous system, 280 expressed in terms of the number of autonomous systems), the total 281 number of autonomous systems in the Internet by A, and assume that 282 the networks are uniformly distributed among the autonomous systems, 283 then the worst case amount of bandwidth consumed during the initial 284 exchange between a pair of BGP speakers is 286 MR = O(N + M * A) 288 The following table illustrates the typical amount of bandwidth 289 consumed during the initial exchange between a pair of BGP speakers 290 based on the above assumptions (ignoring bandwidth consumed by the 291 BGP Header). For purposes of the estimates here, we will calculate MR 292 = 4 * (N + (M * A)). 294 # NLRI Mean AS Distance # AS's Bandwidth (MR) 295 ---------- ---------------- ------ ---------------- 296 40,000 15 400 184,000 bytes 297 100,000 10 10,000 800,000 bytes 298 120,000 10 15,000 1,080,000 bytes 299 140,000 15 20,000 1,760,000 bytes 301 [note that most of this bandwidth is consumed by the NLRI exchange] 303 BGP was created specifically to reduce the size of the set of NLRI 304 entries which have to be carried and exchanged by border routers. The 305 aggregation scheme, defined in RFC 1519 [RFC1519], describes the 306 provider-based aggregation scheme in use in today's Internet. 308 Due to the advantages of advertising a few large aggregate blocks 309 instead of many smaller class-based individual networks, it is 310 difficult to estimate the actual reduction in bandwidth and 311 processing that BGP has provided over BGP-3. If we simply enumerate 312 all aggregate blocks into their individual class-based networks, we 313 would not take into account "dead" space that has been reserved for 314 future expansion. The best metric for determining the success of 315 BGP's aggregation is to sample the number NLRI entries in the 316 globally connected Internet today and compare it to projected growth 317 rates before BGP was deployed. 319 At the time of this writing, the full set of exterior routes carried 320 by BGP approximately 120,000 network entries [ROUTEVIEWS]. 322 5.1.1. CPU utilization 324 An important and fundamental feature of BGP is that BGP's CPU 325 utilization depends only on the stability of the Internet. If the 326 Internet is stable, then the only link bandwidth and router CPU 327 cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE 328 messages. The KEEPALIVE messages are exchanged only between peers. 329 The suggested frequency of the exchange is 30 seconds. The KEEPALIVE 330 messages are quite short (19 octets), and require virtually no 331 processing. As a result, the bandwidth consumed by the KEEPALIVE 332 messages is about 5 bits/sec. Operational experience confirms that 333 the overhead (in terms of bandwidth and CPU) associated with the 334 KEEPALIVE messages should be viewed as negligible. 336 During periods of Internet instability, changes to the reachability 337 information are passed between routers in UPDATE messages. If we 338 denote the number of routing changes per second by C, then in the 339 worst case the amount of bandwidth consumed by the BGP can be 340 expressed as O(C * M). The greatest overhead per UPDATE message 341 occurs when each UPDATE message contains only a single network. It 342 should be pointed out that in practice routing changes exhibit strong 343 locality with respect to the AS path. That is routes that change are 344 likely to have common AS path. In this case multiple networks can be 345 grouped into a single UPDATE message, thus significantly reducing the 346 amount of bandwidth required (see also Appendix F.1 of [BGP4]). 348 Since in the steady state the link bandwidth and router CPU cycles 349 consumed by the BGP protocol are dependent only on the stability of 350 the Internet, it follows that BGP should have no scaling problems in 351 the areas of link bandwidth and router CPU utilization. This assumes 352 that as the Internet grows, the overall stability of the inter-AS 353 connectivity of the Internet can be controlled. In particular, while 354 the size of the IPv4 Internet routing table is bounded by O(2^32 * 355 M), (where M is a slow-moving function describing the AS 356 interconnectivity of the network), no such bound can be formulated 357 for the dynamic properties (i.e., stability) of BGP. Finally, since 358 the dynamic properties of the network cannot be quantitatively 359 bounded, stability must be addressed via heuristics such as BGP 360 Route Flap Dampening [RFC2439]. Due to the nature of BGP, such 361 dampening should be viewed as a local to an autonomous system matter 362 (see also Appendix F.2 of [BGP4]). 364 It may also be instructive to compare bandwidth and CPU requirements 365 of BGP with EGP. While with BGP the complete information is exchanged 366 only at the connection establishment time, with EGP the complete 367 information is exchanged periodically (usually every 3 minutes). Note 368 that both for BGP and for EGP the amount of information exchanged is 369 roughly on the order of the networks reachable via a peer that sends 370 the information. Therefore, even if one assumes extreme instabilities 371 of BGP, its worst case behavior will be the same as the steady state 372 behavior of it's predecessor, EGP. 374 Operational experience with BGP showed that the incremental update 375 approach employed by BGP provides qualitative improvement in both 376 bandwidth and CPU utilization when compared with complete periodic 377 updates used by EGP (see also presentation by Dennis Ferguson at the 378 Twentieth IETF, March 11-15, 1991, St.Louis). 380 5.1.2. Memory requirements 382 To quantify the worst case memory requirements for BGP, we denote the 383 total number of networks in the Internet by N, the mean AS distance 384 of the Internet by M (distance at the level of an autonomous system, 385 expressed in terms of the number of autonomous systems), the total 386 number of autonomous systems in the Internet by A, and the total 387 number of BGP speakers that a system is peering with by K (note that 388 K will usually be dominated by the total number of the BGP speakers 389 within a single autonomous system). Then the worst case memory 390 requirements (MR) can be expressed as 392 MR = O((N + M * A) * K) 394 It is interesting to note that prior to the introduction of BGP in 395 the NSFNET Backbone, memory requirements on the NSFNET Backbone 396 routers running EGP were on the order of O(N *K). Therefore, the 397 extra overhead in memory incurred by modern routers running BGP is 398 less than 7 percent. 400 Since a mean AS distance M is a slow moving function of the 401 interconnectivity ("meshiness") of the Internet, for all practical 402 purposes the worst case router memory requirements are on the order 403 of the total number of networks in the Internet times the number of 404 peers the local system is peering with. We expect that the total 405 number of networks in the Internet will grow much faster than the 406 average number of peers per router. As a result, BGP's memory 407 scaling properties are linearly related to the total number of 408 networks in the Internet. 410 The following table illustrates typical memory requirements of a 411 router running BGP. It is assumed that each network is encoded as 412 four bytes, each AS is encoded as two bytes, and each networks is 413 reachable via some fraction of all of the peers (# BGP peers/per 414 net). For purposes of the estimates here, we will calculate MR = 415 ((N*4) + (M*A)*2) * K. 417 # Networks Mean AS Distance # AS's # BGP peers/per net Memory Req (MR) 418 ---------- ---------------- ------ ------------------- -------------- 419 100,000 20 3,000 20 1,040,000 420 100,000 20 15,000 20 1,040,000 421 120,000 10 15,000 100 75,000,000 422 140,000 15 20,000 100 116,000,000 424 In analyzing BGP's memory requirements, we focus on the size of the 425 forwarding table (and ignoring implementation details). In 426 particular, we derive upper bounds for the size of the forwarding 427 table. For example, at the time of this writing, the forwarding 428 tables of a typical backbone router carries on the order of 120,000 429 entries. Given this number, one might ask whether it would be 430 possible to have a functional router with a table that will have 431 1,000,000 entries. Clearly the answer to this question is independent 432 of BGP. Interestingly, in his review of the BGP protocol for the BGP 433 review committee in March of 1990, Paul Tsuchiya noted that "BGP does 434 not scale well. This is not really the fault of BGP. It is the fault 435 of the flat IP address space. Given the flat IP address space, any 436 routing protocol must carry network numbers in its updates." The 437 introduction of the provider based aggregation schemes (e.g., CIDR 438 [RFC1519]) have sought to address this issue, to the extent possible, 439 within the context of current addressing architectures. 441 6. BGP Policy Expressiveness and its Implications 443 BGP is unique among deployed IP routing protocols in that routing is 444 determined using semantically rich routing policies. Although 445 routing policies are usually the first thing that comes to a network 446 operator's mind concerning BGP, it is important to note that the 447 languages and techniques for specifying BGP routing policies are not 448 actually a part of the protocol specification (see RFC 2622 [RFC2622] 449 for an example of such a policy language). In addition, the BGP 450 specification contains few restrictions, either explicitly or 451 implicitly, on routing policy languages. These languages have 452 typically been developed by vendors and have evolved through 453 interactions with network engineers in an environment lacking vendor- 454 independent standards. 456 The complexity of typical BGP configurations, at least in provider 457 networks, has been steadily increasing. Router vendors typically 458 provided hundreds of special commands for use in the configuration of 459 BGP, and this command set is continually expanding. For example, BGP 460 communities [RFC1997] allow policy writers to selectively attach tags 461 to routes and use these to signal policy information to other BGP- 462 speaking routers. Many providers allow customers, and sometimes 463 peers, to send communities that determine the scope and preference of 464 their routes. These developments have more and more given the task of 465 writing BGP configurations aspects associated with open-ended 466 programming. This has allowed network operators to encode complex 467 policies in order to address many unforeseen situations, and has 468 opened the door for a great deal of creativity and experimentation in 469 routing policies. This policy flexibility is one of the main reasons 470 why BGP is so well suited to the commercial environment of the 471 current Internet. 473 However, this rich policy expressiveness has come with a cost that is 474 often not recognized. In particular, it is possible to construct 475 locally defined routing policies that can lead to unexpected global 476 routing anomalies such as (unintended) nondeterminism and to protocol 477 divergence. If the interacting policies causing such anomalies are 478 defined in different autonomous systems, then these problems can be 479 very difficult to debug and correct. In the following sections, we 480 describe two such cases relating to the existence (or lack thereof) 481 of stable routings. 483 6.1. Existence of Unique Stable Routings 485 One can easily construct sets of policies for which BGP can not 486 guarantee that stable routings are unique. This can be illustrated by 487 the following simple example. Consider the example of four Autonomous 488 Systems, AS1, AS2, AS3, and AS4. AS1 and AS2 are peers, and they 489 provide transit for AS3 and AS4 respectively, Suppose further that 490 AS3 provides transit for AS4 (in this case AS3 is a customer of AS1, 491 and AS4 is a multihomed customer of both AS3 and AS4). AS4 may want 492 to use the link to AS3 as a "backup" link, and sends AS3 a community 493 value that AS3 has configured to lower the preference of AS4's routes 494 to a level below that of its upstream provider, AS1. The intended 495 "backup routing" to AS4 is illustrated here: 497 AS1 ------> AS2 498 /|\ | 499 | | 500 | | 501 | \|/ 502 AS3 ------- AS4 504 That is, the AS3-AS4 link is intended to be used only when the 505 AS2-AS4 link is down (for outbound traffic, AS4 simply gives routes 506 from AS2 a higher local preference). This is a common scenario in 507 today's Internet. But note that this configuration has another 508 stable solution: 510 AS1 ------- AS2 511 | | 512 | | 513 | | 514 \|/ \|/ 515 AS3 ------> AS4 517 In this case, AS3 does not translate the "depref my route" community 518 received from AS4 into a "depref my route" community for AS1, and so 519 if AS1 hears the route to AS4 that transits AS3 it will prefer that 520 route (since AS3 is a customer). This state could be reached, for 521 example, by starting in the "correct" backup routing shown first, 522 bringing down the AS2-AS4 BGP session, and then bringing it back up. 523 In general, BGP has no way to prefer the "intended" solution over the 524 anomalous one, and which is picked will depend on the unpredictable 525 order of BGP messages. 527 While this example is relatively simple, many operators may fail to 528 recognize that the true source of the problem is that the BGP 529 policies of ASes can interact in unexpected ways, and that these 530 interactions can result in multiple stable routings. One can imagine 531 that the interactions could be much more complex in the real 532 Internet. We suspect that such anomalies will only become more common 533 as BGP continues to evolve with richer policy expressiveness. For 534 example, extended communities provide an even more flexible means of 535 signaling information within and between autonomous systems than is 536 possible with RFC 1997 communities. At the same time, applications of 537 communities by network operators are evolving to address complex 538 issues of inter-domain traffic engineering. 540 6.2. Existence of Stable Routings 542 One can also construct a set of policies for which BGP can not 543 guarantee that a stable routing exists (or worse, that a stable 544 routing will ever be found). For example, RFC 3345 [RFC3345] 545 documents several scenarios that lead to route oscillations 546 associated with the use of MEDs. Route oscillation will happen in BGP 547 when a set of policies has no solution. That is, when there is no 548 stable routing that satisfies the constraints imposed by policy, then 549 BGP has no choice by to keep trying. In addition, BGP configurations 550 can have a stable routing, yet the protocol may not be able to find 551 it; BGP can "get trapped" down a blind alley that has no solution. 553 Protocol divergence is not, however, a problem associated solely with 554 use of the MED attribute. This potential exists in BGP even without 555 the use of the MED attribute. Hence, like the unintended 556 nondeterminism described in the previous section, this type of 557 protocol divergence is an unintended consequence of the unconstrained 558 nature of BGP policy languages. 560 7. Applicability 562 In this section we answer the question of which environments is BGP 563 well suited, and for which environments it is not suitable. This 564 question is partially answered in Section 2 of RFC 1771 [RFC1771], 565 which states: 567 "To characterize the set of policy decisions that can be enforced 568 using BGP, one must focus on the rule that an AS advertises to its 569 neighbor ASs only those routes that it itself uses. This rule 570 reflects the "hop-by-hop" routing paradigm generally used 571 throughout the current Internet. Note that some policies cannot 572 be supported by the "hop-by-hop" routing paradigm and thus require 573 techniques such as source routing to enforce. For example, BGP 574 does not enable one AS to send traffic to a neighbor AS intending 575 that the traffic take a different route from that taken by traffic 576 originating in the neighbor AS. On the other hand, BGP can 577 support any policy conforming to the "hop-by-hop" routing 578 paradigm. Since the current Internet uses only the "hop-by-hop" 579 routing paradigm and since BGP can support any policy that 580 conforms to that paradigm, BGP is highly applicable as an inter-AS 581 routing protocol for the current Internet." 583 One of the important points here is that the BGP protocol contains 584 only the functionality that is essential, while at the same time 585 providing a flexible mechanism within the protocol that allow us to 586 extend its functionality. For example, BGP capabilities provide an 587 easy and flexible way to introduce new features within the protocol. 588 Finally, since BGP was designed with flexibility and extensibility in 589 mind, new and/or evolving requirements can be addressed via existing 590 mechanisms. 592 To summarize, BGP is well suitable as an inter-autonomous system 593 routing protocol for the IPv4 Internet that is based on IP [RFC791] 594 as the Internet Protocol and "hop-by-hop" routing paradigm. Finally, 595 BGP is equally applicable to IPv6 [RFC2460] internets. 597 8. Acknowledgments 599 We would like to thank Paul Traina for authoring previous versions of 600 this document. Tim Griffin also provided many insightful comments on 601 earlier versions of this document. 603 9. References 605 [BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A 606 Border Gateway Protocol 4 (BGP-4)", 607 draft-ietf-idr-bgp4-19.txt. Work in progress. 609 [RFC791] "INTERNET PROTOCOL", DARPA INTERNET PROGRAM 610 PROTOCOL SPECIFICATION, RFC 791, September, 611 1981. 613 [RFC854] Postel, J. and Reynolds, J., "TELNET PROTOCOL 614 SPECIFICATION", RFC 854, May, 1983. 616 [RFC1105] Lougheed, K., and Rekhter, Y, "Border Gateway 617 Protocol BGP", RFC 1105, June 1989. 619 [RFC1163] Lougheed, K., and Rekhter, Y, "Border Gateway 620 Protocol BGP", RFC 1105, June 1990. 622 [RFC1264] Hinden, R., "Internet Routing Protocol 623 Standardization Criteria", RFC 1264, October 1991. 625 [RFC1267] Lougheed, K., and Rekhter, Y, "Border Gateway 626 Protocol 3 (BGP-3)", RFC 1105, October 1991. 628 [RFC1519] Fuller, V., Li. T., Yu J., and K. Varadhan, 629 "Classless Inter-Domain Routing (CIDR): an 630 Address Assignment and Aggregation Strategy", RFC 631 1519, September 1993. 633 [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway 634 Protocol 4 (BGP-4)", RFC 1771, March 1995. 636 [RFC1772] Rekhter, Y., and P. Gross, Editors, "Application 637 of the Border Gateway Protocol in the Internet", 638 RFC 1772, March 1995. 640 [RFC1997] Chandra. R, and T. Li, "BGP Communities 641 Attribute", RFC 1997, August, 1996. 643 [RFC2439] Villamizar, C., Chandra, R., and Govindan, R., 644 "BGP Route Flap Damping", RFC 2439, November 645 1998. 647 [RFC2460] Deering, S, and R. Hinden, "Internet Protocol, 648 Version 6 (IPv6) Specification", RFC 2460, 649 December, 1998. 651 [RFC2622] C. Alaettinoglu, et al., "Routing Policy 652 Specification Language (RPSL)" RFC 2622, May, 653 1998. 655 [RFC2842] Chandra, R. and J. Scudder, "Capabilities 656 Advertisement with BGP-4", RFC 2842, May 2000. 658 [RFC3345] McPherson, D., Gill, V., Walton, D., and 659 Retana, A, "Border Gateway Protocol (BGP) Persistent 660 Route Oscillation Condition", RFC 3345, 661 August, 2002. 663 [ROUTEVIEWS] Meyer, David, "The Route Views Project", 664 http://www.routeviews.org 666 10. Author's Address 668 David Meyer 669 Email: dmm@maoz.com 671 Keyur Patel 672 Cisco Systems 673 Email: keyupate@cisco.com 675 11. Full Copyright Statement 677 Copyright (C) The Internet Society (2003). All Rights Reserved. 679 This document and translations of it may be copied and furnished to 680 others, and derivative works that comment on or otherwise explain it 681 or assist in its implementation may be prepared, copied, published 682 and distributed, in whole or in part, without restriction of any 683 kind, provided that the above copyright notice and this paragraph are 684 included on all such copies and derivative works. However, this 685 document itself may not be modified in any way, such as by removing 686 the copyright notice or references to the Internet Society or other 687 Internet organizations, except as needed for the purpose of 688 developing Internet standards in which case the procedures for 689 copyrights defined in the Internet Standards process must be 690 followed, or as required to translate it into languages other than 691 English. 693 The limited permissions granted above are perpetual and will not be 694 revoked by the Internet Society or its successors or assigns. 696 This document and the information contained herein is provided on an 697 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 698 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 699 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 700 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 701 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.