idnits 2.17.1 draft-iab-bgparch-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 872: '... protocol design SHOULD therefore cons...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 598 has weird spacing: '...ability also ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 2001) is 8471 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 12 == Missing Reference: 'RFC 1338' is mentioned on line 144, but not defined ** Obsolete undefined reference: RFC 1338 (Obsoleted by RFC 1519) == Missing Reference: 'Labowitz 2000' is mentioned on line 388, but not defined == Unused Reference: 'Bates 2000' is defined on line 896, but no explicit reference was found in the text == Unused Reference: 'Chen 2000' is defined on line 899, but no explicit reference was found in the text == Unused Reference: 'Labowitz' is defined on line 906, but no explicit reference was found in the text Summary: 9 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Architecture Board G. Huston 2 Internet Draft Telstra 3 Document: draft-iab-bgparch-00.txt February 2001 4 Category: Informational 6 Architectural Requirements for Inter-Domain Routing in the Internet 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC2026 [1]. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. Internet-Drafts are draft documents valid for a maximum of 17 six months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- Drafts 19 as reference material or to cite them other than as "work in 20 progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 1. Abstract 30 This draft examines the various longer term trends visible within 31 the characteristics of the Internet's BGP table and identifies a 32 number of operational practices and protocol factors which 33 contribute to these trends. The potential impacts of these practices 34 and protocol properties on the scaling properties of the inter- 35 domain routing space are examined. 37 These impacts include the potential for exhaustion of the existing 38 Autonomous System number space, increasing convergence times for 39 selection of stable alternate paths following withdrawal of route 40 announcements, the stability of table entries, and the average 41 prefix length of entries in the BGP table. The larger long term 42 issue is that of an increasingly denser inter-connectivity mesh 43 between AS's, causing a finer degree of granularity of inter-domain 44 policy and finer levels of control to undertake inter-domain traffic 45 engineering. 47 Various approaches to a refinement of the inter-domain routing 48 protocol and associated operating practices that may provide 49 superior scaling properties are identified as an area for further 50 investigation. 52 2. Network Scale and Inter-Domain Routing 54 Are there inherent scaling limitations in the technology of the 55 Internet, or its architecture of deployment, that may impact on the 56 ability of the Internet to meet escalating levels of demand? There 57 are a number of potential areas to search for such limitations. 58 These include the capacity of transmission systems, packet switching 59 capacity, the continued availability of protocol addresses, and the 60 capability of the routing system to produce a stable view of the 61 overall topology of the network. In this study we will look at this 62 latter capability with a view to identifying some aspects of the 63 scaling properties of the Internet's routing system. 65 The basic structure of the Internet is a collection of networks, or 66 Autonomous Systems (AS's) which are interconnected to form a 67 connected domain. Each AS uses an interior routing system to 68 maintain a coherent view of the topology within the AS, and uses an 69 exterior routing system to maintain adjacency information with 70 neighboring AS's and thereby create a view of the connectivity of 71 the entire system. This network-wide connectivity is described in 72 the routing table used by the BGP4 protocol. Each entry in the table 73 refers to a distinct route. The attributes of the route are used to 74 determine the best path from the local AS to the AS that is 75 originating the route. Determining the 'best path' in this case is 76 determining which routing advertisement and associated next hop 77 address is the most preferred. The BGP routing system is not aware 78 of finer level of topology within the local AS or within any remote 79 AS. From this perspective BGP can be seen as a connectivity 80 maintenance protocol, and the BGP routing table, a description of 81 the current connectivity of the Internet, using an AS as the basic 82 element of computation. 84 There is an associated dimension of policy determination within the 85 routing table. If an AS advertises a route to a neighboring AS, the 86 local AS is offering to accept traffic from the neighboring AS which 87 is ultimately destined to addresses described by the advertised 88 routing entry. If the local AS does not originate the route, then 89 the inference is that the local AS is willing to undertake the role 90 of transit provider for this traffic on behalf of some third party. 91 Similarly, an AS may or may not chose to accept a route from a 92 neighbor. Accepting a route implies that under some circumstances, 93 as determined by the local route selection parameters, the local AS 94 will use the neighboring AS to reach addresses spanned by the 95 router. The BGP routing domain maintains a coherent view of the 96 connectivity of the inter-AS domain, where connectivity is expressed 97 as a preference for 'shortest paths' to reach any destination 98 address, modulated by the connectivity policies expressed by each 99 AS, and coherence is expressed as a global constraint that none of 100 the paths contains loops or dead ends. The elements of the BGP 101 routing domain are routing entries, expressed as a span of 102 addresses. All addresses advertised within each routing entry share 103 a common origin AS and a common connectivity policy. The total size 104 of the BGP table is therefore a metric of the number of distinct 105 routes within the Internet, where each route describes a contiguous 106 set of addresses which share a common origin AS and a common 107 reachability policy. 109 When the scaling properties of the Internet were studied in the 110 early 1990s two critical factors identified in the study were, not 111 surprisingly, routing and addressing [RFC 1287]. As more devices 112 connect to the Internet they consume addresses, and the associated 113 function of maintaining reachability information for these 114 addresses, with an assumption of an associated growth in the number 115 of distinct provider networks and the number of distinct 116 connectivity policies, implies ever larger routing tables. The work 117 in studying the limitations of the 32 bit IPv4 address space 118 produced a number of outcomes, including the specification of IPv6 119 [RFC ], as well as the refinement of techniques of network 120 address translation [RFC ] intended to allow some degree of 121 transparent interaction between two networks using different address 122 realms. Growth in the routing system is not directly addressed by 123 these approaches, as the routing space is the cross product of the 124 complexity of the inter-AS topology of the network, multiplied by 125 the number of distinct connectivity policies multiplied by the 126 degree of fragmentation of the address space. For example, use of 127 NAT may reduce the pressure on the number of public addresses 128 required by a single connected network, but it does not necessarily 129 imply that the network's connectivity policies can be subsumed 130 within the aggregated policy of a single upstream provider. 132 When a network advertises a block of addresses into the exterior 133 routing space this entry is generally carried across the entire 134 exterior routing domain of the Internet. To measure the common 135 characteristics of the global routing table, it is necessary to 136 establish a point in the default-free part of the exterior routing 137 domain and examine the BGP routing table that is visible at that 138 point. 140 3. Measurements of the total size of the BGP Table 142 Measurements of the size of the routing table were somewhat sporadic 143 to start, and a number of measurements were take at approximate 144 monthly intervals from 1988 until 1992 by Merit [RFC 1338]. This 145 effort was resumed in 1994 By Erik-Jan Bos at Surfnet in the 146 Netherlands, who commenced measuring the size of the BGP table at 147 hourly intervals in 1994. This measurement technique was adopted by 148 the author in 1997, using a measurement point located at the edge of 149 AS 1221 in Australia, again using an hourly interval for the 150 measurement. The initial measurements were of the number of routing 151 entries contained within the set of selected best paths. These 152 measurements were expanded to include the number of AS numbers, 153 number of AS paths, and a set of measurements relating to the prefix 154 size of routing table entries. 156 We now have a view of the dynamics of the Internet's routing table 157 growth which spans some 13 years, and a very detailed view spanning 158 the most recent seven years [Huston 2001]. Looking at just the total 159 size of the BGP routing table over this period, it is possible to 160 identify four distinct phases of inter-AS routing practice in the 161 Internet. 163 3.1 Pre-CIDR Growth 165 The initial characteristics of the routing table size from 1988 166 until April 1994 show definite characteristics of exponential 167 growth. If continued unchecked, this growth would have lead to 168 saturation of the available BGP routing table space in the non- 169 default routers of the time within a small number of years. 171 Estimates of the time at which this would've happened varied 172 somewhat from study to study, but the overall general theme of these 173 observations was that the growth rates of the BGP routing table were 174 exceeding the growth in hardware and software capability of the 175 deployed network, and that at some point in the mid-90's, the BGP 176 table size would have grown to the point where it was larger than 177 the capabilities of available equipment to support. 179 3.2 CIDR Deployment 181 The response from the engineering community was the introduction of 182 a hierarchy into the inter-domain routing system. The intent of the 183 hierarchical routing structure was to allow a provider to merge the 184 routing entries for its customers into a single routing entry which 185 spanned its entire customer base. The practical aspects of this 186 change was the introduction of routing protocols which dispensed 187 with the requirement for the Class A, B and C address delineation, 188 replacing this scheme with a routing system that carried an address 189 prefix and an associated prefix length. This approached was termed 190 Classless Inter-Domain Routing (CIDR). 192 A concerted effort was undertaken in 1994 and 1995 to deploy CIDR 193 routing in the Internet, based on encouraging deployment of the 194 CIDR-capable version of the BGP protocol, BGP4 [RFC }. The 195 effects of this effort are visible in the history of the routing 196 table, where the routing table remained constant for some 14 months 197 at 20,000 entries in 1994 and 1995. 199 The intention of CIDR was one of hierarchical provider address 200 aggregation, where a network provider is allocated an address block 201 from an address registry, and the provider announces this entire 202 block into the exterior routing domain as a single entry with a 203 single routing policy. Customers of the provider use a sub- 204 allocation from this address block, and these smaller routing 205 elements are aggregated by the provider and not directly passed into 206 the exterior routing domain. During 1994 the size of the routing 207 table remained relatively constant at some 20,000 entries as the 208 growth in the number of providers announcing address blocks was 209 matched by a corresponding reduction in the number of address 210 announcements as a result of CIDR aggregation. 212 3.3 CIDR Growth 214 For the next four years until the start of 1998, CIDR proved 215 effective in damping unconstrained growth in the BGP routing table. 216 During this period, the BGP table grew at an approximate linear 217 rate, adding some 10,000 entries per year. 219 A close examination of the table reveals a greater level of 220 stability in the routing system at this time. The short term 221 (hourly) variation in the number of announced routes reduced, both 222 as a percentage of the number of announced routes, and also in 223 absolute terms. One of the other benefits of using large aggregate 224 address blocks is that an instability at the edge of the network is 225 not immediately propagated into the routing core. The instability at 226 the last hop is absorbed at the point at which an aggregate route is 227 used in place of a collection of more specific routes. This, coupled 228 with widespread adoption of BGP route flap damping, was been every 229 effective in reducing the short term instability in the routing 230 space during this period. 232 3.4 Current Growth 234 In late 1998 the trend of growth in the BGP table size changed 235 radically, and the growth for the past two years is again showing 236 all the signs of a re-establishment of exponential growth. It 237 appears that CIDR is unable to keep pace with the levels of growth 238 of the Internet, and some additional factors are becoming apparent 239 in the Internet which has lead to a growth pattern in the total size 240 of the BGP table which has some elements of compound growth rather 241 than linear growth. A best fit of the data for the period from 242 January 1999 until December 2000 indicates a compound growth model 243 of 42% growth per year. 245 An initial observation is that this growth pattern points to some 246 weakening of the hierarchical model of connectivity and routing 247 within the Internet. To identify the characteristics of this recent 248 trend it is necessary to look at a number of related characteristics 249 of the routing table. 251 4. Related Measurements derived from BGP Table 253 The level of analysis of the BGP routing table has been extended in 254 an effort to identify the factors contributing to this growth, and 255 to determine whether this leads to some limiting factors in the 256 potential size of the routing space. Analysis includes measuring the 257 number of AS's in the routing system, and the number of distinct AS 258 paths, the range of addresses spanned by the table and average span 259 of each routing entry. 261 4.1 AS Number Consumption 263 Each network that is multi-homed within the topology of the Internet 264 and wishes to express a distinct external routing policy must use a 265 unique AS number to associate its advertised addresses with such a 266 policy. In general, each network is associated with a single AS, and 267 the number of AS's in the default-free routing table tracks the 268 number of entities that have unique routing policies. There are some 269 exceptions to this, including large global transit providers with 270 varying regional policies, where multiple AS's are associated with a 271 single network, but such exceptions are relatively uncommon. 273 The number of unique AS's present in the BGP table has been tracked 274 since late 1996, and the trend of AS number deployment over the past 275 four years is also one which matches a compound growth model with a 276 growth rate of 51% per year. As of the start of 2001 there were some 277 9,500 AS's visible in the BGP table. At a continued rate of growth 278 of 51% p.a., the 16 bit AS number space will be fully deployed by 279 August 2005. Work is underway within the IETF to modify the BGP 280 protocol to carry AS numbers in a 32 bit field. [I-D Chen & Rekhter 281 work in progress 2000] While the protocol modifications are 282 relatively straightforward, the major responsibility rests with the 283 operations community to devise a transition plan that will allow 284 gradual transition into this larger AS number space. 286 4.2 Address Consumption 288 It is also possible to track the total amount of address space 289 advertised within the BGP routing table. At the start of 2001 the 290 routing table encompassed 1,081,131,733 addresses, or some 25.17% of 291 the total IPv4 address space. This has grown from 1,019,484,655 292 addresses in November 1999. However, there are a number of /8 293 prefixes which are periodically announced and withdrawn from the BGP 294 table, and if the effects of these prefixes is removed, a compound 295 growth model against the previous 12 months of data of this metric 296 yields a best fit model of growth of 7% per year in the total number 297 of addresses spanned by the routing table. 299 Compared to the 42% growth in the number of routing advertisements, 300 it would appear that much of the growth of the Internet in terms of 301 growth in the number of connected devices is occurring behind 302 various forms of NAT gateways. In terms of solving the perceived 303 finite nature of the address space identified just under a decade 304 ago, the Internet appears so far to have embraced the approach of 305 using NATs, irrespective of their various perceived functional 306 shortcomings. [RFC 2993] This also supports the observation of 307 smaller address fragments supporting distinct policies in the BGP 308 table, as such small address blocks may encompass arbitrarily large 309 networks located behind one or more NAT gateways. 311 4.3 Granularity of Table Entries 313 The intent of CIDR aggregation was to support the use of large 314 aggregate address announcements in the BGP routing table. To check 315 whether this is still the case the average span of each BGP 316 announcement has been tracked for the past 12 months. The data 317 indicates a decline in the average span of a BGP advertisement from 318 16,000 individual addresses in November 1999 to 12,100 in December 319 2000. This corresponds to an increase in the average prefix length 320 from /18.03 to /18.44. Separate observations of the average prefix 321 length used to route traffic in operation networks in late 2000 322 indicate an average length of 18.1 [Lothberg 2000]. This trend is 323 cause for concern as it implies the increasing spread of traffic 324 over greater numbers of increasingly finer forwarding table entries. 325 This, in turn, has implications for the design of high speed core 326 routers, particularly when extensive use is made of a small number 327 of very high speed cached forwarding entries within the switching 328 subsystem of a router's design. 330 A similar observation can be made regarding the number of addresses 331 advertised per AS. In December 1999 each AS advertised an average of 332 161,900 addresses (equivalent to a prefix length of /14.69(, and in 333 January 2001 this average has fallen to 115,800 addresses (an 334 equivalent prefix length of /15.18). 336 This points to increasingly finer levels of routing detail being 337 announced into the global routing domain, which in turn supports the 338 observation that the efficiencies of hierarchical routing structures 339 are no longer being realized within the deployed Internet, and 340 instead increasingly finer levels of routing detail are being 341 announced globally in the BGP tables. The most likely cause of this 342 trend of finer levels of routing granularity is an increasingly 343 dense interconnection mesh, where more networks are moving from a 344 single-homed connection with hierarchical addressing and routing 345 into multi-homed connections without any hierarchical structure. The 346 spur for this increasingly dense connectivity mesh in the Internet 347 may well be the declining unit costs of communications bearer 348 services coupled with a common perception that richer sets of 349 adjacencies yields greater levels of service resilience. 351 4.4 Prefix Length Distribution 353 In addition to looking at the average prefix length, the analysis of 354 the BGP table also includes an examination of the number of 355 advertisements of each prefix length. 357 An extensive program commenced in the mid-nineties to move away from 358 intense use of the Class C space and to encourage providers to 359 advertise larger address blocks, as part of the CIDR effort. This 360 has been reinforced by the address registries who have used provider 361 allocation blocks that correspond to a prefix length of /19 and, 362 more recently, /20. 364 These measures were introduced in the mid-90's when there were some 365 20,000 - 30,000 entries in the BGP table. Some six years later in 366 January 2001 it is interesting to note that of the 104,000 entries 367 in the routing table, some 59,000 entries have a /24 prefix. In 368 absolute terms the /24 prefix set is the fastest growing set in the 369 BGP routing table. The routing entries of these smaller address 370 blocks also show a much higher level of change on an hourly basis. 371 While a large number of BGP routing points perform route flap 372 damping, nevertheless there is still a very high level of 373 announcements and withdrawals of these entries in this particular 374 area of the routing table when viewed using a perspective of route 375 updates per prefix length. Given that the number of these small 376 prefixes are growing rapidly, there is cause for some concern that 377 the total level of BGP flux, in terms of the number of announcements 378 and withdrawals per second may be increasing, despite the pressures 379 from flap damping. This concern is coupled with the observation 380 that, in terms of BGP stability under scaling pressure, it is not 381 the absolute size of the BGP table which is of prime importance, but 382 the rate of dynamic path recomputations that occur in the wake of 383 announcements and withdrawals. Withdrawals are of particular concern 384 due to the number of transient intermediate states that the BGP 385 distance vector algorithm explores in processing a withdrawal. 386 Current experimental observations indicate a typical convergence 387 time of some 2 minutes to propagate a route withdrawal across the 388 BGP domain. [Labowitz 2000] An increase in the density of the BGP 389 mesh, coupled with an increase in the rate of such dynamic changes, 390 does have serious implications in maintaining the overall stability 391 of the BGP system as it continues to grow. The registry allocation 392 policies also have had some impact on the routing table prefix 393 distribution. The original registry practice was to use a minimum 394 allocation unit of a /19, and the 10,000 prefix entries in the /17 395 to /19 range are a consequence of this policy decision. More 396 recently the allocation policy now allows for a minimum allocation 397 unit of a /20 prefix, and the /20 prefix is used by some 4,300 398 entries as of January 2001, and in relative terms is one of the 399 fastest growing prefix sets. The number of entries corresponding to 400 very small address blocks (smaller than a /24), while small in 401 number as a proportion of the total BGP routing table, is the 402 fastest growing in relative terms. The number of /25 through /32 403 prefixes in the routing table is growing faster, in terms of 404 percentage change, than any other area of the routing table. If 405 prefix length filtering were in widespread use, the practice of 406 announcing a very small address block with a distinct routing policy 407 would have no particular beneficial outcome, as the address block 408 would not be passed throughout the global BGP routing domain and the 409 propagation of the associated policy would be limited in scope. The 410 growth of the number of these small address blocks, and the 411 diversity of AS paths associated with these routing entries, points 412 to a relatively limited use of prefix length filtering in today's 413 Internet. In the absence of any corrective pressure in the form of 414 widespread adoption of prefix length filtering, the very rapid 415 growth of global announcements of very small address blocks is 416 likely to continue. In percentage terms, the set of prefixes 417 spanning /25 to /32 show the largest growth rates. 419 4.5 Aggregation and Holes 421 With the CIDR routing structure it is possible to advertise a more 422 specific prefix of an existing aggregate. The purpose of this more 423 specific announcement is to punch a 'hole' in the policy of the 424 larger aggregate announcement, creating a different policy for the 425 specifically referenced address prefix. 427 Another use of this mechanism is to perform a rudimentary form of 428 load balancing and mutual backup for multi-homed networks. In this 429 model a network may advertise the same aggregate advertisement along 430 each connection, but then advertise a set of specific advertisements 431 for each connection, altering the specific advertisements such that 432 the load on each connection is approximately balanced. The two forms 433 of holes can be readily discerned in the routing table - while the 434 approach of policy differentiation uses an AS path which is 435 different from the aggregate advertisement, the load balancing and 436 mutual backup configuration uses the same As path for both the 437 aggregate and the specific advertisements. While it is difficult to 438 understand whether the use of such more specific advertisements was 439 intended to be an exception to a more general rule or not within the 440 original intent of CIDR deployment, there appears to be very 441 widespread use of this mechanism within the routing table. Some 442 41,600 advertisements, or 41% of the routing table, is being used to 443 punch policy holes in existing aggregate announcements. Of these the 444 overall majority of some 35,000 routes use distinct AS paths, so 445 that it does appear that this is evidence of finer levels of 446 granularity of connection policy in a densely interconnected space. 447 While long term data is not available for the relative level of such 448 advertisements as a proportion of the full routing table, the growth 449 level does strongly indicate that policy differentiation at a fine 450 level within existing provider aggregates is a significant driver of 451 overall table growth. 453 5. Current State of inter-AS routing in the Internet 455 The resumption of compound growth trends within the BGP table, and 456 the associated aspects of finer granularity of routing entries 457 within the table form adequate grounds for consideration of 458 potential refinements to the Internet's exterior routing protocols 459 and potential refinements to current operating practices of inter-AS 460 connectivity. With the exception of the 16 bit AS number space, 461 there is no particular finite limit to any aspect of the BGP table. 462 The motivation for such activity is that a long term pattern of 463 continued growth at current rates may once again pose a potential 464 condition where the capacity of the available processors may be 465 exceeded by some aspect of the Internet routing table. 467 5.1 A denser interconnectivity mesh 469 The decreasing unit cost of communications bearers in many part of 470 the Internet is creating a rapidly expanding market in exchange 471 points and other forms of inter-provider peering. The deployment 472 model of a single-homed network with a single upstream provider is 473 rapidly being supplanted by a model of extensive interconnection at 474 the edges of the Internet. The underlying deployment model assumed 475 by CIDR assumed a different structure, more akin to a strict 476 hierarchy of supply providers. The business imperatives driving this 477 denser mesh of interconnection in the Internet are irresistible, and 478 the casualty in this case is the CIDR-induced dampened growth of the 479 BGP routing table. 481 5.2 Multi-Homed small networks and service resiliency 483 It would appear that one of the major drivers of the recent growth 484 of the BGP table is that of small networks advertised as a /24 485 prefix entry in the routing table are multi-homing with a number of 486 peers and a number of upstream providers. In the appropriate 487 environment where there are a number of networks in relatively close 488 proximity, using peer relationships can reduce total connectivity 489 costs, as compared to using a single upstream service provider. 490 Equally significantly, multi-homing with a number of upstream 491 providers is seen as a means of improving the overall availability 492 of the service. In essence, multi-homing is seen as an acceptable 493 substitute for upstream service resiliency. This has a potential 494 side-effect that when multi-homing is seen as a preferable 495 substitute for upstream provider resiliency, the upstream provider 496 cannot command a price premium for proving resiliency as an 497 attribute of the provided service, and therefore has little 498 incentive to spend the additional money required to engineer 499 resiliency into the network. The actions of the network's multi- 500 homed clients then become self-fulfilling. One way to characterize 501 this behavior is that service resiliency in the Internet is becoming 502 the responsibility of the customer, not the service provider. 504 In such an environment resiliency still exists, but rather than 505 being a function of the bearer or switching subsystem, resiliency is 506 provided through the function of the BGP routing system. The 507 question is not whether this is feasible or desirable in the 508 individual case, but whether the BGP routing system can scale 509 adequately to continue to undertake this role. 511 5.3 Traffic Engineering via Routing 513 Further driving this growth in the routing table is the use of 514 selective advertisement of smaller prefixes along different paths in 515 an effort to undertake traffic engineering within a multi-homed 516 environment. While there is considerable effort being undertaken to 517 develop traffic engineering tools within a single network using MPLS 518 as the base flow management tool, inter-provider tools to achieve 519 similar outcomes are considerably more complex when using such 520 switching techniques. 522 At this stage the only tool being used for inter-provider traffic 523 engineering is that of the BGP routing table, further exacerbating 524 the growth and stability pressures being placed on the BGP routing 525 domain. 527 5.4 Lack of common operational practices 529 There is considerable evidence of a lack of uniformity of 530 operational practices within the inter-domain routing space. This 531 includes the use and setting of prefix filters, the use and setting 532 of route damping parameters and level of verification undertaken on 533 BGP advertisements by both the advertiser and the recipient. There 534 is some extent of 'noise' in the routing table where advertisements 535 appear to be propagated well beyond their intended domain of 536 applicability, and also where withdrawals and advertisements are not 537 being adequately damped close to the origin of the route flap. This 538 diversity of operating practices also extends to policies of 539 accepting advertisements which are more specific advertisements of 540 existing provider blocks. 542 5.5 CIDR and Hierarchical Routing 544 The current growth factors at play in the BGP table are not easily 545 susceptible to another round of CIDR deployment pressure within the 546 operator community. The denser interconnectivity mesh, the 547 increasing use of multi-homing with smaller address prefixes, the 548 extension of the use of BGP to perform roles related to inter-domain 549 traffic engineering and the lack of common operating practices all 550 point to a continuation of the trend of growth in the total size of 551 the BGP routing table, with this growth most apparent with 552 advertisements of smaller address blocks, and an increasing trend 553 for these small advertisements to be punching a connectivity policy 554 'hole' in an existing provider aggregate advertisement. 556 It may be appropriate to consider how to operate an Internet with a 557 BGP routing table which has millions of small entries, rather than 558 the expectation of a hierarchical routing space with at most tens of 559 thousands of larger entries in the global routing table. 561 6. Future Requirements for the Exterior Routing System 563 It is beyond the scope of this document to define a scaleable inter- 564 domain routing environment and associated routing protocols and 565 operating practices. A more modest goal is to look at the attributes 566 of routing systems as understood and identify those aspects of such 567 systems which may be applicable to the inter-domain environment as a 568 potential set of requirements for inter-domain routing tools. 570 6.1 Scalability 572 The overall intent is scalability of the routing environment. 573 Scalability can be expressed in many dimensions, including number of 574 discrete network layer reachability entries, number of discrete 575 route policy entries, level of dynamic change over a unit of time of 576 these entries, time to converge to a coherent view of the 577 connectivity of the network following changes, and so on. 579 The basic objective behind this expressed requirement for 580 scalability is that the most likely near to medium trend in the 581 structure of the Internet is a continuation in the pattern of dense 582 interconnectivity between a large number of discrete network 583 entities, and little impetus behind hierarchical aggregating 584 structures. It is not an objective to place any particular metrics 585 on scalability within this examination of requirements, aside from 586 indicating that a prudent view would encompass a scale of 587 connectivity in the inter-domain space that is at least two orders 588 of magnitude larger than comparable metrics of the current 589 environment. 591 6.2 Stability and Predictability 593 Any routing system should behave in a stable and predictable 594 fashion. What is inferred from the predictability requirement is the 595 behavior that under identical environmental conditions the routing 596 system should converge to the same state. Stability implies that the 597 routing state should be maintained for as long as the environmental 598 conditions remain constant. Stability also implies a qualitative 599 property that minor variations in the network's state should not 600 cause large scale instability across the entire network while a new 601 stable routing state is reached. Instead, routing changes should be 602 propagated only as far as necessary to reach a new stable state, so 603 that the global requirement for stability implies some degree of 604 locality in the behavior of the system. 606 6.3 Convergence 608 Any routing system should have adequate convergence properties. By 609 adequate it is implied that within a finite time following a change 610 in the external environment, the routing system will have reached a 611 shared common description of the network's topology which accurately 612 describes the current state of the network and which is stable. In 613 this case finite time implies a time limit which is bounded by some 614 upper limit, and this upper limit reflects the requirements of the 615 routing system. In the case of the Internet this convergence time is 616 currently of the order of hundreds of seconds as an upper bound on 617 convergence. A more useful upper bound for convergence is of the 618 order of tens of seconds or lower. 620 It is not a requirement to be able to undertake full convergence of 621 the inter-domain routing system in the sub-second timescale. 623 6.4 Routing Overhead 625 The greater the amount of information passed within the routing 626 system, and the greater the frequency of such information exchanges, 627 the greater the level of expectation that the routing system can 628 maintain an accurate view of the connectivity of the network. 629 Equally, the greater the amount of information passed within the 630 routing system, and the higher the frequency of information 631 exchange, the higher the level of overhead consumed by operation of 632 the routing system. There is an element of design compromise in a 633 routing system to pass enough information across the system to allow 634 each routing element to have adequate local information to reach a 635 coherent local view of the network, yet ensure that the total 636 routing overhead is low. 638 7 Architectural approaches to a scaleable Exterior Routing Protocol 640 This document does not attempt to define an inter-domain routing 641 protocol that possess all the attributes as listed above, but a 642 number of architectural considerations can be identified that would 643 form an integral part of the protocol design process. 645 7.1 Policy opaqueness vs policy transparency 647 The two major approaches to routing protocols are distance vector 648 and link state. 650 In the distance vector protocol a routing node gathers information 651 from its neighbors, applies local policy to this information and 652 then distributes this updated information to its neighbors. In this 653 model the nature of the local policy applied to the routing 654 information is not necessarily visible to the node's neighbors, and 655 the process of converting received route advertisements into 656 advertised route advertisements uses a local policy process whose 657 policy rules are not visible externally. This scenario can be 658 described as 'policy opaque'. The side-effect of such an environment 659 is that a third party cannot remotely compute which routes a network 660 may accept and which may be re-advertised to each neighbor. 662 In link state protocols a routing node effectively broadcasts its 663 local adjancies, and the policies it has with respect to these 664 adjancies, to all nodes within the link state domain. Every node can 665 perform an identical computation upon this set of adjancies and 666 associated policies in order to compute the local forwarding table. 667 The essential attribute of this environment is that the routing node 668 has to announce its routing policies, in order to allow a remote 669 node to compute which routes will be accepted from which neighbor, 670 and which routes will be advertised to each neighbor and what, if 671 any, attributes are placed on the advertisement. Within an interior 672 routing domain the local policies are in effect metrics of each link 673 and these polices can be announced within the routing domain without 674 any consequent impact. 676 In the exterior routing domain it is not the case that 677 interconnection policies between networks are always fully 678 transparent. Various permutations of supplier / customer 679 relationships and peering relationships have associated policy 680 qualifications which are not publicly announced for business 681 competitive reasons. The current diversity of interconnection 682 arrangements appears to be predicated on policy opaqueness, and to 683 mandate a change to a model of open interconnection policies may be 684 contrary to operational business imperatives. 686 An inter-domain routing tool should be able to support models of 687 interconnection where the policy associated with the interconnection 688 is not visible to any third party. This consideration would appear 689 to favor the continued use of a distance vector approach to inter- 690 domain routing which, in turn, has implications on the convergence 691 properties and stability of the inter-domain routing environment. 693 7.2 The number of routing objects 695 The current issues with the trend behaviors of the BGP space can be 696 coarsely summarized as the growth in the number of distinct routing 697 objects, the increased level of dynamic behaviors of these objects 698 (in the form of announcements and withdrawals). 700 This entails evaluating possible measures that can address the 701 growth rate in the number of objects in the inter-domain routing 702 table, and separately examining measures that can reduce the level 703 of dynamic change in the routing table. The current routing 704 architecture defines a basic unit of a route object as an 705 originating AS number and an address prefix. 707 In looking at the growth rate in the number of route objects, the 708 salient observation is that the number of route objects is the 709 byproduct of the density of the interconnection mesh and the number 710 of discrete points where policy is imposed of route objects. One 711 approach to reduce the growth in the number of objects is to allow 712 each object to describe larger segments of infrastructure. Such an 713 approach could use a single route object to describe a set of 714 address prefixes, or a collection of ASs, or a combination of the 715 two. The most direct form of extension would be to preserve the 716 assumption that each routing object represents an indivisible policy 717 entity. However, given that one of the drivers of the increasing 718 number of route objects is a proliferation of discrete route 719 objects, it is not immediately apparent that this form of 720 aggregation will prove capable in addressing the growth in the 721 number of route objects. 723 If single route objects are to be used that encompass a set of 724 address prefixes and a collection of ASs, then it appears necessary 725 to define additional attributes within the route object to further 726 qualify the policies associated with the object in terms of specific 727 prefixes, specific ASs and specific policy semantics that may be 728 considered as policy exceptions to the overall aggregate 730 Another approach to reduce the number of route objects is to reduce 731 the scope of advertisement of each routing object, allowing the 732 object to be removed and proxy aggregated into some larger object 733 once the logical scope of the object has been reached. This approach 734 would entail the addition of route attributes which could be used to 735 define the circumstances where a specific route object would be 736 subsumed by an aggregate route object without impacting the policy 737 objectives associated with the original set of advertisements. 739 7.3 Inter-domain Traffic Engineering 741 Attempting to place greater levels of detail into route objects is 742 intended to address the dual role of the current BGP system as both 743 an inter-domain connectivity maintenance protocol and as an implicit 744 traffic engineering tool. 746 In the current environment, advertisement of more specific prefixes 747 with unique policy is intended to create a traffic engineering 748 response, where incoming traffic to an AS may be balanced across 749 multiple paths. The outcome is that the control of the relative 750 profile of load is placed with the originating AS. The way this is 751 achieved is by using limited knowledge of the remote AS's route 752 selection policy to explicitly limit the number of egress choices 753 available to a remote AS. The most common route selection policy is 754 the preference for more specific prefixes over larger address 755 blocks. By advertising specific prefixes along specific neighbor AS 756 connections with specific route attributes, traffic destined to 757 these addresses is passed through the selected transit paths. This 758 limitation of choice allows the originating AS to override the 759 potential policy choices of all other ASs, imposing its traffic 760 import policies at a higher level than the remote AS's egress 761 policies. 763 An alternative approach is the use of a class of traffic engineering 764 attributes which are attached to an aggregate route object, allowing 765 each remotes AS to respond to the route object in a manner that 766 equates to the current specific prefix response, but without the 767 multiplicity of specific prefix route objects. However, even this 768 approach uses route objects to communicate traffic engineering 769 policy, and the same risk remains that the route table is used to 770 carry fine-detailed traffic path policies. 772 An alternative direction is to separate the functions of 773 connectivity maintenance and traffic engineering, using the routing 774 protocol to identify a number of viable paths from a source AS to a 775 destination AS, and use a distinct collection of traffic engineering 776 tools to allow a traffic source AS to make egress path selections 777 that match the desired traffic service profile for the traffic. 779 There is one critical difference between traffic engineering 780 approaches as used in intra-domain environments and the current 781 inter-domain operating practices. Whereas the intra-domain 782 environment uses the ingress network element to make the appropriate 783 path choice to the egress point, the inter domain traffic 784 engineering has the opposite intent, where a downstream AS(or egress 785 point) is attempting to constrain the path choice of an upstream AS 786 (or ingress point). If explicit traffic engineering were undertaken 787 within the inter-domain space, it is highly likely that the current 788 structure would be altered. Instead of the downstream element 789 attempting to constrain the path choices of an upstream element, a 790 probable approach is the downstream element placing a number of 791 advisory constraints on the upstream elements, and the upstream 792 elements using a combination of these advisory constraints, dynamic 793 information relating to path service characteristics and local 794 policies to make an egress choice. 796 From the perspective of the inter-domain routing environment, such 797 measures offer the potential to remove the advertisement of specific 798 routes for traffic engineering purposes. However, there is a need to 799 adding traffic engineering information into advertised route blocks, 800 requiring the definition of the syntax and semantics of traffic 801 engineering attributes that can be attached to route objects. 803 7.4 Hierarchical Routing Models 805 The CIDR routing model assumed a hierarchy of providers, where at 806 each level in the hierarchy the routing policies and address space 807 of networks at the lower level of hierarchy were subsumed by the 808 next level up (or `upstream') provider. The connectivity policy 809 assumed by this model is also a hierarchical model, where horizontal 810 connections within a single level of the hierarchy are not visible 811 beyond the networks of the two parties. 813 A number of external factors are increasing the density of 814 interconnection including decreasing unit costs of communications 815 services and the increasing use of exchange points to augment point- 816 to-point connectivity models with point-to-multipoint facilities. 818 The outcome of these external factors is a significant reduction in 819 the hierarchical nature of the inter-domain space. The outcomes of 820 this characteristic of the Internet in terms of the routing space is 821 the increasing number of distinct route policies that are associated 822 with each multi-homed network within the Internet. 824 One way to limit the proliferation of such policies across the 825 entire inter-domain space is to associate attributes to such 826 advertisements that specify the conditions whereby a remote transit 827 AS may proxy-aggregate this route object with other route objects. 829 7.5 Extend or Replace BGP 831 A final consideration is to consider whether these requirements can 832 best be met by an approach of a set of upward-compatible extensions 833 to BGP, or by a replacement to BGP. No recommendation is made here, 834 and this is a topic requiring further investigation. 836 The general approach in extending BGP appears to lie in increasing 837 the number of supported transitive route attributes, allowing the 838 route originator greater control in specifying the level of 839 propagation of the route and the intended outcome in terms of policy 840 and traffic engineering. It is also be necessary to allow BGP 841 sessions to negotiate an enhanced capability to improve the 842 convergence behavior of the protocol. Whether such changes can 843 produce a scaleable and useful outcome in terms of inter-domain 844 routing remains, at this stage, an open question. 846 An alternative approach is that of a replacement protocol, and such 847 an approach may well be based on the adoption of a link-state 848 behavior. The issues of policy opaqueness and link-state protocols 849 have been described above. The other major issue with such an 850 approach is the need to limit the extent of link state flooding, 851 where the inter-domain space would need some further levels of 852 imposed structure similar to intra-domain areas. Such structure may 853 well imply the need for an additional set of operator inter- 854 relationships such as mutual transit, and this may prove challenging 855 to adapt to existing practices. 857 8. Security Considerations 859 Any adopted inter-domain routing protocol needs to be secure against 860 disruption. Disruption comes from two primary sources: 861 - Accidental misconfiguration 862 - Malicious attacks 864 Given past experience with routing protocols, both can be 865 significant sources of harm. 867 Given that it is not reasonable to guarantee the security of all the 868 routers involved in the global Internet interdomain routing system, 869 there is also every reason to believe that malicious attacks may 870 come from peer routers, in addition to coming from external sources. 872 A protocol design SHOULD therefore consider how to minimize the 873 damage to the overall routing computation that can be caused by a 874 single or small set of misbehaving routers. 876 The routing system itself needs to be resilient against acceidental 877 or malicious advertisements of a route object by a route server not 878 entitled to generate such an advertisement. This implies several 879 things, including the need for cruptographic validation of 880 announcements, cryptographic protection of routing messages and an 881 accurate and trusted database of routing assignments via which 882 authorization can be checked. 884 9. References 886 [RFC 1287] "Towards the Future Internet Architecture", D. Clark, 887 L. Chapin, V. Cerf, R. Braden, R. Hobby, RFC 1287, December 1991. 889 [RFC 1338[ "Supernetting: an Address Assignment and Aggregation 890 Strategy", Supernetting: an Address Assignment and Aggregation 891 Strategy", V. Fuller, T. Li, J. Yu, K. Varadhan, June 1992. 893 [RFC 2993] "Architectural Implications of NAT", T. Hain, November 894 2000. 896 [Bates 2000] "The CIDR Report", T. Bates, updated weekly at 897 http://www.employees.org/~tbates/cidr-report.html 899 [Chen 2000] "BGP Support for four-octet AS number space", E. Chen, 900 Y. Rekhter, work in progress (currently published as an Internet 901 Draft: draft-chen-as4bytes-00.txt), November 2000. 903 [Huston 2001] "BGP Table Report" updated hourly at 904 http://www.telstra.net/ops/bgp 906 [Labowitz] bgp convergence 908 [Lothberg 2000] Peter Lothberg, personal communication. 910 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP 911 9, RFC 2026, October 1996. 913 10. Acknowledgements 915 The author acknowledges the assistance of Brian Carpenter, Harald 916 Alvestrand and Steve Bellovin in preparing this document. 918 11. Author's Addresses 920 Geoff Huston 921 Telstra 922 5/490 Northbourne Ave 923 Dickson ACT 2602 924 AUSTRALIA 926 EMail: gih@telstra.net 928 Full Copyright Statement 930 "Copyright (C) The Internet Society (date). All Rights Reserved. 931 This document and translations of it may be copied and furnished to 932 others, and derivative works that comment on or otherwise explain it 933 or assist in its implmentation may be prepared, copied, published 934 and distributed, in whole or in part, without restriction of any 935 kind, provided that the above copyright notice and this paragraph 936 are included on all such copies and derivative works. However, this 937 document itself may not be modified in any way, such as by removing 938 the copyright notice or references to the Internet Society or other 939 Internet organizations, except as needed for the purpose of 940 developing Internet standards in which case the procedures for 941 copyrights defined in the Internet Standards process must be 942 followed, or as required to translate it into