idnits 2.17.1 draft-narten-radir-problem-statement-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 17, 2010) is 5172 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'GSE' is mentioned on line 109, but not defined == Missing Reference: 'ROAD' is mentioned on line 109, but not defined -- Obsolete informational reference (is this intentional?): RFC 3344 (Obsoleted by RFC 5944) -- Obsolete informational reference (is this intentional?): RFC 3775 (Obsoleted by RFC 6275) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group T. Narten 3 Internet-Draft IBM 4 Intended status: Informational February 17, 2010 5 Expires: August 21, 2010 7 On the Scalability of Internet Routing 8 draft-narten-radir-problem-statement-05.txt 10 Abstract 12 There has been much discussion over the last years about the overall 13 scalability of the Internet routing system. Some have argued that 14 the resources required to maintain routing tables in the core of the 15 Internet are growing faster than available technology will be able to 16 keep up. Others disagree with that assessment. This document 17 attempts to describe the factors that are placing pressure on the 18 routing system and the growth trends behind those factors. 20 Status of this Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet- 28 Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on August 21, 2010. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the BSD License. 58 This document may contain material from IETF Documents or IETF 59 Contributions published or made publicly available before November 60 10, 2008. The person(s) controlling the copyright in some of this 61 material may not have granted the IETF Trust the right to allow 62 modifications of such material outside the IETF Standards Process. 63 Without obtaining an adequate license from the person(s) controlling 64 the copyright in such materials, this document may not be modified 65 outside the IETF Standards Process, and derivative works of it may 66 not be created outside the IETF Standards Process, except to format 67 it for publication as an RFC or to translate it into languages other 68 than English. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 73 2. Terms and Definitions . . . . . . . . . . . . . . . . . . . . 4 74 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 6 75 3.1. Technical Aspects . . . . . . . . . . . . . . . . . . . . 7 76 3.2. Business Considerations . . . . . . . . . . . . . . . . . 7 77 3.3. Alignment of Incentives . . . . . . . . . . . . . . . . . 8 78 3.4. Table Growth Targets . . . . . . . . . . . . . . . . . . . 9 79 4. Pressures on Routing Table Size . . . . . . . . . . . . . . . 10 80 4.1. Traffic Engineering . . . . . . . . . . . . . . . . . . . 10 81 4.2. Multihoming . . . . . . . . . . . . . . . . . . . . . . . 11 82 4.3. End Site Renumbering . . . . . . . . . . . . . . . . . . . 12 83 4.4. Acquisitions and Mergers . . . . . . . . . . . . . . . . . 12 84 4.5. RIR Address Allocation Policies . . . . . . . . . . . . . 12 85 4.6. Dual Stack Pressure on the Routing Table . . . . . . . . . 13 86 4.7. Internal Customer Routes . . . . . . . . . . . . . . . . . 14 87 4.8. IPv4 Address Exhaustion . . . . . . . . . . . . . . . . . 14 88 5. Pressures on Control Plane Load . . . . . . . . . . . . . . . 15 89 5.1. Interconnection Richness . . . . . . . . . . . . . . . . . 15 90 5.2. Multihoming . . . . . . . . . . . . . . . . . . . . . . . 15 91 5.3. Traffic Engineering . . . . . . . . . . . . . . . . . . . 15 92 5.4. Questionable Operational Practices? . . . . . . . . . . . 16 93 5.4.1. Rapid shuffling of prefixes . . . . . . . . . . . . . 16 94 5.4.2. Anti-Route Hijacking . . . . . . . . . . . . . . . . . 16 95 5.4.3. Operational Ignorance . . . . . . . . . . . . . . . . 16 96 5.5. RIR Policy . . . . . . . . . . . . . . . . . . . . . . . . 17 97 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 98 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 99 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 100 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 22 101 10. Informative References . . . . . . . . . . . . . . . . . . . . 23 102 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24 104 1. Introduction 106 Prompted in part by the October, 2006 IAB workshop on Routing & 107 Addressing [RFC4984], there has been a renewed focus on the topic of 108 routing scalability within the Internet. The issue itself is not 109 new, with discussions dating back at least 10-15 years [GSE, ROAD]. 111 This document attempts to describe the "pain points" being placed on 112 the routing system, with the aim of describing the essential aspects 113 so that the community has a way of evaluating whether proposed 114 changes to the routing system actually address or impact existing 115 pain points in a significant manner. 117 2. Terms and Definitions 119 Control Plane: The routing setup protocols, their associated state 120 and the activity needed to create and maintain the data structures 121 used to forward packets from one network to another. The term is 122 defined broadly to include all protocols and activities needed to 123 construct and maintain the forwarding tables used to forward 124 packets. 126 Control Plane Load: The actual load associated with operating the 127 Control Plane. The higher the control plane load, the higher the 128 cost of operating the control plane (in terms of hardware, 129 bandwidth, power, etc.). The terms "routing load" and "control 130 plane load" are used interchangeably throughout this document. 132 Control Plane Cost: The overall cost associated with operating the 133 Control Plane. The cost consists of capital costs (for hardware), 134 bandwidth costs (for the control plane signalling) and any other 135 ongoing operational cost associated with operating and maintaining 136 the control plane. 138 Default Free Zone (DFZ): That part of the Internet where routers 139 maintain full routing tables. Many routers maintain only partial 140 tables, having explicit routes for "local" destinations (i.e., 141 prefixes) plus a "default" for everything else. For such routers, 142 building and maintaining routing tables is relatively simple 143 because the amount of information learned and maintained can be 144 small. In contrast, routers in the DFZ maintain complete 145 information about all reachable destinations, which at the time of 146 this writing number in the hundreds of thousands of entries. 148 Routing Information Base (RIB): The data structures a router 149 maintains that hold the information about destinations and paths 150 to those destinations. The amount of state information maintained 151 is dependent on a number of factors, including the number of 152 individual prefixes learned from peers, the number of BGP peers, 153 the number of distinct paths interconnecting destinations, etc. 154 In addition to maintaining information about active paths used for 155 forwarding, the RIB may also include information about unused 156 ("backup") paths. 158 Forwarding Information Base (FIB): The actual table consulted while 159 making forwarding decisions for individual packets. The FIB is a 160 compact, optimized subset of the RIB, containing only the 161 information needed to actually forward individual packets, i.e., 162 mapping a packet's destination address to an outgoing interface 163 and next-hop. The FIB only stores information about paths 164 actually used for forwarding; it typically does not store 165 information about backup paths. The FIB is typically constructed 166 from specialized hardware components, which have different (and 167 higher) cost properties than the hardware typically used to 168 maintain the RIB. 170 Traffic Engineering (TE): In this document, "traffic engineering" 171 refers to the current practice of inbound, inter-AS traffic 172 engineering. TE is accomplished by injecting additional, more- 173 specific routes into the routing system and/or increasing the 174 frequency of routing updates in order to arrange for inbound 175 traffic at the boundary of an Autonomous system (AS) to travel 176 over a different path than it otherwise would. 178 Provider Aggregatable (PA) address space: Address space that an end 179 site obtains from an upstream ISP's address block. The main 180 benefit of PA address space is that reachability to all of a 181 provider's customers can be achieved by advertising a single 182 "provider aggregate" address prefix into the DFZ, rather than 183 needing to announce individual prefixes for each customer. An 184 important disadvantage is that when a customer changes providers, 185 the customer must renumber their site into addresses belonging to 186 the new provider and return the previously used addresses to the 187 former provider. 189 Provider Independent (PI) address space: Address space that an end 190 site obtains directly from a Regional Internet Registry (RIR) for 191 addressing its devices. The main advantage (for the end site) is 192 that it does not need to renumber its site when changing 193 providers, since it continues to use its PI block. However, PI 194 address blocks are not aggregatable and thus each individual PI 195 assignment results in an individual prefix being injected into the 196 DFZ. 198 Site: Any topologically and administratively distinct entity that 199 connects to the Internet. A site can range from a large 200 enterprise or ISP to a small home site. 202 3. Background 204 Within the DFZ, both the size of the RIB and FIB and the overall 205 update rate have historically increased at a greater than linear 206 rate. Specifically: 208 o The number of individual prefixes that are being propagated into 209 the DFZ over time has been and continues to increase at a faster- 210 than-linear rate. The term "super-linear" has been used to 211 characterize the growth. The exact nature of the growth is much 212 debated (e.g., quadratic, polynomial, etc.), but growth is clearly 213 faster than linear. The reasons behind the rate increase are 214 varied and discussed below. Because each individual prefix 215 requires resources to process, any increase in the number of 216 prefixes produces a corresponding increase in control plane load 217 of the routing system. Each individual prefix that appears in 218 routing updates requires state in the RIB (and possibly the FIB) 219 and consumes processing and other resources when updates related 220 to the prefix are received. 222 o The overall rate of routing updates is increasing [1], requiring 223 routers to process updates at an increased rate or converge more 224 slowly if they cannot. The rate increase of the control plane 225 load is driven by a number of factors (discussed below). Further 226 study is needed to better understand the factors behind the 227 increasing update rate. For example, it appears that a 228 disproportionate increase in observed updates originates from a 229 small percentage of the total number of advertised prefixes. 231 The super-linear growth in the routing load presents a scalability 232 challenge for current and/or future routers. While there appears to 233 be general agreement that we will be able to build routers (i.e., 234 hardware & software) actually capable of handling the control plane 235 load, both today and going forward, there is considerable debate 236 about the cost. In particular, will it be possible for ISPs that 237 currently (or would like to) maintain routes as part of the DFZ be 238 able to afford to do so, or will only the largest (and a shrinking 239 number) of top tier ISPs be able to afford the investment and cost of 240 operating the control plane while being a part of the DFZ? 242 Finally, the scalability challenge is aggravated by the lack of any 243 firm limiting architectural upper-bound on the growth rate of the 244 routing load and a weakening of social constraints that historically 245 have helped restrain the growth rate so far. Going forward, there is 246 considerable uncertainty (some would say doubt) whether future growth 247 rates will continue to be sufficiently constrained so that router 248 development can keep up at an acceptable price point. 250 3.1. Technical Aspects 252 The technical challenge of building routers relates to the resources 253 needed to process a larger and increasingly dynamic amount of routing 254 information. More specifically, routers must maintain an increasing 255 amount of associated state information in the RIB, they must be 256 capable of populating a growing FIB, they must perform forwarding 257 lookups at line rates (while accessing the FIB) and they must be able 258 to initialize the RIB and FIB after system restart. All of these 259 activities must take place within acceptable time frames (i.e., paths 260 for individual destinations must converge and stabilize within an 261 acceptable time period). Finally, the hardware needed to achieve 262 this cannot have unreasonable power consumption or cooling demands. 264 3.2. Business Considerations 266 While the IETF does not (and cannot) concern itself with business 267 models or the profitability of the ISP community, the cost of running 268 the routing subsystem as a whole is directly influenced by the 269 routing architecture of the Internet, which clearly is the IETF's 270 business. Thus, it is useful to consider the overall business 271 environment that underlies operation of the DFZ routing 272 infrastructure. The DFZ is run entirely by the private sector with 273 no overall governmental oversight or regulatory framework to oversee 274 or even influence what routes are propagated where, who must carry 275 them, etc. ISPs decide (on their own) which routing updates to 276 accept and how (if at all) to process them. Thus, there is no 277 overall authority that can limit the number of prefixes that are 278 injected into the DFZ or that insure that any particular prefixes are 279 accepted at all. Today, the system functions because the set of 280 entities that comprise the DFZ are (generally) able to accept the 281 prefixes that are being advertised and some loose best practices have 282 emerged that are generally followed (e.g., minimum prefix sizes that 283 are routed coupled with RIR policies that place limitations on who 284 may obtain PI prefixes). 286 In general the Internet would benefit if the cost of the (routing) 287 infrastructure did not grow too rapidly as the Internet grows, since 288 a lower infrastructure cost makes it possible to provide Internet 289 service at a lower cost to a larger number of users. That said, some 290 types of Internet growth tie directly to revenue opportunities or 291 cost savings for an ISP (e.g., adding more users/customers, 292 increasing bandwidth, technological advances, providing new or 293 additional services, etc.). Upgrading or changing infrastructure is 294 most feasible (and expected) when supported by a workable cost 295 recovery model. Hence limiting the cost of self-induced scaling is a 296 nice-to-have benefit, but not a requirement. 298 On the other hand, it is problematic when the infrastructure cost for 299 an ISP grows (rapidly) due to factors outside of its own control, 300 e.g., resulting from overall Internet growth external to the ISP. If 301 an ISP that does not add new customers, upgrade the bandwidth for 302 their customers, or provide new services needs to upgrade or replace 303 their infrastructure in unexpected ways, then they have no natural 304 cost recovery mechanisms. This is in essence what is happening with 305 the scaling of the global routing table. An ISP that is part of the 306 DFZ may need to upgrade its routers to handle an increased routing 307 load just to maintain the same level of service with respect to their 308 current customers and services. 310 Even if it is technically possible to build routers capable of 311 meeting the technical and operational requirements, it is also 312 necessary that the overall cost to build, maintain and deploy such 313 equipment meet reasonable business expectations. ISPs, after all, 314 are run as businesses. As such, they must be able to plan, develop 315 and construct viable business plans that provide an acceptable return 316 on investment (i.e., one acceptable to investors). 318 3.3. Alignment of Incentives 320 Today's growth pattern is influenced by the scaling properties of the 321 current routing system. If the routing system had better scaling 322 properties, we would be able support and enable more widespread usage 323 of such services as multihoming and traffic engineering. The current 324 system simply would not be able to handle to the routing load if 325 everyone were to choose to multihome. There are millions of 326 potential end sites that would benefit from being able to multihome. 327 This compares with a low few hundred thousand prefixes being carried 328 today. Broader availability of multihoming is limited by barriers 329 imposed by operational practices that try to strike a balance between 330 the amount of multihoming and preservation of routing slots. It is 331 desirable that the routing and addressing system exert the least 332 possible back pressure on end user applications and deployment 333 scenarios, to enable the broadest possible use of the Internet. 335 One aspect of the current architecture is a misalignment of cost and 336 benefit. Injecting individual prefixes into the DFZ creates a small 337 amount of "pain" for those routers that are part of the DFZ. Each 338 individual prefix adds only a small cost to the routing load, but the 339 aggregate sum of all prefixes is significant, and leads to the key 340 issue at hand. Those that inject prefixes into the DFZ do not 341 generally pay the cost associated with the individual prefix -- it is 342 carried by the routers in the DFZ. But the originator of the prefix 343 receives the benefit. Hence, there is misalignment of incentives 344 between those receiving the benefit and those bearing the cost of 345 providing the benefit. Consequently, incentives are not aligned 346 properly to produce a natural feedback loop to balance the cost and 347 benefit of maintaining routing tables. 349 3.4. Table Growth Targets 351 A precise target for the rate of table size or routing update 352 increase that should reasonably be supported going forward is 353 difficult to state in quantitative terms. One target might simply be 354 to keep the growth at a stable, but manageable growth rate so that 355 the increased router functionality can roughly be covered by 356 improvements in technology (e.g., increased processor speeds, 357 reductions in component costs, etc.). 359 However, it is highly desirable to significantly bring down (or even 360 reverse) the growth rate in order to meet user expectations for 361 specific services. As discussed below, there are numerous pressures 362 to deaggregate routes. These pressures come from users seeking 363 specific, tangible service improvements that provide "business- 364 critical" value. Today, some of those services simply cannot be 365 supported to the degree that future demand can reasonably be expected 366 because of the negative implications on DFZ table growth. Hence, 367 valuable services are available to some, but not all potential 368 customers. As the need for such services becomes increasingly 369 important, it will be difficult to deny such services to large 370 numbers of users, especially when some "lucky" sites are able to use 371 the service and others are not. 373 4. Pressures on Routing Table Size 375 There are a number of factors behind the increase in the quantity of 376 prefixes appearing in the DFZ. From a theoretical perspective, the 377 number of prefixes in the DFZ can be minimized through aggressive 378 aggregation [RFC4632]. In practice, strict adherence to the CIDR 379 principles is difficult. 381 4.1. Traffic Engineering 383 Traffic engineering (TE) is the act of arranging for certain Internet 384 traffic to use or avoid certain network paths (that is, TE attempts 385 to place traffic where capacity exists, or where some set of 386 parameters of the path is more favorable to the traffic being placed 387 there). 389 Outbound TE is typically accomplished by using internal interial 390 gateway protocol (IGP) metrics to choose the shortest exit for two 391 equally good BGP paths. Adjustment of IGP metrics controls how much 392 traffic flows over different internal paths to specific exit points 393 for two equally good BGP paths. Additional traffic can be moved by 394 applying some policy to depreference or filter certain routes from 395 specific BGP peers. Because outbound TE is achieved via a site's own 396 IGP, outbound TE does not impact routing outside of a site. 398 Inbound TE is performed by announcing a more-specific route along the 399 preferred path that "catches" the desired traffic and channels it 400 away from the path it would take otherwise (i.e., via a larger 401 aggregate). At the BGP level, if the address range requiring TE is a 402 portion of a larger address aggregate, network operators implementing 403 TE are forced to de-aggregate otherwise aggregatable prefixes in 404 order to steer the traffic of the particular address range to 405 specific paths. 407 TE is performed by both ISPs and customer networks, for three primary 408 reasons: 410 o to match traffic with network capacity, or to spread the traffic 411 load across multiple links (frequently referred to as "load 412 balancing") 414 o to reduce costs by shifting traffic to lower cost paths or by 415 balancing the incoming and outgoing traffic volume to maintain 416 appropriate peering relations 418 o to enforce certain forms of policy (e.g., to prevent government 419 traffic from transiting through other countries) 421 TE impacts route scaling in two ways. First, inbound TE can result 422 in additional prefixes being advertised into the DFZ. Second, 423 Network operators usually achieve traffic engineering by "tweaking" 424 the processing of routing protocols to achieve desired results, e.g., 425 by sending updates at an increased rate. In addition, some devices 426 attempt to automatically find better paths and then advertise those 427 preferences through BGP, though the extent to which such tools are in 428 use and contributing to the control plane load is unknown. 430 In today's highly competitive environment, providers require TE to 431 maintain good performance and low cost in their networks. 433 4.2. Multihoming 435 Multihoming refers generically to the case in which a site is served 436 by more than one ISP [RFC4116]. Multihoming is used to provide 437 backup paths (i.e., to remove single points of failure), to achieve 438 load-sharing, and to achieve policy or performance objectives (e.g., 439 to use lower latency or higher bandwidth paths). Multihoming may 440 also be a requirement due to contract or law. 442 Multihoming can be accomplished using either PI or PA address space. 443 A multihomed site advertises its site prefix into the routing system 444 of each of its providers. For PI space, the site's PI space is used, 445 and the prefix is propagated throughout the DFZ. For PA space, the 446 PA site prefix may (or may not) be propagated throughout the DFZ, 447 with the details depending on what type of multihoming is sought. 449 If the site uses PA space, the PA site prefix allocated from one of 450 its providers (whom we'll call the Primary Provider) is used. The PA 451 site prefix will be aggregatable by the Primary Provider but not the 452 others. To achieve multihoming with comparable properties to that 453 when PI addresses are used as described above, the PA site prefix 454 will need to be injected into the routing system of all of its ISPs, 455 and throughout the DFZ. In addition, because of the longest-match 456 forwarding rule, the Primary Provider must advertise both its 457 aggregate and the individual PA site prefix; otherwise, the path via 458 the primary provider (as advertised via the aggregate) will never be 459 selected due to the longest match rule. For the type of multihoming 460 described here, where the PA site prefix is propagated throughout the 461 DFZ, the use of PI vs. PA space has no impact on the control plane 462 load. The increased load is due entirely to the need to propagate 463 the site's individual prefix throughout the DFZ. 465 The demand for multihoming is increasing [2]. The increase in 466 multihoming demand is due to the increased reliance on the Internet 467 for mission and business-critical applications (where businesses 468 require 7x24 availability for their services) and the general 469 decrease in cost of Internet connectivity. 471 4.3. End Site Renumbering 473 It is generally considered painful and costly to renumber a site, 474 with the cost proportional to the size and complexity of the network 475 and most importantly, to the degree that addresses are stored in 476 places that are difficult in practice to update. When using PA 477 space, a site must renumber when changing providers. Larger sites 478 object to this cost and view the requirement to renumber akin to 479 being held "hostage" to the provider from which PA space was 480 obtained. Consequently, many sites desire PI space. Having PI space 481 provides independence from any one provider and makes it easier to 482 switch providers (for whatever reason). However, each individual PI 483 prefix must be propagated throughout the DFZ and adds to the control 484 plane load. 486 It should be noted that while larger sites may also want to 487 multihome, the cost of renumbering drives some sites to seek PI 488 space, even though they do not multihome. 490 4.4. Acquisitions and Mergers 492 Acquisitions and mergers take place for business reasons, which 493 usually have little to do with the network topologies of the impacted 494 organizations. When a business sells off part of itself, the assets 495 may include networks, attached devices, etc. A company that 496 purchases or merges with other organizations may quickly find that 497 its network assets are numbered out of many different and 498 unaggragatable address blocks. Consequently, an individual 499 organization may find itself unable to announce a single prefix for 500 all of their networks without renumbering a significant portion of 501 its network. 503 Likewise, selling off part of a business may involve selling part of 504 a network as well, resulting in the fragmentation of one address 505 block into two (or more) smaller blocks. Because the resultant 506 blocks belong to different organizations, they can no longer be 507 advertised by a single aggregate and the resultant fragments may need 508 to be advertised individually into the DFZ. 510 4.5. RIR Address Allocation Policies 512 ISPs and multihoming end sites obtain address space from RIRs. As an 513 entity grows, it needs additional address space and requests more 514 from its RIR. In order to be able to obtain additional address space 515 that can be aggregated with the previously-allocated address space, 516 the RIR must keep a reserve of space that the requester can grow into 517 in the future. But any reserved address space cannot be used for any 518 other purpose (i.e., assigned to another organization). Hence, there 519 is an inherent conflict between holding address space in reserve to 520 allow for the future growth of an existing allocation holder and 521 using address space efficiently. In IPv4, there has been a heavy 522 emphasis on conserving address space and obtaining efficient 523 utilization. Consequently, insufficient space has been held in 524 reserve to allow for the growth of all sites and some allocations 525 have had to be made from discontiguous address blocks. That is, some 526 sites have received discontiguous address blocks because their growth 527 needs exceeded the amount of space held in reserve for them. 529 In IPv6, its vast address space allows for a much a greater emphasis 530 to be placed placed on preserving future aggregation than was 531 possible in IPv4. 533 4.6. Dual Stack Pressure on the Routing Table 535 The recommended IPv6 deployment model is dual-stack, where IPv4 and 536 IPv6 are run in parallel across the same links. This has two 537 implications for routing. First, although alternative scenarios are 538 possible, it seems likely that many routers will be supporting both 539 IPv4 and IPv6 simultaneously and will thus be managing both IPv4 and 540 IPv6 routing tables within a single router. Second, for sites 541 connected via both IPv4 and IPv6, both IPv4 and IPv6 prefixes will 542 need to be propagated into the routing system. Consequently, dual- 543 stack routers will maintain both an IPv4 and IPv6 route to reach the 544 same destination. 546 It is possible to make some simple estimates on the approximate size 547 of the IPv6 tables that would be needed if all sites reachable via 548 IPv4 today were also reachable via IPv6. In theory, each autonomous 549 system (AS) needs only a single aggregate route. This provides a 550 lower bound on the size of the fully-realized IPv6 routing table. 551 (As of Feb 2010, [3] states there are 33,548 active ASes in the 552 routing system.) 554 A single IPv6 aggregate will not allow for inbound traffic 555 engineering. End sites will need to advertise a number of smaller 556 prefixes into the DFZ if they desire to gain finer grained control 557 over their IPv6 inbound traffic. This will increase the size of the 558 IPv6 routing table beyond the lower bound discussed above. There is 559 reason to expect the IPv6 routing table will be smaller than the 560 current IPv4 table, however, because the larger initial assignments 561 to end sites will minimize the de-aggregation that occurs when a site 562 must go back to its upstream address provider or RIR and receive a 563 second, non-contiguous assignment. 565 It is possible to extrapolate what the size of the IPv6 Internet 566 routing table would be if widespread IPv6 adoption occurred, from the 567 current IPv4 Internet routing table. Each active AS (33,548) would 568 require at least one aggregate. In addition, the IPv6 Internet table 569 would also carry more-specific prefixes for traffic engineering. 570 Assume that the IPv6 Internet table will carry the same number of 571 more specifics as the IPv4 Internet table. In this case one can take 572 the number of IPv4 Internet routes and subtract the number of CIDR 573 aggregates that they could easily be aggregated down to. As of Feb 574 2010, the 313,626 routes can be easily aggregated down to 193,844 575 CIDR aggregates [3]. That difference yields 119,782 extra more- 576 specific prefixes. Thus if each active AS (33,548) required one 577 aggregate, and an additional 119,782 more specifics were required, 578 then the IPv6 Internet table would be 153,330 prefixes. 580 4.7. Internal Customer Routes 582 In addition to the Internet routing table, networks must also support 583 their internal routing table. Internal routes are defined as more- 584 specific routes that are not advertised to the DFZ. This primarily 585 consists of prefixes that are a more-specific of a provider aggregate 586 (PA) and are assigned to a single-homed customer. The DFZ need only 587 carry the PA aggregate in order to deliver traffic to the provider. 588 However, the provider's routers require the more-specific route to 589 deliver traffic to the end site. 591 Internal routes could also come from more-specific prefixes 592 advertised by multihomed customers with the "no-export" BGP 593 community. This is useful when the fine grained control of traffic 594 to be influenced can be contained to the neighboring network. 596 For a large ISP, the internal IPv4 table can be between 50,000 and 597 150,000 routes. During the dot com boom some ISPs had more internal 598 prefixes than there were in the Internet table. Thus the size of the 599 internal routing table can have significant impact on the scalability 600 and should not be discounted. 602 4.8. IPv4 Address Exhaustion 604 The IANA and RIR free pool of IPv4 addresses will be exhausted within 605 a few years. As the free pool shrinks, the size of the remaining 606 unused blocks will also shrink and unused blocks previously held in 607 reserve for expansion of existing allocations or otherwise not used 608 due to their smaller size will be allocated for use. Consequently, 609 as the community looks to use every piece of available address space 610 (no matter how small) there will be an increasing pressure to 611 advertise additional prefixes in the DFZ. 613 5. Pressures on Control Plane Load 615 This section describes a number of trends and pressures that are 616 contributing to the overall routing load. The previous section 617 described pressures that are increasing the size of the routing 618 table. Even if the size could be bounded, the amount of work needed 619 to maintain paths for a given set of prefixes appears to be 620 increasing. 622 5.1. Interconnection Richness 624 The degree of interconnectedness between ASes has increased in recent 625 years. That is, the Internet as a whole is becoming "flatter" with 626 an increasing number of possible paths interconnecting sites [4]. As 627 the number of possible paths increase, the amount of computation 628 needed to find a best path also increases. This computation comes 629 into effect whenever a change in path characteristics occurs, whether 630 from a new path becoming available, an existing path failing, or a 631 change in the attributes associated with a potential path. Thus, 632 even if the total number of prefixes were to stay constant, an 633 increase in the interconnection richness implies an increase in the 634 resources needed to maintain routing tables. 636 5.2. Multihoming 638 Multihoming places pressure on the routing system in two ways. 639 First, an individual prefix for a multihomed site (whether PI or PA) 640 must be propagated into the routing system, so that other sites can 641 find a good path to the site. Even if the site's prefix comes out of 642 a PA block, an individual prefix for the site needs to be advertised 643 so that the most desirable path to the site can be chosen when the 644 path through the aggregate is sub-optimal. Second, a multihomed site 645 will be connected to the Internet in more than one place, increasing 646 the overall level of interconnection richness. If an outage occurs 647 on any of the circuits connecting the site to the Internet, those 648 changes will be propagated into the routing system. In contrast, a 649 singly-homed site numbered out of a Provider Aggregate places no 650 additional control plane load in the DFZ as the details of the 651 connectivity status to the site are kept internal to the provider to 652 which it connects. 654 5.3. Traffic Engineering 656 The mechanisms used to achieve multihoming and inbound Traffic 657 Engineering are the same. In both cases, a specific prefix is 658 advertised into the routing system to "catch" traffic and route it 659 over a different path than it would otherwise be carried. When 660 multihoming, the specific prefix is one that differs from that of its 661 ISP or is a more-specific of the ISP's PA. Traffic Engineering is 662 achieved by taking one prefix and dividing it into a number of 663 smaller and more-specific ones, and advertising them in order to gain 664 finer-grained control over the paths used to carry traffic covered by 665 those prefixes. 667 Traffic Engineering increases the number of prefixes carried in the 668 routing system. In addition, when a circuit fails (or the routing 669 attributes associated with the circuit change), additional load is 670 placed on the routing system by having multiple prefixes potentially 671 impacted by the change, as opposed to just one. 673 5.4. Questionable Operational Practices? 675 Some operators are believed to engage in operational practices that 676 increase the load on the routing system. 678 5.4.1. Rapid shuffling of prefixes 680 Some networks try to assert fine-grained control of inbound traffic 681 by modifying route announcements frequently in order to migrate 682 traffic to less loaded links quickly. The goal of this is to achieve 683 higher utilization of multiple links. In addition, some route 684 selection devices actively measure link or path utilization and 685 attempt to optimize inbound traffic by withholding or depreferencing 686 certain prefixes in their advertisements. In short, any system that 687 actively measures load and modifies route advertisements in real time 688 increases the load on the routing system, as any change in what is 689 advertised must ripple through the entire routing system. 691 5.4.2. Anti-Route Hijacking 693 In order to reduce the threat of accidental (or intentional) 694 hijacking of its address space by an unauthorized third party, some 695 sites advertise their space as a set of smaller prefixes rather than 696 as one aggregate. That way, if someone else advertised a path for 697 the larger aggregate (or a small piece of the aggregate), it will be 698 ignored in favor of the more-specific announcements. This increases 699 both the number of prefixes advertised, and the number of updates. 701 5.4.3. Operational Ignorance 703 It is believed that some undesirable practices result from operator 704 ignorance, where the operator is unaware of what they are doing and 705 the impact that has on the DFZ. 707 The default behavior of most BGP configurations is to automatically 708 propagate all learned routes. That is, one must take explicit 709 configuration steps to prevent the automatic propagation of learned 710 routes. In addition, it is often significant work to figure out how 711 to (safely) aggregate routes (and which ones to aggregate) in order 712 to reduce the number of advertisements propagated elsewhere. While 713 vendors could provide additional configuration "knobs" to reduce 714 leakage, the implementation of additional features increases 715 complexity and some operators may fear that the new configuration 716 will break their existing routing setup. Finally, leaking routes 717 unnecessarily does not generally harm those responsible for the 718 misconfiguration, hence, there may be little incentive to change such 719 behavior. 721 5.5. RIR Policy 723 RIR address policy has direct impact on the control plane load 724 because address policy determines who is eligible for a PI assignment 725 (which impacts how many are given out in practice) and the size of 726 the assignment (which impacts how much address space can be 727 aggregated within a single assignment). If PI assignments for end 728 sites did not exist, then those end sites would not advertise their 729 own prefix directly into the global routing system; instead their 730 address block would be covered by their provider's aggregate. That 731 said, RIRs have adopted PI policies in response to community demand, 732 for reasons described elsewhere (e.g., to support multihoming and to 733 avoid the need to renumber). In short, RIR policy can be seen as a 734 symptom rather than a root cause. 736 6. Summary 738 As discussed in previous sections, in the current operating 739 environment, an ISP may experience an overall increase in the routing 740 load due entirely to external factors outside of its control. These 741 external pressures can make it increasingly difficult for ISPs to 742 recover control plane related costs associated with the growth of the 743 Internet. Moreover, real business and user needs are creating 744 increasing pressure to use techniques that increase the control plane 745 load for ISPs operating within the DFZ. While the system largely 746 works today, there is a real risk that the current cost and incentive 747 structures will be unable to keep control plane costs manageable 748 (within the context of then-available routing hardware) over the next 749 decades. The Internet would strongly benefit from a routing and 750 addressing model designed with this in mind. Thus, in the absence of 751 a business model that better supports such cost recovery, there is a 752 need for an approach to routing and addressing that fulfils the 753 following criteria: 755 1. Provides sufficient benefits to the party bearing the costs of 756 deploying and maintaining the technology to recover the cost for 757 doing so. 759 2. Reduces the growth rate of the DFZ control plane load. In the 760 current architecture, this is dominated by the routing, which is 761 dependent on: 763 A. The number of individual prefixes in the DFZ 765 B. The update rate associated with those prefixes. 767 Any change to the control plane architecture must result in a 768 reduction in the overall control plane load, and shouldn't simply 769 shift the load from one place in the system to another, without 770 reducing the overall load as a whole. 772 3. Allows any end site wishing to multihome to do so 774 4. Supports ISP and enterprise TE needs 776 5. Allows end sites to switch providers while minimizing 777 configuration changes to internal end site devices. 779 6. Provides end-to-end convergence/restoration of service at least 780 comparable to that provided by the current architecture 782 This document has purposefully been scoped to focus on the growth of 783 the routing control plane load of operating the DFZ. Other problems 784 that may seem related, but do not directly impact on route scaling 785 are not considered to be "in scope" at this time. For example, 786 Mobile IP [RFC3344] [RFC3775] and NEMO [RFC3963] place no pressures 787 on the routing system. They are layered on top of existing IP, using 788 tunneling to forward packets via a care-of addresses. Hence, 789 "improving" these technologies (e.g., by having them leverage a 790 solution to the multihoming problem), while a laudable goal, is not 791 considered a necessary goal. 793 7. Security Considerations 795 This document does not introduce any security considerations. 797 8. IANA Considerations 799 This document contains no IANA actions. 801 9. Acknowledgments 803 The initial version of this document was produced by the Routing and 804 Addressing Directorate (http://www.ietf.org/IESG/content/radir.html). 805 The membership of the directorate at that time included Marla 806 Azinger, Vince Fuller, Vijay Gill, Thomas Narten, Erik Nordmark, 807 Jason Schiller, Peter Schoenmaker, and John Scudder. 809 Comments should be sent to rrg@iab.org or to radir@ietf.org. 811 10. Informative References 813 [RFC3344] Perkins, C., "IP Mobility Support for IPv4", RFC 3344, 814 August 2002. 816 [RFC3775] Johnson, D., Perkins, C., and J. Arkko, "Mobility Support 817 in IPv6", RFC 3775, June 2004. 819 [RFC3963] Devarapalli, V., Wakikawa, R., Petrescu, A., and P. 820 Thubert, "Network Mobility (NEMO) Basic Support Protocol", 821 RFC 3963, January 2005. 823 [RFC4116] Abley, J., Lindqvist, K., Davies, E., Black, B., and V. 824 Gill, "IPv4 Multihoming Practices and Limitations", 825 RFC 4116, July 2005. 827 [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing 828 (CIDR): The Internet Address Assignment and Aggregation 829 Plan", BCP 122, RFC 4632, August 2006. 831 [RFC4984] Meyer, D., Zhang, L., and K. Fall, "Report from the IAB 832 Workshop on Routing and Addressing", RFC 4984, 833 September 2007. 835 [1] 837 [2] 843 [3] 845 [4] 847 Author's Address 849 Thomas Narten 850 IBM 852 Email: narten@us.ibm.com