idnits 2.17.1 

draft-narten-radir-problem-statement-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 17, 2010) is 5172 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'GSE' is mentioned on line 109, but not defined

  == Missing Reference: 'ROAD' is mentioned on line 109, but not defined

  -- Obsolete informational reference (is this intentional?): RFC 3344
     (Obsoleted by RFC 5944)

  -- Obsolete informational reference (is this intentional?): RFC 3775
     (Obsoleted by RFC 6275)


     Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          T. Narten
3	Internet-Draft                                                       IBM
4	Intended status: Informational                         February 17, 2010
5	Expires: August 21, 2010

7	                 On the Scalability of Internet Routing
8	              draft-narten-radir-problem-statement-05.txt

10	Abstract

12	   There has been much discussion over the last years about the overall
13	   scalability of the Internet routing system.  Some have argued that
14	   the resources required to maintain routing tables in the core of the
15	   Internet are growing faster than available technology will be able to
16	   keep up.  Others disagree with that assessment.  This document
17	   attempts to describe the factors that are placing pressure on the
18	   routing system and the growth trends behind those factors.

20	Status of this Memo

22	   This Internet-Draft is submitted to IETF in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF), its areas, and its working groups.  Note that
27	   other groups may also distribute working documents as Internet-
28	   Drafts.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   The list of current Internet-Drafts can be accessed at
36	   http://www.ietf.org/ietf/1id-abstracts.txt.

38	   The list of Internet-Draft Shadow Directories can be accessed at
39	   http://www.ietf.org/shadow.html.

41	   This Internet-Draft will expire on August 21, 2010.

43	Copyright Notice

45	   Copyright (c) 2010 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the BSD License.

58	   This document may contain material from IETF Documents or IETF
59	   Contributions published or made publicly available before November
60	   10, 2008.  The person(s) controlling the copyright in some of this
61	   material may not have granted the IETF Trust the right to allow
62	   modifications of such material outside the IETF Standards Process.
63	   Without obtaining an adequate license from the person(s) controlling
64	   the copyright in such materials, this document may not be modified
65	   outside the IETF Standards Process, and derivative works of it may
66	   not be created outside the IETF Standards Process, except to format
67	   it for publication as an RFC or to translate it into languages other
68	   than English.

70	Table of Contents

72	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
73	   2.  Terms and Definitions  . . . . . . . . . . . . . . . . . . . .  4
74	   3.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  6
75	     3.1.  Technical Aspects  . . . . . . . . . . . . . . . . . . . .  7
76	     3.2.  Business Considerations  . . . . . . . . . . . . . . . . .  7
77	     3.3.  Alignment of Incentives  . . . . . . . . . . . . . . . . .  8
78	     3.4.  Table Growth Targets . . . . . . . . . . . . . . . . . . .  9
79	   4.  Pressures on Routing Table Size  . . . . . . . . . . . . . . . 10
80	     4.1.  Traffic Engineering  . . . . . . . . . . . . . . . . . . . 10
81	     4.2.  Multihoming  . . . . . . . . . . . . . . . . . . . . . . . 11
82	     4.3.  End Site Renumbering . . . . . . . . . . . . . . . . . . . 12
83	     4.4.  Acquisitions and Mergers . . . . . . . . . . . . . . . . . 12
84	     4.5.  RIR Address Allocation Policies  . . . . . . . . . . . . . 12
85	     4.6.  Dual Stack Pressure on the Routing Table . . . . . . . . . 13
86	     4.7.  Internal Customer Routes . . . . . . . . . . . . . . . . . 14
87	     4.8.  IPv4 Address Exhaustion  . . . . . . . . . . . . . . . . . 14
88	   5.  Pressures on Control Plane Load  . . . . . . . . . . . . . . . 15
89	     5.1.  Interconnection Richness . . . . . . . . . . . . . . . . . 15
90	     5.2.  Multihoming  . . . . . . . . . . . . . . . . . . . . . . . 15
91	     5.3.  Traffic Engineering  . . . . . . . . . . . . . . . . . . . 15
92	     5.4.  Questionable Operational Practices?  . . . . . . . . . . . 16
93	       5.4.1.  Rapid shuffling of prefixes  . . . . . . . . . . . . . 16
94	       5.4.2.  Anti-Route Hijacking . . . . . . . . . . . . . . . . . 16
95	       5.4.3.  Operational Ignorance  . . . . . . . . . . . . . . . . 16
96	     5.5.  RIR Policy . . . . . . . . . . . . . . . . . . . . . . . . 17
97	   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
98	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
99	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
100	   9.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 22
101	   10. Informative References . . . . . . . . . . . . . . . . . . . . 23
102	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 24

104	1.  Introduction

106	   Prompted in part by the October, 2006 IAB workshop on Routing &
107	   Addressing [RFC4984], there has been a renewed focus on the topic of
108	   routing scalability within the Internet.  The issue itself is not
109	   new, with discussions dating back at least 10-15 years [GSE, ROAD].

111	   This document attempts to describe the "pain points" being placed on
112	   the routing system, with the aim of describing the essential aspects
113	   so that the community has a way of evaluating whether proposed
114	   changes to the routing system actually address or impact existing
115	   pain points in a significant manner.

117	2.  Terms and Definitions

119	   Control Plane:  The routing setup protocols, their associated state
120	      and the activity needed to create and maintain the data structures
121	      used to forward packets from one network to another.  The term is
122	      defined broadly to include all protocols and activities needed to
123	      construct and maintain the forwarding tables used to forward
124	      packets.

126	   Control Plane Load:  The actual load associated with operating the
127	      Control Plane.  The higher the control plane load, the higher the
128	      cost of operating the control plane (in terms of hardware,
129	      bandwidth, power, etc.).  The terms "routing load" and "control
130	      plane load" are used interchangeably throughout this document.

132	   Control Plane Cost:  The overall cost associated with operating the
133	      Control Plane.  The cost consists of capital costs (for hardware),
134	      bandwidth costs (for the control plane signalling) and any other
135	      ongoing operational cost associated with operating and maintaining
136	      the control plane.

138	   Default Free Zone (DFZ):  That part of the Internet where routers
139	      maintain full routing tables.  Many routers maintain only partial
140	      tables, having explicit routes for "local" destinations (i.e.,
141	      prefixes) plus a "default" for everything else.  For such routers,
142	      building and maintaining routing tables is relatively simple
143	      because the amount of information learned and maintained can be
144	      small.  In contrast, routers in the DFZ maintain complete
145	      information about all reachable destinations, which at the time of
146	      this writing number in the hundreds of thousands of entries.

148	   Routing Information Base (RIB):  The data structures a router
149	      maintains that hold the information about destinations and paths
150	      to those destinations.  The amount of state information maintained
151	      is dependent on a number of factors, including the number of
152	      individual prefixes learned from peers, the number of BGP peers,
153	      the number of distinct paths interconnecting destinations, etc.
154	      In addition to maintaining information about active paths used for
155	      forwarding, the RIB may also include information about unused
156	      ("backup") paths.

158	   Forwarding Information Base (FIB):  The actual table consulted while
159	      making forwarding decisions for individual packets.  The FIB is a
160	      compact, optimized subset of the RIB, containing only the
161	      information needed to actually forward individual packets, i.e.,
162	      mapping a packet's destination address to an outgoing interface
163	      and next-hop.  The FIB only stores information about paths
164	      actually used for forwarding; it typically does not store
165	      information about backup paths.  The FIB is typically constructed
166	      from specialized hardware components, which have different (and
167	      higher) cost properties than the hardware typically used to
168	      maintain the RIB.

170	   Traffic Engineering (TE):  In this document, "traffic engineering"
171	      refers to the current practice of inbound, inter-AS traffic
172	      engineering.  TE is accomplished by injecting additional, more-
173	      specific routes into the routing system and/or increasing the
174	      frequency of routing updates in order to arrange for inbound
175	      traffic at the boundary of an Autonomous system (AS) to travel
176	      over a different path than it otherwise would.

178	   Provider Aggregatable (PA) address space:  Address space that an end
179	      site obtains from an upstream ISP's address block.  The main
180	      benefit of PA address space is that reachability to all of a
181	      provider's customers can be achieved by advertising a single
182	      "provider aggregate" address prefix into the DFZ, rather than
183	      needing to announce individual prefixes for each customer.  An
184	      important disadvantage is that when a customer changes providers,
185	      the customer must renumber their site into addresses belonging to
186	      the new provider and return the previously used addresses to the
187	      former provider.

189	   Provider Independent (PI) address space:  Address space that an end
190	      site obtains directly from a Regional Internet Registry (RIR) for
191	      addressing its devices.  The main advantage (for the end site) is
192	      that it does not need to renumber its site when changing
193	      providers, since it continues to use its PI block.  However, PI
194	      address blocks are not aggregatable and thus each individual PI
195	      assignment results in an individual prefix being injected into the
196	      DFZ.

198	   Site:  Any topologically and administratively distinct entity that
199	      connects to the Internet.  A site can range from a large
200	      enterprise or ISP to a small home site.

202	3.  Background

204	   Within the DFZ, both the size of the RIB and FIB and the overall
205	   update rate have historically increased at a greater than linear
206	   rate.  Specifically:

208	   o  The number of individual prefixes that are being propagated into
209	      the DFZ over time has been and continues to increase at a faster-
210	      than-linear rate.  The term "super-linear" has been used to
211	      characterize the growth.  The exact nature of the growth is much
212	      debated (e.g., quadratic, polynomial, etc.), but growth is clearly
213	      faster than linear.  The reasons behind the rate increase are
214	      varied and discussed below.  Because each individual prefix
215	      requires resources to process, any increase in the number of
216	      prefixes produces a corresponding increase in control plane load
217	      of the routing system.  Each individual prefix that appears in
218	      routing updates requires state in the RIB (and possibly the FIB)
219	      and consumes processing and other resources when updates related
220	      to the prefix are received.

222	   o  The overall rate of routing updates is increasing [1], requiring
223	      routers to process updates at an increased rate or converge more
224	      slowly if they cannot.  The rate increase of the control plane
225	      load is driven by a number of factors (discussed below).  Further
226	      study is needed to better understand the factors behind the
227	      increasing update rate.  For example, it appears that a
228	      disproportionate increase in observed updates originates from a
229	      small percentage of the total number of advertised prefixes.

231	   The super-linear growth in the routing load presents a scalability
232	   challenge for current and/or future routers.  While there appears to
233	   be general agreement that we will be able to build routers (i.e.,
234	   hardware & software) actually capable of handling the control plane
235	   load, both today and going forward, there is considerable debate
236	   about the cost.  In particular, will it be possible for ISPs that
237	   currently (or would like to) maintain routes as part of the DFZ be
238	   able to afford to do so, or will only the largest (and a shrinking
239	   number) of top tier ISPs be able to afford the investment and cost of
240	   operating the control plane while being a part of the DFZ?

242	   Finally, the scalability challenge is aggravated by the lack of any
243	   firm limiting architectural upper-bound on the growth rate of the
244	   routing load and a weakening of social constraints that historically
245	   have helped restrain the growth rate so far.  Going forward, there is
246	   considerable uncertainty (some would say doubt) whether future growth
247	   rates will continue to be sufficiently constrained so that router
248	   development can keep up at an acceptable price point.

250	3.1.  Technical Aspects

252	   The technical challenge of building routers relates to the resources
253	   needed to process a larger and increasingly dynamic amount of routing
254	   information.  More specifically, routers must maintain an increasing
255	   amount of associated state information in the RIB, they must be
256	   capable of populating a growing FIB, they must perform forwarding
257	   lookups at line rates (while accessing the FIB) and they must be able
258	   to initialize the RIB and FIB after system restart.  All of these
259	   activities must take place within acceptable time frames (i.e., paths
260	   for individual destinations must converge and stabilize within an
261	   acceptable time period).  Finally, the hardware needed to achieve
262	   this cannot have unreasonable power consumption or cooling demands.

264	3.2.  Business Considerations

266	   While the IETF does not (and cannot) concern itself with business
267	   models or the profitability of the ISP community, the cost of running
268	   the routing subsystem as a whole is directly influenced by the
269	   routing architecture of the Internet, which clearly is the IETF's
270	   business.  Thus, it is useful to consider the overall business
271	   environment that underlies operation of the DFZ routing
272	   infrastructure.  The DFZ is run entirely by the private sector with
273	   no overall governmental oversight or regulatory framework to oversee
274	   or even influence what routes are propagated where, who must carry
275	   them, etc.  ISPs decide (on their own) which routing updates to
276	   accept and how (if at all) to process them.  Thus, there is no
277	   overall authority that can limit the number of prefixes that are
278	   injected into the DFZ or that insure that any particular prefixes are
279	   accepted at all.  Today, the system functions because the set of
280	   entities that comprise the DFZ are (generally) able to accept the
281	   prefixes that are being advertised and some loose best practices have
282	   emerged that are generally followed (e.g., minimum prefix sizes that
283	   are routed coupled with RIR policies that place limitations on who
284	   may obtain PI prefixes).

286	   In general the Internet would benefit if the cost of the (routing)
287	   infrastructure did not grow too rapidly as the Internet grows, since
288	   a lower infrastructure cost makes it possible to provide Internet
289	   service at a lower cost to a larger number of users.  That said, some
290	   types of Internet growth tie directly to revenue opportunities or
291	   cost savings for an ISP (e.g., adding more users/customers,
292	   increasing bandwidth, technological advances, providing new or
293	   additional services, etc.).  Upgrading or changing infrastructure is
294	   most feasible (and expected) when supported by a workable cost
295	   recovery model.  Hence limiting the cost of self-induced scaling is a
296	   nice-to-have benefit, but not a requirement.

298	   On the other hand, it is problematic when the infrastructure cost for
299	   an ISP grows (rapidly) due to factors outside of its own control,
300	   e.g., resulting from overall Internet growth external to the ISP.  If
301	   an ISP that does not add new customers, upgrade the bandwidth for
302	   their customers, or provide new services needs to upgrade or replace
303	   their infrastructure in unexpected ways, then they have no natural
304	   cost recovery mechanisms.  This is in essence what is happening with
305	   the scaling of the global routing table.  An ISP that is part of the
306	   DFZ may need to upgrade its routers to handle an increased routing
307	   load just to maintain the same level of service with respect to their
308	   current customers and services.

310	   Even if it is technically possible to build routers capable of
311	   meeting the technical and operational requirements, it is also
312	   necessary that the overall cost to build, maintain and deploy such
313	   equipment meet reasonable business expectations.  ISPs, after all,
314	   are run as businesses.  As such, they must be able to plan, develop
315	   and construct viable business plans that provide an acceptable return
316	   on investment (i.e., one acceptable to investors).

318	3.3.  Alignment of Incentives

320	   Today's growth pattern is influenced by the scaling properties of the
321	   current routing system.  If the routing system had better scaling
322	   properties, we would be able support and enable more widespread usage
323	   of such services as multihoming and traffic engineering.  The current
324	   system simply would not be able to handle to the routing load if
325	   everyone were to choose to multihome.  There are millions of
326	   potential end sites that would benefit from being able to multihome.
327	   This compares with a low few hundred thousand prefixes being carried
328	   today.  Broader availability of multihoming is limited by barriers
329	   imposed by operational practices that try to strike a balance between
330	   the amount of multihoming and preservation of routing slots.  It is
331	   desirable that the routing and addressing system exert the least
332	   possible back pressure on end user applications and deployment
333	   scenarios, to enable the broadest possible use of the Internet.

335	   One aspect of the current architecture is a misalignment of cost and
336	   benefit.  Injecting individual prefixes into the DFZ creates a small
337	   amount of "pain" for those routers that are part of the DFZ.  Each
338	   individual prefix adds only a small cost to the routing load, but the
339	   aggregate sum of all prefixes is significant, and leads to the key
340	   issue at hand.  Those that inject prefixes into the DFZ do not
341	   generally pay the cost associated with the individual prefix -- it is
342	   carried by the routers in the DFZ.  But the originator of the prefix
343	   receives the benefit.  Hence, there is misalignment of incentives
344	   between those receiving the benefit and those bearing the cost of
345	   providing the benefit.  Consequently, incentives are not aligned
346	   properly to produce a natural feedback loop to balance the cost and
347	   benefit of maintaining routing tables.

349	3.4.  Table Growth Targets

351	   A precise target for the rate of table size or routing update
352	   increase that should reasonably be supported going forward is
353	   difficult to state in quantitative terms.  One target might simply be
354	   to keep the growth at a stable, but manageable growth rate so that
355	   the increased router functionality can roughly be covered by
356	   improvements in technology (e.g., increased processor speeds,
357	   reductions in component costs, etc.).

359	   However, it is highly desirable to significantly bring down (or even
360	   reverse) the growth rate in order to meet user expectations for
361	   specific services.  As discussed below, there are numerous pressures
362	   to deaggregate routes.  These pressures come from users seeking
363	   specific, tangible service improvements that provide "business-
364	   critical" value.  Today, some of those services simply cannot be
365	   supported to the degree that future demand can reasonably be expected
366	   because of the negative implications on DFZ table growth.  Hence,
367	   valuable services are available to some, but not all potential
368	   customers.  As the need for such services becomes increasingly
369	   important, it will be difficult to deny such services to large
370	   numbers of users, especially when some "lucky" sites are able to use
371	   the service and others are not.

373	4.  Pressures on Routing Table Size

375	   There are a number of factors behind the increase in the quantity of
376	   prefixes appearing in the DFZ.  From a theoretical perspective, the
377	   number of prefixes in the DFZ can be minimized through aggressive
378	   aggregation [RFC4632].  In practice, strict adherence to the CIDR
379	   principles is difficult.

381	4.1.  Traffic Engineering

383	   Traffic engineering (TE) is the act of arranging for certain Internet
384	   traffic to use or avoid certain network paths (that is, TE attempts
385	   to place traffic where capacity exists, or where some set of
386	   parameters of the path is more favorable to the traffic being placed
387	   there).

389	   Outbound TE is typically accomplished by using internal interial
390	   gateway protocol (IGP) metrics to choose the shortest exit for two
391	   equally good BGP paths.  Adjustment of IGP metrics controls how much
392	   traffic flows over different internal paths to specific exit points
393	   for two equally good BGP paths.  Additional traffic can be moved by
394	   applying some policy to depreference or filter certain routes from
395	   specific BGP peers.  Because outbound TE is achieved via a site's own
396	   IGP, outbound TE does not impact routing outside of a site.

398	   Inbound TE is performed by announcing a more-specific route along the
399	   preferred path that "catches" the desired traffic and channels it
400	   away from the path it would take otherwise (i.e., via a larger
401	   aggregate).  At the BGP level, if the address range requiring TE is a
402	   portion of a larger address aggregate, network operators implementing
403	   TE are forced to de-aggregate otherwise aggregatable prefixes in
404	   order to steer the traffic of the particular address range to
405	   specific paths.

407	   TE is performed by both ISPs and customer networks, for three primary
408	   reasons:

410	   o  to match traffic with network capacity, or to spread the traffic
411	      load across multiple links (frequently referred to as "load
412	      balancing")

414	   o  to reduce costs by shifting traffic to lower cost paths or by
415	      balancing the incoming and outgoing traffic volume to maintain
416	      appropriate peering relations

418	   o  to enforce certain forms of policy (e.g., to prevent government
419	      traffic from transiting through other countries)

421	   TE impacts route scaling in two ways.  First, inbound TE can result
422	   in additional prefixes being advertised into the DFZ.  Second,
423	   Network operators usually achieve traffic engineering by "tweaking"
424	   the processing of routing protocols to achieve desired results, e.g.,
425	   by sending updates at an increased rate.  In addition, some devices
426	   attempt to automatically find better paths and then advertise those
427	   preferences through BGP, though the extent to which such tools are in
428	   use and contributing to the control plane load is unknown.

430	   In today's highly competitive environment, providers require TE to
431	   maintain good performance and low cost in their networks.

433	4.2.  Multihoming

435	   Multihoming refers generically to the case in which a site is served
436	   by more than one ISP [RFC4116].  Multihoming is used to provide
437	   backup paths (i.e., to remove single points of failure), to achieve
438	   load-sharing, and to achieve policy or performance objectives (e.g.,
439	   to use lower latency or higher bandwidth paths).  Multihoming may
440	   also be a requirement due to contract or law.

442	   Multihoming can be accomplished using either PI or PA address space.
443	   A multihomed site advertises its site prefix into the routing system
444	   of each of its providers.  For PI space, the site's PI space is used,
445	   and the prefix is propagated throughout the DFZ.  For PA space, the
446	   PA site prefix may (or may not) be propagated throughout the DFZ,
447	   with the details depending on what type of multihoming is sought.

449	   If the site uses PA space, the PA site prefix allocated from one of
450	   its providers (whom we'll call the Primary Provider) is used.  The PA
451	   site prefix will be aggregatable by the Primary Provider but not the
452	   others.  To achieve multihoming with comparable properties to that
453	   when PI addresses are used as described above, the PA site prefix
454	   will need to be injected into the routing system of all of its ISPs,
455	   and throughout the DFZ.  In addition, because of the longest-match
456	   forwarding rule, the Primary Provider must advertise both its
457	   aggregate and the individual PA site prefix; otherwise, the path via
458	   the primary provider (as advertised via the aggregate) will never be
459	   selected due to the longest match rule.  For the type of multihoming
460	   described here, where the PA site prefix is propagated throughout the
461	   DFZ, the use of PI vs. PA space has no impact on the control plane
462	   load.  The increased load is due entirely to the need to propagate
463	   the site's individual prefix throughout the DFZ.

465	   The demand for multihoming is increasing [2].  The increase in
466	   multihoming demand is due to the increased reliance on the Internet
467	   for mission and business-critical applications (where businesses
468	   require 7x24 availability for their services) and the general
469	   decrease in cost of Internet connectivity.

471	4.3.  End Site Renumbering

473	   It is generally considered painful and costly to renumber a site,
474	   with the cost proportional to the size and complexity of the network
475	   and most importantly, to the degree that addresses are stored in
476	   places that are difficult in practice to update.  When using PA
477	   space, a site must renumber when changing providers.  Larger sites
478	   object to this cost and view the requirement to renumber akin to
479	   being held "hostage" to the provider from which PA space was
480	   obtained.  Consequently, many sites desire PI space.  Having PI space
481	   provides independence from any one provider and makes it easier to
482	   switch providers (for whatever reason).  However, each individual PI
483	   prefix must be propagated throughout the DFZ and adds to the control
484	   plane load.

486	   It should be noted that while larger sites may also want to
487	   multihome, the cost of renumbering drives some sites to seek PI
488	   space, even though they do not multihome.

490	4.4.  Acquisitions and Mergers

492	   Acquisitions and mergers take place for business reasons, which
493	   usually have little to do with the network topologies of the impacted
494	   organizations.  When a business sells off part of itself, the assets
495	   may include networks, attached devices, etc.  A company that
496	   purchases or merges with other organizations may quickly find that
497	   its network assets are numbered out of many different and
498	   unaggragatable address blocks.  Consequently, an individual
499	   organization may find itself unable to announce a single prefix for
500	   all of their networks without renumbering a significant portion of
501	   its network.

503	   Likewise, selling off part of a business may involve selling part of
504	   a network as well, resulting in the fragmentation of one address
505	   block into two (or more) smaller blocks.  Because the resultant
506	   blocks belong to different organizations, they can no longer be
507	   advertised by a single aggregate and the resultant fragments may need
508	   to be advertised individually into the DFZ.

510	4.5.  RIR Address Allocation Policies

512	   ISPs and multihoming end sites obtain address space from RIRs.  As an
513	   entity grows, it needs additional address space and requests more
514	   from its RIR.  In order to be able to obtain additional address space
515	   that can be aggregated with the previously-allocated address space,
516	   the RIR must keep a reserve of space that the requester can grow into
517	   in the future.  But any reserved address space cannot be used for any
518	   other purpose (i.e., assigned to another organization).  Hence, there
519	   is an inherent conflict between holding address space in reserve to
520	   allow for the future growth of an existing allocation holder and
521	   using address space efficiently.  In IPv4, there has been a heavy
522	   emphasis on conserving address space and obtaining efficient
523	   utilization.  Consequently, insufficient space has been held in
524	   reserve to allow for the growth of all sites and some allocations
525	   have had to be made from discontiguous address blocks.  That is, some
526	   sites have received discontiguous address blocks because their growth
527	   needs exceeded the amount of space held in reserve for them.

529	   In IPv6, its vast address space allows for a much a greater emphasis
530	   to be placed placed on preserving future aggregation than was
531	   possible in IPv4.

533	4.6.  Dual Stack Pressure on the Routing Table

535	   The recommended IPv6 deployment model is dual-stack, where IPv4 and
536	   IPv6 are run in parallel across the same links.  This has two
537	   implications for routing.  First, although alternative scenarios are
538	   possible, it seems likely that many routers will be supporting both
539	   IPv4 and IPv6 simultaneously and will thus be managing both IPv4 and
540	   IPv6 routing tables within a single router.  Second, for sites
541	   connected via both IPv4 and IPv6, both IPv4 and IPv6 prefixes will
542	   need to be propagated into the routing system.  Consequently, dual-
543	   stack routers will maintain both an IPv4 and IPv6 route to reach the
544	   same destination.

546	   It is possible to make some simple estimates on the approximate size
547	   of the IPv6 tables that would be needed if all sites reachable via
548	   IPv4 today were also reachable via IPv6.  In theory, each autonomous
549	   system (AS) needs only a single aggregate route.  This provides a
550	   lower bound on the size of the fully-realized IPv6 routing table.
551	   (As of Feb 2010, [3] states there are 33,548 active ASes in the
552	   routing system.)

554	   A single IPv6 aggregate will not allow for inbound traffic
555	   engineering.  End sites will need to advertise a number of smaller
556	   prefixes into the DFZ if they desire to gain finer grained control
557	   over their IPv6 inbound traffic.  This will increase the size of the
558	   IPv6 routing table beyond the lower bound discussed above.  There is
559	   reason to expect the IPv6 routing table will be smaller than the
560	   current IPv4 table, however, because the larger initial assignments
561	   to end sites will minimize the de-aggregation that occurs when a site
562	   must go back to its upstream address provider or RIR and receive a
563	   second, non-contiguous assignment.

565	   It is possible to extrapolate what the size of the IPv6 Internet
566	   routing table would be if widespread IPv6 adoption occurred, from the
567	   current IPv4 Internet routing table.  Each active AS (33,548) would
568	   require at least one aggregate.  In addition, the IPv6 Internet table
569	   would also carry more-specific prefixes for traffic engineering.
570	   Assume that the IPv6 Internet table will carry the same number of
571	   more specifics as the IPv4 Internet table.  In this case one can take
572	   the number of IPv4 Internet routes and subtract the number of CIDR
573	   aggregates that they could easily be aggregated down to.  As of Feb
574	   2010, the 313,626 routes can be easily aggregated down to 193,844
575	   CIDR aggregates [3].  That difference yields 119,782 extra more-
576	   specific prefixes.  Thus if each active AS (33,548) required one
577	   aggregate, and an additional 119,782 more specifics were required,
578	   then the IPv6 Internet table would be 153,330 prefixes.

580	4.7.  Internal Customer Routes

582	   In addition to the Internet routing table, networks must also support
583	   their internal routing table.  Internal routes are defined as more-
584	   specific routes that are not advertised to the DFZ.  This primarily
585	   consists of prefixes that are a more-specific of a provider aggregate
586	   (PA) and are assigned to a single-homed customer.  The DFZ need only
587	   carry the PA aggregate in order to deliver traffic to the provider.
588	   However, the provider's routers require the more-specific route to
589	   deliver traffic to the end site.

591	   Internal routes could also come from more-specific prefixes
592	   advertised by multihomed customers with the "no-export" BGP
593	   community.  This is useful when the fine grained control of traffic
594	   to be influenced can be contained to the neighboring network.

596	   For a large ISP, the internal IPv4 table can be between 50,000 and
597	   150,000 routes.  During the dot com boom some ISPs had more internal
598	   prefixes than there were in the Internet table.  Thus the size of the
599	   internal routing table can have significant impact on the scalability
600	   and should not be discounted.

602	4.8.  IPv4 Address Exhaustion

604	   The IANA and RIR free pool of IPv4 addresses will be exhausted within
605	   a few years.  As the free pool shrinks, the size of the remaining
606	   unused blocks will also shrink and unused blocks previously held in
607	   reserve for expansion of existing allocations or otherwise not used
608	   due to their smaller size will be allocated for use.  Consequently,
609	   as the community looks to use every piece of available address space
610	   (no matter how small) there will be an increasing pressure to
611	   advertise additional prefixes in the DFZ.

613	5.  Pressures on Control Plane Load

615	   This section describes a number of trends and pressures that are
616	   contributing to the overall routing load.  The previous section
617	   described pressures that are increasing the size of the routing
618	   table.  Even if the size could be bounded, the amount of work needed
619	   to maintain paths for a given set of prefixes appears to be
620	   increasing.

622	5.1.  Interconnection Richness

624	   The degree of interconnectedness between ASes has increased in recent
625	   years.  That is, the Internet as a whole is becoming "flatter" with
626	   an increasing number of possible paths interconnecting sites [4].  As
627	   the number of possible paths increase, the amount of computation
628	   needed to find a best path also increases.  This computation comes
629	   into effect whenever a change in path characteristics occurs, whether
630	   from a new path becoming available, an existing path failing, or a
631	   change in the attributes associated with a potential path.  Thus,
632	   even if the total number of prefixes were to stay constant, an
633	   increase in the interconnection richness implies an increase in the
634	   resources needed to maintain routing tables.

636	5.2.  Multihoming

638	   Multihoming places pressure on the routing system in two ways.
639	   First, an individual prefix for a multihomed site (whether PI or PA)
640	   must be propagated into the routing system, so that other sites can
641	   find a good path to the site.  Even if the site's prefix comes out of
642	   a PA block, an individual prefix for the site needs to be advertised
643	   so that the most desirable path to the site can be chosen when the
644	   path through the aggregate is sub-optimal.  Second, a multihomed site
645	   will be connected to the Internet in more than one place, increasing
646	   the overall level of interconnection richness.  If an outage occurs
647	   on any of the circuits connecting the site to the Internet, those
648	   changes will be propagated into the routing system.  In contrast, a
649	   singly-homed site numbered out of a Provider Aggregate places no
650	   additional control plane load in the DFZ as the details of the
651	   connectivity status to the site are kept internal to the provider to
652	   which it connects.

654	5.3.  Traffic Engineering

656	   The mechanisms used to achieve multihoming and inbound Traffic
657	   Engineering are the same.  In both cases, a specific prefix is
658	   advertised into the routing system to "catch" traffic and route it
659	   over a different path than it would otherwise be carried.  When
660	   multihoming, the specific prefix is one that differs from that of its
661	   ISP or is a more-specific of the ISP's PA.  Traffic Engineering is
662	   achieved by taking one prefix and dividing it into a number of
663	   smaller and more-specific ones, and advertising them in order to gain
664	   finer-grained control over the paths used to carry traffic covered by
665	   those prefixes.

667	   Traffic Engineering increases the number of prefixes carried in the
668	   routing system.  In addition, when a circuit fails (or the routing
669	   attributes associated with the circuit change), additional load is
670	   placed on the routing system by having multiple prefixes potentially
671	   impacted by the change, as opposed to just one.

673	5.4.  Questionable Operational Practices?

675	   Some operators are believed to engage in operational practices that
676	   increase the load on the routing system.

678	5.4.1.  Rapid shuffling of prefixes

680	   Some networks try to assert fine-grained control of inbound traffic
681	   by modifying route announcements frequently in order to migrate
682	   traffic to less loaded links quickly.  The goal of this is to achieve
683	   higher utilization of multiple links.  In addition, some route
684	   selection devices actively measure link or path utilization and
685	   attempt to optimize inbound traffic by withholding or depreferencing
686	   certain prefixes in their advertisements.  In short, any system that
687	   actively measures load and modifies route advertisements in real time
688	   increases the load on the routing system, as any change in what is
689	   advertised must ripple through the entire routing system.

691	5.4.2.  Anti-Route Hijacking

693	   In order to reduce the threat of accidental (or intentional)
694	   hijacking of its address space by an unauthorized third party, some
695	   sites advertise their space as a set of smaller prefixes rather than
696	   as one aggregate.  That way, if someone else advertised a path for
697	   the larger aggregate (or a small piece of the aggregate), it will be
698	   ignored in favor of the more-specific announcements.  This increases
699	   both the number of prefixes advertised, and the number of updates.

701	5.4.3.  Operational Ignorance

703	   It is believed that some undesirable practices result from operator
704	   ignorance, where the operator is unaware of what they are doing and
705	   the impact that has on the DFZ.

707	   The default behavior of most BGP configurations is to automatically
708	   propagate all learned routes.  That is, one must take explicit
709	   configuration steps to prevent the automatic propagation of learned
710	   routes.  In addition, it is often significant work to figure out how
711	   to (safely) aggregate routes (and which ones to aggregate) in order
712	   to reduce the number of advertisements propagated elsewhere.  While
713	   vendors could provide additional configuration "knobs" to reduce
714	   leakage, the implementation of additional features increases
715	   complexity and some operators may fear that the new configuration
716	   will break their existing routing setup.  Finally, leaking routes
717	   unnecessarily does not generally harm those responsible for the
718	   misconfiguration, hence, there may be little incentive to change such
719	   behavior.

721	5.5.  RIR Policy

723	   RIR address policy has direct impact on the control plane load
724	   because address policy determines who is eligible for a PI assignment
725	   (which impacts how many are given out in practice) and the size of
726	   the assignment (which impacts how much address space can be
727	   aggregated within a single assignment).  If PI assignments for end
728	   sites did not exist, then those end sites would not advertise their
729	   own prefix directly into the global routing system; instead their
730	   address block would be covered by their provider's aggregate.  That
731	   said, RIRs have adopted PI policies in response to community demand,
732	   for reasons described elsewhere (e.g., to support multihoming and to
733	   avoid the need to renumber).  In short, RIR policy can be seen as a
734	   symptom rather than a root cause.

736	6.  Summary

738	   As discussed in previous sections, in the current operating
739	   environment, an ISP may experience an overall increase in the routing
740	   load due entirely to external factors outside of its control.  These
741	   external pressures can make it increasingly difficult for ISPs to
742	   recover control plane related costs associated with the growth of the
743	   Internet.  Moreover, real business and user needs are creating
744	   increasing pressure to use techniques that increase the control plane
745	   load for ISPs operating within the DFZ.  While the system largely
746	   works today, there is a real risk that the current cost and incentive
747	   structures will be unable to keep control plane costs manageable
748	   (within the context of then-available routing hardware) over the next
749	   decades.  The Internet would strongly benefit from a routing and
750	   addressing model designed with this in mind.  Thus, in the absence of
751	   a business model that better supports such cost recovery, there is a
752	   need for an approach to routing and addressing that fulfils the
753	   following criteria:

755	   1.  Provides sufficient benefits to the party bearing the costs of
756	       deploying and maintaining the technology to recover the cost for
757	       doing so.

759	   2.  Reduces the growth rate of the DFZ control plane load.  In the
760	       current architecture, this is dominated by the routing, which is
761	       dependent on:

763	       A.  The number of individual prefixes in the DFZ

765	       B.  The update rate associated with those prefixes.

767	       Any change to the control plane architecture must result in a
768	       reduction in the overall control plane load, and shouldn't simply
769	       shift the load from one place in the system to another, without
770	       reducing the overall load as a whole.

772	   3.  Allows any end site wishing to multihome to do so

774	   4.  Supports ISP and enterprise TE needs

776	   5.  Allows end sites to switch providers while minimizing
777	       configuration changes to internal end site devices.

779	   6.  Provides end-to-end convergence/restoration of service at least
780	       comparable to that provided by the current architecture

782	   This document has purposefully been scoped to focus on the growth of
783	   the routing control plane load of operating the DFZ.  Other problems
784	   that may seem related, but do not directly impact on route scaling
785	   are not considered to be "in scope" at this time.  For example,
786	   Mobile IP [RFC3344] [RFC3775] and NEMO [RFC3963] place no pressures
787	   on the routing system.  They are layered on top of existing IP, using
788	   tunneling to forward packets via a care-of addresses.  Hence,
789	   "improving" these technologies (e.g., by having them leverage a
790	   solution to the multihoming problem), while a laudable goal, is not
791	   considered a necessary goal.

793	7.  Security Considerations

795	   This document does not introduce any security considerations.

797	8.  IANA Considerations

799	   This document contains no IANA actions.

801	9.  Acknowledgments

803	   The initial version of this document was produced by the Routing and
804	   Addressing Directorate (http://www.ietf.org/IESG/content/radir.html).
805	   The membership of the directorate at that time included Marla
806	   Azinger, Vince Fuller, Vijay Gill, Thomas Narten, Erik Nordmark,
807	   Jason Schiller, Peter Schoenmaker, and John Scudder.

809	   Comments should be sent to rrg@iab.org or to radir@ietf.org.

811	10.  Informative References

813	   [RFC3344]  Perkins, C., "IP Mobility Support for IPv4", RFC 3344,
814	              August 2002.

816	   [RFC3775]  Johnson, D., Perkins, C., and J. Arkko, "Mobility Support
817	              in IPv6", RFC 3775, June 2004.

819	   [RFC3963]  Devarapalli, V., Wakikawa, R., Petrescu, A., and P.
820	              Thubert, "Network Mobility (NEMO) Basic Support Protocol",
821	              RFC 3963, January 2005.

823	   [RFC4116]  Abley, J., Lindqvist, K., Davies, E., Black, B., and V.
824	              Gill, "IPv4 Multihoming Practices and Limitations",
825	              RFC 4116, July 2005.

827	   [RFC4632]  Fuller, V. and T. Li, "Classless Inter-domain Routing
828	              (CIDR): The Internet Address Assignment and Aggregation
829	              Plan", BCP 122, RFC 4632, August 2006.

831	   [RFC4984]  Meyer, D., Zhang, L., and K. Fall, "Report from the IAB
832	              Workshop on Routing and Addressing", RFC 4984,
833	              September 2007.

835	   [1]  <http://www3.ietf.org/proceedings/06mar/slides/grow-3.pdf>

837	   [2]  <http://www.cidr-report.org/as2.0/, http://www.cidr-report.org/
838	        cgi-bin/
839	        plota?file=%2fvar%2fdata%2fbgp%2fas2.0%2fbgp%2das%2dcount%2etxt&
840	        descr=Unique%20ASes&ylabel=Unique%20ASes&with=step,http://
841	        www.potaroo.net/tools/asn32/>

843	   [3]  <http://www.cidr-report.org/as2.0/>

845	   [4]  <http://www.potaroo.net/bgprpts/bgp-average-aspath-length.png>

847	Author's Address

849	   Thomas Narten
850	   IBM

852	   Email: narten@us.ibm.com