idnits 2.17.1 

draft-ietf-ipngwg-gseaddr-00.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-27) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** There is 1 instance of too long lines in the document, the longest one
     being 1 character in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 559 has weird spacing: '...   is  designa...'

  == Line 561 has weird spacing: '...ntified  by  o...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

     No issues found here.

     Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        Mike O'Dell
3	Internet-Draft                                        UUNET Technologies
4	                                                  1997/02/24 01:32:32GMT

6	          GSE - An Alternate Addressing Architecture for IPv6

8	                    <draft-ietf-ipngwg-gseaddr-00.txt>

10	1. Status of this Memo

12	   This document is an Internet-Draft.  Internet-Drafts are working
13	   documents of the Internet Engineering Task Force (IETF), its areas,
14	   and its working groups. Note that other groups may also distribute
15	   working documents as Internet-Drafts.

17	   Internet-Drafts are draft documents valid for a maximum of six months
18	   and may be updated, replaced, or obsoleted by other documents at any
19	   time.  It is inappropriate to use Internet-Drafts as reference
20	   material or to cite them other than as ``work in progress.''

22	   To learn the current status of any Internet-Draft, please check the
23	   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
24	   Directories on ftp.is.co.za (Africa) , nic.nordu.net (Europe),
25	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast ), or
26	   ftp.isi.edu (US West Coast).

28	2. Abstract

30	   This document presents an alternative addressing architecture for
31	   IPv6 which controls global routing growth by very aggressive
32	   topological aggregation. It includes support for scalable multi-
33	   homing as a distinguished service.  It provides for future
34	   independent evolution of routing and forwarding models with
35	   essentially no impact on end systems.  Finally, it frees sites and
36	   service resellers from the tyranny of CIDR-based aggregation by
37	   providing transparent re-homing of both.

39	3. Introduction

41	   This alternative IPv6 addressing architecture addresses several
42	   scalability issues with the current IPv6 addressing proposals.

44	           Scaling of the global route computation

46	           Ease of re-homing (both leaf Sites and upstream Resellers)

48	           Economic scalability of of Multi-homing

50	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

52	   The current IPv6 addressing proposals address route and topology
53	   aggregation by continuing to rely on CIDR-style "Provider-based
54	   Addressing" coupled with a powerful new dynamic address assignment
55	   mechanism which is intended to make renumbering more palatable.

57	   However, CIDR-style provider-based aggregation breaks down in the
58	   face of the accelerating growth of multi-homed sites (leaf sites or
59	   regional networks).  Worse, renumbering an entire Site to accomplish
60	   a simple topological re-homing such as changing ISPs is a problem
61	   whose magnitude can only grow over time. It will remain increasingly
62	   difficult to explain this renumbering requirement to customers with
63	   the spectre of a complete failure of this aggregation approach a
64	   distinct possibility.

66	   While the large IPv6 addresses provide for a huge increase in the
67	   number of end systems which can be accommodated, it also portends a
68	   huge increase in the number of routes required to reach them. Even if
69	   CIDR aggregation were to continue at current levels (maintaining
70	   current efficiency is relatively unlikely), this still presents a
71	   serious problem for the growth of the the global route computations.

73	   This document presents a new proposal for using the 16 byte IPv6
74	   address which mitigates the route scaling problem and with it a
75	   number of collateral issues.  This model provides for aggressive
76	   topological aggregation while controlling the complexity of flat-
77	   routed regions.  It exploits and supports the dynamic address
78	   assignment machinery in IPv6 but makes the exact role of that
79	   machinery a decision local to a Site.  It is therefore subject to
80	   engineering cost and benefit analysis rather than being mandatory for
81	   simple Site re-homing situations.

83	   This new model also identifies the special work done by the global
84	   Internet infrastructure on behalf of multi-homed sites. Rather than
85	   continuing the current "Tragedy of the Commons", the multi-homing is
86	   isolated into a specific mechanism which is then traceable to and
87	   incurred by only those sites wishing to subscribe to this capability.
88	   Again, this makes it possible for sites to make informed cost-benefit
89	   decisions about multi-homing.

91	4. Central Concepts of the Architecture

93	   The architecture is based upon a few central concepts.

95	           A strong distinction between Public and Private Topology

97	           A strong distinction between system identity and location

99	           GSE - Global, Site, and End-system address elements

101	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

103	           The deep similarity of Re-homing and Multi-homing

105	           Rewriting address prefixes at Site boundaries

107	           Very aggressive hierarchical network topology aggregation

109	           Optimizing actual forwarding paths by limited-scope
110	           cut-throughs

112	   This model draws a strong distinction between the Public Topology
113	   which forms the transit infrastructure of the Global Internet and a
114	   "Site" which can contain a rich but strictly private local network
115	   topology which cannot "leak" into the global routing machinery.  The
116	   Site is the fundamental unit of attachment to the Global Internet and
117	   is therefore strictly a leaf, even if possibly multi-homed.

119	   This model also draws a very strong distinction between the identity
120	   of a computer system and where it attaches to the the Public
121	   Topology.  In IPv4 and current IPv6 models, these notions of identity
122	   and location are deeply co-mingled and this is the fundamental reason
123	   why simple topology changes have such wide-ranging impact on address
124	   assignment (if aggregation is to be maintained at all).

126	   The 16 byte IPv6 address is split into 3 pieces:

128	             0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
129	           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
130	           |  Routing Goop    | STP| End System Designator |
131	           +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
132	                  6+ bytes   ~2 bytes       8 bytes

134	   Routing Goop signifies where the Site attaches to the Global
135	   Internet.  The Site Topology Partition (STP) is Site-private "LAN
136	   segment" information.  The End System Designator (ESD) specifies an
137	   interface on an end-system.

139	   One surprising notion is that re-homing and multi-homing are very
140	   deeply related. Multi-homing can be viewed as rather like several
141	   simultaneous re-homings happening at once.  Achieving both painless
142	   re-homing and scalable multi-homing rely on the same set of
143	   fundamental mechanisms, each with a few distinct details.

145	   Rewriting IPv6 addresses by Site Border Routers is by far the most
146	   controversial, but also most critical part of this proposal.  To
147	   control the complexity of routing information which must be managed
148	   within a Site and to isolate end systems and interior routers from
149	   external topology changes, the RG of some addresses is modified by
150	   Site Border Routers.  Packets exiting a site have the RG for the Site

152	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

154	   egress point inserted into source addresses, while packets entering a
155	   Site have the RG in all destination addresses replaced with a
156	   canonical prefix signifying "within this Site" (the "Site-local
157	   prefix").

159	   One immediate result is that upper-layer protocols must use only the
160	   ESD for purposes such as pseudo-header checksums and the like.  The
161	   ESD is the invariant token, the RG is possibly transient topology
162	   information subject to change.

164	   Topology aggregation is accomplished by partitioning the Global
165	   Internet into a set of tree-shaped regions anchored by "Large
166	   Structures".  The Routing Goop in an address specifies a path from
167	   the root of the tree (the Large Structure) to a point in the
168	   topology; in the terminal case this is a Site.  Large Structures are
169	   chosen by their ability to aggregate topology and no particular
170	   advantage flows from "being one"; actually quite the contrary. Large
171	   Structures are responsible for subdividing the space under them and
172	   managing that delegation.  Large Structures provide a "forwarding
173	   token of last resort" which can always be used for selecting a valid
174	   next-hop when no other information is available.  This significantly
175	   limits the minimally-sufficient information required for a "default-
176	   free" router.  Any additional route information kept is the result of
177	   path optimizations from cut-throughs.

179	   While it is useful to think of the Large Structures as trees, the
180	   collection is actually a DAG (Directed Acyclic Graph) because the
181	   trees can touch each other via cut-throughs.  By cross-propagating
182	   selected details via a cut-through, a locally-controlled region can
183	   learn of alternative paths to some destinations.  The distance this
184	   optimization information is propagated and the radius of the
185	   optimization region advertised are the business of the collaborating
186	   regions.

188	5. The Structure of End System Designators - the ESD

190	   End System Designators denote every computer system in the GSE
191	   Internet regardless of whether it is a host, router, or other network
192	   element.  While a given system can have more than one ESD, each ESD
193	   is globally unique.  This is critical for their utility to the
194	   upper-level protocols.  This uniqueness can be induced several ways
195	   as will be seen.

197	   A crucial design decision is whether an ESD identifies a system,
198	   invariant of its interfaces as in the XNS architecture, or an
199	   interface on a system as in the existing IPv4 and IPv6 architecture.

201	           An ESD designates an interface on a computer system and that

203	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

205	           interface can be either physical or virtual.

207	   When processing a GSE address, a computer system need only examine
208	   the ESD portion of the address to determine whether a packet is
209	   destined for that system.

211	   There are circumstances when it is quite useful to have "an address"
212	   for a computer system which is independent of any particular physical
213	   interface on that system. It has become commonplace in IPv4 practice
214	   to use a distinguished virtual interface to provide a system with
215	   such an "interface independent identity".  This technique affords the
216	   same architectural utility of XNS while still allowing the
217	   flexibility of the IPv4 "addressed interface" model. This model
218	   retains the successful IPv4/IPv6 model.

220	   NOTE: We remain intentionally vague about exactly what constitutes an
221	   "interface" and a "computer system".  The malleability of those
222	   notions in IPv4 has proven manifestly useful in practice.

224	   To summarize the ESD uniqueness characteristics:

226	           (1) an ESD is globally unique
227	           (2) an ESD designates an "interface" on "a computer system"
228	           (3) an Interface may have more than one ESD
229	               (current IPv6 already requires implementations to support
230	               multiple addresses per interface)
231	           (4) an ESD may not necessarily designate a particular
232	               physical computer (Neighbor Discovery continues to provide
233	               a level of virtual address translation and considerable
234	               cleverness can be disguised therein)

236	   There are two forms of ESD, both 8 bytes long, one a subcase of the
237	   other.

239	   It is clear that with the impending onslaught of the IEEE-1394
240	   technology that 8-byte IEEE MAC addresses are simply fait accompli
241	   and many devices will be provided with a unique identity in that
242	   format at the time of manufacture.  The 8-byte IEEE MAC Address
243	   format includes the current 6-byte MAC Addresses as a proper
244	   subspace.  Using the 8-byte IEEE MAC address will be very convenient
245	   for many network builders.

247	   There are at least two issues with using *only* the IEEE 8-byte MAC
248	   addresses as ESDs:  There are point-to-point link interfaces which
249	   have no IEEE MAC address assigned for them, and the 8-byte IEEE MAC
250	   addresses assigned to the interfaces of a system are essentially
251	   random.  For some, there is also the issue of whether the IEEE MAC
252	   address is "unique enough" for the purposes at hand.

254	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

256	   We clearly need a space for generating ESDs for interfaces which
257	   don't come equipped with one.  Some have also suggested there might
258	   be great utility in enabling inverse lookups on just the ESD part of
259	   an address.  Assigning ESDs in semantic clusters (like current IPv4
260	   addresses) would be a signficant aid to this end. Finally if a
261	   network designer decides not to trust the uniqueness of the IEEE MAC
262	   addresses, he could always use the Dynamic Numbering machinery of
263	   IPv6 to assign ESDs.

265	   We propose that the IETF seek a large (7 bytes or greater) subspace
266	   of the IEEE 8-byte MAC space for allocation as IETF-NodeIDs in
267	   semantic clusters to provide a pool of addresses which can be used
268	   for any of the above reasons, as required.  However, it is expected
269	   that most network builders will exploit the intrinsic IEEE MAC
270	   addresses present in many network interfaces whenever possible.

272	   The IETF-NodeID space should be partitioned into two regions - one
273	   exactly isomorphic to the existing IPv4 address space to provide
274	   instant grandfathering of IPv4 addresses, and another space which is
275	   simply larger but allocated in a similar manner.

277	   A few comments on "global uniqueness" are in order because in
278	   previous discussions, some have asserted that unless "uniqueness" can
279	   be accomplished with absolute and complete mathematical perfection,
280	   any scheme using the concept is unworkable.  This extreme view
281	   inconsistent with mass-market experience.

283	   IEEE MAC addresses are globally unique by nature of the delegation
284	   process where they are assigned to interfaces by the manufacturers.
285	   Both XNS and IPX rely on this uniqueness and it works very well in
286	   practice.  IETF-NodeID values will be globally unique by nature of
287	   the same kind of assignment mechanism.  IPv4 addresses must be
288	   globally unique for the Internet to function, and it does, mostly, by
289	   nature of exactly the same kind of assignment mechanism.

291	   While accidents and manufacturing defects do occasionally violate the
292	   uniqueness of IEEE MAC address assignment, humans routinely make
293	   errors in assigning IPv4 addresses to systems with equally mystifying
294	   results.  Given the reliance of IEEE-1394 Firewire interconnects on
295	   these unique MAC addresses, it is likely that the frequency of these
296	   occurence (relative to the total number of objects with assigned
297	   addresses) will only decrease. The economic pressure to insure this
298	   will be intense.

300	6. The Structure of a Site

302	   The GSE global routing architecture ultimately views a Site as a leaf

304	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

306	   of the topology and doesn't concern itself with the interior of this
307	   private topology.  However, the internal topology of a Site is
308	   extremely important to the management and operation of the Site so
309	   the GSE address architecture provides for a rich set of
310	   organizational alternatives with different cost-benefit tradeoffs.

312	   The GSE address structure provides for 16384 distinct Site Topology
313	   Partitions (STPs) within a Site.  This is the number of SEGMENTS in
314	   the internal topology, not hosts.  The number of attached hosts is
315	   limited strictly by available local network technology, and the
316	   Site's ability to buy enough machines to exhaust the available IEEE
317	   8-byte MAC address space, or the available 7-byte IETF-NodeID space.

319	   Using this structure, a single Site can develop an internal topology
320	   which is a very significant fraction of the total CIDR routes in the
321	   IPv4 Global Internet.

323	   An organization is not constrained to being structured as a single
324	   Site.  The trade-off is that the inter-Site topology must then be
325	   part of the Public Topology. While the individual Sites can retain
326	   considerable independence in topological structure and attachment to
327	   the Global Internet, they must be aware of changes between the
328	   constituent Sites and that re-homing of constituent Sites will
329	   potentially impact long-running sessions. That is the cost of
330	   exploiting the routing machinery available to the Public Topology.

332	   Given the generous flexibility available for organizing a Site, it is
333	   worthwhile to examine a few examples.  Note that none of these
334	   organizational approaches is exclusive.  A large Site might well mix
335	   these approaches to good effect and indeed the goal is to provide the
336	   designer of private Site topology with a broad spectrum of design
337	   alternatives.

339	   The simplest structure to imagine is a Site using all IEEE MAC
340	   Addresses with all the systems connected in a single Private Topology
341	   Partition (i.e., all the GSE addresses carry the same STP value which
342	   is assigned by the local network administration).  Given the
343	   sophistication of current LAN-switching technology, a Site like this
344	   could be both large and internally complex yet have simple IPv6
345	   addressing.  The complexity is absorbed into the LAN infrastructure
346	   and it appears to be only one partition from the GSE Site Topology
347	   view.  This structure has one very significant advantage:   long-
348	   running TCP sessions will will survive arbitrary changes in the local
349	   topology.  This works, of course, because the single STP is a virtual
350	   topology with the real topology hidden by the LAN Switching
351	   machinery.

353	   The second Site model is like the one just described, except it would

355	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

357	   have multiple STPs with routers moving traffic between the segments.
358	   This is very close to the common IPv4 structure of a CIDR block being
359	   subnetted to assign a prefix to each STP.  This approach has the
360	   advantage of familiarity, but it has the disadvantage that long-lived
361	   TCP connections don't necessarily survive arbitrary changes to the
362	   private topology. This arises because even though the ESD is
363	   invariant, reachability will fail because a change in the STP of one
364	   of the system doesn't get injected into the protocol stack of the
365	   communicating systems when they move.  The existing IPv6 dynamic
366	   address assignment machinery will serve to make such internal changes
367	   much less painful than with IPv4, however.

369	   One point worth noting is that even with multiple STPs routed within
370	   a Site, a "Private Topology Partition" need not correspond to a
371	   "physical" LAN cable.  The STP values could be used to label larger
372	   organizational structures like "Engineering" or "Finance".  This
373	   could reduce the likelihood that common internal topology changes
374	   break long-lived connections.

376	   The third Site model uses IETF-NodeID ESDs based on existing IPv4
377	   address assignments.  In this case, all the IPv4-style ESDs could be
378	   placed in a single STP and then routed internally on the IPv4 address
379	   in the lowest 4 bytes of the ESD.  It must be emphasized that the
380	   IPv4 addresses used in IPv4-style ESD must be an officially-
381	   registered, public-use IPv4 address and NOT an RFC-1918 private-use
382	   address.  Using an RFC-1918 private-use address violates the global
383	   uniqueness properties required of an ESD.

385	   In all of the multi-segment cases, an IETF-NodeID ESD could be used
386	   to designate any point-to-point link endpoint, the loopback addresses
387	   in routers, or any other IP-accessible network elements which don't
388	   naturally have IEEE MAC address for forming an ESD.  And in all of
389	   the cases, an IETF-NodeID ESDs could be used universally, although it
390	   is more appropriate to use IEEE ESD form whenever possible.

392	   In all of the cases where the real topology is not completely
393	   virtualized by the LAN technology, there will be "Internal
394	   Renumbering" events caused by moving systems between infrastructure
395	   segments (STPs).  This will have the effect of killing long-running
396	   off-Site connections unless provisions are made to allow the systems
397	   (and the routing infrastructure) to carry the previous ESDs as
398	   synonyms for a while.  Given that most significant topology moves
399	   involve powering off the end system in question, this is hardly a
400	   hardship.  However, the powerful renumbering support already
401	   developed for IPv6 can make those other moves considerably less
402	   impacting.

404	   Most importantly, external re-homing of a Site to the global

406	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

408	   infrastructure can be made completely transparent.

410	7. Dynamic Address Re-writing by Site Border Routers

412	   A critical component of this architecture is the modification of
413	   addresses when packets leave or enter a Site.  Re-writing source
414	   addresses to insert appropriate Routing Goop at the Site egress point
415	   was part of the 8+8 proposal, but this proposal extends this to re-
416	   writing destination addresses when inbound packets arrive at a Site
417	   Border Router.

419	   The reasons for both re-writings are the same: to insulate the
420	   interior of the Site from external topology changes and egress policy
421	   details.

423	   When a Site Border Router inserts the correct RG in the source
424	   address of outbound packets, it frees the end-systems in the Site
425	   from having to know the RG for the Site. This is especially important
426	   if the site is Multi-homed and the Site implements a complex egress
427	   selection policy.

429	   In the case of inbound packets, if the destination address were not
430	   converted to a canonical form, the Site interior routers would have
431	   to be aware of all the different RG which could be used to reach the
432	   site, essentially creating aliasing of the destination addresses.  In
433	   the singly-homed case, this doesn't seem like a significant issue,
434	   but in complex Multi-homing scenarios there could be a significant
435	   problem managing this information.

437	   This symmetric re-writing essentially isolates the Site from the
438	   Global Internet just as the hard boundary between RG and STP
439	   components insulates the Global Internet from the Site topology.

441	8. The Structure of Routing Goop

443	   Routing Goop, or "RG" is the upper 6+ bytes of a GSE address.  This
444	   somewhat non-technical term was chosen because all the other
445	   alternatives seem to have various degrees of conceptual baggage which
446	   would be as much work to neutralize as the new notions are to explain
447	   in the first place.

449	   Fundamentally, RG is a Locator.  It encodes the topological
450	   connectivity of the Site containing the computer system identified by
451	   the ESD in the lower 8 bytes.  In the case of a singly-homed Site,
452	   re-homing to a new attachment to the Public Topology will change ONLY
453	   the RG in full GSE addresses for computer systems at that Site.  One
454	   example of such a re-homing would be a change of the Site's Internet
455	   Service Provider.  This change-over can be made essentially

457	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

459	   completely transparent to users both inside and outside the Site,
460	   although it does involve a practical limit on the transition duration
461	   relating to how long the departing ISP is willing to extend
462	   transitional courtesies.  During a changeover, though, all new
463	   connections will be initiated via the new ISP connection.

465	   This brings up the deep structure of the topology information carried
466	   in RG and how it is encoded.  More specifically, RG is a hierarchical
467	   locator which is a rooted path-expression of flat-routed regions
468	   which are tangent. Each element in the path-expression includes only
469	   enough detail to negotiate the flat-routed region.

471	   It has been observed before that the graph of the Global Internet is
472	   not obviously a hierarchy so how can this work?

474	   We start with the observation that every connected graph has at least
475	   one labeling which forms a spanning tree covering the nodes. The
476	   hierarchy is induced by a labeling function which partitions the
477	   global graph into regions and recursively into subregions.  This
478	   function is only globally visible at the top-level where an initial
479	   partitioning of the graph is used to form the first level of what
480	   will become the hierarchy.  Within each partition there is a local
481	   sub-partition function which assigns labels, and we proceed
482	   recursively. The nested recursions directly induce the hierarchy.

484	   This decomposition of the Global Internet produces a recursive graph
485	   where each level is composed of a set of subgraphs which are
486	   explicitly connected (i.e., explicitly routed between the subgraphs)
487	   while the structure within each subgraph is assumed to be flat-routed
488	   (at least as seen at that level).

490	   From an abstract viewpoint, a hierarchical partitioning can be
491	   induced with an arbitrary choice of labeling function (as long as the
492	   function produces the minimally-required partitioning). However, we
493	   desire the partitions to have several important properties which
494	   effects the choice of labeling function.

496	   The general goal is to produce a global labeling which represents the
497	   topology as compactly as possible, yet allows rich connectivity while
498	   bounding the complexity of the discrete regions which are flat-
499	   routed.

501	   The top level objects in the GSE graph hierarchy are called "Large
502	   Structures".  These are objects chosen for their ability to naturally
503	   represent significant topological aggregation of substructure (not
504	   geographical, political, or geometric).  The number of Large
505	   Structures is explicitly limited to bound the complexity at the top
506	   level of the aggregation graph.

508	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

510	   Within Large Structures, the (sub-)partition function is a trade-off
511	   between the flat-routing complexity within a region and minimizing
512	   total depth of the substructure.  This is driven by the internal
513	   topology of a Large Structure and the choices in different Large
514	   Structures will not necessarily be the same. This is why Routing Goop
515	   only has one hard bit boundary; Large Structures are free to
516	   internally subdivide as they chose. They are only required to
517	   encapsulate a significant portion of the Public Topology.

519	   One obvious candidate for Large Structures is large networks which
520	   already represent considerable aggregation based on existing CIDR
521	   deployment.  Another good candidate might be "Exchange Points".  The
522	   GSE model can accommodate both of these simultaneously, allowing
523	   IPv6-style "Network-anchored Prefixes" and "Exchange-anchored
524	   Prefixes" like that proposed by some to coexist and be subsumed into
525	   a unified notion of "Aggregator-anchored Prefixes."  Of course, these
526	   aren't prefixes strictly in the IPv4 CIDR sense, but the left-
527	   anchored substrings of the Routing Goop are intuitively quite
528	   similar.

530	   Large Structures are assigned a Large Structure Identifier, known as
531	   an LSID.  The total number of LSIDs is intentionally limited as we
532	   assume the paths between Large Structures are only flat-routed.

534	   Two consenting Large Structures remain free to share a tangency below
535	   the top level and exchange routes so as to provide for improved
536	   routing between the two of them (formalizing cut-throughs in the
537	   natural hierarchy).  The goal is to provide for manageable complexity
538	   of the ultimate default-free zone (the top level of the global
539	   hierarchy) while allowing for controlled circumvention of the natural
540	   hierarchical paths.

542	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

544	   Bit-level structure of Routing Goop:

546	    0                   1                   2                   3
547	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
548	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
549	   | xxx | 13 Bits of LSID         |      Upper 16 bits of Goop    |
550	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

552	    3               4                   5                   6
553	    2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
554	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
555	   | Bottom 18 bits of Routing Goop    | 14 bits of Site Topology  |
556	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

558	   NOTE: The Routing Goop structure above assumes that the GSE  proposal
559	   is  designated  by a 3-bit type of IPv6 address.  If a GSE address is
560	   identified by two upper bits, the LSID would expand to 14  bits.   If
561	   identified  by  one bit, the LSID would stay at 14 bits and the Upper
562	   16 bits of Goop would expand to 17 bits.

564	   Routing between two interior points of two different Large Structures
565	   is always possible based solely on the LSID. This provides a
566	   "forwarding strategy of last resort" for a router running "default-
567	   free".  From one point of view, the LSID partitions the Global
568	   Internet into a set of regions such that an interior router only need
569	   carry a "per-LSID default" pointing at an appropriate boundary router
570	   which knows how to to handle traffic bound outside the containing
571	   Large Structure for a point in the other Large Structure.

573	   If two Large Structures share a tangency somewhere below the top
574	   level, then some interior routers of both Large Structures will share
575	   routes to exploit the tangency for optimizing paths.  How this cut-
576	   through information is distributed within the two Large Structures is
577	   not revealed elsewhere in the global topology. The exact "shape" of
578	   the optimization region is controlled by the decisions about which
579	   routes to advertise across the cut-through.  These decisions are made
580	   by the collaborators and the optimized region need not be symmetric
581	   with respect to the cut-through.  The size of the optimization area
582	   is controlled by how far routes learned via the cut-through are
583	   propagated within the sub-graphs tangent via the cut-through. Again,
584	   this is a matter of engineering choices made by the collaborators
585	   operating the cut-through.

587	   While the LSID is may appear similar to the Autonomous System Number
588	   currently used in IPv4 policy-based routing machinery, the LSID is
589	   quite distinct from the AS number and the two identifiers play very
590	   different roles.  AS Numbers will continue be used for policy routing
591	   information exchange and must remain distinct from the LSID space.

593	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

595	9. The "Flow" of Routing Goop

597	   It is intuitively useful to think about Routing Goop as "flowing
598	   downhill" through the hierarchy from the topmost Large Structures,
599	   through the intermediate levels of the Public Topology, and
600	   ultimately down to the Site.  As the RG propagates downward, the
601	   prefix extends to the right, just like in IPv4 CIDR, with each
602	   extension navigating the nested flat-routed subgraphs, eventually
603	   terminating at the Site, which then descends invisibly into the
604	   Private Topology of that Site.

606	   The nested flat-routed areas correspond to transit subnetworks of the
607	   Large Structure.  One very important example of such subnets is the
608	   "reseller" or "wholesale transit customer" of a Large Structure.
609	   (Note that whether the Large Structure is a network or an exchange
610	   point doesn't matter.)  The reseller network provides transit for
611	   Sites, so must be part of the Public Topology and appears as a
612	   substring within the Routing Goop, usually the right-most extension
613	   unless the reseller has further reseller customers.  In that case,
614	   the next level reseller will have his own extension to record his
615	   place in the Public Topology and to provide for navigating through it
616	   as well.

618	   The overall picture can now be drawn as a forest of trees
619	   distributing Routing Goop down to the Sites, with each tree being a
620	   Large Structure and the Large Structures connected arbitrarily at the
621	   top level. This structure will be mirrored by the actual machinery
622	   for distributing Routing Goop to the Sites as will be discussed a bit
623	   later, but this mental image of the prefixes "flowing" from the
624	   anchoring Large Structures is critical to understanding fundamental
625	   self-organizing abilities in the GSE model.

627	   While the GSE machinery is intended to be adequate for almost
628	   completely automated self-organization with respect to the
629	   construction and propagation of Routing Goop on an Internet-wide
630	   basis, we proceed for now closely following current practice
631	   (admitting manual configuration of certain information like Routing
632	   Goop) because of the additional complexity of the self-organization
633	   functions.  Initial deployment following current practice would not
634	   preclude eventual deployment of a fully self-organizing Global
635	   Internet.

637	10. The Distribution of Routing Goop

639	   There are two cases to consider for how Routing Goop gets
640	   distributed: source addresses and destination addresses.  In both
641	   cases RG is part of the address, one way or another, so we show how a
642	   full 16-byte address with the right RG gets created in these two

644	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

646	   cases.

648	10.1 RG for Source Addresses

650	   The initial RG of a source address is almost always the Site-local
651	   prefix.  If the destination address is not within the Site, the
652	   packet will leave the Site via one of possibly several Site Boundary
653	   Routers.  The egress Site Border Router will insert the correct RG in
654	   the source address based on the path the destination should use to
655	   return a packet to the sender.  Except in unusual circumstances this
656	   will be the RG which corresponds to the attachment path of that
657	   egress Site Boundary Router to the Global Internet.

659	   If the Site is multi-homed via just one Site Boundary Router, then
660	   the router is free to apply whatever local policy suits. It simply
661	   must fill in a valid RG path which leads back to a Site Boundary
662	   Router for that Site.  If the Site is multi-homed via more than one
663	   Site Boundary Router, which router provides egress is purely local
664	   policy and which RG gets applied is likewise local policy.

666	   The dynamic insertion of RG upon Site egress accomplishes a number of
667	   things.

669	   (1) It means that for most purposes, a computer system at a Site need
670	   not concern itself with egress policy matters which can be
671	   particularly tricky in Multi-homed Sites.

673	   (2) It means that computer systems are essentially not impacted at
674	   all by topological re-homing of the Site.

676	   (3) It means that more complex multi-homing scenarios with multiple
677	   Site Boundary Routers each with multiple connections to the Global
678	   Internet can execute arbitrarily complex path recovery policy without
679	   concern for how it might impact a computer system doing source
680	   address selection.

682	   (4) It means that while a computer systems might forge the ESD in a
683	   source address, it CANNOT forge the point of injection into the
684	   Public Topology.  This is not strong authentication down to the
685	   particular computer system, but it is probably a strong deterrent to
686	   certain obnoxious activities due to the dramatically improved
687	   traceability.  We also note that the first-hop attachment router in
688	   the Public Topology is free to insert or override the RG if somehow
689	   an errant packet escapes a Site carrying invalid RG, thereby
690	   enforcing traceability. Of course, the Public first-hop router could
691	   always just drop a packet carrying inappropriate source RG as well.
692	   But to make it very clear, we put the burden of inserting correct RG
693	   in exiting source addresses squarely and solely on the Site and the

695	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

697	   Site Border Router. Any other location of the task has bad
698	   performance scaling.

700	   The Site Border Router acquires the necessary RG from the first-hop
701	   attachment router in the Public Topology.  Alternately, as an initial
702	   mechanism the RG could be statically configured, but the real goal is
703	   completely automated propagation down the tree so that an entire
704	   complex subtree can be rehomed without human intervention or service
705	   disruption.

707	10.2 RG for Destination Addresses

709	   Currently, an IPv6 address lookup for a DNS name returns the
710	   information in a "AAAA" record which is the full 16 bytes of the IPv6
711	   address.

713	   The GSE design proposes synthesizing the 16 bytes of information in a
714	   query response from two different sources: an "AAA" record and an
715	   "RG" record.  The "AAA" record carries the 8-byte ESD + ~2 byte STP
716	   for the DNS name in question and the "RG" record carries 6+ bytes of
717	   the appropriate Routing Goop.

719	   One interesting question is how the AAA record gets paired with an RG
720	   record in a given nameserver.  One simpleminded implementation would
721	   be to pair an RG record with a zone, but that has the problem of
722	   requiring all the systems in that zone to use the same Routing Goop
723	   and hence be in the same Site.

725	   A better scheme is to carry an "RG Name" in the "AAA" record which
726	   would allow a nameserver to concatenate an arbitrary RG prefix to the
727	   ESD+STP producing the full 16 byte response.  The "RG Name" would be
728	   a full DNS name which could be recursively translated (and the result
729	   cached).  Structured as an "upward delegation" with an appropriate
730	   Time-to-Live, a Site could import the Routing Goop information from
731	   their service provider completely automatically.  This capability
732	   will be used to great advantage in the discussions of re-homing which
733	   follows. [Interactions between RG TTL and zone TTL is an issue to be
734	   explored more.]

736	   Alternately, one special case for an RG record could be a delegation
737	   to a Site Border Router which could supply the correct RG
738	   automatically, at least in single-homed cases, and possibly in
739	   multi-homed cases.

741	   The result of this structure is that individual zone entries for
742	   individual nodes (AAA records) do NOT change when a Site rehomes.
743	   The only thing which changes (logically) is the RG information which
744	   is composed with the node's AAA record to produce a full 16-byte

746	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

748	   response.  This means the general Dynamic DNS machinery is NOT
749	   required to support Site re-homing.

751	   One implication of the special Site-local Prefix RG for intra-Site
752	   traffic is that Sites will have to provide at least two "faces" on
753	   their nameservice - one that returns Site-local as the RG for queries
754	   from inside the site, and another that returns full RG responses for
755	   requests originating outside the Site.  This can be readily
756	   accomplished by inspecting the source address - if the source address
757	   contains the Site-local Prefix as RG, then return the same.
758	   Otherwise, return a fully-general RG-based response (possibly based
759	   on egress-path selection policy).

761	10. Re-homing A Site

763	   When a Site changes its point of attachment to the Global Internet,
764	   it is said to "rehome". One of the significant criticisms of IPv4
765	   CIDR and IPv6 "Provider-based Addressing" is the requirement to
766	   "renumber" a Site when it rehomes.  One of the explicit goals of the
767	   GSE architecture is to eliminate, or at least mitigate, the impact of
768	   this.

770	   It is important to reiterate the notion that the Routing Goop of a
771	   GSE address is not just a Locator, but that it encodes a PATH from
772	   the top level of the global hierarchy down to the Site.  Changing
773	   that path is what makes Re-homing and Multi-homing essentially
774	   equivalent operations.  We proceed with the simple case first.

776	   When a Site wishes to rehome, it must establish a new attachment
777	   point to the Global Internet, and hence establish a new access path.
778	   Then it must start using that new path before the old path is
779	   removed.  The procedure is as follows:

781	   A Site establishes a connection with a new ISP and it becomes able to
782	   carry the traffic.  At that point, the Site alters the upward
783	   delegation of the DNS RG records.  Henceforth, all new connections
784	   made with the new translations will follow the new path to the Site.
785	   The new connection path is then made the preferred egress path and
786	   source addresses in packets exiting the Site immediately start being
787	   marked with the new return path.  The old connection should be
788	   maintained for some administratively determined grace period to allow
789	   DNS timeouts to transition new sessions to the new path and for
790	   long-running sessions to terminate.

792	   At first blush, it might appear that when the egress path for the
793	   Site switches over to the new path and the Site Border Router starts
794	   marking packets with the new RG, the return path for long-running
795	   sessions would automatically switch over to the new path.  Alas, this

797	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

799	   is not so because a long-running session will be using destination
800	   address containing the old RG acquired when the session first
801	   started.

803	   Consideration was given to providing some kind of "path redirect"
804	   which would allow the other end to deal with "flying cutovers" of a
805	   running session, but the security implications of this mechanism are
806	   too far-reaching to consider as part of initial deployment.  If at
807	   some later point it becomes clear how to accomplish this safely, then
808	   it could be added. But the complexity, security risks, and the
809	   magnitude of the added value do not seem worthwhile at present
810	   (although the author would love to be convinced otherwise).

812	   Alternately, the Site could request a "Re-homing Courtesy" from their
813	   old ISP which would effectively make it a multi-homed Site for some
814	   period of time.  After multi-homing was established, the old
815	   connection could be taken down and the long-running sessions would
816	   continue to survive as long as the Site was multi-homed by way of the
817	   Re-homing Courtesy.

819	   Note that at no time did the re-homing effect anything internal to
820	   the Site's Private Topology.  The only change was the attachment to
821	   the Public Topology and the Routing Goop which records that
822	   attachment location.

824	11. Multi-homing a Site

826	   One of the curiosities of IPv4 is that the network does a lot more
827	   work for a multi-homed site but it is very hard to pin it down so
828	   that the instigator of the effort can compensate the workers.

830	   In the GSE model, Multi-homing is an explicit service which is
831	   performed for a Site by the agents of the Public Topology which
832	   provide the access for the Site.  This mechanism can be made more
833	   sophisticated, but the notion is most readily explained by
834	   considering a Site which is dual homed to two different ISPs and
835	   hence has two distinct access paths represented by two distinct blobs
836	   of Routing Goop.

838	   The Site is attached to each ISP via some link and we postulate some
839	   kind of keep-alive protocol which determines when reachability to the
840	   Site's border router is lost. The ISP routers serving the dual-homed
841	   Site are identified to each other (via static configuration
842	   information in the simplest case or a dynamic protocol in the more
843	   general case), and when a link to the Site is lost, the ISP router
844	   anchoring the dead link simply tunnels any traffic destined for the
845	   Site via the other ISP router.

847	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

849	   This approach clearly requires coordination between the two serving
850	   ISPs. This is not a new constraint - multi-homing already requires
851	   considerable coordination between the Site and is providers.  Of
852	   course, creating a protocol for dynamically creating a "homing group"
853	   is probably a very worthwhile investment but it is not absolutely
854	   necessary at the outset.

856	   It should be obvious now that the "Re-homing Courtesy" in the
857	   previous section is simply doing the router-pair coordination with
858	   the new ISP for some period of time.

860	   [Note: Yakov and Bates are working on a draft for a Site-side
861	   implementation of aggregation-efficient multi-homing which may
862	   simplify this even further.]

864	12. Re-homing a Reseller

866	   Re-homing a Reseller is a slightly more general case of re-homing a
867	   Site, primarily characterized by more lead time, a longer grace
868	   period, and some necessary coordination with customer Sites to insure
869	   that the Routing Goop propagates correctly.

871	   The Reseller will establish a new connection which will not only
872	   result in a new path for the Reseller's topology, but for that of his
873	   customer Sites. When the Reseller alters his upward delegation of
874	   Routing Goop, it will ripple downward to his customer Sites by nature
875	   of their upward delegations.  The downward ripple of Routing Goop via
876	   the upward delegations should cause the Site zone TTLs to be reduced
877	   appropriately to insure caches expire well within the dual-homed
878	   transition grace period for the Reseller.

880	   This essentially rehomes all the Reseller's customer Sites all at the
881	   same time the Reseller's infrastructure is re-homing and should be
882	   completely transparent except for long-lived sessions which do not
883	   terminate by the end of the grace period.

885	13. Multi-homing a Reseller

887	   There are two parts to multi-homing a Reseller - one part similar to
888	   the multi-homed Site case above, and one part which is quite
889	   different.

891	   For this discussion, assume a Reseller which is dual-homed and hence
892	   has two different Routing Goop prefixes (remember that each path to
893	   the top level of the hierarchy has a distinct prefix). The reseller
894	   can solicit multi-homed tunneling services from his two access point
895	   routers to provide alternate path service just like a multi-homed
896	   Site.  Why traffic is coming to any particular router, though, is

898	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

900	   influenced entirely by what routes are advertised out that particular
901	   connection via BGP5 (or IDRP).  This is rather different from the
902	   multi-homed Site case where the ESD is the object of interest and the
903	   RG simply gets the traffic to the Site boundary.

905	   The question arises, however, as to which prefix gets used for
906	   extending downward to his customer Sites.  The answer in the simplest
907	   case is to pick one and use it, making the Sites "natural" in the
908	   chosen prefix.  The alternate prefix can, of course, be advertised
909	   out the alternate path if desired.  But this work can be ascribed to
910	   the instigator and the superior attachment points can charge for this
911	   service.  (This is somewhat akin to charging for routes, but only
912	   routes which create a discontinuity in the routing space.)

914	15. A Comment on NAT Boxes

916	   In discussions about requiring destination address re-writing for
917	   inbound packets, Brian Carpenter remarked that with the advent of
918	   symmetric re-writing (both inbound and outbound), the GSE
919	   architecture is essentially "NAT that works."  To some, this would be
920	   the ultimate insult, but I think it is essentially correct.  NAT
921	   Boxes provide for isolating a Site from topology changes but severely
922	   compromise the end-to-end model.  GSE affords very similar
923	   operational topological isolation but without violating the end-to-
924	   end model, at least not nearly as much.  If a Site wishes the
925	   additional isolation afforded by NAT Boxes, a firewalls will
926	   accomplish that task.

928	15. General Comments

930	   While some of GSE is a radical departure from IPv6 as we currently
931	   know it, in general it relies deeply on all the IPv6 underpinnings
932	   which contribute so much to the attractiveness of IPv6: Neighbor
933	   Discover, all the dynamic configuration machinery designed to make
934	   renumbering palatable even using "provider-based addressing", and the
935	   flexibility of the "salami headers" which make tunneling and security
936	   attractive.  The general forwarding operations based on longest-
937	   match-under-prefix-mask and the policy-based routing machinery of
938	   BGP5/IDRP are also simply assumed.

940	16. Closing Comments and Acknowledgments

942	   This document presents a revision of the "8+8" addressing model which
943	   has been under construction by the author since before Fall of 1995,
944	   at least.  Conversations with a great many people have contributed to
945	   the design presented in this document.  A skeletal version of this
946	   proposal first appeared in some email from Dave Clark of MIT who
947	   planted the seed and provided the original monicker "8+8". A great

949	Internet-Draft                GSE for IPv6        1997/02/24 01:32:32GMT

951	   many others have contributed ideas and observations, all of which
952	   went into the stew pot for the synthesis contained here.

954	   The original "8+8" draft cited the following individuals for a
955	   special thank-you: Vadim Antonov, Ran Atkinson, Scott Bradner, Brian
956	   Carpenter, Noel Chiappa, Steve Deering, Sean Doran, Joel Halpern,
957	   Christian Huitema, Tony Li, Peter Lothberg, Louis Mamakos, Radia
958	   Perlman, Yakov Rekhter, Paul Traina.

960	   This draft has benefited greatly from conversations with Masataka
961	   Ohta, who convinced the author of the importance of the IETF-NodeID
962	   in addition to the 8-byte IEEE MAC addresses, as well as Brian
963	   Carpenter, Scott Brander, Ran Atkinson, all the people who so
964	   graciously provided invaluable comments on the original "8+8" draft,
965	   and of course Steve Deering, Bob Hinden, and the IPng Working Group.

967	17. Security Considerations

969	   More than can be imagined.

971	18. Author's Address

973	   Mike O'Dell
974	   UUNET Technologies, Inc.
975	   3060 Williams Drive
976	   Fairfax, VA 22031
977	   voice: 703-206-5890
978	   fax:   703-206-5471
979	   email: mo@uu.net