idnits 2.17.1 draft-ietf-ipngwg-gseaddr-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-27) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 559 has weird spacing: '... is designa...' == Line 561 has weird spacing: '...ntified by o...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Mike O'Dell 3 Internet-Draft UUNET Technologies 4 1997/02/24 01:32:32GMT 6 GSE - An Alternate Addressing Architecture for IPv6 8 10 1. Status of this Memo 12 This document is an Internet-Draft. Internet-Drafts are working 13 documents of the Internet Engineering Task Force (IETF), its areas, 14 and its working groups. Note that other groups may also distribute 15 working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as ``work in progress.'' 22 To learn the current status of any Internet-Draft, please check the 23 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 24 Directories on ftp.is.co.za (Africa) , nic.nordu.net (Europe), 25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast ), or 26 ftp.isi.edu (US West Coast). 28 2. Abstract 30 This document presents an alternative addressing architecture for 31 IPv6 which controls global routing growth by very aggressive 32 topological aggregation. It includes support for scalable multi- 33 homing as a distinguished service. It provides for future 34 independent evolution of routing and forwarding models with 35 essentially no impact on end systems. Finally, it frees sites and 36 service resellers from the tyranny of CIDR-based aggregation by 37 providing transparent re-homing of both. 39 3. Introduction 41 This alternative IPv6 addressing architecture addresses several 42 scalability issues with the current IPv6 addressing proposals. 44 Scaling of the global route computation 46 Ease of re-homing (both leaf Sites and upstream Resellers) 48 Economic scalability of of Multi-homing 50 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 52 The current IPv6 addressing proposals address route and topology 53 aggregation by continuing to rely on CIDR-style "Provider-based 54 Addressing" coupled with a powerful new dynamic address assignment 55 mechanism which is intended to make renumbering more palatable. 57 However, CIDR-style provider-based aggregation breaks down in the 58 face of the accelerating growth of multi-homed sites (leaf sites or 59 regional networks). Worse, renumbering an entire Site to accomplish 60 a simple topological re-homing such as changing ISPs is a problem 61 whose magnitude can only grow over time. It will remain increasingly 62 difficult to explain this renumbering requirement to customers with 63 the spectre of a complete failure of this aggregation approach a 64 distinct possibility. 66 While the large IPv6 addresses provide for a huge increase in the 67 number of end systems which can be accommodated, it also portends a 68 huge increase in the number of routes required to reach them. Even if 69 CIDR aggregation were to continue at current levels (maintaining 70 current efficiency is relatively unlikely), this still presents a 71 serious problem for the growth of the the global route computations. 73 This document presents a new proposal for using the 16 byte IPv6 74 address which mitigates the route scaling problem and with it a 75 number of collateral issues. This model provides for aggressive 76 topological aggregation while controlling the complexity of flat- 77 routed regions. It exploits and supports the dynamic address 78 assignment machinery in IPv6 but makes the exact role of that 79 machinery a decision local to a Site. It is therefore subject to 80 engineering cost and benefit analysis rather than being mandatory for 81 simple Site re-homing situations. 83 This new model also identifies the special work done by the global 84 Internet infrastructure on behalf of multi-homed sites. Rather than 85 continuing the current "Tragedy of the Commons", the multi-homing is 86 isolated into a specific mechanism which is then traceable to and 87 incurred by only those sites wishing to subscribe to this capability. 88 Again, this makes it possible for sites to make informed cost-benefit 89 decisions about multi-homing. 91 4. Central Concepts of the Architecture 93 The architecture is based upon a few central concepts. 95 A strong distinction between Public and Private Topology 97 A strong distinction between system identity and location 99 GSE - Global, Site, and End-system address elements 101 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 103 The deep similarity of Re-homing and Multi-homing 105 Rewriting address prefixes at Site boundaries 107 Very aggressive hierarchical network topology aggregation 109 Optimizing actual forwarding paths by limited-scope 110 cut-throughs 112 This model draws a strong distinction between the Public Topology 113 which forms the transit infrastructure of the Global Internet and a 114 "Site" which can contain a rich but strictly private local network 115 topology which cannot "leak" into the global routing machinery. The 116 Site is the fundamental unit of attachment to the Global Internet and 117 is therefore strictly a leaf, even if possibly multi-homed. 119 This model also draws a very strong distinction between the identity 120 of a computer system and where it attaches to the the Public 121 Topology. In IPv4 and current IPv6 models, these notions of identity 122 and location are deeply co-mingled and this is the fundamental reason 123 why simple topology changes have such wide-ranging impact on address 124 assignment (if aggregation is to be maintained at all). 126 The 16 byte IPv6 address is split into 3 pieces: 128 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 129 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 130 | Routing Goop | STP| End System Designator | 131 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 132 6+ bytes ~2 bytes 8 bytes 134 Routing Goop signifies where the Site attaches to the Global 135 Internet. The Site Topology Partition (STP) is Site-private "LAN 136 segment" information. The End System Designator (ESD) specifies an 137 interface on an end-system. 139 One surprising notion is that re-homing and multi-homing are very 140 deeply related. Multi-homing can be viewed as rather like several 141 simultaneous re-homings happening at once. Achieving both painless 142 re-homing and scalable multi-homing rely on the same set of 143 fundamental mechanisms, each with a few distinct details. 145 Rewriting IPv6 addresses by Site Border Routers is by far the most 146 controversial, but also most critical part of this proposal. To 147 control the complexity of routing information which must be managed 148 within a Site and to isolate end systems and interior routers from 149 external topology changes, the RG of some addresses is modified by 150 Site Border Routers. Packets exiting a site have the RG for the Site 152 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 154 egress point inserted into source addresses, while packets entering a 155 Site have the RG in all destination addresses replaced with a 156 canonical prefix signifying "within this Site" (the "Site-local 157 prefix"). 159 One immediate result is that upper-layer protocols must use only the 160 ESD for purposes such as pseudo-header checksums and the like. The 161 ESD is the invariant token, the RG is possibly transient topology 162 information subject to change. 164 Topology aggregation is accomplished by partitioning the Global 165 Internet into a set of tree-shaped regions anchored by "Large 166 Structures". The Routing Goop in an address specifies a path from 167 the root of the tree (the Large Structure) to a point in the 168 topology; in the terminal case this is a Site. Large Structures are 169 chosen by their ability to aggregate topology and no particular 170 advantage flows from "being one"; actually quite the contrary. Large 171 Structures are responsible for subdividing the space under them and 172 managing that delegation. Large Structures provide a "forwarding 173 token of last resort" which can always be used for selecting a valid 174 next-hop when no other information is available. This significantly 175 limits the minimally-sufficient information required for a "default- 176 free" router. Any additional route information kept is the result of 177 path optimizations from cut-throughs. 179 While it is useful to think of the Large Structures as trees, the 180 collection is actually a DAG (Directed Acyclic Graph) because the 181 trees can touch each other via cut-throughs. By cross-propagating 182 selected details via a cut-through, a locally-controlled region can 183 learn of alternative paths to some destinations. The distance this 184 optimization information is propagated and the radius of the 185 optimization region advertised are the business of the collaborating 186 regions. 188 5. The Structure of End System Designators - the ESD 190 End System Designators denote every computer system in the GSE 191 Internet regardless of whether it is a host, router, or other network 192 element. While a given system can have more than one ESD, each ESD 193 is globally unique. This is critical for their utility to the 194 upper-level protocols. This uniqueness can be induced several ways 195 as will be seen. 197 A crucial design decision is whether an ESD identifies a system, 198 invariant of its interfaces as in the XNS architecture, or an 199 interface on a system as in the existing IPv4 and IPv6 architecture. 201 An ESD designates an interface on a computer system and that 203 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 205 interface can be either physical or virtual. 207 When processing a GSE address, a computer system need only examine 208 the ESD portion of the address to determine whether a packet is 209 destined for that system. 211 There are circumstances when it is quite useful to have "an address" 212 for a computer system which is independent of any particular physical 213 interface on that system. It has become commonplace in IPv4 practice 214 to use a distinguished virtual interface to provide a system with 215 such an "interface independent identity". This technique affords the 216 same architectural utility of XNS while still allowing the 217 flexibility of the IPv4 "addressed interface" model. This model 218 retains the successful IPv4/IPv6 model. 220 NOTE: We remain intentionally vague about exactly what constitutes an 221 "interface" and a "computer system". The malleability of those 222 notions in IPv4 has proven manifestly useful in practice. 224 To summarize the ESD uniqueness characteristics: 226 (1) an ESD is globally unique 227 (2) an ESD designates an "interface" on "a computer system" 228 (3) an Interface may have more than one ESD 229 (current IPv6 already requires implementations to support 230 multiple addresses per interface) 231 (4) an ESD may not necessarily designate a particular 232 physical computer (Neighbor Discovery continues to provide 233 a level of virtual address translation and considerable 234 cleverness can be disguised therein) 236 There are two forms of ESD, both 8 bytes long, one a subcase of the 237 other. 239 It is clear that with the impending onslaught of the IEEE-1394 240 technology that 8-byte IEEE MAC addresses are simply fait accompli 241 and many devices will be provided with a unique identity in that 242 format at the time of manufacture. The 8-byte IEEE MAC Address 243 format includes the current 6-byte MAC Addresses as a proper 244 subspace. Using the 8-byte IEEE MAC address will be very convenient 245 for many network builders. 247 There are at least two issues with using *only* the IEEE 8-byte MAC 248 addresses as ESDs: There are point-to-point link interfaces which 249 have no IEEE MAC address assigned for them, and the 8-byte IEEE MAC 250 addresses assigned to the interfaces of a system are essentially 251 random. For some, there is also the issue of whether the IEEE MAC 252 address is "unique enough" for the purposes at hand. 254 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 256 We clearly need a space for generating ESDs for interfaces which 257 don't come equipped with one. Some have also suggested there might 258 be great utility in enabling inverse lookups on just the ESD part of 259 an address. Assigning ESDs in semantic clusters (like current IPv4 260 addresses) would be a signficant aid to this end. Finally if a 261 network designer decides not to trust the uniqueness of the IEEE MAC 262 addresses, he could always use the Dynamic Numbering machinery of 263 IPv6 to assign ESDs. 265 We propose that the IETF seek a large (7 bytes or greater) subspace 266 of the IEEE 8-byte MAC space for allocation as IETF-NodeIDs in 267 semantic clusters to provide a pool of addresses which can be used 268 for any of the above reasons, as required. However, it is expected 269 that most network builders will exploit the intrinsic IEEE MAC 270 addresses present in many network interfaces whenever possible. 272 The IETF-NodeID space should be partitioned into two regions - one 273 exactly isomorphic to the existing IPv4 address space to provide 274 instant grandfathering of IPv4 addresses, and another space which is 275 simply larger but allocated in a similar manner. 277 A few comments on "global uniqueness" are in order because in 278 previous discussions, some have asserted that unless "uniqueness" can 279 be accomplished with absolute and complete mathematical perfection, 280 any scheme using the concept is unworkable. This extreme view 281 inconsistent with mass-market experience. 283 IEEE MAC addresses are globally unique by nature of the delegation 284 process where they are assigned to interfaces by the manufacturers. 285 Both XNS and IPX rely on this uniqueness and it works very well in 286 practice. IETF-NodeID values will be globally unique by nature of 287 the same kind of assignment mechanism. IPv4 addresses must be 288 globally unique for the Internet to function, and it does, mostly, by 289 nature of exactly the same kind of assignment mechanism. 291 While accidents and manufacturing defects do occasionally violate the 292 uniqueness of IEEE MAC address assignment, humans routinely make 293 errors in assigning IPv4 addresses to systems with equally mystifying 294 results. Given the reliance of IEEE-1394 Firewire interconnects on 295 these unique MAC addresses, it is likely that the frequency of these 296 occurence (relative to the total number of objects with assigned 297 addresses) will only decrease. The economic pressure to insure this 298 will be intense. 300 6. The Structure of a Site 302 The GSE global routing architecture ultimately views a Site as a leaf 304 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 306 of the topology and doesn't concern itself with the interior of this 307 private topology. However, the internal topology of a Site is 308 extremely important to the management and operation of the Site so 309 the GSE address architecture provides for a rich set of 310 organizational alternatives with different cost-benefit tradeoffs. 312 The GSE address structure provides for 16384 distinct Site Topology 313 Partitions (STPs) within a Site. This is the number of SEGMENTS in 314 the internal topology, not hosts. The number of attached hosts is 315 limited strictly by available local network technology, and the 316 Site's ability to buy enough machines to exhaust the available IEEE 317 8-byte MAC address space, or the available 7-byte IETF-NodeID space. 319 Using this structure, a single Site can develop an internal topology 320 which is a very significant fraction of the total CIDR routes in the 321 IPv4 Global Internet. 323 An organization is not constrained to being structured as a single 324 Site. The trade-off is that the inter-Site topology must then be 325 part of the Public Topology. While the individual Sites can retain 326 considerable independence in topological structure and attachment to 327 the Global Internet, they must be aware of changes between the 328 constituent Sites and that re-homing of constituent Sites will 329 potentially impact long-running sessions. That is the cost of 330 exploiting the routing machinery available to the Public Topology. 332 Given the generous flexibility available for organizing a Site, it is 333 worthwhile to examine a few examples. Note that none of these 334 organizational approaches is exclusive. A large Site might well mix 335 these approaches to good effect and indeed the goal is to provide the 336 designer of private Site topology with a broad spectrum of design 337 alternatives. 339 The simplest structure to imagine is a Site using all IEEE MAC 340 Addresses with all the systems connected in a single Private Topology 341 Partition (i.e., all the GSE addresses carry the same STP value which 342 is assigned by the local network administration). Given the 343 sophistication of current LAN-switching technology, a Site like this 344 could be both large and internally complex yet have simple IPv6 345 addressing. The complexity is absorbed into the LAN infrastructure 346 and it appears to be only one partition from the GSE Site Topology 347 view. This structure has one very significant advantage: long- 348 running TCP sessions will will survive arbitrary changes in the local 349 topology. This works, of course, because the single STP is a virtual 350 topology with the real topology hidden by the LAN Switching 351 machinery. 353 The second Site model is like the one just described, except it would 355 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 357 have multiple STPs with routers moving traffic between the segments. 358 This is very close to the common IPv4 structure of a CIDR block being 359 subnetted to assign a prefix to each STP. This approach has the 360 advantage of familiarity, but it has the disadvantage that long-lived 361 TCP connections don't necessarily survive arbitrary changes to the 362 private topology. This arises because even though the ESD is 363 invariant, reachability will fail because a change in the STP of one 364 of the system doesn't get injected into the protocol stack of the 365 communicating systems when they move. The existing IPv6 dynamic 366 address assignment machinery will serve to make such internal changes 367 much less painful than with IPv4, however. 369 One point worth noting is that even with multiple STPs routed within 370 a Site, a "Private Topology Partition" need not correspond to a 371 "physical" LAN cable. The STP values could be used to label larger 372 organizational structures like "Engineering" or "Finance". This 373 could reduce the likelihood that common internal topology changes 374 break long-lived connections. 376 The third Site model uses IETF-NodeID ESDs based on existing IPv4 377 address assignments. In this case, all the IPv4-style ESDs could be 378 placed in a single STP and then routed internally on the IPv4 address 379 in the lowest 4 bytes of the ESD. It must be emphasized that the 380 IPv4 addresses used in IPv4-style ESD must be an officially- 381 registered, public-use IPv4 address and NOT an RFC-1918 private-use 382 address. Using an RFC-1918 private-use address violates the global 383 uniqueness properties required of an ESD. 385 In all of the multi-segment cases, an IETF-NodeID ESD could be used 386 to designate any point-to-point link endpoint, the loopback addresses 387 in routers, or any other IP-accessible network elements which don't 388 naturally have IEEE MAC address for forming an ESD. And in all of 389 the cases, an IETF-NodeID ESDs could be used universally, although it 390 is more appropriate to use IEEE ESD form whenever possible. 392 In all of the cases where the real topology is not completely 393 virtualized by the LAN technology, there will be "Internal 394 Renumbering" events caused by moving systems between infrastructure 395 segments (STPs). This will have the effect of killing long-running 396 off-Site connections unless provisions are made to allow the systems 397 (and the routing infrastructure) to carry the previous ESDs as 398 synonyms for a while. Given that most significant topology moves 399 involve powering off the end system in question, this is hardly a 400 hardship. However, the powerful renumbering support already 401 developed for IPv6 can make those other moves considerably less 402 impacting. 404 Most importantly, external re-homing of a Site to the global 406 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 408 infrastructure can be made completely transparent. 410 7. Dynamic Address Re-writing by Site Border Routers 412 A critical component of this architecture is the modification of 413 addresses when packets leave or enter a Site. Re-writing source 414 addresses to insert appropriate Routing Goop at the Site egress point 415 was part of the 8+8 proposal, but this proposal extends this to re- 416 writing destination addresses when inbound packets arrive at a Site 417 Border Router. 419 The reasons for both re-writings are the same: to insulate the 420 interior of the Site from external topology changes and egress policy 421 details. 423 When a Site Border Router inserts the correct RG in the source 424 address of outbound packets, it frees the end-systems in the Site 425 from having to know the RG for the Site. This is especially important 426 if the site is Multi-homed and the Site implements a complex egress 427 selection policy. 429 In the case of inbound packets, if the destination address were not 430 converted to a canonical form, the Site interior routers would have 431 to be aware of all the different RG which could be used to reach the 432 site, essentially creating aliasing of the destination addresses. In 433 the singly-homed case, this doesn't seem like a significant issue, 434 but in complex Multi-homing scenarios there could be a significant 435 problem managing this information. 437 This symmetric re-writing essentially isolates the Site from the 438 Global Internet just as the hard boundary between RG and STP 439 components insulates the Global Internet from the Site topology. 441 8. The Structure of Routing Goop 443 Routing Goop, or "RG" is the upper 6+ bytes of a GSE address. This 444 somewhat non-technical term was chosen because all the other 445 alternatives seem to have various degrees of conceptual baggage which 446 would be as much work to neutralize as the new notions are to explain 447 in the first place. 449 Fundamentally, RG is a Locator. It encodes the topological 450 connectivity of the Site containing the computer system identified by 451 the ESD in the lower 8 bytes. In the case of a singly-homed Site, 452 re-homing to a new attachment to the Public Topology will change ONLY 453 the RG in full GSE addresses for computer systems at that Site. One 454 example of such a re-homing would be a change of the Site's Internet 455 Service Provider. This change-over can be made essentially 457 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 459 completely transparent to users both inside and outside the Site, 460 although it does involve a practical limit on the transition duration 461 relating to how long the departing ISP is willing to extend 462 transitional courtesies. During a changeover, though, all new 463 connections will be initiated via the new ISP connection. 465 This brings up the deep structure of the topology information carried 466 in RG and how it is encoded. More specifically, RG is a hierarchical 467 locator which is a rooted path-expression of flat-routed regions 468 which are tangent. Each element in the path-expression includes only 469 enough detail to negotiate the flat-routed region. 471 It has been observed before that the graph of the Global Internet is 472 not obviously a hierarchy so how can this work? 474 We start with the observation that every connected graph has at least 475 one labeling which forms a spanning tree covering the nodes. The 476 hierarchy is induced by a labeling function which partitions the 477 global graph into regions and recursively into subregions. This 478 function is only globally visible at the top-level where an initial 479 partitioning of the graph is used to form the first level of what 480 will become the hierarchy. Within each partition there is a local 481 sub-partition function which assigns labels, and we proceed 482 recursively. The nested recursions directly induce the hierarchy. 484 This decomposition of the Global Internet produces a recursive graph 485 where each level is composed of a set of subgraphs which are 486 explicitly connected (i.e., explicitly routed between the subgraphs) 487 while the structure within each subgraph is assumed to be flat-routed 488 (at least as seen at that level). 490 From an abstract viewpoint, a hierarchical partitioning can be 491 induced with an arbitrary choice of labeling function (as long as the 492 function produces the minimally-required partitioning). However, we 493 desire the partitions to have several important properties which 494 effects the choice of labeling function. 496 The general goal is to produce a global labeling which represents the 497 topology as compactly as possible, yet allows rich connectivity while 498 bounding the complexity of the discrete regions which are flat- 499 routed. 501 The top level objects in the GSE graph hierarchy are called "Large 502 Structures". These are objects chosen for their ability to naturally 503 represent significant topological aggregation of substructure (not 504 geographical, political, or geometric). The number of Large 505 Structures is explicitly limited to bound the complexity at the top 506 level of the aggregation graph. 508 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 510 Within Large Structures, the (sub-)partition function is a trade-off 511 between the flat-routing complexity within a region and minimizing 512 total depth of the substructure. This is driven by the internal 513 topology of a Large Structure and the choices in different Large 514 Structures will not necessarily be the same. This is why Routing Goop 515 only has one hard bit boundary; Large Structures are free to 516 internally subdivide as they chose. They are only required to 517 encapsulate a significant portion of the Public Topology. 519 One obvious candidate for Large Structures is large networks which 520 already represent considerable aggregation based on existing CIDR 521 deployment. Another good candidate might be "Exchange Points". The 522 GSE model can accommodate both of these simultaneously, allowing 523 IPv6-style "Network-anchored Prefixes" and "Exchange-anchored 524 Prefixes" like that proposed by some to coexist and be subsumed into 525 a unified notion of "Aggregator-anchored Prefixes." Of course, these 526 aren't prefixes strictly in the IPv4 CIDR sense, but the left- 527 anchored substrings of the Routing Goop are intuitively quite 528 similar. 530 Large Structures are assigned a Large Structure Identifier, known as 531 an LSID. The total number of LSIDs is intentionally limited as we 532 assume the paths between Large Structures are only flat-routed. 534 Two consenting Large Structures remain free to share a tangency below 535 the top level and exchange routes so as to provide for improved 536 routing between the two of them (formalizing cut-throughs in the 537 natural hierarchy). The goal is to provide for manageable complexity 538 of the ultimate default-free zone (the top level of the global 539 hierarchy) while allowing for controlled circumvention of the natural 540 hierarchical paths. 542 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 544 Bit-level structure of Routing Goop: 546 0 1 2 3 547 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | xxx | 13 Bits of LSID | Upper 16 bits of Goop | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 552 3 4 5 6 553 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 555 | Bottom 18 bits of Routing Goop | 14 bits of Site Topology | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 NOTE: The Routing Goop structure above assumes that the GSE proposal 559 is designated by a 3-bit type of IPv6 address. If a GSE address is 560 identified by two upper bits, the LSID would expand to 14 bits. If 561 identified by one bit, the LSID would stay at 14 bits and the Upper 562 16 bits of Goop would expand to 17 bits. 564 Routing between two interior points of two different Large Structures 565 is always possible based solely on the LSID. This provides a 566 "forwarding strategy of last resort" for a router running "default- 567 free". From one point of view, the LSID partitions the Global 568 Internet into a set of regions such that an interior router only need 569 carry a "per-LSID default" pointing at an appropriate boundary router 570 which knows how to to handle traffic bound outside the containing 571 Large Structure for a point in the other Large Structure. 573 If two Large Structures share a tangency somewhere below the top 574 level, then some interior routers of both Large Structures will share 575 routes to exploit the tangency for optimizing paths. How this cut- 576 through information is distributed within the two Large Structures is 577 not revealed elsewhere in the global topology. The exact "shape" of 578 the optimization region is controlled by the decisions about which 579 routes to advertise across the cut-through. These decisions are made 580 by the collaborators and the optimized region need not be symmetric 581 with respect to the cut-through. The size of the optimization area 582 is controlled by how far routes learned via the cut-through are 583 propagated within the sub-graphs tangent via the cut-through. Again, 584 this is a matter of engineering choices made by the collaborators 585 operating the cut-through. 587 While the LSID is may appear similar to the Autonomous System Number 588 currently used in IPv4 policy-based routing machinery, the LSID is 589 quite distinct from the AS number and the two identifiers play very 590 different roles. AS Numbers will continue be used for policy routing 591 information exchange and must remain distinct from the LSID space. 593 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 595 9. The "Flow" of Routing Goop 597 It is intuitively useful to think about Routing Goop as "flowing 598 downhill" through the hierarchy from the topmost Large Structures, 599 through the intermediate levels of the Public Topology, and 600 ultimately down to the Site. As the RG propagates downward, the 601 prefix extends to the right, just like in IPv4 CIDR, with each 602 extension navigating the nested flat-routed subgraphs, eventually 603 terminating at the Site, which then descends invisibly into the 604 Private Topology of that Site. 606 The nested flat-routed areas correspond to transit subnetworks of the 607 Large Structure. One very important example of such subnets is the 608 "reseller" or "wholesale transit customer" of a Large Structure. 609 (Note that whether the Large Structure is a network or an exchange 610 point doesn't matter.) The reseller network provides transit for 611 Sites, so must be part of the Public Topology and appears as a 612 substring within the Routing Goop, usually the right-most extension 613 unless the reseller has further reseller customers. In that case, 614 the next level reseller will have his own extension to record his 615 place in the Public Topology and to provide for navigating through it 616 as well. 618 The overall picture can now be drawn as a forest of trees 619 distributing Routing Goop down to the Sites, with each tree being a 620 Large Structure and the Large Structures connected arbitrarily at the 621 top level. This structure will be mirrored by the actual machinery 622 for distributing Routing Goop to the Sites as will be discussed a bit 623 later, but this mental image of the prefixes "flowing" from the 624 anchoring Large Structures is critical to understanding fundamental 625 self-organizing abilities in the GSE model. 627 While the GSE machinery is intended to be adequate for almost 628 completely automated self-organization with respect to the 629 construction and propagation of Routing Goop on an Internet-wide 630 basis, we proceed for now closely following current practice 631 (admitting manual configuration of certain information like Routing 632 Goop) because of the additional complexity of the self-organization 633 functions. Initial deployment following current practice would not 634 preclude eventual deployment of a fully self-organizing Global 635 Internet. 637 10. The Distribution of Routing Goop 639 There are two cases to consider for how Routing Goop gets 640 distributed: source addresses and destination addresses. In both 641 cases RG is part of the address, one way or another, so we show how a 642 full 16-byte address with the right RG gets created in these two 644 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 646 cases. 648 10.1 RG for Source Addresses 650 The initial RG of a source address is almost always the Site-local 651 prefix. If the destination address is not within the Site, the 652 packet will leave the Site via one of possibly several Site Boundary 653 Routers. The egress Site Border Router will insert the correct RG in 654 the source address based on the path the destination should use to 655 return a packet to the sender. Except in unusual circumstances this 656 will be the RG which corresponds to the attachment path of that 657 egress Site Boundary Router to the Global Internet. 659 If the Site is multi-homed via just one Site Boundary Router, then 660 the router is free to apply whatever local policy suits. It simply 661 must fill in a valid RG path which leads back to a Site Boundary 662 Router for that Site. If the Site is multi-homed via more than one 663 Site Boundary Router, which router provides egress is purely local 664 policy and which RG gets applied is likewise local policy. 666 The dynamic insertion of RG upon Site egress accomplishes a number of 667 things. 669 (1) It means that for most purposes, a computer system at a Site need 670 not concern itself with egress policy matters which can be 671 particularly tricky in Multi-homed Sites. 673 (2) It means that computer systems are essentially not impacted at 674 all by topological re-homing of the Site. 676 (3) It means that more complex multi-homing scenarios with multiple 677 Site Boundary Routers each with multiple connections to the Global 678 Internet can execute arbitrarily complex path recovery policy without 679 concern for how it might impact a computer system doing source 680 address selection. 682 (4) It means that while a computer systems might forge the ESD in a 683 source address, it CANNOT forge the point of injection into the 684 Public Topology. This is not strong authentication down to the 685 particular computer system, but it is probably a strong deterrent to 686 certain obnoxious activities due to the dramatically improved 687 traceability. We also note that the first-hop attachment router in 688 the Public Topology is free to insert or override the RG if somehow 689 an errant packet escapes a Site carrying invalid RG, thereby 690 enforcing traceability. Of course, the Public first-hop router could 691 always just drop a packet carrying inappropriate source RG as well. 692 But to make it very clear, we put the burden of inserting correct RG 693 in exiting source addresses squarely and solely on the Site and the 695 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 697 Site Border Router. Any other location of the task has bad 698 performance scaling. 700 The Site Border Router acquires the necessary RG from the first-hop 701 attachment router in the Public Topology. Alternately, as an initial 702 mechanism the RG could be statically configured, but the real goal is 703 completely automated propagation down the tree so that an entire 704 complex subtree can be rehomed without human intervention or service 705 disruption. 707 10.2 RG for Destination Addresses 709 Currently, an IPv6 address lookup for a DNS name returns the 710 information in a "AAAA" record which is the full 16 bytes of the IPv6 711 address. 713 The GSE design proposes synthesizing the 16 bytes of information in a 714 query response from two different sources: an "AAA" record and an 715 "RG" record. The "AAA" record carries the 8-byte ESD + ~2 byte STP 716 for the DNS name in question and the "RG" record carries 6+ bytes of 717 the appropriate Routing Goop. 719 One interesting question is how the AAA record gets paired with an RG 720 record in a given nameserver. One simpleminded implementation would 721 be to pair an RG record with a zone, but that has the problem of 722 requiring all the systems in that zone to use the same Routing Goop 723 and hence be in the same Site. 725 A better scheme is to carry an "RG Name" in the "AAA" record which 726 would allow a nameserver to concatenate an arbitrary RG prefix to the 727 ESD+STP producing the full 16 byte response. The "RG Name" would be 728 a full DNS name which could be recursively translated (and the result 729 cached). Structured as an "upward delegation" with an appropriate 730 Time-to-Live, a Site could import the Routing Goop information from 731 their service provider completely automatically. This capability 732 will be used to great advantage in the discussions of re-homing which 733 follows. [Interactions between RG TTL and zone TTL is an issue to be 734 explored more.] 736 Alternately, one special case for an RG record could be a delegation 737 to a Site Border Router which could supply the correct RG 738 automatically, at least in single-homed cases, and possibly in 739 multi-homed cases. 741 The result of this structure is that individual zone entries for 742 individual nodes (AAA records) do NOT change when a Site rehomes. 743 The only thing which changes (logically) is the RG information which 744 is composed with the node's AAA record to produce a full 16-byte 746 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 748 response. This means the general Dynamic DNS machinery is NOT 749 required to support Site re-homing. 751 One implication of the special Site-local Prefix RG for intra-Site 752 traffic is that Sites will have to provide at least two "faces" on 753 their nameservice - one that returns Site-local as the RG for queries 754 from inside the site, and another that returns full RG responses for 755 requests originating outside the Site. This can be readily 756 accomplished by inspecting the source address - if the source address 757 contains the Site-local Prefix as RG, then return the same. 758 Otherwise, return a fully-general RG-based response (possibly based 759 on egress-path selection policy). 761 10. Re-homing A Site 763 When a Site changes its point of attachment to the Global Internet, 764 it is said to "rehome". One of the significant criticisms of IPv4 765 CIDR and IPv6 "Provider-based Addressing" is the requirement to 766 "renumber" a Site when it rehomes. One of the explicit goals of the 767 GSE architecture is to eliminate, or at least mitigate, the impact of 768 this. 770 It is important to reiterate the notion that the Routing Goop of a 771 GSE address is not just a Locator, but that it encodes a PATH from 772 the top level of the global hierarchy down to the Site. Changing 773 that path is what makes Re-homing and Multi-homing essentially 774 equivalent operations. We proceed with the simple case first. 776 When a Site wishes to rehome, it must establish a new attachment 777 point to the Global Internet, and hence establish a new access path. 778 Then it must start using that new path before the old path is 779 removed. The procedure is as follows: 781 A Site establishes a connection with a new ISP and it becomes able to 782 carry the traffic. At that point, the Site alters the upward 783 delegation of the DNS RG records. Henceforth, all new connections 784 made with the new translations will follow the new path to the Site. 785 The new connection path is then made the preferred egress path and 786 source addresses in packets exiting the Site immediately start being 787 marked with the new return path. The old connection should be 788 maintained for some administratively determined grace period to allow 789 DNS timeouts to transition new sessions to the new path and for 790 long-running sessions to terminate. 792 At first blush, it might appear that when the egress path for the 793 Site switches over to the new path and the Site Border Router starts 794 marking packets with the new RG, the return path for long-running 795 sessions would automatically switch over to the new path. Alas, this 797 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 799 is not so because a long-running session will be using destination 800 address containing the old RG acquired when the session first 801 started. 803 Consideration was given to providing some kind of "path redirect" 804 which would allow the other end to deal with "flying cutovers" of a 805 running session, but the security implications of this mechanism are 806 too far-reaching to consider as part of initial deployment. If at 807 some later point it becomes clear how to accomplish this safely, then 808 it could be added. But the complexity, security risks, and the 809 magnitude of the added value do not seem worthwhile at present 810 (although the author would love to be convinced otherwise). 812 Alternately, the Site could request a "Re-homing Courtesy" from their 813 old ISP which would effectively make it a multi-homed Site for some 814 period of time. After multi-homing was established, the old 815 connection could be taken down and the long-running sessions would 816 continue to survive as long as the Site was multi-homed by way of the 817 Re-homing Courtesy. 819 Note that at no time did the re-homing effect anything internal to 820 the Site's Private Topology. The only change was the attachment to 821 the Public Topology and the Routing Goop which records that 822 attachment location. 824 11. Multi-homing a Site 826 One of the curiosities of IPv4 is that the network does a lot more 827 work for a multi-homed site but it is very hard to pin it down so 828 that the instigator of the effort can compensate the workers. 830 In the GSE model, Multi-homing is an explicit service which is 831 performed for a Site by the agents of the Public Topology which 832 provide the access for the Site. This mechanism can be made more 833 sophisticated, but the notion is most readily explained by 834 considering a Site which is dual homed to two different ISPs and 835 hence has two distinct access paths represented by two distinct blobs 836 of Routing Goop. 838 The Site is attached to each ISP via some link and we postulate some 839 kind of keep-alive protocol which determines when reachability to the 840 Site's border router is lost. The ISP routers serving the dual-homed 841 Site are identified to each other (via static configuration 842 information in the simplest case or a dynamic protocol in the more 843 general case), and when a link to the Site is lost, the ISP router 844 anchoring the dead link simply tunnels any traffic destined for the 845 Site via the other ISP router. 847 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 849 This approach clearly requires coordination between the two serving 850 ISPs. This is not a new constraint - multi-homing already requires 851 considerable coordination between the Site and is providers. Of 852 course, creating a protocol for dynamically creating a "homing group" 853 is probably a very worthwhile investment but it is not absolutely 854 necessary at the outset. 856 It should be obvious now that the "Re-homing Courtesy" in the 857 previous section is simply doing the router-pair coordination with 858 the new ISP for some period of time. 860 [Note: Yakov and Bates are working on a draft for a Site-side 861 implementation of aggregation-efficient multi-homing which may 862 simplify this even further.] 864 12. Re-homing a Reseller 866 Re-homing a Reseller is a slightly more general case of re-homing a 867 Site, primarily characterized by more lead time, a longer grace 868 period, and some necessary coordination with customer Sites to insure 869 that the Routing Goop propagates correctly. 871 The Reseller will establish a new connection which will not only 872 result in a new path for the Reseller's topology, but for that of his 873 customer Sites. When the Reseller alters his upward delegation of 874 Routing Goop, it will ripple downward to his customer Sites by nature 875 of their upward delegations. The downward ripple of Routing Goop via 876 the upward delegations should cause the Site zone TTLs to be reduced 877 appropriately to insure caches expire well within the dual-homed 878 transition grace period for the Reseller. 880 This essentially rehomes all the Reseller's customer Sites all at the 881 same time the Reseller's infrastructure is re-homing and should be 882 completely transparent except for long-lived sessions which do not 883 terminate by the end of the grace period. 885 13. Multi-homing a Reseller 887 There are two parts to multi-homing a Reseller - one part similar to 888 the multi-homed Site case above, and one part which is quite 889 different. 891 For this discussion, assume a Reseller which is dual-homed and hence 892 has two different Routing Goop prefixes (remember that each path to 893 the top level of the hierarchy has a distinct prefix). The reseller 894 can solicit multi-homed tunneling services from his two access point 895 routers to provide alternate path service just like a multi-homed 896 Site. Why traffic is coming to any particular router, though, is 898 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 900 influenced entirely by what routes are advertised out that particular 901 connection via BGP5 (or IDRP). This is rather different from the 902 multi-homed Site case where the ESD is the object of interest and the 903 RG simply gets the traffic to the Site boundary. 905 The question arises, however, as to which prefix gets used for 906 extending downward to his customer Sites. The answer in the simplest 907 case is to pick one and use it, making the Sites "natural" in the 908 chosen prefix. The alternate prefix can, of course, be advertised 909 out the alternate path if desired. But this work can be ascribed to 910 the instigator and the superior attachment points can charge for this 911 service. (This is somewhat akin to charging for routes, but only 912 routes which create a discontinuity in the routing space.) 914 15. A Comment on NAT Boxes 916 In discussions about requiring destination address re-writing for 917 inbound packets, Brian Carpenter remarked that with the advent of 918 symmetric re-writing (both inbound and outbound), the GSE 919 architecture is essentially "NAT that works." To some, this would be 920 the ultimate insult, but I think it is essentially correct. NAT 921 Boxes provide for isolating a Site from topology changes but severely 922 compromise the end-to-end model. GSE affords very similar 923 operational topological isolation but without violating the end-to- 924 end model, at least not nearly as much. If a Site wishes the 925 additional isolation afforded by NAT Boxes, a firewalls will 926 accomplish that task. 928 15. General Comments 930 While some of GSE is a radical departure from IPv6 as we currently 931 know it, in general it relies deeply on all the IPv6 underpinnings 932 which contribute so much to the attractiveness of IPv6: Neighbor 933 Discover, all the dynamic configuration machinery designed to make 934 renumbering palatable even using "provider-based addressing", and the 935 flexibility of the "salami headers" which make tunneling and security 936 attractive. The general forwarding operations based on longest- 937 match-under-prefix-mask and the policy-based routing machinery of 938 BGP5/IDRP are also simply assumed. 940 16. Closing Comments and Acknowledgments 942 This document presents a revision of the "8+8" addressing model which 943 has been under construction by the author since before Fall of 1995, 944 at least. Conversations with a great many people have contributed to 945 the design presented in this document. A skeletal version of this 946 proposal first appeared in some email from Dave Clark of MIT who 947 planted the seed and provided the original monicker "8+8". A great 949 Internet-Draft GSE for IPv6 1997/02/24 01:32:32GMT 951 many others have contributed ideas and observations, all of which 952 went into the stew pot for the synthesis contained here. 954 The original "8+8" draft cited the following individuals for a 955 special thank-you: Vadim Antonov, Ran Atkinson, Scott Bradner, Brian 956 Carpenter, Noel Chiappa, Steve Deering, Sean Doran, Joel Halpern, 957 Christian Huitema, Tony Li, Peter Lothberg, Louis Mamakos, Radia 958 Perlman, Yakov Rekhter, Paul Traina. 960 This draft has benefited greatly from conversations with Masataka 961 Ohta, who convinced the author of the importance of the IETF-NodeID 962 in addition to the 8-byte IEEE MAC addresses, as well as Brian 963 Carpenter, Scott Brander, Ran Atkinson, all the people who so 964 graciously provided invaluable comments on the original "8+8" draft, 965 and of course Steve Deering, Bob Hinden, and the IPng Working Group. 967 17. Security Considerations 969 More than can be imagined. 971 18. Author's Address 973 Mike O'Dell 974 UUNET Technologies, Inc. 975 3060 Williams Drive 976 Fairfax, VA 22031 977 voice: 703-206-5890 978 fax: 703-206-5471 979 email: mo@uu.net