idnits 2.17.1 draft-ietf-ipngwg-esd-analysis-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 52 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 259 instances of too long lines in the document, the longest one being 3 characters in excess of 72. ** The abstract seems to contain references ([GSE]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 7 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1746 has weird spacing: '...t would addre...' == Line 2154 has weird spacing: '...gh each egres...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 2073' is mentioned on line 640, but not defined ** Obsolete undefined reference: RFC 2073 (Obsoleted by RFC 2374) == Missing Reference: 'ESD' is mentioned on line 730, but not defined == Missing Reference: 'RFC 2267' is mentioned on line 1351, but not defined ** Obsolete undefined reference: RFC 2267 (Obsoleted by RFC 2827) == Unused Reference: 'ANYCAST' is defined on line 1872, but no explicit reference was found in the text == Unused Reference: 'RFC1884' is defined on line 1929, but no explicit reference was found in the text == Unused Reference: 'RFC2267' is defined on line 1946, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1546 (ref. 'ANYCAST') ** Downref: Normative reference to an Informational RFC: RFC 2260 (ref. 'BATES') -- Possible downref: Non-RFC (?) normative reference: ref. 'Bellovin 89' ** Obsolete normative reference: RFC 1519 (ref. 'CIDR') (Obsoleted by RFC 4632) -- Possible downref: Non-RFC (?) normative reference: ref. 'DHCP-DDNS' -- Possible downref: Non-RFC (?) normative reference: ref. 'EUI64' -- Possible downref: Non-RFC (?) normative reference: ref. 'GSE' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE802' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE1212' ** Obsolete normative reference: RFC 2374 (ref. 'IPv6-ADDRESS') (Obsoleted by RFC 3587) ** Obsolete normative reference: RFC 2002 (ref. 'MOBILITY') (Obsoleted by RFC 3220) ** Downref: Normative reference to an Informational RFC: RFC 2663 (ref. 'NAT') ** Obsolete normative reference: RFC 1788 (Obsoleted by RFC 6918) ** Obsolete normative reference: RFC 1884 (Obsoleted by RFC 2373) ** Downref: Normative reference to an Informational RFC: RFC 1958 ** Obsolete normative reference: RFC 1971 (Obsoleted by RFC 2462) ** Obsolete normative reference: RFC 2073 (Obsoleted by RFC 2374) ** Obsolete normative reference: RFC 2267 (Obsoleted by RFC 2827) ** Obsolete normative reference: RFC 2401 (Obsoleted by RFC 4301) -- Duplicate reference: RFC2267, mentioned in 'RFC2409', was also mentioned in 'RFC2267'. ** Obsolete normative reference: RFC 2267 (ref. 'RFC2409') (Obsoleted by RFC 2827) == Outdated reference: A later version (-10) exists of draft-ietf-ipngwg-router-renum-06 == Outdated reference: A later version (-05) exists of draft-ietf-ipngwg-site-prefixes-03 -- Possible downref: Normative reference to a draft: ref. 'SITE-PREFIXES' Summary: 23 errors (**), 0 flaws (~~), 14 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Matt Crawford 2 Fermilab 3 Allison Mankin 4 ISI 5 Thomas Narten 6 IBM 7 John W. Stewart, III 8 Juniper 9 Lixia Zhang 10 UCLA 11 October, 1999 13 Separating Identifiers and Locators in Addresses: | 14 An Analysis of the GSE Proposal for IPv6 16 | 18 Status of this Memo | 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026 except that the right to 22 produce derivative works is not granted. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet- Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Abstract 42 On February 27-28, 1997, the IPng Working Group held an interim 43 meeting in Palo Alto, California to consider adopting Mike O'Dell's 44 "GSE - An Alternate Addressing Architecture for IPv6" proposal [GSE]. 45 In GSE, 16-byte IPv6 addresses are split into distinct portions for 46 global routing, local routing and end-point identification. GSE 47 includes the feature of configuring a node internal to a site with 48 only the local routing and end-point identification portions of the 49 address, thus hiding the full address from the node. When such a 50 node generates a packet, only the low-order bytes of the source 51 address are specified; the high-order bytes of the address are filled 52 in by a border router when the packet leaves the site. 54 It has often been said that IPv4 "got it wrong" by treating its | 55 addresses simultaneously as locators and identifiers. However, there | 56 has never beeeen a detailed and comprehensive proposal for a | 57 scaleable network protocol which separated the functions. As a | 58 result, it wasn't possible to do a serious analysis comparing and | 59 contrasting a "separated" architecture and an "overloaded" | 60 architecture. The GSE proposal serves as a vehicle for just such an | 61 analysis, and that is the purpose of this paper. 63 We conclude that an architecture that clearly separates locators and 64 identifiers in addresses introduces new issues and problems that do 65 not have an easy or clear solution. Indeed, the alleged 66 disadvantages of overloading addresses turn out to provide some 67 significant benefits over the non-overloaded approach. 69 Contents 71 Status of this Memo.......................................... 1 | 73 1. Introduction............................................. 3 | 75 2. Definitions and Terminology.............................. 4 | 77 3. Addressing and Routing in IPv4........................... 5 | 79 4. The GSE Proposal......................................... 14 | 81 5. Analysis: The Pros and Cons of Overloading Addresses..... 21 | 83 6. Conclusion............................................... 39 | 85 7. Security Considerations.................................. 40 | 87 8. Acknowledgments.......................................... 40 | 89 9. References............................................... 41 | 91 10. Authors' Addresses...................................... 43 | 93 Appendix A: Increased Reliance on Domain Name System (DNS)... 43 | 94 Appendix B: Additional Issues Related to Specifically to GSE. 47 | 96 Appendix C: Ideas Incorporated Into IPv6..................... 48 | 98 Appendix D: Reverse Mapping of Complete GSE Addresses........ 49 | 100 1. Introduction 102 In October of 1996, Mike O'Dell published an Internet-Draft (dubbed 103 "8+8") that proposed significant changes to the IPv6 unicast 104 addressing architecture. The 8+8 proposal was the topic of 105 considerable discussion at the December 1996 IETF meeting in San 106 Jose. Because the proposal offered both potential benefits (e.g., 107 enhanced routing scalability) and risks (e.g., changes to the basic 108 IPv6 architecture), the IPng Working Group held an interim meeting on 109 February 27-28, 1997 to consider adopting the 8+8 proposal. 111 Shortly before the interim meeting, an updated version of the 112 Internet-Draft was produced. This version changed the name of the 113 proposal from "8+8" to "GSE" to identify the three separate 114 components of a unicast address: Global, Site and End-System 115 Designator. 117 The well-attended meeting generated high caliber, focused technical 118 discussions on the issues involved, with participation by almost all 119 of the attendees. By the middle of the second day there was 120 unanimous agreement that the GSE proposal as written presented too 121 many risks and should not be adopted as the basis for IPv6. The 122 proposal did, however, challenge the group to make several 123 improvements to the then existing IPv6 specifications (including 124 increasing the aggregatability of addresses, having hard boundaries 125 between routing and non-routing parts of the address, and easing the 126 DNS aspects of renumbering). 128 This document focuses primarily on the issue of separating unicast 129 addresses into distinct portions for identification and location 130 purposes, a separation that IPv4 does not make but that is 131 fundamental to GSE. We start with a discussion of the current 132 architecture of IPv4 addressing and its impact on route scalability, 133 identification, multi-homing, etc. Next, the details of the GSE 134 proposal are described. Finally, the fundamental issue of 135 decomposing addresses into multiple separate functional parts is 136 analyzed in the context of the GSE proposal. Here we detail some of 137 the practical reasons why separating addresses into locators and 138 identifier poses a number of new challenges, making it clear that 139 having such a separation is no panacea. An appendix contains a 140 summary of the IPng Working Group's deliberations of GSE and the 141 results on IPv6 addressing. 143 Finally, this document's focus on unicast issues should not be 144 interpreted to mean that the impact of separating identifier and 145 locating functions on non-unicast aspects of routing and addressing 146 are well understood or trivial to deal with. Specifically, 147 understanding how multicasting and anycast addressing [ANYCAST, 148 RFC1884] fits into such a model requires further work. 150 2. Definitions and Terminology 152 The following terminology is used throughout this document. 154 Routing Goop --- A term defined by the GSE document. It refers to 155 the first six bytes of a sixteen byte IPv6 GSE 156 address. The Routing Goop portion of an address 157 identifies where a site connects to the public 158 Internet. More generally, the term refers to the 159 portion of an address's routing prefix that 160 identifies where on the public Internet the site 161 housing the address resides. 163 Site Topology Partition --- A term defined by the GSE document 164 that refers to the two bytes of a sixteen byte IPv6 165 GSE address immediately to the right of the Routing 166 Goop. The Site Topology Partition part of an 167 address identifies which link within a site an 168 address resides on. 170 Routing Stuff --- The part of an address that identifies which 171 link the address resides on. Within the context of 172 GSE, the Routing Stuff comprises the Routing Goop 173 and Site Topology Partition parts of an address 174 (i.e., the left mots eight bytes). 176 identifier --- a value that indicates the sender of a packet, or 177 the intended recipient of a packet. Within the 178 context of GSE, the ESD portion (i.e., the rightmost 179 eight bytes) of the address is an identifier. 181 locator --- a field in a packet header that is used by the routing 182 subsystem to deliver a packet to the link on which a 183 destination resides. The terms locator and Routing 184 Stuff are similar, we use Routing Stuff when 185 referring to the specific locator in GSE. 187 3. Addressing and Routing in IPv4 189 Before dealing with details of GSE, we present some background about 190 how routing and addressing works in "classical IP" (i.e., IPv4). We 191 present this background because the GSE proposal proposes a fairly 192 major change to the base model. In order to properly evaluate GSE, 193 one must understand what problems in IPv4 it alleges to improve or 194 fix. 196 The structure and semantics of a network layer protocol's addresses 197 are absolutely core to that protocol. Addressing substantially 198 impacts the way packets are routed, the ability of a protocol to 199 scale and the kinds of functionality higher layer protocols can count 200 on. Indeed, addressing is intertwined with both routing and 201 transport layer issues; a change in any one of these can impact 202 another. Issues of administration and operation (e.g., address 203 allocation/re-allocation and required renumbering), while not part of 204 the pure exercise of engineering a network layer protocol, turn out 205 to be critical to the scalability of that protocol in a global and 206 commercial network. The interaction between addressing, routing and 207 especially aggregation is particularly relevant to this document, so 208 some time will be spent describing it. 210 Addresses in IPv4 serve two purposes: 212 1) Unique identification of an interface. A sending host tells the 213 network the identity of the intended recipient by placing an IP 214 address into the destination address field. In addition, the 215 receiving host checks the destination address field of received 216 packets to ensure that the packet is, in fact, for it. 218 2) Location information of that interface. Routers use the 219 packet's destination address in deciding where to forward the 220 packet to get it closer to its ultimate destination. That is, 221 addresses identify "where" the intended recipient is located 222 within the Internet topology. 224 For scalability, the location information contained in addresses 225 must be aggregatable. In practice, this means that nodes 226 topologically close to each other (e.g., connected to the same 227 link, residing at the same site, or customers of the same ISP) 228 must use addresses that share a common prefix. 230 What is important to note is that these identification and location 231 requirements have been met through the use of the same value, namely 232 the IP address. As will be noted repeatedly in this document, the 233 "overloading" of IPv4 addresses with multiple semantics has some 234 undesirable implications. For example, the embedding of IPv4 235 addresses within transport protocol addresses that identify the end- 236 point of a connection couples those transport protocols with routing 237 to a degree. This entanglement is inconsistent with a (too) strictly 238 layered model in which routing would be a completely independent 239 function of the network layer and not directly impact the transport 240 layer. 242 Combining locator and identifier functions also complicates the 243 support for mobility. In a mobile environment, the location of an 244 end-station may change even though its identity stays the same; 245 ideally, transport connections should be able to survive such 246 changes. In IPv4, however, one cannot change the locator without 247 also changing the identifier since the same packet field is used for 248 both. 250 Consequently, there has been a train of thought for some time that 251 having separate values for location and identification could be of 252 significant benefit. The GSE proposal, among other things, attempts 253 to make such a separation. 255 This document frequently uses mobility as an example to demonstrate 256 the pros and cons of separating the identifier from the locator. 257 However, the reader should note the fundamental equivalence between 258 the problems faced by mobile hosts and the problem faced by sites 259 that change providers yet don't want to renumber their network. When 260 a site changes providers, it moves topologically in much the same way 261 a mobile node does when it moves from one place to another. 262 Consequently, techniques that help or hinder mobility are often 263 relevant to the issue of site renumbering. 265 3.1. The Need for Aggregation 267 IPv4 has seen a number of different addressing schemes. Since the 268 original specification, the two major additions have been subnetting 269 and classless routing. The motivation for adding subnetting was to 270 allow a collection of networks located at one site to be viewed from 271 afar as a single IP network (i.e., to aggregate all of the individual 272 networks into a single bigger network). The practical benefit of 273 subnetting was that all of a site's hosts, even if scattered among 274 tens or hundreds of LANs, could be represented by a single routing 275 table entry in routers located far from the site. In contrast, prior 276 to subnetting, a site with ten LANs would advertise ten separate 277 network entries, and all routers would have to maintain ten separate 278 entries, even though they contained essentially redundant 279 information. 281 The benefits of aggregation should be clear. The amount of work 282 involved in constructing forwarding tables (i.e., selecting best 283 routes and installing them into the switching subsystem) is dependent 284 in part on the number of network routes (i.e., destinations) to which 285 best paths are computed. If each site has 10 internal networks, and 286 each of those networks is individually advertised to the global 287 routing system, the complexity of computing forwarding tables can 288 easily be an order of magnitude greater than if each site advertised 289 a single entry that covered all of the addresses used within the 290 site. 292 3.2. The Pre-CIDR Internet 294 In the early days of the Internet, its topology and addressing were 295 orthogonal. Specifically, when a site wanted to connect to the 296 Internet, it approached the central Internet Assigned Numbers 297 Authority (IANA) to obtain an address block and then approached a 298 provider about procuring connectivity. This procedure for address 299 allocation resulted in a system where the addresses used by customers 300 of the same provider bore little relation to the addresses used by 301 other customers of that same provider. In other words, though the 302 actual topology of the Internet was mostly hierarchical, the 303 addressing was not. An example of such a topology and addressing 304 scheme is shown in Figure 1. 306 +----------------+ 307 | |------- Customer1 (192.2.2.0) 308 | |------- Customer2 (128.128.0.0) 309 | Provider A |------- Customer3 (18.0.0.0) 310 | |------- Customer4 (193.3.3.0) 311 | |------- Customer5 (194.4.4.0) 312 +----------------+ 313 | 314 | 315 | 316 | 317 +----------------+ 318 | Provider B | 319 +----------------+ 321 Figure 1 323 Figure 1 shows Provider A having 5 customers, each with their own 324 independently obtained network address. Providers A and B connect to 325 each other. In order for Provider B to be able to send traffic to 326 Customers1-5, Provider A must announce a separate route to Provider B 327 for each of the 5 networks. That is, the routers within Provider B 328 must have explicit routing entries for each of Provider A's customers 329 -- 5 separate routes. 331 Experience has shown that this approach scales very poorly. In the 332 Default-Free Zone (DFZ) of the Public Internet, where routers must 333 maintain routing entries for all reachable destinations, the cost of 334 computing forwarding tables quickly becomes unacceptably large. A 335 large part of the cost is related to the seemingly redundant 336 computations that must be made for each individual network, even 337 though many of them reside in the same topological location (e.g., 338 under the same provider). Looking at Figure 1, the problem is that 339 provider B performs 5 separate calculations to construct the 340 forwarding table needed to reach each of A's customers, even though 341 it is going to take the same path for all of them; in other words, 342 there is an opportunity to do data abstraction. 344 Figure 1 shows network numbers using the older "classful" notation. 345 Since 1981, the first few bits of an address syntactically identified 346 which parts of an address identified the "network" and "local" 347 portions of an address. There were a small number of Class A 348 addresses (intended for very large sites), a medium number of Class B 349 addresses (for medium-sized sites) and a very large number of Class C 350 addresses (for very small sites). In practice, the actual size of 351 real networks didn't match the original allocation of Class A, B, and 352 C addresses. Class B addresses were bigger than most sites needed 353 (and there weren't enough of them), and Class C addresses were too 354 small (i.e., typical sites would need to get 10 or more C blocks to 355 cover all of the hosts on their networks). Consequently, classless 356 addressing was developed [CIDR], which made the boundaries between 357 the network and local parts of an address more flexible. With 358 classless addressing, a separate prefix-length (i.e., network mask) 359 specifies how many of the left-most bits of an address identify the 360 network part of the address. 362 3.3. CIDR and Provider-Based Addressing 364 One of the reasons CIDR (Classless Inter-Domain Routing) and its 365 associated provider-assigned address allocation policy were 366 introduced was to help reduce the cost of computing a routing table 367 and the size of the forwarding table computed from the routing table. 368 To achieve this goal CIDR aggressively aggregates network addresses. 369 Aggregating network addresses means "merging" multiple addresses into 370 a single "bigger" one, that is to use a common prefix to provide 371 location information for all addresses sharing that same prefix. 373 With CIDR, sites that want to connect to the Internet approach a 374 provider to procure both connectivity and a network address. 376 Individual providers have a block of address space covered by one 377 prefix and assign pieces of that space to customers. Consequently, 378 customers of the same provider have addresses that share the same 379 prefix. The combination of CIDR and provider-based addressing 380 results in the ability of a provider to address many hundreds of 381 sites while introducing just one network address into the global 382 routing system. An example of such a topology and addressing scheme 383 is shown in Figure 2. 385 +----------------+ 386 | |------- Customer1 (204.1.0.0/19) 387 | |------- Customer2 (204.1.32.0/23) 388 | Provider A |------- Customer3 (204.1.34.0/24) 389 | |------- Customer4 (204.1.35.0/24) 390 | |------- Customer5 (204.1.36.0/23) 391 +----------------+ 392 | 393 | A announces 394 | 204.1/16 to B 395 | 396 +----------------+ 397 | Provider B | 398 +----------------+ 400 Figure 2 402 In Figure 2, Provider A has been assigned the classless block, or 403 "aggregate", 204.1.0.0/16 (i.e., a prefix with the high-order 16 bits 404 denoting a single network). Provider A has 5 customers, each of 405 which has been assigned a prefix subordinate to the aggregate. In 406 order for Provider B to be able to reach Customers1-5, Provider A 407 only needs to announce the single prefix 204.1.0.0/16, and Provider 408 B's routers need only a single routing table entry to reach all of 409 Provider A's customers. Note the important difference between the 410 cases described in Figures 1 and 2; the latter example uses fewer 411 entries in the routing table to reach the same number of 412 destinations. 414 CIDR was a critical step for the Internet: in the early 1990s the 415 size of default-free routing tables required to support the classful 416 Internet was almost more than the commercially-available hardware and 417 software of the day could handle. The introduction of BGP4's 418 classless routing and provider-based address allocation policies 419 resulted in a significant decrease in the growth rate of the routing 420 tables. At the same time, however, CIDR introduced some new 421 weaknesses. First, the Internet addressing model had to shift from 422 one of "address owning" to "address lending" [RFC2008]. In pre-CIDR 423 days sites acquired addresses from a central authority independent of 424 their provider, and a site could assume it "owned" the address block 425 it was given. Owning addresses meant that once one had been given a 426 set of network addresses, one could always use them; no matter where 427 one's site connected to the Internet, the prefix for that network 428 could be injected into the public routing system. Today, however, it 429 is simply not possible for all individual sites to have their own 430 prefixes injected into the DFZ; there would be too many of them. 431 Consequently, if a site decides to change providers, it needs to 432 renumber all of its nodes using address space given to it by the new 433 provider. The "old" addresses it had used are returned back to its 434 previous provider. To understand this, consider if, from Figure 2, 435 Customer3 changes its provider from Provider A to Provider C, but 436 does not renumber. The picture would be as follows: 438 +----------------+ 439 | |---- Customer1 (204.1.0.0/19) 440 | |---- Customer2 (204.1.32.0/23) 441 | Provider A | 442 +---------------| |---- Customer4 (204.1.35.0/24) 443 | A announces | |---- Customer5 (204.1.36.0/23) 444 | 204.1/16 to B +----------------+ 445 | | 446 | | 447 | | 448 +----------------+ | 449 | Provider B | | 450 +----------------+ | 451 | | 452 | | 453 | | 454 | C announces | 455 | 204.1.34/24 | 456 | to B +----------------+ 457 +---------------| Provider C |---- Customer3 (204.1.34.0/24) 458 +----------------+ 460 Figure 3 462 In Figure 3, Providers A, B and C are all directly connected to each 463 other. In order for Provider B to reach Customers 1, 2, 4 and 5, 464 Provider A still only announces the 204.1.0.0/16 aggregate. However, 465 in order for Provider B to reach Customer3, Provider C must announce 466 the prefix 204.1.34.0/24. Prefix 204.1.34.0/24 is called a "more- 467 specific" of 204.1.0.0/16; another term used is that Customer3 and 468 Provider C have "punched a hole" in Provider A's address block. From 469 Provider B's view, the address space underneath 204.1.0.0/16 is no 470 longer cleanly aggregated into a single prefix and instead the 471 aggregation has been broken because the addressing is inconsistent 472 with the topology; in order to maintain reachability to Customer1-5, 473 Provider B must carry two prefixes where it used to have to carry 474 only one. 476 The example in Figure 3 explains why sites must renumber if existing 477 levels of aggregation are to be maintained. While a small number of 478 new exceptions could be tolerated, and certain prefixes have been 479 grandfathered, the reality in today's Internet is that there are 480 thousands of providers, many with thousands of individual customers. 481 It is generally accepted that renumbering of sites is essential for 482 maintaining sufficient aggregation. 484 The empirical cost of renumbering a site in order to maintain 485 aggregation has been the subject of much discussion. The practical 486 reality, however, is that forcing all sites to renumber is difficult 487 given the size and wealth of companies that now depend on the 488 Internet for running their business. Thus, although the technical 489 community came to consensus that, with the current practice of 490 provider-based addressing, address lending was necessary in order for 491 the Internet to continue to operate and grow, the reality has been 492 that some of CIDR's benefits have been lost because not all sites 493 renumber. It is worth noting that a number of providers today do 494 route filtering based, in part, on prefix length; as a result, a site 495 which does not renumber may have only partial connectivity to the 496 Internet. That is, a site may advertise a long prefix into the 497 routing system, but there is no assurance that all parts of the 498 Internet will accept the route; some simply ignore it. 500 One unfortunate characteristic of CIDR at an architectural level is 501 that the pieces of the infrastructure that benefit from the 502 aggregation (i.e., the providers which make up the DFZ) are not the 503 pieces that incur the renumbering cost (i.e., the end site). The 504 logical corollary of this statement is that the pieces of the 505 infrastructure that do incur cost to achieve aggregation (e.g., sites 506 which renumber when they change providers) don't directly see the 507 benefit. (The word "directly" is used here because the continued 508 operation of the Internet is a benefit, though it requires 509 selflessness on the part of the site to recognize.) This can lead to 510 a "tragedy of the commons" situation, where everyone agrees that some 511 sites should renumber, but they themselves want to be one of those 512 that do not. 514 3.4. Multi-Homed Sites and Aggregation 516 As sites become more dependent on the Internet, they have begun to 517 install additional connections to the Internet to improve robustness 518 and performance. Such sites are called "multi-homed". 519 Unfortunately, when a site connects to the Internet at multiple 520 places, the impact on routing can be much like a site that switches 521 providers but refuses to renumber. 523 In the pre-CIDR days, multi-homed sites were typically known by only 524 one network prefix, the prefix of their own address block. When that 525 site's providers announced the site's network into the global routing 526 system, a "shortest path" type of routing would occur so that pieces 527 of the Internet closest to the first provider might use the first 528 provider while other pieces of the Internet would use the second 529 provider. This allowed sites to use the routing system itself to 530 load balance traffic across their multiple connections. This type of 531 multi-homing assumes that a site's prefix can be propagated 532 throughout the DFZ, an assumption that is no longer universally true. 534 With CIDR, issues of addressing and aggregation complicate matters 535 significantly. At the highest level, there are three possible ways 536 to deal with multi-homed sites. The first possibility is to stay 537 with pre-CIDR approach, allowing each multi-homed site to receive its 538 address block directly from a registry, independent of its providers. 539 The problem with this approach is that, because the address block is 540 obtained independent of either provider, it is not aggregatable and 541 therefore has a negative impact on the scaling of global routing. 543 The second approach is for a multi-homed site to receive an 544 allocation from one of its providers and just use that single prefix. 545 The site would advertise its prefix to all of the providers to which 546 it connects. There are two problems with this approach. First, 547 although the prefix is aggregatable by the provider which made the 548 allocation, it is not aggregatable by the other providers. To the 549 other providers, the site's prefix poses the same problem that a 550 provider-independent address would. Second, due to CIDR's rule for 551 longest-match routing, it turns out that the site's prefix is not 552 always aggregatable in practice even by the provider that made the 553 allocation, if you want shortest-path routing load-spreading. 554 Consider Figure 4. Provider C has two paths for reaching Customer1. 555 Provider A advertises 204.1/16, an aggregate which includes 556 Customer1. But Provider C will also receive an advertisement for 557 prefix 204.1.0/19 from Provider B, and because the prefix match 558 through B is longer, C will choose that path. In order for Provider 559 C to be able to choose between the two paths, Provider A would also 560 have to advertise the longer prefix for 204.1.0/19 in addition to the 561 shorter 204.1/16. At this point, from the routing perspective, the 562 situation is very similar to the general problem posed by the use of 563 provider-independent addresses. 565 It should be noted that the above example simplifies a very complex 566 issue. For example, consider the example in Figure 4 again. 567 Provider A could choose not to propagate a route entry for the longer 568 204.1.0/19 prefix, advertising only the shorter 204.1/16. In such 569 cases, provider C would always select Provider B. Internally, 570 Provider A would continue to route traffic from its other customers 571 to Customer1 directly. If Provider A had a large enough customer 572 base, effective load sharing might be achieved. 574 A advertises | 575 +------------+ 204.1/16 to C +------------+ | 576 ___| Provider A |-----------------| Provider C | | 577 / +------------+ +------------+ | 578 / +----------/ | 579 / / | 580 Customer1 --- / B advertises 204.1.0/19 to C | 581 204.1.0.0/19 | / | 582 | +------------+ | 583 ----- | Provider B | | 584 +------------+ | 586 Figure 4 | 588 The third approach is for a multi-homed site to receive an allocation * 589 from each of its providers and not advertise the prefix obtained from 590 one provider to any of its other providers. This approach has 591 advantages from the perspective of route scaling because both 592 allocations are aggregatable. Unfortunately, the approach doesn't 593 necessarily meet the demands of the multi-homed site. A site that 594 has a prefix from each of its providers faces a number of choices 595 about how to use that address space. Possibilities include: 597 1) The site can number a distinct set of hosts out of each of the 598 prefixes. Consider a configuration where a site is connected to 599 ISP-A and ISP-B. If the link to ISP-A goes down, then unless 600 the ISP-A prefix is announced to ISP-B (which breaks 601 aggregation), the hosts numbered out of the ISP-A prefix would 602 be unreachable. 604 2) The site could assign each host multiple addresses (i.e., one 605 address for each ISP connection). There are two problems with 606 this. First, it accelerates the consumption of the address 607 space. While this may be a problem for the (limited) IPv4 608 address space, it is not a significant issue in IPv6. Second, 609 when the connection to ISP-A goes down, addresses numbered out 610 of ISP-A's space become unreachable. Remote peers would have to 611 have sufficient intelligence to use the second address. For 612 example, when initiating a connection to a host, the DNS would 613 return multiple candidate addresses. Clients would need to try 614 them all before concluding that a destination is unreachable 615 (something not all network applications currently do). In 616 addition, a site's hosts would need a significant amount of 617 intelligence for choosing the source addresses they use. A host 618 shouldn't choose a source address corresponding to a link that 619 is down. At present, hosts do not have such sophistication. 621 In summary, how best to support multi-homing with IPv4/CIDR faces a 622 delicate balance between the scalability of routing versus the site's 623 requirements of robustness and load-sharing. At this point in time, 624 no solution has been discovered that satisfies the competing 625 requirements of route scaling and robustness/performance. It is 626 worth noting, however, that some people are beginning to study the 627 issue more closely and propose novel ideas [BATES]. 629 4. The GSE Proposal 631 This section provides a description of GSE with the intent of making 632 this document stand-alone with respect to the GSE "specification". 633 We begin by reviewing the motivation for GSE. Next we review the 634 salient technical details, and we conclude by listing the explicit 635 non-goals of the GSE proposal. 637 4.1. Motivation For GSE 639 The primary motivation for GSE was the concern that the chief initial 640 IPv6 global unicast address structure, provider-based [RFC 2073], was 641 fundamentally the same as IPv4 with CIDR and provider-based 642 aggregation. Provider-based addressing requires that sites renumber 643 when they switch providers, so that sites are always aggregated 644 within their provider's prefix. In practice, the cost of renumbering 645 (which can only grow as a site grows in size and becomes more 646 dependent on the Internet for day-to-day business) is high enough 647 that an increasing number of sites refuse to renumber when they 648 change providers. This cost is particularly relevant in cases where 649 end-users are asked to renumber because an upstream provider has 650 changed its transit provider (i.e., the end site is asked to renumber 651 for reasons outside of its control and for which it sees no direct 652 benefit). Consequently, the GSE draft asserts that IPv4 with CIDR 653 has not achieved the aggressive aggregation required for the route 654 computation functions of the DFZ of the Internet to scale for IPv4 655 and that the much larger address space of IPv6 simply exacerbates the 656 problem. 658 The GSE proposal does not propose to eliminate the need for 659 renumbering. Indeed, it asserts that end sites will have to renumber 660 more frequently in order to continue scaling the Internet. However, 661 GSE proposes to make the cost of renumbering small enough that sites 662 can be renumbered at essentially any time with little or no 663 disruption to its network connectivity, and in particular with no 664 impact on communications that are strictly within the site. 666 Finally, GSE attempts to address the problem of sites that have 667 multiple Internet connections. In CIDR, the pressure for better 668 multi-homing support can create exceptions to route aggregation and 669 result in poor scaling. That is, the public routing infrastructure 670 may have to carry multiple distinct routes for some demanding multi- 671 homed sites, one for each independent path. GSE recognizes the 672 "special work done by the global Internet infrastructure on behalf of 673 multi-homed sites" [GSE], and proposes a way for multi-homed sites to 674 gain certain benefit without impacting global scaling. This includes 675 a specific mechanism that providers can use to support multi-homed 676 sites, presumably at a cost that the site would consider when 677 deciding whether or not to become multi-homed. 679 4.2. GSE Address Format 681 The key departure of GSE from classical IP addressing (both v4 and 682 v6) was that rather than over-loading addresses with both locator and 683 identifier functions, it splits the address into two elements: the 684 high-order 8 bytes used for routing purposes (called "Routing Stuff" 685 throughout the rest of this document) and the low-order 8 bytes for 686 unique identification of an end-point. The structure of GSE 687 addresses is: 689 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 690 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 691 | Routing Goop | STP| End System Designator | 692 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 693 6+ bytes ~2 bytes 8 bytes 695 Figure 5 697 4.2.1. Routing Stuff (RG and STP) 699 The Routing Goop (RG) identifies where within the public Internet 700 topology a site connects and is used to route datagrams to the site. 701 RG is structured as follows: 703 1 2 3 704 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 | xxx | 13 Bits of LSID | Upper 16 bits of Goop | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 3 4 710 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | Bottom 18 bits of Routing Goop | 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 715 Figure 6 717 The RG describes the location of a site's connection by identifying 718 smaller and smaller regions of topology until finally it identifies 719 the link which connects the site. Before interpreting the bits in 720 the RG, it is important to understand that routing with GSE depends 721 on decomposing the Internet's topology into a specific graph. At the 722 highest level, the topology is broken into Large Structures (LSs). 723 An LS is a region that can aggregate significant amounts of topology. 724 Examples of potential LSs are large providers and exchange points. 725 Within an LS the topology is further divided into another graph of 726 structures, with each LS dividing itself however it sees fit. This 727 division of the topology into smaller and smaller structures can 728 recurse for a number of levels, where the trade-off is "between the 729 flat-routing complexity within a region and minimizing total depth of 730 the substructure" [ESD]. 732 Having described the decomposition process, we now examine the bits 733 in the RG. After the 3-bit prefix identifying the address as having 734 a GSE format, the next 13 bits identify the LS. By limiting the 735 field to 13 bits, a ceiling is defined on the complexity of the top- 736 most routing level (i.e., what we currently call the DFZ). In the 737 next 34 bits, a series of subordinate structure(s) are identified 738 until finally the leaf subordinate structure is identified, at which 739 point the remaining bits identify the individual link within that 740 leaf structure. 742 The remaining 14 bits of the Routing Stuff (i.e., the low-order 14 743 bits of the high-order 8 bytes) comprise the STP and are used for 744 routing structure within a site, similar to subnetting with IPv4. 746 These bits are not part of the Routing Goop per se. The distinction 747 between Routing Stuff and Routing Goop is that RG controls routing in 748 the Public Internet, while Routing Stuff includes the RG plus the 749 Site Topology Partition (STP). The STP is used for routing structure 750 within a site. 752 The GSE proposal formalized the ideas of sites and of public versus 753 private topology. In the first case, a site is a set of hosts, 754 routers and media under the same administrative control which have 755 zero or more connections to the Internet. A site can have an 756 arbitrarily complicated topology, but all of that complexity is 757 hidden from everyone outside of the site. A site only carries 758 packets which originated from, or are destined to, that site; in 759 other words, a site cannot be a transit network. A site is private 760 topology, while the transit networks form the public topology. 762 A datagram is routed through public topology using just the RG, but 763 within the destination site, routing is based on the Site Topology 764 Partition (STP). 766 4.2.2. End-System Designator 768 The End-System Designator (ESD) is an unstructured 8-byte field that 769 uniquely identifies an interface from all others. The most important 770 feature of the ESD is that it alone identifies an interface; the 771 Routing Stuff portion of an address, although used to help deliver a 772 packet to its destination, is not used to identify an end point. 773 End-points of communication care about the ESD; as examples, TCP 774 peers could be identified by the source and destination ESDs alone 775 (together with port numbers), checksums would exclude the RG (the 776 sender doesn't even know its RG, as described later) and on receipt 777 of a packet only the ESD would be used in testing whether the packet 778 is intended for local delivery. 780 The leading contender for the role of a 64-bit globally unique ESD is 781 the recently defined "EUI-64" identifier [EUI64]. These identifiers 782 consist of a 24-bit "company_id" concatenated with a 40-bit 783 "extension". (Company_id is a new name for the "Organizationally 784 Unique Identifier" that forms the first half of an 802 MAC address). 785 Manufacturers are expected to assign locally unique values to the 786 extension field, guaranteeing global uniqueness for the complete 64- 787 bit identifier. A range of the EUI-64 space is reserved to cover 788 pre-existing 48-bit MAC addresses, and a defined mapping insures that 789 an ESD derived from a MAC address will not duplicate the ESD of a 790 device that has a built-in EUI-64. 792 In some cases, interfaces may not have an appropriate MAC address or 793 EUI-64 identifier. A globally unique ESD must then be obtained 794 through some alternate mechanism. Several possible mechanisms can be 795 imagined (e.g., the IANA could hand out addresses from the company_id 796 it has been allocated). Although we do not explore them in detail 797 here, we note that a global coordination structure is required here 798 to control the allocation of globally unique identifiers. 800 4.3. Address Rewriting by Border Routers 802 To obviate the need to renumber devices within sites because of 803 changing providers, the GSE design hides the global Routing Goop (RG) 804 from hosts in each site by having site border routers rewrite 805 addresses of the packets they forward across the boundary between the 806 site and public topology. Within a site, nodes need not know the RG 807 associated with their addresses. They simply use a designated 808 "Site-Local RG" value for internal addresses. When a packet is 809 forwarded to the public topology, the border router replaces the 810 Site-Local RG portion of the packet's source address with an 811 appropriate value. Likewise, when a packet from the public topology 812 is forwarded into a site, the border router replaces the RG part of 813 the destination address with the designated Site-Local RG. 815 To simplify discussion, the following text uses the singular term RG 816 as if a site could have only one RG value (i.e., one connection to 817 the Internet). In fact, a site could have multiple Internet 818 connections and consequently multiple RGs. 820 GSE's approach to easing renumbering isn't so much to ease 821 renumbering as to make it transparent to end users. The RG by which 822 a site is known is hidden from nodes within that site. Instead, the 823 RG for the site would be known only by the exit router, either 824 through static configuration or through a dynamic protocol with an 825 upstream provider. 827 Because end hosts don't know their RG, they don't know their entire 828 16-byte address, so they can't specify the full address in the source 829 fields of packets they originate. Consequently, when a datagram 830 leaves a site, the egress border router fills in the high-order 831 portion of the source address with the appropriate RG. 833 The point of keeping the RG hidden from nodes within the core of a 834 site is to insure the changeability of the RG without impacting the 835 site itself. It is expected that the RG would need to change 836 relatively frequently (e.g., several times a year) in order to 837 support sufficient aggregation as the topology of the Internet 838 changes. A change to a site's RG would only require a change at the 839 site's egress point, and it's well possible that this change could be 840 accomplished through a dynamic protocol with the upstream provider. | 841 In addition, the site's DNS records would need updating to properly | 842 indicate the current RG value. 844 Hiding a site's RG from its internal nodes does not, however, mean 845 that changes to RG have no impact on end sites. Since the full 16- 846 byte address of a node isn't a stable value (the RG portion can 847 change), a stored address may contain invalid RG and be unusable if 848 it isn't "refreshed" through some other means. For example, opening 849 a TCP connection, writing the address of the peer to a file and then 850 later trying to reestablish a connection to that peer may well fail. 851 For intra-site communication, however, it is expected that only the 852 Site-Local RG would be used (and stored) which would continue to work 853 for intra-site communication regardless of changes to the site's 854 external RG. This shields a site's intra-site traffic from any 855 instabilities resulting from renumbering. 857 In addition to rewriting source addresses that leave a site, 858 destination addresses must be rewritten upon entering a site. To 859 understand the motivation behind this, consider a site with 860 connections to three Internet providers. Because each of those 861 connections has its own RG, each destination within the site would be 862 known by three different 16-byte addresses. As a result, intra-site 863 routers would have to carry a routing table three times larger than 864 expected. To work around this, GSE proposed replacing the RG in 865 inbound packets with the special "Site-Local RG" value to reduce 866 intra-site routing tables to the minimum necessary. 868 In summary, when a node initiates a flow to a node at another site, 869 the initiating node is expected to know the full 16-byte address for 870 the destination through mechanisms such as a DNS query. The 871 initiating node does not, however, know its own RG, and uses the 872 Site-Local RG values in the RG part of the source address. When the 873 datagram reaches the exit border router, the router replaces the RG 874 of the packet's source address. When the datagram arrives at the 875 entry router at the destination site, the router replaces the RG 876 portion of the destination address with the distinguished "Site-Local 877 RG" value. When the destination host needs to send return traffic, 878 that host knows the full 16-byte address for the other host because 879 it appeared in the source address field of the arriving packet. 881 4.4. Renumbering and Rehoming Mid-Level ISPs 883 One of the most difficult-to-solve components of the renumbering 884 problem with CIDR is that of renumbering mid-level service providers. 885 Specifically, if SmallISP1 changes its transit provider from BigISP1 886 to BigISP2, then in order for the overall size of the routing tables 887 to stay the same, all of SmallISP1's customers would have to renumber 888 into address space covered by an aggregate of BigISP2. GSE deals 889 with this problem by handling the RG in DNS with indirection. 890 Specifically, a site's DNS server specifies the RG portion of its 891 addresses by referencing the "name" of its immediate provider, which 892 is a resolvable DNS name (this implies a new Resource Record type). 893 That provider may define some of the low-order bits of the RG and 894 then reference its immediate provider. This chain of reference 895 allows mid-level service providers to change transit providers, and 896 the customers of that mid-level will simply "inherit" the change in 897 RG. Note that this mechanism does not depend on the GSE address 898 format per se and can also be applied to IPv4 addressing. 900 4.5. Support for Multi-Homed Sites 902 GSE defines a specific mechanism for providers to use to support 903 multi-homed customers that gives those customers more reliability 904 than singly-homed sites, but without a negative impact on the scaling 905 of global routing. This mechanism is not specific to GSE and could 906 be applied to any multi-homing scenario where a site is known by 907 multiple prefixes (including provider-based addressing). Assume the 908 following topology: 910 Provider1 Provider2 911 +------+ +------+ 912 | | | | 913 | PBR1 | | PBR2 | 914 +----x-+ +-x----+ 915 | | 916 RG1 | | RG2 917 | | 918 +--x-----------x--+ 919 | SBR1 SBR2 | 920 | | 921 +-----------------+ 922 Site 924 Figure 7 926 PBR1 is Provider1's border router while PBR2 is Provider2's border 927 router. SBR1 is the site's border router that connects to Provider1 928 while SBR2 is the site's border router that connects to Provider2. 929 Imagine, for example, that the line between Provider1 and the site 930 goes down. Any already existing flows that use a destination address 931 including RG1 would stop working. In addition, any addresses 932 returned from DNS queries that include RG1 would not be viable 933 addresses. If PBR1 and PBR2 knew about each other, however, then in 934 this case PBR1 could tunnel packets destined for RG1-prefixed 935 addresses to PBR2, thus keeping the communication working. (Note 936 that IP-in-IP encapsulation is necessary since routers between PBR1 937 and PBR2 would forward packets destined for addresses with PBR1's 938 prefix back towards PBR1.) 940 4.6. Explicit Non-Goals for GSE 942 It is worth noting explicitly that GSE did not attempt to address the 943 following issues: 945 1) Survival of TCP connections through renumbering events. If a 946 site is renumbered, TCP connections using a previous address 947 will continue to work only as long as the previous address still 948 works (i.e., while it is still "valid" using RFC 1971 949 terminology). No attempt is made to have existing connections 950 switch to the new address. 952 2) It is not known how multicast can be made to work under GSE. 954 3) It is not known how mobility can be made to work under GSE. 956 4) The performance impact of having routers rewrite portions of the 957 source and destination address in packet headers requires 958 further study. 960 That GSE didn't address the above does not mean they cannot be 961 solved. Rather, the issues simply weren't studied in sufficient 962 depth. 964 5. Analysis: The Pros and Cons of Overloading Addresses 966 At this point we have given complete descriptions of two addressing 967 architectures: IPv4, which uses the overloading technique, and GSE, 968 which uses the separated technique. We now compare and contrast the 969 two techniques. 971 The following discussion is organized around three fundamental 972 points: 974 1) Identifiers indicate who the intended recipient of a packet is. 975 At the network layer, an identifier refers to an interface, at 976 the transport layer it refers to a process or other endpoint of 977 a "connection". 979 2) Identifiers must be mapped into a locator that the network layer 980 can use to actually deliver a packet to its intended 981 destination. 983 3) There must be a suitable way to adequately authenticate the user 984 of an identifier, so that communicating peers have sufficient 985 confidence that packets sent to or received from a particular 986 identifier correspond to the intended recipient. 988 5.1. Purpose of an Identifier 990 An identifier gives an entity the ability to refer to a communication 991 end point and to refer to the same endpoint over an extended period 992 of time. In terms of semantics, two or more packets sent to the same 993 identifier should be delivered to the same end point. Likewise, one 994 expects multiple packets received from the same identifier to have 995 been originated by the same sending entity. That is, a source 996 identifier indicates who the packet is from and a destination 997 identifier indicates who the packet is intended for. 999 In IPv4, when applications communicate, transport "identifiers" 1000 consist of addresses and port numbers. For the purposes of this 1001 discussion, we use the term "identifier" to mean the identifier of an 1002 interface. It is assumed that port numbers will be present when 1003 higher layer entities communicate; the exact port numbers used are 1004 not relevant to this discussion. 1006 In small networks, flat routing can be used to deliver packets to 1007 their destination based only on the destination identifier carried in 1008 a packet header (i.e., the identifier is the locator and is not 1009 required to have any structure). However, in such systems, a 1010 distinct route entry is required for every destination, an approach 1011 that does not scale. In larger networks, packet addresses include a 1012 locator that helps the network layer deliver a packet to its 1013 destination. Such a locator typically has a structure to keep 1014 routing tables small relative to the total number of reachable 1015 destinations. In IPv4, the identifier and locator are combined in a 1016 single address; it is not possible to separate the locator portion of 1017 an address from the identifier portion. In contrast, the ESD portion 1018 of a GSE address (which can easily be extracted from the address) 1019 serves as an identifier, while the Routing Stuff plays the role of a 1020 locator. 1022 Having a clear separation between the locator and the identifier 1023 portion of an address appears to provide protocols some additional 1024 flexibility. Once a packet has been delivered to its intended 1025 destination interface (i.e., node), for example, the locator has 1026 served its purpose and is no longer needed to further demultiplex a 1027 packet to its higher-layer end point. This means that if a packet is 1028 delivered to the correct destination node (that is the identifier 1029 carried in the packet address matches to one interface identifier of 1030 the node), the node will accept the packet, regardless of how the 1031 packet got there. The exact locator used does not matter, within 1032 most Internet circumstances, so long as it gets the packet delivered 1033 to its proper destination. 1035 The most obvious example that could benefit from the separation of 1036 locators and identifiers involves communication with a mobile host. 1037 Transport protocols such as TCP are unable to keep connections open 1038 if either of the two endpoint identifiers for an open connection 1039 changes. Fundamentally, the endpoint identifiers indicate the two 1040 endpoint entities that are communicating. If a node were to receive 1041 a packet from a node with which it had been communicating previously, 1042 but the identifier used by the sending node has changed, the 1043 recipient would be unable to distinguish this case from that of a 1044 packet received from a completely different node. 1046 In the specific case of TCP and IPv4, connections are identified 1047 uniquely by the tuple: (srcIPaddr, dstIPaddr, srcport, dstport). 1048 Because IPv4 addresses contain a combined locator/identifier, it is 1049 not possible to have a node's location change without also having its 1050 identifier change. Consequently, when a mobile node moves, its 1051 existing connections no longer work, in the absence of special 1052 protocols such as Mobile IP [MOBILITY]. 1054 In contrast, connections in GSE are identified by the ESDs rather 1055 than full IPv6 addresses. That is, connections are identified 1056 uniquely by the tuple: (srcESD, dstESD, srcport, dstport). 1057 Consequently, when demultiplexing incoming packets to their proper 1058 end point, TCP would ignore the Routing Stuff portions of addresses. 1059 Because the Routing Stuff portion of an address is ignored during 1060 demultiplexing operations, a mobile node is free to move -- and 1061 change its Routing Stuff -- without changing its identification. 1063 As a side note, it is a requirement in GSE that packets be 1064 demultiplexed to higher layer endpoints on ESDs alone independent of 1065 the Routing Stuff. If a site is multi-homed, the packets it sends 1066 may exit the site at different egress border routers during the 1067 lifetime of a connection. Because each border router will place its 1068 own RG into the source addresses of outgoing packets, the receiving 1069 TCP must ignore (at least) the RG portion of addresses when 1070 demultiplexing received packets. The alternative would make TCP 1071 unable to cope with common routing changes, i.e., if the path 1072 changed, packets delivered correctly would be discarded by the 1073 receiving TCP rather than accepted. 1075 Not surprisingly, having separate locator and identifiers in 1076 addresses leads to additional problems as well. First, an identifier 1077 by itself provides only limited value. In order to actually deliver 1078 packets to a destination identifier, a corresponding locator must be 1079 known. The general problem of mapping identifiers into locators is 1080 non-trivial to solve, and is the topic of the next Section. Second, 1081 because the Routing Stuff is ignored when packets being demultiplexed 1082 upward in the protocol stack, it becomes much easier for an intruder 1083 to masquerade as someone else. 1085 5.2. Mapping an Identifier to a Locator 1087 The idea of using addresses that cleanly separate location and 1088 identification information is not new. However, there are several 1089 different flavors. In its pure form, a sender need only know the 1090 identifier of an end-point in order to send packets to it. When 1091 presented with a datagram to send, network software would be 1092 responsible for determining the locator associated with an identifier 1093 so that the packet can be delivered. A key question is: "who is 1094 responsible for finding the Routing Stuff associated with a given 1095 identifier"? There are a number of possibilities, each with a 1096 different set of implications: 1098 1) The network layer could be responsible for doing the mapping. 1099 The advantage of such a system is that an ESD could be stored 1100 essentially forever (e.g., in configuration files), but whenever 1101 it is actually used, network layer software would automatically 1102 perform the mapping to determine the appropriate Routing Stuff 1103 for the destination. Likewise, should an existing mapping 1104 become invalid, network layer software could dynamically 1105 determine the updated value. Unfortunately, building such a 1106 mapping mechanism that scales is difficult if not impossible 1107 with a flat identifier space (e.g., the ESD identifier). 1109 2) The transport layer could be responsible for doing the mapping. 1110 It could perform the mapping when a connection is first opened, 1111 periodically refreshing the binding for long-running 1112 connections. Implementing such a scheme would change the 1113 existing transport layer protocols TCP and UDP significantly. 1114 However, in the case of TCP, such a scheme would have the 1115 benefit that applications would probably not need to be 1116 modified. For UDP-based applications, this may not hold, since 1117 most UDP-based protocols are implemented within applications. 1119 3) Higher-layer software (e.g., the application itself) could be 1120 responsible for performing the mapping. This potentially 1121 increases the burden on application programmers significantly, 1122 especially if long-running connections are required to survive 1123 renumbering and/or deal with mobile nodes. 1125 The GSE proposal uses the last approach. The network and transport 1126 layers are always presented with both the Routing Stuff (RG + STP) 1127 and the ESD together in one IPv6 address. It is neither of these 1128 layers' jobs to determine the Routing Stuff given only the ESD or to 1129 validate that the Routing Stuff is correct. When an application has 1130 data to send, it queries the DNS to obtain the IPv6 AAAA record for a 1131 destination. The returned AAAA record contains both the Routing 1132 Stuff and the ESD of the specified destination. While such an 1133 approach eliminates the need for the lower layers to be able to map 1134 ESDs into corresponding Routing Stuff, it also means that when 1135 presented with an address containing an incorrect (i.e., no longer 1136 valid) Routing Stuff, the network is unable to deliver the packet to 1137 its correct destination. Note that addresses containing invalid 1138 Routing Stuff will result any time when cached addresses are used 1139 after the Routing Stuff of the address becomes invalid. This may 1140 happen if addresses are stored in configuration files, a mobile node 1141 moves to a new location, long-running applications (clients and 1142 servers) cache the result of DNS queries, a long-running connection 1143 attempts to continue operating during a site renumbering event, etc. 1144 Whatever the causes, the failures are fundamentally due to dynamic 1145 topological changes at the network layer, yet in GSE such failures 1146 are left to be dealt with at the application level (through DNS), 1147 because neither the transport nor the network level has the ability 1148 to re-map identifiers to corresponding locators. | 1150 To avoid the above problem a network architecture must provide the 1151 ability to map an identifier to a locator. In IPv4, this mapping is 1152 trivial, since the identifier and locator are combined in a single 1153 quantity (i.e., the IPv4 address). GSE does not provide this mapping 1154 functionality directly. Instead, GSE assumes that a node's DNS name 1155 serves as its stable identifier, and uses normal DNS queries to map 1156 the DNS "identifier" into an IPv6 address. The IPv6 address contains 1157 both the ESD identifier together with its Routing Stuff, that is an 1158 initial binding/mapping between the identifier and locator. When 1159 this binding breaks (for example due to dynamic topological changes), 1160 the ESD identifier cannot be mapped into a new locator by itself. 1161 Instead one must resort back to application level, hoping another DNS 1162 query would provide rescue to the broken binding between identifier 1163 to locator that is needed for network delivery. 1165 The use of DNS to provide identifier to locator mapping contributes 1166 to GSE's apparent simplicity. However, there are two fundamental 1167 problems with this approach, if the intention is to make it 1168 transparently easy to change locators over time. First, the burden 1169 of performing the mapping from identifier to locator is placed 1170 directly on the application, because lower layers (i.e., transport 1171 and network layers) cannot perform the mapping themselves due to 1172 layering violation concerns (i.e., TCP and UDP can't perform a DNS 1173 query). Second, following all RG changes the DNS database must be 1174 promptly updated and all expired information must be flushed out of 1175 all DNS caches. This stringent timing requirement imposed by lower 1176 level operation would represent a departure from the original DNS 1177 design, which provides DNS names to address mappings that only change 1178 slowly over time if at all, and which relies heavily on caching over 1179 relatively long time periods to scale well. 1181 The following subsections discuss a number of issues related to 1182 keeping track of or determining the locator associated with an 1183 identifier. 1185 5.2.1. Scalable Mapping of Identifiers to Locators 1187 It is not difficult to construct a mapping from an identifier (such 1188 as an ESD) to a locator (as well as other information such as a name, 1189 cryptographic keys, etc.) provided one can structure the identifier 1190 space appropriately to support scalable lookups. In particular, 1191 identifiers must have sufficient structure to support the delegating 1192 mechanism of a distributed database such as DNS. On the other hand, 1193 no scalable mechanism is known for performing such a mapping on 1194 arbitrary identifiers taken from a flat space lacking any structure. 1196 Imposing a hierarchy on identifiers poses the following difficulties: 1198 - - It increases the size of the identifier. The exact size 1199 necessary to support sufficient hierarchy is unclear, though it 1200 is likely to be roughly the same as that used for the routing 1201 hierarchy. Analysis done during the original IPng debates 1202 [RFC1752] suggests that close to 48-bits of hierarchy are needed 1203 to identify all the possible sites 30-40 years from now. 1205 - - The assignment of identifiers must be tied to the delegation 1206 structure. That is, the site that "owns" an identifier is the 1207 one responsible for maintaining the identifier-to-locator 1208 mapping information about it. 1210 - - Due to the requirement of tying an identifier to the 1211 delegation structure the identifier of a node cannot be burned 1212 in during manufacturing. Instead a mechanism is needed to allow 1213 a node to learn its identifier. To be practical, such a 1214 mechanism would need to be automated and avoid the need for 1215 manual configuration. 1217 5.2.2. Insufficient Hierarchy Space in ESDs 1219 In the case of GSE's 8-byte ESD, the size of the identifier is not 1220 large enough to contain sufficient hierarchy to both create DNS-like 1221 delegation points and support stateless address autoconfiguration. 1222 Stateless address autoconfiguration [RFC1971] already assumes that an 1223 interface's 6-byte link-layer (i.e., MAC) address can be appended to 1224 a link's routing prefix to produce a globally unique IPv6 address. 1225 With GSE, only two bytes would be available for hierarchy and 1226 delegation. 1228 It is also the case that the sorts of built-in identifiers now found 1229 in computing hardware, such as "EUI-48" and "EUI-64" addresses 1230 [IEEE802, IEEE1212], do not have the structure required for this 1231 delegation. Such identifiers have only two-levels of hierarchy; the 1232 top-level typically identifies a manufacturer, with the remaining 1233 part of the address being the equivalent of the serial number unique 1234 to the manufacturer. The delegation of the two-level hierarchy 1235 (i.e., equipment manufacturer) does not correspond to the 1236 administrator under which the end-user operates. Hence, stateless 1237 autoconfiguration [RFC1971] cannot create addresses with the 1238 necessary hierarchical property in the ESD portion of an address. 1240 Finally, imposing a required hierarchical structure on identifiers 1241 such as an ESD would also introduce a new administrative burden and a 1242 new or expanded registry system to manage ESD space (i.e., to insure 1243 that ESDs are globally unique). While the procedures for assigning 1244 ESDs, which need only organizational and not topological 1245 significance, would be simpler than the procedures for managing IPv4 1246 addresses, it seems a laudable goal to avoid the problem altogether 1247 if possible. In addition, it would likely increase the complexity 1248 for connecting new nodes to the Internet, a goal inconsistent with 1249 Stateless Address autoconfiguration [RFC1971]. 1251 The topic of mapping full 16-byte GSE addresses to a locator or other 1252 information is discussed in Appendix D. 1254 5.3. Authentication of Identifiers 1256 The true value of a globally unique identifier lies not on its 1257 uniqueness but on an ability to use the same identifier repeatedly 1258 and have it refer to the same end point. That is, there is an 1259 expectation that repeated and subsequent use of the same identifier 1260 results in continued communication with the same end point. To be 1261 useful then, a valid identifier must either be easily distinguishable 1262 from a fraudulent one, or the system must have a way to prevent 1263 identifiers from being used in an unauthorized manner. 1265 The remainder of this section discusses how identifier authentication 1266 is done in both IPv4 and GSE, and shows how overloading an address 1267 with both an identifier and a locator provides a significant 1268 automatic identifier authentication. In contrast, there is 1269 essentially no identifier authentication in GSE. It should be noted 1270 that the actual strength of authentication that would be considered 1271 sufficient is a topic in its own right, and we do not cover it here. 1272 Instead, we focus on the relative strengths in the two schemes. 1274 The following discussion assumes an absence of cryptographic | 1275 authentication to bind an identifier to an end site. Many of the | 1276 concerns described below would become non-issues if an appropriate | 1277 cryptographic infrastructure were available. Section 5.5 discusses | 1278 this issue in more detail. | 1280 5.3.1. Identifier Authentication in IPv4 1282 As described earlier, an IPv4 address simultaneously plays two roles: 1283 a unique identifier and a locator. Using an overloaded address as an 1284 identifier has the side-effect of insuring that (for all practical 1285 purposes) the identifier is globally unique. Furthermore, because 1286 the same number is used both to identify an interface and to deliver 1287 data to that interface, it is impossible for some interface A to use 1288 the identification of another interface B in an attempt to receive 1289 data destined to B without being detected, unless the routing system 1290 is compromised. | 1292 When both interfaces A and B claim the same unicast address, an | 1293 (uncompromised) routing subsystem generally delivers packets to only | 1294 one of them. The other node will quickly realize that something is | 1295 wrong (since communication using the duplicate address fails) and | 1296 take corrective actions, either correcting a misconfiguration or | 1297 otherwise detecting and thwarting the intruder. To understand how | 1298 the routing subsystem prevents the same address from being used in | 1299 multiple locations, there are two cases to consider, depending on | 1300 whether the two interfaces using duplicate addresses are attached to | 1301 the same or to different links. 1303 When two interfaces on the same link use the same address, a node 1304 (host or router) sending traffic to the duplicate address will in 1305 practice send all packets to one of the nodes. On Ethernets, for 1306 example, the sender will use ARP (or Neighbor Discovery in IPv6) to 1307 determine the link-layer address corresponding to the destination 1308 address. When multiple ARP replies for the target IP address are 1309 received, the most recently received response replaces whatever is 1310 already in the cache. Consequently, the destinations a node using a 1311 duplicate IP address can communicate with depends on what its 1312 neighboring nodes have in their ARP caches. In most cases, such 1313 communication failures become apparent relatively quickly, since it 1314 is unlikely that communication can proceed correctly on both nodes. 1316 It is also the case that a number of ARP implementations (e.g., BSD- 1317 derived implementations) log warning messages when an ARP request is 1318 received from a node using the same address as the machine receiving 1319 the ARP request. 1321 The previous discussion describes the operation of ARP in the absence | 1322 of intruders or other malicious users. ARP has a number of security | 1323 vulnerabilities that make it trivial for an intruder to intercept | 1324 traffic and selectively process traffic that traverses a link, | 1325 provided the intruder is attached to the link the traffic of interest | 1326 traverses. For example, an intruder could intercept all traffic to an | 1327 address by being the last to return an ARP response, and then | 1328 selectively relay the traffic (after examining and/or modifying it) | 1329 to its intended recipient. This is a classic man-in-the-middle | 1330 attack. | 1332 When two interfaces on different links use the same address, the 1333 routing subsystem generally delivers packets to only one of the nodes 1334 because only one of the links has the right subnet corresponding to 1335 the IP address. Consequently, the node using the address on the 1336 "wrong" link will generally never receive any packets sent to it and 1337 will be unable to communicate with anyone. For obvious reasons, this 1338 condition is usually detected quickly. 1340 It should be noted that although an address containing a combined 1341 identifier and locator can be forged, the routing subsystem 1342 significantly limits communication using the forged address. First, 1343 return traffic will be sent to the correct destination and not the 1344 originator of the forged address. This alone prevents certain types 1345 of spoofing attacks. For example, if a destination receives an 1346 unexpected packet corresponding to a TCP connection that it is 1347 unaware of, it may return a TCP segment resetting the connection. | 1348 Second, routers performing ingress filtering can refuse to forward 1349 traffic claiming to originate from a source whose source address does | 1350 not match the expected addresses (from a topology perspective) for 1351 sources located within a particular region [RFC 2267]. To 1352 effectively masquerade as someone else requires subverting the 1353 intermediate routing subsystem. 1355 To summarize, the routing subsystem in IPv4 provides a limited (but | 1356 quite significant) defense against arbitrary hijacking of packets to | 1357 an improper destination. We do not claim that this defense is | 1358 sufficient against all types of attacks by a determined intruder. | 1359 However, it does provide some degree of defense against accidental | 1360 misconfigurations (e.g., assigning an improper address to an | 1361 interface) and does erect hurdles that prevent an abritrary node from | 1362 impersonating another node. The more dangerous attack, subverting | 1363 the routing subsystem by injecting unauthorized routes, can be traced | 1364 and detected by appropriate tools. | 1366 5.3.2. Identifier Authentication in GSE 1368 In GSE, it is not possible for the routing subsystem to provide any 1369 enforcement on the authenticity of identifiers with respect to their 1370 corresponding Routing Stuff, since the Routing Stuff and ESD portions | 1371 of an address are by definition completely orthogonal quantities. | 1372 Thus, even the limited protection offered by IPv4 is not immediately | 1373 available. | 1375 An interesting question is whether any such protection is needed. One | 1376 argument is that address-based authentication is so inherently weak | 1377 as to be useless, thus the increased vulnerability of a GSE-like | 1378 scheme is not significant. Where authentication is desired, the use | 1379 of something based on cryptography is necessary (e.g., IPsec | 1380 [RFC2401]). | 1382 There are at least two arguments against this line of thought. | 1383 First, the lack of protection comparable to IPv4 may lead to a new | 1384 set of (poorly understood) security threats; Section 5.5 below | 1385 describes one possible threat. These threats must be dealt with at | 1386 the transport (or lower) layer because the threats are to the | 1387 integrety of the transport layer itself. Attempting to solve them at | 1388 higher-layers (e.g., via IPsec [RFC2401] and IKE [RFC2409]) results | 1389 in a potential layering circularity, where the security mechanisms | 1390 rely on a correctly functioning transport, but the transport relies | 1391 on those same security mechanisms to provide a service. Whether such | 1392 a mechanism can be designed is an area of future work. | 1394 Second, requiring that basic threats to the transport layer be dealt | 1395 with using cryptographic techniques significantly increases the cost | 1396 of formerly simple packet exchanges. Cryptographic security no longer | 1397 becomes a choice an application can make, but quite possibly a | 1398 requirement to protect against certain types of attacks. Thus, the | 1399 cost of deploying effective defenses against a new class of denial of | 1400 service attacks may be quite significant. 1402 5.4. Transport Layer: What Locator Should Be Used? 1404 In the following, we focus on what Routing Stuff to use with TCP; UDP 1405 also depends on the Routing Stuff in similar way. Indeed, we believe 1406 that TCP is the "easier" case to deal with, for two reasons. First, 1407 TCP is a stateful protocol in which both ends of the connection can 1408 negotiate with each other. UDP-based communications are stateless, 1409 and remember nothing from one packet to the next. Consequently, 1410 changing UDP to remember locator information in addition to the 1411 identifier of the peer may require the introduction of "session" 1412 features, perhaps as part of a common "library". Second, changes to 1413 UDP in practice mean changing individual applications themselves, 1414 raising deployability questions. 1416 There are three cases of interest from TCP's perspective: 1418 - - the sending side of an active open 1420 - - the sending side of a passive open (i.e., how to respond to an 1421 active open) 1423 - - changes to the Routing Stuff during an open connection. 1425 5.4.1. RG Selection On An Active Open 1427 If the host is performing a TCP "active open", the application first 1428 queries the DNS to obtain the destination address, which contains the 1429 appropriate RG for the remote peer. That is, the initiator of 1430 communication is assumed to provide the correct Routing Stuff when 1431 initiating communication to a specific destination. 1433 5.4.2. RG Selection On An Passive Open 1435 When a server passively accepts connections from arbitrary clients, 1436 it has no choice but to assume that the Routing Stuff in the source 1437 address of a received packet that initiated the communication is 1438 correct, because it has no way to authenticate its validity. Note 1439 that the Routing Stuff is "correct" only in the sense that it 1440 corresponds to the site originating the connection, which the server 1441 will send the reply to. Whether the Routing Stuff paired with the 1442 received ESD actually matches the Routing Stuff located at the site 1443 where the legitimate owner of the ESD currently resides is not known 1444 and cannot be determined. Because the ESD alone cannot be mapped 1445 into a locator (or some other quantity that can provide input to an 1446 authentication procedure), there is no way to determine whether the 1447 received Routing Stuff corresponds to that legitimately associated 1448 with the source identifier of the received packet. The issue of 1449 spoofing is discussed in more detail later. 1451 5.4.3. Mid-Connection RG Changes 1453 While packets are flowing as part of an open connection, the RG 1454 appearing on subsequent packets is susceptible to change through 1455 renumbering events, or as a result of site-internal routing changes 1456 that cause the egress point for off-site traffic to change. It is 1457 even possible that traffic-balancing schemes could result in the use 1458 of two egress routers, with roughly every other packet exiting 1459 through a different egress router. 1461 Because TCP under GSE demultiplexes packets using only ESDs, newly 1462 arrived packets will be delivered to the correct end-point regardless 1463 of whether their source RG have changed. The GSE proposal calls for 1464 return traffic to continue to be sent via the "old" RG, even though 1465 it may have been deprecated or become less optimal because the peer's 1466 border router has changed. That is, the RG to use for reaching a 1467 peer is bound to a connection when the connection is established and 1468 does not change thereafter. However, the completion of renumbering 1469 events (so that an earlier RG is now invalid) and certain topology 1470 changes would require TCP to switch sending to a new RG mid- 1471 connection. To explore the scenario, we consider ways of allowing 1472 the RG change to be made to existing established connections. 1474 If TCP connection identifiers are based on ESDs rather than full 1475 addresses, traffic from the same ESD would be viewed as coming from 1476 the same peer, regardless of the source RG. Because this 1477 vulnerability is already present in today's Internet (forging the 1478 source address of a packet is trivial), the mere delivery of incoming 1479 datagrams with the same ESD but a different RG does not introduce new 1480 vulnerability to TCP. In today's Internet, any node can already 1481 originate FINs/RSTs from an arbitrary source address and potentially 1482 or definitely disrupt the connection. Therefore, acceptance of 1483 traffic independent of its source RG does not appear to significantly 1484 worsen existing robustness. Note, however, that ingress filtering as 1485 described in Section 5.3.1, cannot be performed on packets containing 1486 GSE addresses. This does make it more difficult to prevent certain 1487 types of attacks. 1489 We also considered allowing TCP to reply to each segment using the RG 1490 of the most recently-received segment. Although this allows TCP 1491 connections to survive certain important events (e.g., renumbering), 1492 it also makes it trivial for anyone to hijack connections, 1493 unacceptably weakening robustness compared with today's Internet. A 1494 sender simply needs to guess the sequence numbers in use by a given 1495 TCP connection [Bellovin 89] and send traffic with a bogus RG to 1496 hijack a connection to an intruder at an arbitrary location. 1498 Providing protection from hijacking implies that the RG used to send 1499 packets must be bound to a connection end-point (e.g., it is part of 1500 the connection state). Although it may be reasonable to accept 1501 incoming traffic independent of the source RG, the choice of sending 1502 RG requires more careful consideration. Indeed, any subsequent 1503 change in the RG used for sending traffic must be properly 1504 authenticated (e.g., using cryptographic means). In the GSE 1505 proposal, the is no apparent way to authenticate such a change, since 1506 the remote peer doesn't even know its own RG. Consequently, the only 1507 reasonable approach in GSE is to send to the peer using the first RG 1508 used for the entire life of a connection. That is, always use the 1509 first RG seen, and accept the loss of connectivity whenever the RG 1510 changes. 1512 5.4.4. The Impact of Corrupted Routing Goop 1514 Another interesting issue that arises is what impact corrupted RG | 1515 would have on robustness, given that there is no IPv6 header checksum | 1516 that could help detect a corrupted source address field. Because the | 1517 RG is not covered by the TCP checksum (the sender doesn't know what | 1518 source RG will be inserted), no TCP mechanism can detect such | 1519 corruption at the receiver. Moreover, once a specific RG is in use, | 1520 it does not change for the duration of a connection. One interesting | 1521 case occurs on the passive side of a TCP connection, where a server | 1522 accepts incoming connections from remote clients. If the initial SYN | 1523 from the client includes a corrupted RG, the server TCP will create a | 1524 TCP connection (in the SYN-RECEIVED state) and cache the corrupted RG | 1525 with the connection. The second packet of the 3-way handshake, the | 1526 SYN-ACK packet, would be sent to the wrong RG and consequently not | 1527 reach the correct destination. Later, when the client retransmits | 1528 the unacknowledged SYN, the server will continue to send the SYN-ACK | 1529 using the bad RG. Eventually the client times out, and the attempt | 1530 to open a TCP connection fails. 1532 We next consider relaxing the restriction on switching RGs in an 1533 attempt to avoid the previous failure scenario. The situation is 1534 complicated by the fact that the RG on received packets may change 1535 for legitimate reasons (e.g., a multi-homed site load-shares traffic 1536 across multiple border routers). The key question is how one can 1537 determine which RG is valid and which is not. That is, for each of 1538 the destination RGs a sender attempts to use, how can it determine 1539 which RG worked and which did not? Solving this problem is more 1540 difficult than first appears, since one must cover the cases of 1541 delayed segments, lost segments, simultaneous opens, etc. If a SYN- 1542 ACK is retransmitted using different RGs, it is not possible to 1543 determine which of the two RGs worked correctly. We conclude that 1544 the only way TCP can determine that a particular RG is correct is by 1545 receiving an ACK for a specific sequence number in which all 1546 transmissions of that sequence number used the same RG. This would 1547 involve non-trivial changes to TCP implementations. 1549 At best, an RG selection algorithm for TCP would require new logic in 1550 implementations of TCP's opening handshake --- a significant 1551 transition and deployment issue. We are not certain that a valid 1552 algorithm is attainable, however. RG changes would have to be 1553 handled in all cases handled by the opening handshake: delayed 1554 segments, lost segments, undetected bit errors in RG, simultaneous 1555 opens, old segments, etc. 1557 In the end, we conclude that although the corrupted SYN case 1558 introduces potential problems, the changes that would need to be made 1559 to TCP to robustly deal with such corruption would be significant, if 1560 tractable at all. This would result in a transition to GSE also 1561 having a significant TCPng component, a significant drawback. 1563 5.5. On The Uniqueness Of ESDs 1565 Although ESDs are expected to be globally unique, their uniqueness 1566 property may be violated either due to mistakes in allocation or by 1567 malicious attacks. The exact uniqueness requirements for ESDs 1568 depends on what purpose they serve and how they are used. If the 1569 correctness of some applications relies on the global uniqueness of 1570 ESDs, then active checking and enforcement will be necessary. On the 1571 other hand if ESDs are used only to uniquely identify individual 1572 endpoints within a session, then one may consider global uniqueness 1573 as unnecessary. 1575 5.5.1. Impact of Duplicate ESDs 1577 Consider what happens when two nodes using the same ESD attempt to | 1578 communicate with each other. In the GSE proposal, a node queries the | 1579 DNS to obtain an IPv6 address. The returned address includes the | 1580 Routing Stuff of an address (the RG+STP portions). At this point, | 1581 the sender might notice the destination ESD is the same as its own | 1582 ESD and indicate an error. If it doesn't check, however, it may well | 1583 forward the packet to a router that delivers the packet to its | 1584 correct destination (using the information in the Routing Stuff). On | 1585 receipt of the packet, again, the destination node could examine the | 1586 ESD portion of the source address and determine that it is the same | 1587 as its own and indicate an error. Alternatively, it could just | 1588 process the packet without detecting the duplication and | 1589 communication would proceed as normal (unless there are port number | 1590 conflicts due to the sender and receiver allocating port numbers from | 1591 the same name space). 1593 A more problematic case occurs if two nodes having the same ESD 1594 communicate with a third party. To the third party, packets received 1595 from either machine might appear to be coming from the same machine 1596 since they all carry the same ESD. Consequently, at the transport 1597 level, if both machines choose the same source and destination port 1598 numbers (one of the ports --- a server's well-known port number --- 1599 will likely be the same), packets belonging to two distinct transport 1600 connections will be demultiplexed to a single transport end-point. 1602 When packets from different sources using the same source ESD are 1603 delivered to the same transport end-point, a number of possibilities 1604 come to mind: 1606 1) Following the GSE specification, the transport end-point would 1607 accept the packet, without regard to the Routing Stuff of the 1608 source address. This may lead to a number of robustness 1609 problems (and at best will confuse the application). 1611 2) The transport end-point could verify that the Routing Stuff of 1612 the source address matches one of a set of expected values 1613 before processing the packet further. If the Routing Stuff 1614 doesn't match any expected value, the packet could be dropped. 1615 This would result in a connection from one host operating 1616 correctly, while a connection from another host (using the same 1617 ESD) would fail. 1619 3) When a packet is received with an unexpected Routing Stuff the 1620 receiver could invoke special-purpose code to deal with this 1621 case. Possible actions include attempting to verify whether the 1622 Routing Stuff is indeed correct (the saved values may have 1623 expired) or attempting to verify whether duplicate ESDs are in 1624 use (e.g., by inventing a protocol that sends packets using both 1625 Routing Stuff and verifies that they are delivered to the same 1626 end-point). 1628 5.5.2. New Denial of Service Attacks. 1630 It is clear that there are potential problems if identifiers are not 1631 globally unique. How common such problems would actually occur in 1632 practice depends on how many duplicates there actually are. Thus, 1633 one might be tempted to make the argument that a scheme for assigning 1634 identifiers could be made to be "unique enough" in practice. This 1635 would be a dangerous and naive assumption, because in the absence of 1636 any ESD enforcement (i.e. ensuring each host use only the assigned 1637 ESD), intruders will actively impersonate other sites for the sole 1638 purpose of invalidating the uniqueness assumption. For example, one 1639 could deny service to host foo.bar.com by querying the DNS for its 1640 corresponding ESD, and then impersonating that ESD. 1642 As a specific example, one GSE-specific denial-of-service attack 1643 would be for an intruder to masquerade as another host and "wedge" 1644 connections in a SYN-RECEIVED state by sending SYN segments 1645 containing an invalid RG in the source IP address for a specific ESD. 1646 Subsequent connection attempts to the wedged host from the legitimate 1647 owner of the ESD (if they used the same TCP port numbers) would then 1648 not complete, since return traffic would be sent to the wrong place. | 1649 Note that this attack is worse than the common syn-flood attack | 1650 because it not only ties up resources on the target machine, it | 1651 blocks out legitimate access to the target machine by a specific | 1652 third party. | 1654 Another potential attack involves an intruder assuming the ESD of a | 1655 target site (e.g., mit.edu), then opening TCP connections using | 1656 mit.edu's ESD to a targer server (e.g., big-server.com). Because the | 1657 RG would point back to the attacker, the attacker could create a | 1658 number of TCP connections in an OPEN state without needing to guess | 1659 the sequence numbers needed to complete a 3-way handshake. Once those | 1660 connections are open, it would be difficult to (automatically) | 1661 distinguish between connections that are part of a denial-of-service | 1662 attack from those (idle) connections that are part of a legitimate | 1663 activity. | 1665 The previous discussion indicates that separating identifiers and | 1666 locators opens up new potential denial-of-service attack policies | 1667 that would need to be carefully studied. One way of addressing them | 1668 would be to have a way to authenticate the RG associated with an | 1669 identifier, as the attacks take advantage of the distinction between | 1670 identifiers and locators. 1672 5.6. Summary of Identifier Authentication Issues 1674 In summary, changing the RG dynamically in a safe way for a 1675 connection requires that an originator of traffic be able to 1676 authenticate a proposed change in the RG before sending to a 1677 particular ESD via that RG. This is difficult for several reasons: 1679 1) It can't be done on an end-to-end basis in GSE (e.g., via IPsec) | 1680 because the sender doesn't know what value the RG portion of the | 1681 address will have when it reaches the receiver. This issue is | 1682 specific to GSE and other approaches in which the end node knows | 1683 its own RG would not automatically have this problem. | 1685 2) It can't be easily done in GSE using just the ESD because there | 1686 is no mechanism at or below the transport layer to map ESDs into | 1687 a quantity that can be used as a key to jump start the | 1688 authentication process (using the DNS would be problematic due | 1689 to layering circularity considerations). | 1691 3) It is conceivable that one could send a "who are you" type | 1692 message to a peer asking it to return a more suitable identifier | 1693 that can be used to jump start the authentication process. This | 1694 additional information would include information needed to | 1695 obtain keys, certificates, etc. from an appropriate source that | 1696 can be used to verify proper use of an ESD by a particular node. | 1697 Note, however, that the "who are you" makes use of the full | 1698 address, not just the ESD portion. 1700 4) Any scheme that uses the full IPv6 address to do the | 1701 authentication can be used with today's standard provider-based 1702 addressing, raising the question of what benefit is retained 1703 from having separate identifiers and locators. 1705 Our final conclusion is that with the GSE approach, transport 1706 protocol end-points must make an early, single choice of the RG to 1707 use when sending to a peer and stick with that choice for the 1708 duration of the connection. Specifically: 1710 1) The demultiplexing of arriving packets to their transport end 1711 points should use only the ESD, and not the Routing Stuff. 1713 2) If the application chooses an RG for the remote peer (i.e., an 1714 active open), use the provided RG for all traffic sent to that 1715 peer, even if alternative RGs are received on subsequent 1716 incoming datagrams from the same ESD. For all other cases, use 1717 the first RG received with a given ESD for all sending. 1719 3) Simultaneously, we understand that, with the above rules, there 1720 are still open issues with regard to invalid RGs, either through 1721 corruption or through a active hostile attacks. 1723 One difficulty With the above recommendation is that there does not 1724 appear to be a straightforward way to use ESDs in conjunction with 1725 mobility or site renumbering (in which existing connections survive 1726 the renumbering). This presents a quandary. The main benefit of 1727 separating identifiers and locators is the ability to have 1728 communication (e.g., a TCP connection) continue transparently, even 1729 when the Routing Stuff associated with a particular ESD changes. 1730 However, switching to a new Routing Stuff without properly 1731 authenticating it makes it trivial to hijack connections. 1733 We cannot emphasize enough that the use of an ESD independent of an 1734 associated RG can be very dangerous. That is, communicating with a 1735 peer implies that one is always talking to the same peer for the 1736 duration of the communication. But as has been described in previous 1737 sections, such assurance can only come from properly authenticating | 1738 the RG associated with an ESD. How to authentic the RG associated | 1739 with an ESD in GSE does not appear to have a trivial solution is an | 1740 open problem. | 1742 5.7. The Need For Strong Authentication | 1744 The problems described earlier stem from an inability to verify | 1745 whether a particular RG is legitimately associated with an ESD. One | 1746 approach that would address this problem is to use cryptographic | 1747 techniques to verify the binding between RG and an ESD. There are two | 1748 cases to consider. | 1750 First, for an existing connection, switching from one RG to another | 1751 risks the possibility of an intruder hijacking a connection. | 1752 Addressing this risk involves having one endpoint verify | 1753 (cryptographically) with its peer that proposed new RG is acceptable. | 1754 This requires only an ability to communicate with the peer using the | 1755 older (i.e., current) RG and using the older RG to verify the new RG. | 1756 For example, a node could send its peer a message requesting | 1757 cryptographic verification for a new RG prior to actually switching | 1758 to it. Such verification would not require a public key | 1759 infrastrucutre, as the purpose is not to verify that the legitimate | 1760 owner of the ESD approves use of the RG, but that the peer with which | 1761 one is currently communicating with (and who is using a particular | 1762 ESD -- possibly illegally) approves switching to a different RG. | 1764 A more problematic case involves the wedging of connections as | 1765 described in Section 5.5.2. Here, an intruder improperly uses an | 1766 identifier legitimately belonging to someone else, denying the | 1767 legitimate owner service. Addressing this problem is more difficult. | 1768 One approach is to verify the RG associated with an identifier the | 1769 first time it is used. This would appear to require a global PKI | 1770 infrastructure (not available today) in which every potential node is | 1771 registered so that in the case of conflicts, it becomes possible to | 1772 determine the legitimate owner of an identifier. | 1774 Another interesting question concerns at what layer such | 1775 cryptographic mechanisms would be needed. Ideally, the denial of | 1776 service threats must be dealt with at the transport (or lower) layer | 1777 because the threats are to the integrety of the transport layer | 1778 itself. Attempting to solve them at higher-layers (e.g., via IPsec | 1779 and IKE) results in a potential layering circularity, where the | 1780 security mechanisms rely on a correctly functioning transport, but | 1781 the transport relies on those same security mechanisms to provide a | 1782 service. Further work is needed to determine whether such a mechanism | 1783 can be designed using IPsec. | 1785 6. Conclusion 1787 The GSE proposal provides a concrete example of a network protocol 1788 design that separates identifiers from locators in addresses. In 1789 this paper we compared GSE with IPv4's CIDR-style addressing to 1790 better understand the pros and cons of the respective design 1791 approaches. 1793 Functionally speaking, identifiers and locators each have a logically 1794 different role to play. Thus overloading both in one field causes 1795 problems whenever the location of a node changes but its identity 1796 does not. However, our analysis shows that overloading also presents | 1797 three critically important benefits. 1799 First, for network entity A to send data to network entity B, A must 1800 not only know B's end identifier but also B's locator. No scalable 1801 way is known at this time to provide this mapping at the network 1802 layer, other than overloading the two quantities into an address as 1803 is done in IPv4. Fundamentally, a scalable mapping algorithm 1804 strongly suggests that the identifier space be structured 1805 hierarchically, yet identifiers in GSE are not sufficiently large to 1806 both contain sufficient hierarchy and support stateless address 1807 autoconfiguration. Instead, GSE forces applications to supply up- 1808 to-date locators. However, relying on the locator provided at the 1809 time communication is established as GSE does is inadequate when the 1810 remote locator can change dynamically, precisely the scenario that is 1811 supposed to benefit from the separation. That is, the benefits of 1812 separating the identifier from the locator are largely lost, if the 1813 changes in the identifier to locator binding are not tracked quickly. 1815 Second, when communicating with a remote site, if the RG changes | 1816 there begins to be uncertainty as to whether a reliable TCP handshake 1817 is possible (because of the need for passively opened TCP to use the 1818 RG's it obtains from the packets). Because the reliability of TCP's 1819 byte stream is critically dependent on its three-way handshake, this 1820 is a significant issue. 1822 Finally, when communicating with a remote site, a receiver must be 1823 able to insure (with reasonable certainty) that received data does 1824 indeed come from the expected remote entity. In IPv4, it is possible 1825 to receive packets from a forged source, but the potential for 1826 mischief between communicating peers is significantly limited because 1827 return traffic will not generally reach the source of the forged 1828 traffic. That is, communication involving packets sent in both 1829 directions will not succeed. In contrast, architectures like GSE 1830 that decouple the identifier and locator functions lose the built-in 1831 protection available in classical IP and thus face great difficulty 1832 assuring that traffic from a source identified only by an identifier 1833 actually comes from the correct source. Short of using cryptographic 1834 techniques (e.g. IPsec), there is no known mechanism that can use an 1835 identifier alone to perform this remote entity authentication. Using 1836 an identifier alone for authentication of received packets is 1837 dangerously unsafe. 1839 In summary, although overloading the address field with a combined 1840 identifier and locator leads to difficulties in retaining the 1841 identity of a node whenever its address changes, analysis in this 1842 paper suggests that the benefit of the overloading actually out- 1843 weighs its cost. Completely separating an identifier from its 1844 locator renders the identifier untrustworthy, thus useless, in the 1845 absence of an accompanying authentication system. 1847 7. Security Considerations 1849 The primary security consideration with GSE or, more generally, a 1850 network layer with addresses split into locator and identifier parts, 1851 is that of one node impersonating another by copying the 1852 identification without the location. Indeed, the main conclusion of 1853 this paper is that a GSE-like addressing structure introduces new 1854 security vulnerabilities that are not present in IP, and that those 1855 problems are serious enough to question the benefits of an 1856 architecture that separates locaters and identifiers in addresses. 1858 8. Acknowledgments 1860 Thanks go to Steve Deering and Bob Hinden (the Chairs of the IPng 1861 Working Group) as well as Sun Microsystems (the host for the interim 1862 meeting) for the planning and execution of the interim meeting. 1863 Thanks also go to Mike O'Dell for writing the 8+8 and GSE drafts; by 1864 publishing these documents and speaking on their behalf, Mike was the 1865 catalyst for some valuable discussions, both for IPv6 addressing and 1866 for addressing architectures in general. Special thanks to the 1867 attendees of the interim meeting whose high caliber discussions 1868 helped motivate and shape this document. 1870 9. References 1872 [ANYCAST] "Host Anycasting Service", C. Partridge, T. Mendez, & W. 1873 Milliken, RFC 1546. 1875 [BATES] Scalable support for multi-homed multi-provider 1876 connectivity, Tony Bates & Yakov Rekhter, RFC 2260, 1877 January, 1998. 1879 [Bellovin 89] "Security Problems in the TCP/IP Protocol Suite", 1880 Bellovin, Steve, Computer Communications Review, Vol. 19, 1881 No. 2, pp32-48, April 1989. 1883 [CIDR] "Classless Inter-Domain Routing (CIDR): an Address 1884 Assignment and Aggregation Strategy". V. Fuller, T. Li, J. 1885 Yu, & K. Varadhan, RFC 1519, September 1993. 1887 [DHCP-DDNS] Interaction between DHCP and DNS, Internet Draft, Yakov 1888 Rekhter, (Work in Progress.) 1890 [DDNS] "Dynamic Updates in the Domain Name System (DNS UPDATE)", 1891 Paul Vixie (Editor), RFC 2136, April, 1997. 1893 [EUI64] 64-Bit Global Identifier Format Tutorial. 1894 http://standards.ieee.org/db/oui/tutorials/EUI64.html. 1895 Note: "EUI-64" is claimed as a trademark by an organization 1896 which also forbids reference to itself in association with 1897 that term in a standards document which is not their own, 1898 unless they have approved that reference. However, since 1899 this document is not standards-track, it seems safe to name 1900 that organization: the IEEE. 1902 [GSE] "GSE - An Alternate Addressing Architecture for IPv6", Mike 1903 O'Dell, (Work in progress). 1905 [IEEE802] IEEE Std 802-1990, "Local and Metropolitan Area Networks: 1906 IEEE Standard Overview and Architecture." 1908 [IEEE1212] IEEE Std 1212-1994, "Information technology-- 1909 Microprocessor systems: Control and Status Registers (CSR) 1910 Architecture for microcomputer buses." 1912 [IPv6-ADDRESS] "An IPv6 Aggregatable Global Unicast Address 1913 Format", R. Hinden, M. O'Dell, S. Deering, RFC 2374, July, 1914 1998. 1916 [MOBILITY] "IP Mobility Support", C. Perkins, RFC 2002, October, 1917 1996. 1919 [NAT] "IP Network Address Translator (NAT) Terminology and | 1920 Considerations", P. Srisuresh, M. Holdrege, RFC 2663, | 1921 August, 1999. | 1923 [RFC1752] "The Recommendation for the IP Next Generation Protocol", 1924 S. Bradner, A. Mankin, RFC 1752, January, 1995. 1926 [RFC1788] "ICMP Domain Name Messages", W. Simpson, RFC 1788, April, 1927 1995. 1929 [RFC1884] "IP Version 6 Addressing Architecture", R. Hinden & S. 1930 Deering, Editors, RFC 1884. 1932 [RFC1958] "Architectural Principles of the Internet", B. Carpenter, 1933 RFC 1958, June, 1996. 1935 [RFC1971] "IPv6 Stateless Address Autoconfiguration", S. Thomson, 1936 T. Narten, RFC 1971, August, 1996. 1938 [RFC2008] "Implications of Various Address Allocation Policies for 1939 Internet Routing", Y. Rekhter, T. Li, RFC 2008, October 1940 1996. 1942 [RFC2073] An IPv6 Provider-Based Unicast Address Format. Y. 1943 Rekhter, P. Lothberg, R. Hinden, S. Deering, J. Postel. RFC 1944 2073, January, 1997. 1946 [RFC2267] Network Ingress Filtering: Defeating Denial of Service 1947 Attacks which employ IP Source Address Spoofing, P. 1948 Ferguson, D. Senie, RFC 2267, January, 1998. | 1950 [RFC2401] Security Architecture for the Internet Protocol. S. Kent, | 1951 R. Atkinson, RFC 2401, November 1998. | 1953 [RFC2409] The Internet Key Exchange (IKE). D. Harkins, D. Carrel, | 1954 RFC 2267 November 1998. 1956 [ROUTER-RENUM] "Router Renumbering for IPv6", M. Crawford, draft- 1957 ietf-ipngwg-router-renum-06.txt. | 1959 [SITE-PREFIXES] "Site prefixes in Neighbor Discovery", E. Nordmark, | 1960 draft-ietf-ipngwg-site-prefixes-03.txt. | 1962 10. Authors' Addresses 1964 Matt Crawford John Stewart 1965 Fermilab MS 368 Juniper Networks, Inc. 1966 PO Box 500 385 Ravendale Drive 1967 Batavia, IL 60510 USA Mountain View, CA 94043 1968 Phone: 630-840-3461 Phone: +1 650 526 8000 1969 EMail: crawdad@fnal.gov EMail: jstewart@juniper.net 1971 Allison Mankin Lixia Zhang 1972 USC/ISI UCLA Computer Science Department 1973 4350 North Fairfax Drive 4531G Boelter Hall 1974 Suite 620 Los Angeles, CA 90095-1596 USA 1975 Arlington, VA 22203 USA Phone: 310-825-2695 1976 EMail: mankin@isi.edu EMail: lixia@cs.ucla.edu 1977 Phone: 703-812-3706 1979 Thomas Narten 1980 IBM Corporation 1981 3039 Cornwallis Ave. 1982 PO Box 12195 - F11/502 1983 Research Triangle Park, NC 27709-2195 1984 Phone: 919-254-7798 1985 EMail: narten@raleigh.ibm.com 1987 Appendix A: Increased Reliance on Domain Name System (DNS) 1989 As we've discussed in previous sections, the motivation for 1990 separating identifiers from locators in IP address is to allow the 1991 locator portion to change more easily. However because GSE does not 1992 provide a mapping from an ESD to its locator, whenever the locator 1993 changes, GSE falls back on DNS to provide such mapping. 1995 Because any mapping scheme is complicated by renumbering, and because 1996 recent IPv4 experience has shown a requirement for renumbering at 1997 some frequency, it is worthwhile to explore the general renumbering 1998 issue. 2000 A.1: Renumbering and DNS: How Frequently Can We Renumber? 2002 One premise of the GSE proposal [GSE] is that an ISP can renumber the 2003 Routing Goop portion of a site's addresses transparently to the site 2004 (i.e., without coordinating the change with the site). This would 2005 make it possible for backbone providers to aggressively renumber the 2006 Routing Goop part of addresses to achieve a high degree of route 2007 aggregation. On closer examination, frequent (e.g., daily) 2008 renumbering turns out to be difficult in practice because of a 2009 circular dependency between the DNS and routing. Specifically, if a 2010 site's Routing Stuff changes, nodes communicating with the site need 2011 to obtain the new Routing Stuff. In the GSE proposal, one queries 2012 the DNS to obtain this information. However, in order to reach a 2013 site's DNS servers, the pointers controlling the downward delegation 2014 of authoritative DNS servers (i.e., DNS "glue records") must use 2015 addresses with Routing Stuff that are reachable. That is, in order 2016 to find the address for the web server "www.foo.bar.com", DNS queries 2017 might need to be sent to a root DNS server, as well as DNS servers 2018 for "bar.com" and "foo.bar.com". Each of these servers must be 2019 reachable from the querying client. Consequently, there must be an 2020 adequate overlap period after the RG changes, during which both the 2021 old Routing Stuff and the new Routing Stuff can be used 2022 simultaneously. During the overlap period, DNS glue records will 2023 need to be updated to use the new addresses (including Routing Stuff) 2024 and DNS RR's needs to be updated. Only after all relevant DNS 2025 servers have been updated and all previously cached RRs containing 2026 the old addresses have timed out can the old RG be deleted. 2028 An important observation is that the above issue is not specific to 2029 GSE; the same requirement exists with today's provider-based 2030 addressing architecture. When a site is renumbered (e.g., it 2031 switches ISPs and obtains a new set of addresses from its new 2032 provider), the DNS must be updated in a similar fashion. 2034 A.2: Efficient DNS support for Site Renumbering 2036 In the current Internet, when a site is renumbered, the addresses of 2037 all the site's internal nodes change. This requires a potentially 2038 large update to the RR database for that site. Although Dynamic DNS 2039 [DDNS] could potentially be used, the cost is likely to be large due 2040 to the large number of individual records that would need to be 2041 updated. In addition, when DHCP and DDNS are used together [DHCP- 2042 DDNS], it may be the case that individual hosts "own" their own A or 2043 AAAA records, further complicating the question of who is able to 2044 update the contents of DNS RRs. 2046 With GSE, When a site renumbers to satisfy its ISP, only the site's 2047 routing prefix needs to change. That is, the prefix reflects where 2048 within the Internet the site resides. One DNS modification that 2049 could reduce the cost of updating the DNS when a site is renumbered 2050 is to store addresses in two distinct RR's: one for the Routing Goop 2051 that reflects where a node attaches to the Internet and the other for 2052 STP-plus-ESD that is the site-specific part of an address. During a 2053 renumbering, the Routing Goop would change, but the "site internal 2054 part" would remain fixed. That way, renumbering a site would only 2055 require that the Routing Goop RR of a site be updated; the "site- 2056 internal part" of individual addresses would not change. 2058 To obtain the address of a node from the DNS, a DNS query for the 2059 name would return two quantities: the "site internal part" and the 2060 DNS name of the Routing Stuff for the site. An additional DNS query 2061 would then obtain the specific RR of the site, and the complete 2062 address would be synthesized by concatenating the two pieces of 2063 information. 2065 Implementing these DNS changes increases the practicality of using 2066 Dynamic DNS to update a site's DNS records as it is renumbered. Only 2067 the site's Routing Goop RRs would need updating. 2069 Finally, it may be useful to divide a node's AAAA RR into the three 2070 logical parts of the GSE proposal, namely RG, STP and ESD. Whether 2071 or not it is useful to have separate RRs for the STP and ESD portions 2072 of an address or a single RR combining both is an issue that requires 2073 further study. 2075 If AAAA records are comprised of multiple distinct RRs, then one 2076 question is who should be responsible for synthesizing the AAAA from 2077 its components: the resolver running on the querying client's machine 2078 or the queried name server? To minimize the impact on client hosts 2079 and make it easier to deploy future changes, it is recommended that 2080 the synthesis of AAAA records from its constituent parts be done on 2081 name servers rather than in client resolvers. 2083 A.2.1: Two-Faced DNS 2085 The GSE proposal attempts to hide the RG part of addresses from nodes 2086 within a site. If the nodes do not know their own RG, then they 2087 can't store or use them in ways that cause problems should the site 2088 be renumbered and its RG change (i.e., the cached RG become invalid). 2089 A site's DNS servers, however, will need to have more information 2090 about the RG its site uses. Moreover, the responses it returns will 2091 depend on who queries the server. A query from a node within the 2092 site should return an address with a Site Local RG, whereas a query 2093 for the same name from a client located at a different site should 2094 return the global scope RG. This facilitates intra-site 2095 communication to be more resilient to failures outside of the site. 2096 Such context-dependent DNS servers are commonly referred as "two- 2097 faced" DNS servers. 2099 Some issues that must be considered in this context: 2101 1) A DNS server may recursively attempt to resolve a query on 2102 behalf of a requesting client. Consequently, a DNS query might 2103 be received from a proxy rather than from the client that 2104 actually seeks the information. Because the proxy may not be 2105 located at the same site as the originating client, a DNS server 2106 cannot reliably determine whether a DNS request is coming from 2107 the same site or a remote site. One solution would be to 2108 disallow recursive queries for off-site requesters, though this 2109 raises additional questions. 2111 2) Since cached responses are, in general, context sensitive, a 2112 name server may be unable to correctly answer a query from its 2113 cache, since the information it has is incomplete. That is, it 2114 may have loaded the information via a query from a local client, 2115 and the information has a site-local prefix. If a subsequent 2116 request comes in from an off-site requester, the DNS server 2117 cannot return a correct response (i.e., one containing the 2118 correct RG). 2120 A.2.2: Bootstrapping Issues 2122 If Routing Stuff information is distributed via the DNS, key DNS 2123 servers must always be reachable. In particular, the addresses 2124 (including Routing Stuff) of all root DNS servers are, for all 2125 practical purposes, well-known and assumed to never change. It is 2126 not uncommon for the addresses of root servers to be hard-coded into 2127 software distributions. Consequently, the Routing Stuff associated 2128 with such addresses must always be usable for reaching root servers. 2129 If it becomes necessary or desirable to change the Routing Stuff of 2130 an address at which a root DNS server resides, the routing subsystem 2131 will likely need to continue carrying "exceptions" for those 2132 addresses. Because the total number of root DNS servers is 2133 relatively small, the routing subsystem is expected to be able to 2134 handle this requirement. 2136 All other DNS server addresses can be changed, since their addresses 2137 are typically learned from an upper-level DNS server that has 2138 delegated a part of the name space to them. So long as the 2139 delegating server is configured with the new address, the addresses 2140 of other servers can change. 2142 Appendix B: Additional Issues Related to Specifically to GSE | 2144 This paper focused primarily on the issues of separating identifiers 2145 and locators in unicast addresses. It is worth noting that a number | 2146 of GSE-specific additional issues were identified during the IPng | 2147 interim meeting. These stem from a GSE end node not knowing its own | 2148 RG and the need for border routers to translate the RG of addresses. | 2149 These issues would need to be considered before an architecture such | 2150 as GSE could be deployed. Specifically: 2152 - - it is not known how multicast would work under GSE. One 2153 identified issue is that a site with multiple egress routers 2154 would (by default) inject multicast traffic through each egress | 2155 routers, each would then replace the source Routing Goop with a 2156 differing value. This would lead to multiple copies of the same 2157 packet each carrying a different IPv6 address, thus being 2158 considered as from different sources. 2160 - - It would be more difficult to create tunnels. Any tunnel that 2161 crosses a site boundary (i.e., the entry and exit points are in 2162 differing sites) would in effect require that both tunnel 2163 endpoints be border routers to insure that the addresses in the 2164 inner headers were rewritten correctly. 2166 - - In order for the DNS to hide a site's Routing Goop from 2167 internal nodes yet make it visible to external nodes requires a 2168 two-faced DNS. The current DNS model assumes a single global 2169 database in which all queries are answered the same way, 2170 irregardless of who issued the query. It is unclear how to make 2171 the DNS answer queries in a context-sensitive manner without | 2172 also negatively impacting (i.e., crippling) its caching model. | 2174 - - Applications that send addresses in payloads (e.g., FTP PORT | 2175 command) may run into difficulties with GSE. Because the sender | 2176 does not know its own RG, the addresses it sends in payloads | 2177 will contain only the site-local prefix in the RG portion of the | 2178 address. In order for the receiver to open a connection back to | 2179 that address, it needs the proper RG. This problem is analagous | 2180 to that of NATs, where addresses in payloads need to be | 2181 rewritten (e.g., via an ALG) when crossing the boundary between | 2182 different addressing realms [NAT]. | 2184 - - Border routers need to rewrite the source address of outgoing | 2185 packets. Additional parsing of packet headers is also required, | 2186 to find and rewrite any other addresses containing the site- | 2187 local prefix. For example, the source routing header may contain | 2188 additional addresses. 2190 Appendix C: Ideas Incorporated Into IPv6 2192 This section summarizes changes made to IPv6 specifications which 2193 originated in the GSE proposal or in the discussions arising from it. 2195 The unicast address format was changed to improve the aggregability 2196 of unicast addresses. Instead of a topologically insignificant 2197 Registry ID immediately following the Format Prefix [RFC2073], there 2198 is now a Top-Level Aggregation Identifier [IPv6-ADDRESS]. This field 2199 identifies a large routable aggregate to which an address belongs 2200 rather than an administrative unit that assigned the address. The 2201 TLA corresponds to the "Large Structure" of GSE. The IPv6 Next-Level 2202 Aggregation Identifier (NLA) is roughly the rest of the GSE "Routing 2203 Goop" and the Site-Level Aggregation Identifier (SLA) is a slightly 2204 expanded GSE Site Topology Partition. 2206 The decision to put fixed boundaries between parts of the unicast 2207 address (TLA, NLA, SLA, Interface Identifier) into IPv6 addresses 2208 [IPv6-ADDRESS] also came from GSE. The previous "provider-based" 2209 addressing architecture for IPv6 [RFC2073] had fluid boundaries 2210 between Registry ID, Provider ID, Subscriber ID and the Intra- 2211 Subscriber part, as well as undefined divisions within the Provider- 2212 ID and Intra-Subscriber part. (On subnetworks with a MAC-layer 2213 address, the latter boundary was generally placed to accommodate use 2214 of that address as an Interface ID.) The new addressing architecture 2215 still expects divisions within the NLA portion of the address, placed 2216 to reflect topological aggregation points. 2218 Defining a fixed boundary between the routable portion of the address 2219 and the part indicating an interface on a specific link required 2220 specifying an Interface Identifier that would be suitable for all 2221 subnetwork technologies. The IEEE "EUI-64" identifier was selected, 2222 having the advantages of an easy mapping from 48 bit MAC addresses 2223 and a defined escape flag into locally-administered values. 2225 Another change was the redefinition of the interface identifier to be 2226 a 64-bit quantity. In the common case where a node has at least one 2227 IEEE interface, the interface identifier is constructed from an IEEE 2228 identifier (i.e., a MAC address) in such a way that there is a very 2229 high probability that the identifier will be globally unique. In the 2230 case where a globally unique identifier can't easily be constructed 2231 automatically, a bit in the identifier indicates that the address is 2232 not globally unique. At present, there are no plans for transport 2233 protocols such as TCP to exploit interface identifiers, but the door 2234 has been left open for a future protocol (e.g., TCPng) to take 2235 advantage of the ESD concept. 2237 Another change to come out of the GSE discussions relates to reducing 2238 the number of DNS record changes required in the event of site 2239 renumbering. This work is not finalized as of this writing, but the 2240 result may be that individual IPv6 addresses are stored (and signed, 2241 in the case of Secure DNS) as a partial address and an indirect 2242 pointer which leads to the high-order part of the address. There may 2243 be multiple levels of indirection and a changed record at any one 2244 level would suffice to update the DNS's record of the IPv6 addresses 2245 of every node in a given branch of the addressing hierarchy. 2247 A change in the method of doing DNS address-to-name lookups is also 2248 in the works. This may be a change in the form and/or operation of 2249 the ip6.int domain or some new mechanism which involves participation 2250 by the routers or the end-nodes themselves. 2252 Another example of follow-on work is site prefixes [SITE-PREFIXES], | 2253 whose aim is to have communicating parties prefer site-local | 2254 addresses for internal communication. Applications using site-local | 2255 addresses are generally immune to renumbering issues that effect only | 2256 global-scope addresses. | 2258 Two other changes arising from GSE will not affect the IPv6 base 2259 specifications themselves, but do direct additional work. Those are 2260 the injection of global prefix information into a site from a 2261 provider or exchange [ROUTER-RENUM], and some inter-provider 2262 cooperative method of providing multihoming to mutual customers with 2263 minimal impact on routing tables in distant parts of the network. 2265 Appendix D: Reverse Mapping of Complete GSE Addresses 2267 The ability to map an IP address into its corresponding DNS name is 2268 used in several contexts: 2270 1) Network packet tracing utilities (e.g., tcpdump) display the 2271 contents of packets. Printing out the DNS names appearing in 2272 those packets (rather than dotted IP addresses) requires access 2273 to an address-to-name mapping mechanism. 2275 2) Some applications perform a "poor-man's" authentication by using 2276 the DNS to map the source address of a peer into a DNS name. 2277 The client then queries the DNS a second time, this time asking 2278 for the address(es) corresponding to the peer's DNS name. Only 2279 if one of the addresses returned by the DNS matches the peer 2280 address of the TCP connection is the source of the TCP 2281 connection accepted as being from the indicated DNS name. 2283 It is important to note that although two DNS queries are made 2284 during the above operation, it is the second one --- mapping the 2285 peer's DNS name back into an IP address --- that provides the 2286 authentication property. The first transaction simply obtains 2287 the peer's DNS name, but no assumption is made that the returned 2288 DNS name is correct. Thus, the first DNS query could be 2289 replaced by an alternate mechanism without weakening the already 2290 weak authentication check described above. One possible 2291 alternate mechanism, an ICMP "Who Are You" message, is described | 2292 below. 2294 3) Applications that log all incoming network connections (e.g., 2295 anonymous FTP servers) may prefer logging recognizable DNS names 2296 to addresses. 2298 4) Network administrators examining logs or other trace data 2299 containing addresses may wish to determine the DNS name of some 2300 addresses. Note that this may occur sometime after those 2301 addresses were actually used. 2303 The following subsections describe techniques for mapping a full IPv6 2304 address back into some quantity (e.g., a DNS name or locator). We 2305 include these descriptions for completeness even though they do not 2306 address the fundamental problem of how to perform the mapping on an 2307 identifier alone. It should also be noted that because both 2308 techniques operate on complete IPv6 addresses, they are both directly 2309 applicable to provider-based addressing schemes and are not specific 2310 to GSE. 2312 D.1: DNS-Like Reverse Mapping of Full GSE Addresses 2314 Although it seems infeasible to have a global scale, reverse mapping 2315 of ESDs, within a site, it may be feasible to maintain a database 2316 keyed on unstructured 8-byte ESDs. However, it is an open question 2317 whether such a database can be kept up-to-date at reasonable cost, 2318 without making unreasonable assumptions as to how large sites are 2319 going to grow, and how frequently ESD registrations will be made or 2320 updated. Note that the issue isn't just the physical database 2321 itself, but the operational issues involved in keeping it up-to-date. 2322 For the rest of this section, however, let us assume that such a 2323 database can be built. 2325 A mechanism supporting a lookup keyed on a flat-space ESD from an 2326 arbitrary site requires having sufficient structure to identify the 2327 site that needs to be queried. In practice, since the Routing Stuff 2328 is organized hierarchically, if an ESD is always used in conjunction 2329 with Routing Stuff (i.e., a full 16-byte address), it becomes 2330 feasible to maintain a DNS-like tree that maps full GSE addresses 2331 into DNS names, in a fashion analagous to what is done with IPv4 PTR | 2332 records today. 2334 It should be noted that a GSE address lookup will work only if the 2335 Routing Stuff portion of the address is correctly entered in the DNS 2336 tree. Because the Routing Stuff portion of an address is expected to 2337 change over time, this assumption will not hold valid indefinitely. 2338 As a consequence, a packet trace recorded in the past might not 2339 contain enough information to identify the off-Site sources of the 2340 packets in the present. This problem can be addressed by requiring 2341 that the database of RG delegations be maintained, together with 2342 accurate timing information, for some period of time after the RG is 2343 no longer usable for routing packets. 2345 Finally, it should be noted that the problem where an address's RG 2346 "expires" with the implication that the mapping of "expired" 2347 addresses into DNS names may no longer hold is not a problem specific 2348 to the GSE proposal. With provider-based addressing, the same issue 2349 arises when a site renumbers into a new provider prefix and releases 2350 the allocation from a previous block. The authors are aware of one 2351 such renumbering incidence in IPv4 where a block of returned 2352 addresses was reassigned and reused within 24 hours of the 2353 renumbering event. 2355 D.2: The ICMP Who-Are-You Message 2357 There is widespread agreement on the utility of being able to 2358 determine the DNS name one is communicating with from the address 2359 being used. In addition to the fact that DNS names are more 2360 meaningful to human users and more stable than addresses, many users 2361 use this reverse mapping as part of a poor-man's authentication for 2362 the remote peer; if one can map the obtained DNS name back to the 2363 same address, one has an increased confidence of the peer being a 2364 legitimate one. 2366 In practice, however, the IN-ADDR.ARPA domain is not fully populated | 2367 and poorly maintained. Consequently, an old proposal to define an 2368 ICMP Who-Are-You message was resurrected [RFC1788]. A client would 2369 send such a message to a peer, and that peer would return an ICMP 2370 message containing its DNS name. Asking a remote host to supply its 2371 own name in no way implies that the returned information is accurate. 2372 However, having a remote peer provide a piece of information that a 2373 client can use as input to a separate authentication procedure 2374 provides a starting point for performing strong authentication. The 2375 actual strength of the authentication depends on the authentication 2376 procedure invoked, rather than the untrustable piece of information 2377 provided by a remote peer. 2379 Reconsidering the "cheap" authentication procedure described earlier, 2380 the ICMP Who-Are-You replaces the DNS PTR query used to obtain the 2381 DNS name of a remote peer. The second DNS query, to map the DNS name 2382 back into a set of addresses, would be performed as before. Because 2383 the latter DNS query provides the strength of the authentication, the 2384 use of an ICMP Who-Are-You message does not in any way weaken the 2385 strength of the authentication method. Indeed, it can only make it 2386 more useful in practice, because virtually all hosts can be expected 2387 to implement the Who-Are-You message. 2389 The Who-Are-You message has advantages outside the context of GSE as 2390 well, including a more decentralized, and hence more scalable, 2391 administration and easier upkeep than a DNS reverse-lookup zone. It 2392 also has drawbacks: it requires the target node to be up and 2393 reachable at the time of the query and to know its fully qualified 2394 domain name. It is also not possible to resolve addresses once those 2395 addresses become unroutable. In contrast, the DNS PTR mirrors, but 2396 is independent of, the routing hierarchy. The DNS can maintain 2397 mappings long after the routing subsystem stops delivering packets to 2398 certain addresses. 2400 The requirement that the target node be up and reachable at the time 2401 of the query makes it very uncertain that one would be able to take 2402 addresses from a packet log and translate them to correct domain 2403 names at a later time. One can argue that this is a design flaw in 2404 the logging system, as it violates the architectural principle, 2405 "Avoid any design that requires addresses to be ... stored on non- 2406 volatile storage" [RFC1958]. A better-designed system would look up 2407 domain names promptly from logged addresses. Indeed, one of the 2408 authors has been doing that for some years.