idnits 2.17.1 draft-frejborg-hipv4-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 17 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 107: '... The key words MUST, MUST NOT, REQUI...' RFC 2119 keyword, line 108: '... SHOULD NOT, RECOMMENDED, MAY, and O...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 19, 2011) is 4746 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 110, but not defined ** Obsolete normative reference: RFC 1385 (Obsoleted by RFC 6814) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) ** Obsolete normative reference: RFC 4601 (Obsoleted by RFC 7761) -- Obsolete informational reference (is this intentional?): RFC 4423 (Obsoleted by RFC 9063) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) == Outdated reference: A later version (-11) exists of draft-rja-ilnp-intro-10 == Outdated reference: A later version (-24) exists of draft-ietf-lisp-11 -- Obsolete informational reference (is this intentional?): RFC 5395 (Obsoleted by RFC 6195) -- Obsolete informational reference (is this intentional?): RFC 4941 (Obsoleted by RFC 8981) Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Research Task Force Patrick Frejborg 2 Internet Draft April 19, 2011 3 Intended status: Experimental 4 Expires: October 2011 6 Hierarchical IPv4 Framework 7 draft-frejborg-hipv4-14.txt 9 Abstract 11 This document describes a framework for how the current IPv4 address 12 space can be divided into two new address categories: a core address 13 space (Area Locators, ALOC) that is globally unique, and an edge 14 address space (Endpoint Locators, ELOC) that is regionally unique. In 15 the future the ELOC space will only be significant in a private 16 network or in a service provider domain. Therefore, a 32x32 bit 17 addressing scheme and a hierarchical routing architecture are 18 achieved. The hierarchical IPv4 framework is backwards compatible 19 with the current IPv4 Internet. 21 This document also discusses a method for decoupling the location and 22 identifier functions - future applications can make use of the 23 separation. The framework requires extensions to the existing Domain 24 Name System, the existing IPv4 stack of the endpoints, middleboxes, 25 and to routers in the Internet. The framework can be implemented 26 incrementally for endpoints, DNS, middleboxes, and routers. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as Internet- 36 Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/ietf/1id-abstracts.txt 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 This Internet-Draft will expire on October 19, 2011. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Requirements Notation..........................................3 68 2. Introduction...................................................3 69 3. Definitions of Terms...........................................6 70 4. Hierarchical Addressing........................................9 71 5. Intermediate Routing Architecture.............................10 72 5.1. Overview.................................................10 73 5.2. Life of a hIPv4 Session..................................14 74 6. Long-term Routing Architecture................................17 75 6.1. Overview.................................................18 76 6.2. Exit- DFZ-, and Approach Routing.........................20 77 7. Decoupling Location and Identification........................22 78 8. ALOC Use Cases................................................23 79 9. Mandatory Extensions..........................................27 80 9.1. Overview.................................................27 81 9.2. DNS Extensions...........................................28 82 9.3. Extensions to the IPv4 Header............................29 83 10. Consequences.................................................33 84 10.1. Overlapping Local and Remote ELOC Prefixes/Ports........33 85 10.2. Large Encapsulated Packets..............................34 86 10.3. Affected Applications...................................34 87 10.4. ICMP....................................................36 88 10.5. Multicast...............................................36 89 11. Traffic Engineering Considerations...........................38 90 11.1. Valiant Load-Balancing..................................38 91 12. Mobility Considerations......................................40 92 13. Transition Considerations....................................41 93 14. Security Considerations......................................43 94 15. IANA Considerations..........................................45 95 16. Conclusions..................................................45 96 17. References...................................................46 97 17.1. Normative References....................................46 98 17.2. Informative References..................................47 99 18. Acknowledgments..............................................50 100 Appendix A. Short Term and Future IPv4 Address Allocation Policy.51 101 Appendix B. Multi-homing becomes Multi-pathing...................53 102 Appendix C. Incentives and Transition Arguments..................58 103 Appendix D. Integration with CES Architectures...................60 105 1. Requirements Notation 107 The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, 108 SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be 109 interpreted as described in [RFC2119]. 111 2. Introduction 113 A Locator/Identifier Separation Protocol [LISP] presentation from a 114 breakout session at an expo held in January, 2008, triggered a 115 research study; findings from the study are described in this 116 document. Further studies revealed that the routing community at IETF 117 is concerned about the scalability of the routing and addressing 118 system of the future Internet. The Internet Architecture Board (IAB) 119 held a Routing and Addressing workshop on October 18-19, 2006, in 120 Amsterdam. The outcome from the workshop is documented in [RFC4984]. 121 Also, IRTF had established a Routing Research Group [RRG] in 2007 and 122 created some design guidelines, see [RRG_Design_Goals]. 124 The author of this document found the LISP approach very interesting 125 because the IP address space is proposed to be separated into two 126 groups: Routing Locators (RLOC), which are present in the global 127 routing table of the Internet called the Default-Free Zone (DFZ), and 128 Endpoint Identifiers (EID), which are only present in edge networks 129 attached to the Internet. 131 The proposed LISP architecture reduces the routing information in the 132 DFZ but it also introduces a new mapping system that would require a 133 caching solution at the border routers installed between the edge 134 networks and DFZ. EID prefixes are not needed in the DFZ since a 135 tunneling (overlay) scheme is applied between the border routers. To 136 the author, this seems to be a complex architecture that could be 137 improved by applying lessons learned from similar past architectures 138 - in the 90s, overlay architectures were common, deployed on top of 139 Frame Relay and ATM technologies. Cache-based routing architectures 140 have also been tried, for example Ipsilon's IP Switching. These 141 architectures have largely been replaced by MPLS [RFC3031] for 142 several reasons - one being that overlay and caching solutions have 143 historically suffered from scalability issues. Technology has 144 certainly evolved since the 90s. The scalability issues of overlay 145 and caching solutions may prove to be less relevant for modern 146 hardware and new methods, see [Revisiting_Route_Caching] 148 Nevertheless, the author has some doubt whether overlay and caching 149 will scale well, based upon lessons learned from past overlay and 150 caching architectures. The hierarchical IPv4 framework proposal arose 151 from the question of whether the edge and core IP addressing 152 groupings from LISP could be used without creating an overlay 153 solution by borrowing ideas from MPLS to develop a peer-to-peer 154 architecture. That is, instead of tunneling, why not swap IP 155 addresses (hereafter called locators) on a node in the DFZ? By 156 introducing a shim header to the IPv4 header and Realm Border Router 157 (RBR) functionality on the network, the edge locators are no longer 158 needed in the routing table of DFZ. 160 Two architectural options existed regarding how to assemble the 161 packet so that RBR functionality can be applied in the DFZ: either 162 the packet was assembled by an ingress network node (similar to LISP 163 or MPLS), or at the endpoint itself. The major drawback in assembling 164 the packet with a shim header at the endpoint is that the endpoints' 165 stack must be upgraded - however a significant advantage is that the 166 Path MTU Discovery issue, as discussed in e.g. LISP, would not exist. 167 In addition, the caching scalability issue is mitigated to the 168 greatest extent possible by pushing caching to the endpoint. 170 This approach also opened up the possibility of extending the current 171 IP address scheme with a new dimension. In a MPLS network, 172 overlapping IP addresses are allowed since the forwarding plane is 173 leveraging label information from the MPLS shim header. By applying 174 RBR functionality, extending the current IPv4 header with a shim 175 header and assembling the new header at endpoints, an IP network can 176 also carry packets with overlapping edge locators, although the core 177 locators must still be globally unique. The location of an endpoint 178 is also no longer described by a single address space; it is 179 described by a combination of an edge locator and a core locator, or 180 a set of core locators. 182 Later on, it was determined that the current 32-bit address scheme 183 can be extended to 64 bits - 32 bits reserved for globally unique 184 core locators and 32 bits reserved for locally unique edge locators. 186 The new 64-bit addressing scheme is backwards compatible with the 187 currently deployed Internet addressing scheme. 189 By making the architectural decisions described above, the foundation 190 for the hierarchical IPv4 framework was laid out. 192 Note that the hierarchical IPv4 framework is abbreviated as hIPv4, 193 which is close to the abbreviation of Host Identity Protocol, HIP 194 [RFC4423]. Thus, the reader needs to pay attention to the use of the 195 two abbreviations - hIPv4 and HIP, which represent two different 196 architectures. 198 Use of the hIPv4 abbreviation has caused much confusion, but it was 199 chosen for two reasons: 201 o Hierarchical - to emphasize that a hierarchical addressing scheme 202 is developed. A formalized hierarchy is achieved in the routing 203 architecture. Some literature describes today's Internet as 204 already using hierarchical addressing. The author believes that 205 this claim is not valid - today's Internet uses one flat address 206 space. 208 It is true that we have hierarchical routing in place. A routing 209 architecture can consist of at least three types of areas: stub 210 area, backbone area, and autonomous system (AS). The current flat 211 address space is summarized or aggregated at border routers 212 between the areas to suppress the size of a routing table. In 213 order to carry out summaries or aggregates of prefixes the address 214 space must be continuous over the areas. 216 Thus, the author concludes that the current method is best 217 described as an aggregating addressing scheme since there are 218 address block dependencies between the areas. Dividing addresses 219 into edge and core locator spaces (a formalized hierarchy) opens 220 up a new dimension - the edge locator space can still be deployed 221 as an aggregating address scheme on the three types of areas 222 mentioned earlier. In hIPv4, the core locators are combined with 223 edge locators, independent from each other - the two locator space 224 allocation policies are separated and no dependencies exist 225 between the two addressing schemes in the long-term architecture. 227 A new hierarchical addressing scheme is achieved: a two-level 228 addressing scheme describing how the endpoint is attached to the 229 local network and also how the endpoint is attached to the 230 Internet. This change in the addressing scheme will enable a 231 fourth level, called Area Locator (ALOC) realm, at the routing 232 architecture. 234 o IPv4 - to emphasize that the framework is still based upon the 235 IPv4 addressing scheme, and is only an evolution from the 236 currently deployed addressing scheme of the Internet 238 While performing this research study, the author reviewed a previous 239 hierarchical addressing and routing architecture that had been 240 proposed in the past, the Extended Internet Protocol, EIP [RFC1385]. 241 Should the hIPv4 framework ever be developed from a research study to 242 a standard RFC, it is recommended that the hierarchical IPv4 243 framework name be replaced with Extended Internet Protocol, EIP, 244 since both architectures share similarities, e.g. backwards 245 compatibility with existing deployed architecture, hierarchical 246 addressing etc. and the hIPv4 abbreviation can be mixed up with HIP. 248 This document is an individual contribution to the IRTF Routing 249 Research Group (RRG); discussions between those on the mailing list 250 of the group have influenced the framework. The views in this 251 document are considered controversial by the IRTF Routing Research 252 Group (RRG), but the group reached a consensus that the document 253 should still be published. Since consensus was not achieved at RGG 254 regarding which proposal should be preferred - as stated in 255 [RFC6115]: "The group explored a number of proposed solutions but did 256 not reach consensus on a single best approach" - thus all proposals 257 produced within RRG can be considered controversial. 259 3. Definitions of Terms 261 This document makes use of the following terms: 263 Regional Internet Registry (RIR): 265 This is an organization overseeing the allocation and registration 266 of Internet number resources within a particular region of the 267 world. Resources include IP addresses (both IPv4 and IPv6) and 268 autonomous system numbers. 270 Locator: 272 A locator is a name for a point of attachment within the topology 273 at a given layer. Objects that change their point(s) of attachment 274 will need to change their associated locator(s). 276 Global Locator Block (GLB): 278 An IPv4 address block that is globally unique. 280 Area Locator (ALOC): 282 An IPv4 address (/32) assigned to locate an ALOC realm in the 283 Internet. The ALOC is assigned by a RIR to a service provider. The 284 ALOC is globally unique because it is allocated from the GLB. 286 Endpoint Locator (ELOC): 288 An IPv4 address assigned to locate an endpoint in a local network. 289 The ELOC block is assigned by a RIR to a service provider or to an 290 enterprise. In the intermediate routing architecture the ELOC 291 block is only unique in a geographical region. The final policy of 292 uniqueness shall be defined by the RIRs. In the long-term routing 293 architecture the ELOC block is no longer assigned by a RIR, it is 294 only unique in the local ALOC realm. 296 ALOC realm: 298 An area in the Internet with at least one attached Realm Border 299 Router (RBR). Also, an ALOC must be assigned to the ALOC realm. 300 The Routing Information Base (RIB) of an ALOC realm holds both 301 local ELOC prefixes and global ALOC prefixes. An ALOC realm 302 exchanges only ALOC prefixes with other ALOC realms. 304 Realm Border Router (RBR): 306 A router or node that is able to identify and process the hIPv4 307 header. In the intermediate routing architecture the RBR shall be 308 able to produce a service, that is, to swap the prefixes in the IP 309 header and locator header, and then forward the packet according 310 to the value in destination address field of the IP header. In the 311 long-term routing architecture the RBR is not required to produce 312 the swap service. The RBR can instead make use of the Forwarding 313 Indicator field in the locator header. Once the FI-bits are 314 processed the RBR forwards the packet according to the value in 315 destination address of the IP header or according to the value in 316 the ELOC field of the locator header. The RBR must have the ALOC 317 assigned as its locator. 319 Locator Header: 321 A 4-byte or 12-byte field, inserted between the IP header and 322 transport protocol header. If an identifier/locator split scheme 323 is used, the size of the locator header is further expanded. 325 Identifier: 327 An identifier is the name of an object at a given layer. 328 Identifiers have no topological sensitivity, and do not have to 329 change, even if the object changes its point(s) of attachment 330 within the network topology. 332 Identifier/locator split scheme: 334 Separate identifiers used by applications from locators that are 335 used for routing. The exchange of identifiers can occur discreetly 336 between endpoints that have established a session, or the 337 identifier/locator split can be mapped at a public database. 339 Session: 341 A session is an interactive information exchange between endpoints 342 that is established at a certain time and torn down at a later 343 time. 345 Provider Independent Address Space (PI addresses/prefixes): 347 An IPv4 address block that is assigned by a Regional Internet 348 Registry directly to a user organization. 350 Provider Aggregatable Address Space (PA addresses/prefixes): 352 An IPv4 address block assigned by a Regional Internet Registry to 353 an Internet Service Provider which can be aggregated into a single 354 route advertisement. 356 Site mobility: 358 A site wishes to change its attachment point to the Internet 359 without changing its IP address block. 361 Endpoint mobility: 363 An endpoint moves relatively rapidly between different networks, 364 changing its IP layer network attachment point. 366 Subflow: 368 A flow of packets operating over an individual path, the flow 369 forming part of a larger transport protocol connection. 371 4. Hierarchical Addressing 373 The current IP addressing (IPv4) and the future addressing (IPv6) 374 schemes of the Internet are unidimensional by their nature. This 375 limitation - the unidimensional addressing scheme - has created some 376 roadblocks, for example breaking end-to-end connectivity due to NAT, 377 limited deployment of SCTP [RFC4960] etc., for further growth of the 378 Internet. 380 If we compare the Internet's current addressing schemes to other 381 global addressing or location schemes, we notice that the other 382 schemes use several levels in their structures. For example, the 383 postal system uses street address, city, and country to locate a 384 destination. To locate a geographical site we use longitude and 385 latitude in the cartography system. The other global network, the 386 Public Switched Telephone Network (PSTN), has been built upon a 387 three-level numbering scheme that has enabled a hierarchical 388 signaling architecture. By expanding the current IPv4 addressing 389 scheme from a single level to a two-level addressing structure, most 390 of the issues discussed in [RFC4984] can be solved. Also, a 391 hierarchical addressing scheme would better describe the Internet we 392 have in place today. 394 Looking back, it seems that the architecture of the Internet changed 395 quite radically from the intended architecture with the introduction 396 of [RFC1918], which divides the hosts into three categories and the 397 address space into two categories - globally unique and private 398 address spaces. This idea allowed for further growth of the Internet 399 and extended the life of the IPv4 address space, and ended up 400 becoming much more successful than expected. RFC1918 didn't solve the 401 multi-homing requirements for endpoints providing services for 402 Internet users, that is, multi-homed sites with globally unique IP 403 addresses at endpoints to be accessed from the Internet. 405 Multi-homing has imposed some challenges for the routing architecture 406 that [RRG] is addressing in [RFC6115]. Almost all proposals in the 407 report suggest a core and edge locator separation or elimination to 408 create a scalable routing architecture. The core locator space can be 409 viewed to be similar to the globally unique address space, and the 410 edge locator space similar to the private address space in RFC1918. 412 RFC1918 has already demonstrated that Internet scales better with the 413 help of categorized address spaces, that is, globally unique and 414 private address spaces. The RRG proposals suggest that the Internet 415 will be able to scale even further by introducing core and edge 416 locators. Why not then change the addressing scheme (both IPv4 and 417 IPv6 addressing schemes, though this document is only focusing on 418 IPv4) to better reflect the current and forthcoming Internet routing 419 architecture? If we continue to use a flat addressing scheme, and 420 combine it with core (global) and edge (private) locator (address) 421 categories, the routing architecture will have to support additional 422 mechanisms, such as NAT, tunneling, or locator rewriting with the 423 help of an identifier to overcome the mismatch. The result will be 424 that information is lost or hidden for the endpoints. With a two- 425 level addressing scheme, these additional mechanisms can be removed 426 and core/edge locators can be used to create new routing and 427 forwarding directives. 429 A convenient way to understand the two-level addressing scheme of the 430 hIPv4 framework is to compare it to the PSTN numbering scheme 431 (E.164), which uses country codes, national destination codes, and 432 subscriber numbers. The Area Locator (ALOC) prefix in the hIPv4 433 addressing scheme can be considered similar to the country code in 434 PSTN, i.e., the ALOC prefix locates an area in the Internet called an 435 ALOC realm. The Endpoint Locator (ELOC) prefixes in hIPv4 can be 436 compared to the subscriber numbers in PSTN - the ELOC is regionally 437 unique (in the future, locally unique) at the attached ALOC realm. 438 The ELOC can also be attached simultaneously to several ALOC realms. 440 By inserting the ALOC and ELOC elements as a shim header (similar to 441 the MPLS and [RBridge] architectures) between the IPv4 header and the 442 transport protocol header, a hIPv4 header is created. From the 443 network point of view, the hIPv4 header "looks and feels like" an 444 IPv4 header, thus fulfilling some of the goals as outlined in EIP and 445 in the early definition of [Nimrod]. The outcome is that the current 446 forwarding plane does not need to be upgraded, though some minor 447 changes are needed in the control plane (e.g., ICMP extensions). 449 5. Intermediate Routing Architecture 451 The intermediate routing architecture is backwards compatible with 452 the current deployed Internet, that is, the forwarding plane remains 453 intact except that the control plane needs to be upgraded to support 454 ICMP extensions. The endpoints' stack needs to be upgraded, and 455 middleboxes need to be upgraded or replaced. In order to speed up the 456 transition phase, middleboxes might be installed in front of 457 endpoints so that their stack upgrade can be postponed; for further 458 details see Appendix D. 460 5.1. Overview 462 As mentioned in previous sections, the role of an Area Locator (ALOC) 463 prefix is similar to a country code in PSTN - the ALOC prefix 464 provides a location functionality of an area within an Autonomous 465 System (AS), or an area spanning over a group of ASes, in the 466 Internet. An area can have several ALOC prefixes assigned, e.g. for 467 traffic engineering purposes such as load balancing among several 468 ingress/egress points at the area. The ALOC prefix is used for 469 routing and forwarding purposes on the Internet, and so the ALOC 470 prefix must be globally unique and is allocated from an IPv4 address 471 block. This globally unique IPv4 address block is called the Global 472 Locator Block (GLB). 474 When an area within an AS (or a group of ASes) is assigned an ALOC 475 prefix, the area has the potential to become an ALOC realm. In order 476 to establish an ALOC realm, more elements, more than just the ALOC 477 prefix, are needed. One or multiple Realm Border Routers (RBRs) must 478 be attached to the ALOC realm. A RBR element is a node capable of 479 swapping the prefixes of the IP header and the new shim header, 480 called the locator header. The swap service is described in detail in 481 section 5.2, step 3. 483 Today's routers do not support this RBR functionality. Therefore, the 484 new functionality will most likely be developed on an external device 485 attached to a router belonging to the ALOC realm. The external RBR 486 might be a server with two interfaces attached to a router, the first 487 interface configured with the prefix of the ALOC and the second with 488 any IPv4 prefix. The RBRs do not make use of dynamic routing 489 protocols, so neither a Forwarding Information Base (FIB) nor a cache 490 is needed - the RBR performs a service, swapping headers. 492 The swap service is applied on a per packet basis and the information 493 needed to carry out the swap is included in the locator header of the 494 hIPv4 packet. Thus, a standalone device with sufficient computing and 495 I/O resources to handle the incoming traffic can take the role as a 496 RBR. Later on, the RBR functionality might be integrated into the 497 forwarding plane of a router. It is expected that one RBR will not be 498 able to handle all the incoming traffic designated for an ALOC realm, 499 and that having a single RBR would also create a potential single 500 point of failure in the network. Therefore, several RBRs might be 501 installed in the ALOC realm and the RBRs shall use the ALOC prefix as 502 their locator, and the routers announce the ALOC prefix as an anycast 503 locator within the local ALOC realm. The ALOC prefix is advertised 504 throughout the DFZ by BGP mechanisms. The placement of the RBRs in 505 the network will influence the ingress traffic to the ALOC realm. 507 Since the forwarding paradigm of multicast packets is quite different 508 from forwarding unicast packets, the multicast functionality will 509 have an impact on the RBR. Because the multicast RBR (mRBR) 510 functionality is not available on today's routers, an external device 511 is needed - later on the functionality might be integrated into the 512 routers. The mRBR shall take the role of an anycast Rendezvous Point 513 with MSDP [RFC3618] and PIM [RFC4601] capabilities, but to swap 514 headers neither a FIB nor a cache is required. As with the RBR, the 515 multicast hIPv4 packets are carrying all needed information in their 516 headers in order to apply the swap service; for details see section 517 10.5. 519 The ALOC realm is not yet fully constructed. We can now locate the 520 ALOC realm on the Internet, but to locate the endpoints attached to 521 the ALOC realm a new element is needed, the Endpoint Locator (ELOC). 522 As mentioned in the previous section, the ELOC prefixes can be 523 considered similar to the subscriber numbers in PSTN. The ELOC is not 524 a new element but a redefinition of the current IPv4 address 525 configured at an endpoint. The term redefinition is applied because 526 when the hIPv4 framework is fully implemented, the global uniqueness 527 of the IPv4 addresses is no longer valid. A more regional address 528 allocation policy of IPv4 addresses can be deployed, as discussed in 529 Appendix A. The ELOC prefix will only be used for routing and 530 forwarding purposes inside the local and remote ALOC realms, and it 531 is not used in the intermediate ALOC realms. 533 When an initiator is establishing a session to a responder residing 534 outside the local ALOC realm, the value in the destination address 535 field of the IP header of an outgoing packet is no longer the remote 536 destination address (ELOC prefix) - instead, the remote ALOC prefix 537 is installed in the destination address field of the IP header. 538 Because the value in the destination address field of the IP header 539 is carrying an ALOC prefix, the intermediate ALOC realms do not need 540 to install the ELOC prefixes of other ALOC realms in their routing 541 tables. It is sufficient for the intermediate ALOC realms to carry 542 only the ALOC prefixes. 544 The outcome is that the routing tables at each ALOC realm will be 545 reduced when the hIPV4 framework is fully implemented. The ALOC 546 prefixes are still globally unique and must be installed in the DFZ. 547 Thus the service provider cannot control the growth of the ALOC 548 prefixes, but she/he can control the amount of local ELOC prefixes in 549 her/his local ALOC realm. 551 When the hIPv4 packet arrives at the remote ALOC realm, it is 552 forwarded to the nearest RBR, since the value in the destination 553 address field of the IP header is the remote ALOC prefix. When the 554 RBR has swapped the hIPv4 header, the value in the destination 555 address field of the IP header is the remote ELOC, thus the hIPv4 556 packet will be forwarded to the final destination at the remote ALOC 557 realm. An endpoint using an ELOC prefix can be attached 558 simultaneously to two different ALOC realms without the requirement 559 to deploy a classical multi-homing solution; for details see section 560 12 and Appendix B. 562 Understanding that the addressing structure is no longer 563 unidimensional and that a second level of hierarchy has been added, 564 it is important to solve the problems of locating the remote ELOC 565 (endpoint) and remote ALOC realm on the Internet, as well as 566 determining where to assemble the header of the hIPv4 packet. The 567 hierarchical IPv4 framework relies upon the Domain Name System needs 568 to support a new record type so that the ALOC information can be 569 distributed to the endpoints. To construct the header of the hIPv4 570 packet, either the endpoint or an intermediate node (e.g. a proxy) 571 should be used. A proxy solution is likely to prove suboptimal due to 572 complication induced by the proxy's need to listen to DNS messages, 573 and a cache solution has scalability issues. 575 A better solution is to extend the current IPv4 stack at the 576 endpoints so that the ALOC and ELOC elements are incorporated at the 577 endpoint's stack, however, backwards compatibility must be preserved. 578 Most applications will not be aware of the extensions while other IP- 579 aware applications, such as Mobile IP, SIP, IPsec AH and so on (see 580 section 10.3) will suffer and cannot be used outside their ALOC realm 581 when the hIPv4 framework is fully implemented, unless they are 582 upgraded. The reason is that the IP-aware applications depend upon 583 the underlying network addressing structure to, e.g., identify an 584 endpoint. 586 Note that the applications used inside the local ALOC realm (e.g. 587 enterprise's private network) do not need to be upgraded - neither in 588 the intermediate nor in the long-term routing architecture. The 589 classical IPv4 framework is preserved in that only IP-aware 590 applications used between ALOC realms need to be upgraded to support 591 the hIPv4 header. 593 Figure 1 shows a conceptual overview of the intermediate routing 594 architecture. When this architecture is in place, the ELOC space is 595 no longer globally unique. Instead, a regional allocation policy can 596 be implemented. For further details, see Appendix A. The transition 597 from the current routing architecture to the intermediate routing 598 architecture is discussed in Appendix D. 600 Legend: UER=Unique ELOC region 601 *=attachment point in the ALOC realm 602 EP=Endpoint 604 |-------------------------------------------------------------| 605 | UER1 | | UER2 | 606 |-------------------------------------------------------------| 607 | Enterprise1 | ISP1 | ISP | ISP2 | Enterprise2 | 608 | ALOC Realm | ALOC Realm | Tier1 | ALOC Realm | ALOC Realm | 609 | | | | | | 610 | *EP | *RBR | | *RBR | *EP | 611 | ELOC1 | ALOC1 | | ALOC2 | ELOC4 | 612 | | | | | | 613 | | *EP | | *EP | | 614 | | ELOC2 | | ELOC3 | | 615 | | | | | | 616 |-------------|xxxxxxxxxxxxxx DFZ xxxxxxxxxxxxxx| ------------| 617 | RIB | RIB | RIB | RIB | RIB | 618 | | | | | | 619 | ALOC1 | ALOC1 | ALOC1 | ALOC2 | ALOC2 | 620 | ELOC1 | ALOC2 | ALOC2 | ALOC1 | ELOC4 | 621 | | ELOC2 | | ELOC3 | | 622 | | ELOC1 | | ELOC4 | | 623 | | | | | | 624 |-------------------------------------------------------------| 626 Figure 1: Intermediate routing architecture of hIPv4 628 5.2. Life of a hIPv4 Session 630 This section provides an example of a hIPv4 session between two hIPv4 631 endpoints: an initiator and a responder residing in different ALOC 632 realms. 634 When the hIPv4 stack is assembling the packet for transport, the 635 hIPv4 stack shall decide if a classical IPv4 or a hIPv4 header is 636 used based on the ALOC information received by a DNS reply. If the 637 initiator's local ALOC prefix equals the responder's ALOC prefix 638 there is no need to use the hIPv4 header for routing purposes, 639 because both the initiator and responder reside in the local ALOC 640 realm. The packet is routed according to the prefixes in the IP 641 header since the packet will not exit the local ALOC realm. When the 642 local ALOC prefix does not match the remote ALOC prefix, a hIPv4 643 header must be assembled because the packet needs to be routed to a 644 remote ALOC realm. 646 A session between two endpoints inside an ALOC realm might use the 647 locator header - not for routing purposes, but to make use of Valiant 648 Load-Balancing [VLB] for multipath-enabled transport protocols (see 649 section 11.1) or to make use of an identifier/locator split scheme 650 (see section 7). When making use of VLB, the initiator adds the 651 locator header to the packet and by setting the VLB bits to 01 or 11, 652 indicating to the responder and intermediate routers that VLB is 653 requested for the subflow. Because this is an intra-ALOC realm 654 session, there is no need to add ALOC and ELOC fields to the locator 655 header, and thus the size of the locator header will be 4 bytes. 657 If an identifier/locator split scheme is applied for the session 658 (intra-ALOC or inter-ALOC), the initiator must set the I-bit to 1 and 659 make use of the Locator Header Length field. Identifier/locator split 660 scheme information is inserted into the locator header after the 661 Locator Header Length field. 663 How a hIPv4 session is established follows: 665 1. The initiator queries the DNS server. The hIPv4 stack notices that 666 the local and remote ALOCs do not match and therefore must use the 667 hIPv4 header for the session. The hIPv4 stack of the initiator 668 must assemble the packet by the following method: 670 a. Set the local IP address from the API in the source address 671 field of the IP header. 673 b. Set the remote IP address from the API in the ELOC field of the 674 locator header. 676 c. Set the local ALOC prefix in the ALOC field of the locator 677 header. 679 d. Set the remote ALOC prefix in the destination address field of 680 the IP header. 682 e. Set the transport protocol value in the protocol field of the 683 locator header and set the hIPv4 protocol value in the 684 protocol field of the IP header. 686 f. Set the desired parameters in the A-, I-, S-, VLB-, and L- 687 fields of the locator header. 689 g. Set the FI-bits of the locator header to 00. 691 h. Calculate IP-, locator- and transport protocol header 692 checksums. The transport protocol header calculation does not 693 include the locator header fields. When completed, the packet 694 is transmitted. 696 2. The hIPv4 packet is routed throughout the Internet based on the 697 value in the destination address field of the IP header. 699 3. The hIPv4 packet will reach the closest RBR of the remote ALOC 700 realm. When the RBR notices that the value in the destination 701 address of the IP header matches the local ALOC prefix, the RBR 702 must: 704 a. Verify that the received packet uses the hIPv4 protocol value 705 in the protocol field of the IP header. 707 b. Verify IP-, locator- and transport protocol header checksums. 708 The transport protocol header verification does not include 709 the locator header fields. 711 c. Replace the source address in the IP header with the ALOC 712 prefix of the locator header. 714 d. Replace the destination address in the IP header with the ELOC 715 prefix of the locator header. 717 e. Replace the ALOC prefix in the locator header with the 718 destination address of the IP header. 720 f. Replace the ELOC prefix in the locator header with the source 721 address of the IP header. 723 g. Set the S-field to 1. 725 h. Decrease the TTL value by one. 727 i. Calculate IP-, locator- and transport protocol header 728 checksums. The transport header calculation does not include 729 the locator header fields. 731 j. Forward the packet according to the value in the destination 732 address field of the IP header. 734 4. The swapped hIPv4 packet is now routed inside the remote ALOC 735 realm based on the new value in the destination address field of 736 the IP header to the final destination. 738 5. The responder receives the hIPv4 packet. 740 a. The hIPv4 stack must verify that the received packet uses the 741 hIPv4 protocol value in the protocol field of the IP header. 743 b. Verify IP-, locator- and transport protocol header checksums. 744 The transport protocol header verification does not include 745 the locator header fields. 747 6. The hIPv4 stack of the responder must present the following to the 748 extended IPv4 socket API: 750 a. The source address of the IP header as the remote ALOC prefix 752 b. The destination address of the IP header as the local IP 753 address 755 c. Verify that the received ALOC prefix of the locator header 756 equals the local ALOC prefix 758 d. The ELOC prefix of the locator header as the remote IP address 760 The responder's application will respond to the initiator and the 761 returning packet will take almost the same steps, which are steps 1 762 to 6, as when the initiator started the session. In step 1 the 763 responder does not need to do a DNS lookup since all information is 764 provided by the packet. 766 6. Long-term Routing Architecture 768 The long-term routing architecture is established once the forwarding 769 planes of private ALOC realms or service providers ALOC realms 770 containing subscribers are upgraded. The forwarding planes of transit 771 DFZ routers do not need to be upgraded. Why then would private 772 network or service provider administrators upgrade their 773 infrastructure? There are two incentives: 775 o The overlay local ALOC exit routing topology (as discussed in 776 section 11) can be replaced by a peer-to-peer local ALOC exit 777 routing topology, which is simpler to operate, thus decreasing 778 operational expenditures. 780 o Locator freedom: Once the local ALOC realm is upgraded, the 781 enterprise or service provider can use the full 32-bit ELOC 782 address space to remove address space constraints and to design a 783 well-aggregated routing topology with an overdimensioned ELOC 784 allocation policy. 786 When an enterprise or service provider upgrades the forwarding plane 787 in their ALOC realm, the previous PI or PA address space allocation 788 is released back to the RIR to be used for ALOC allocations in the 789 GLB. 791 6.1. Overview 793 The swap service at the RBR was added to the framework in order to 794 provide a smooth transition from the current IPv4 framework to the 795 hIPv4 framework, a major upgrade of the current forwarding plane is 796 avoided by the introduction of the swap service. In the future, the 797 swap service can be left "as is" in the ALOC realm, if preferred, or 798 the swap service can be pushed towards the edge of the ALOC realm 799 when routers are upgraded in their natural lifecycle process. 801 Once an upgrade of a router is required because of, for example, 802 increased demand for bandwidth, the modified forwarding plane might 803 concurrently support IPv4 and hIPv4 forwarding - and the swap service 804 can be pushed towards the edge and in the future removed at the ALOC 805 realm. This is accomplished by adding an extension to the current 806 routing protocols, both IGP and BGP. When a RBR receives a hIPv4 807 packet where the value of destination address field in the IP header 808 matches the local ALOC prefix, the RBR will - contrary to the tasks 809 defined in section 5.2, step 3 - look up the ELOC field in the 810 locator header and compare this prefix against the FIB. If the next- 811 hop entry is RBR-capable, the packet will be forwarded according to 812 the ELOC prefix. If the next-hop is a classical IPv4 router, the RBR 813 must apply the tasks defined in section 5.2, step 3, and once 814 completed forward the packet according to the new value in the 815 destination address field of the IP header. 817 When all endpoints (that need to establish sessions outside the local 818 ALOC realm) and infrastructure nodes in an ALOC realm are hIPv4- 819 capable, there is no need to apply swap service for unicast sessions 820 - forwarding decisions can be based on information in the IP and 821 locator header. In the local ALOC realm packets are routed to their 822 upstream anycast or unicast ALOC RBR according to the ALOC prefix in 823 the locator header - local ALOC exit routing is applied against the 824 local ALOC FIB. Remote ELOC approach routing is applied against the 825 ELOC FIB in the remote ALOC realm. 827 Note that IP and transport protocol headers will remain intact 828 (except for TTL values, since the RBR is a router), only FI and 829 Locator Header Checksum values in the locator header will alternate 830 in local ALOC exit routing mode and remote ELOC approach routing 831 mode. 833 Figure 2 shows a conceptual overview of the long-term hIPv4 routing 834 architecture. 836 Legend: UER=Unique ELOC region 837 *=attachment point in the ALOC realm 838 EP=Endpoint 839 aRBR=anycast RBR 840 uRBR=unicast RBR 842 |-------------------------------------------------------------| 843 | UER1 | UER2 | | UER3 | UER4 | 844 |-------------------------------------------------------------| 845 | Enterprise1 | ISP1 | ISP | ISP2 | Enterprise2 | 846 | ALOC Realm | ALOC Realm | Tier1 | ALOC Realm | ALOC Realm | 847 | | | | | | 848 | *EP | *aRBR | | *aRBR | *EP | 849 | ELOC1 | ALOC1.1 | | ALOC2.1 | ELOC4 | 850 | | | | | | 851 | *uRBR | | uRBR* | 852 | |ALOC1.2 | | ALOC2.2| | 853 | | | | | | 854 | | *EP | | *EP | | 855 | | ELOC2 | | ELOC3 | | 856 | | | | | | 857 |-------------|xxxxxxxxxxxxxx DFZ xxxxxxxxxxxxxx|-------------| 858 | RIB | RIB | RIB | RIB | RIB | 859 | | | | | | 860 | ALOC1.2 | ALOC1.1 | ALOC1 | ALOC2.1 | ALOC2.2 | 861 | ELOC1 | ALOC1.2 | ALOC2 | ALOC2.2 | ELOC4 | 862 | | ALOC2 | | ALOC1 | | 863 | | ELOC2 | | ELOC3 | | 864 | | | | | | 865 |-------------------------------------------------------------| 867 Figure 2: Long-term routing architecture of hIPv4 869 Also, the swap service for multicast can be removed when the 870 forwarding planes are upgraded in all consequent ALOC realms. The 871 source's ALOC RBR sets the FI-bits to 11 and a RFP check is hereafter 872 applied against the ALOC prefix in the locator header. Here, IP and 873 transport protocol headers will not alternate. 875 A long-term evolution will provide a 32x32 bit locator space. The 876 ALOC prefixes are allocated only to service providers - ELOC prefixes 877 are only significant at a local ALOC realm. An enterprise can use a 878 32-bit locator space for its private network (the ALOC prefix is 879 rented from the attached ISP) and an ISP can use a 32-bit ELOC space 880 to provide Internet connectivity services for its directly attached 881 customers (residential and enterprise). 883 6.2. Exit- DFZ-, and Approach Routing 885 This section provides an example of a hIPv4 session between two hIPv4 886 endpoints: an initiator in an ALOC realm where the forwarding plane 887 has been upgraded to support the hIPv4 framework, and a responder 888 residing in a remote ALOC realm with the classical IPv4 forwarding 889 plane. 891 When the forwarding plane at the local ALOC realm has been upgraded, 892 the endpoints must be informed about it, that is, extensions to DHCP 893 are needed or the endpoints are manually configured to be notified 894 that the local ALOC realm is fully hIPv4 compliant. 896 How a hIPV4 session is established follows: 898 1. The initiator queries the DNS server. The hIPv4 stack notices that 899 the local and remote ALOC do not match and therefore must use the 900 hIPv4 header for the session. The hIPv4 stack of the initiator 901 must assemble the packet as described in section 5.2, step 1 902 except for the following: 904 g. Set the FI-bits of the locator header to 01. 906 2. The hIPv4 packet is routed throughout the local ALOC realm 907 according to the ALOC prefix of the locator header - local ALOC 908 exit routing is applied. 910 3. The hIPv4 packet will reach the closest RBR of the local ALOC 911 realm. When the RBR notices that the packet's ALOC prefix of the 912 locator header matches the local ALOC prefix and the FI-bits is 913 set to 01, the RBR must: 915 a. Verify that the received packet uses the hIPv4 protocol value 916 in the protocol field of the IP header. 918 b. Verify the IP- and locator header checksums. 920 c. Set the FI-bits of the locator header to 00. 922 d. Decrease the TTL value by one. 924 e. Calculate IP- and locator header checksums. 926 f. Forward the packet according to the value in the destination 927 address field of the IP header. 929 4. The hIPv4 packet is routed to the responder as described in 930 section 5.2, steps 2 to 6 - DFZ routing is applied. 932 5. The responder's application responds to the initiator and the 933 returning packet takes almost the same steps as described in 934 section 5.2 except for: 936 6. The hIPv4 packet will reach the closest RBR of the initiator's 937 ALOC realm. When the RBR notices that the value in the destination 938 address field of the IP header matches the local ALOC prefix and 939 the FI-bits is set to 00, the RBR must: 941 a. Verify that the received packet uses the hIPv4 protocol value 942 in the protocol field of the IP header. 944 b. Verify the IP- and locator header checksums. 946 c. Set the FI-bits of the locator header to 10. 948 d. Decrease the TTL value by one. 950 e. Calculate IP- and locator header checksums. 952 f. Forward the packet according to the ELOC prefix of the 953 locator header. 955 7. The hIPv4 packet is routed throughout the initiator's ALOC realm 956 according to the ELOC prefix of the locator header - remote ELOC 957 approach routing is applied. 959 8. The hIPv4 stack of the responder must present the following to the 960 extended IPv4 socket API: 962 a. The source address of the IP header as the remote IP address 964 b. The destination address of the IP header as the local ALOC 965 prefix 967 c. The ALOC prefix of the locator header as the remote ALOC prefix 969 d. The ELOC prefix of the locator header as the local IP address 971 7. Decoupling Location and Identification 973 The design guidelines and rationale behind decoupling the location 974 from identification is stated in [RRG_Design_Goals]. Another 975 important influence source are the report and presentations from the 976 [Dagstuhl] workshop that declared "a future Internet architecture 977 must hence decouple the functions of IP addresses as names, locators, 978 and forwarding directives in order to facilitate the growth and new 979 network-topological dynamisms of the Internet". 981 Therefore, identifier elements need to be added to the hIPv4 982 framework to provide a path for future applications to be able to 983 remove the current dependency on the underlying network layer 984 addressing scheme (local and remote IP address tuple). 986 However, there are various ways to apply an identifier/locator split, 987 as discussed in an [ID/loc_Split] presentation from the MobiArch 988 workshop at Sigcomm 2008. Thus the hIPv4 framework will not propose 989 or define a single identifier/locator split solution - a split can be 990 achieved by, for example, a multipath transport protocol or by an 991 identifier/locator database scheme such as HIP. A placeholder has 992 been added to the locator header so identifier/locator split schemes 993 can be integrated into the hIPv4 framework. But identifier/locator 994 split schemes may cause privacy inconveniences, as discussed in 995 [Mobility_&_Privacy]. 997 Multipath transport protocols, such as SCTP and the currently under 998 development Multipath TCP, MPTCP [RFC6182], are the most interesting 999 candidates to enable an identifier/locator split for the hIPv4 1000 framework. Especially MPTCP is interesting from hIPv4's point of view 1001 - one of the main goals of MPTCP is to provide backwards 1002 compatibility with current implementations; hIPv4 shares the same 1003 goal. 1005 MPTCP itself does not provide an identifier/locator database scheme 1006 as HIP does. Instead, MPTCP is proposing a token - with local meaning 1007 - to manage and bundle subflows under one session between two 1008 endpoints. The token can be considered to have the characteristics of 1009 a session identifier, providing a generic cookie mechanism for the 1010 application layer and creating a session layer between the 1011 application and transport layer. Thus the use of a session identifier 1012 will provide a mechanism to improve mobility, both in site and 1013 endpoint mobility scenarios. 1015 Since the session identifier improves site and endpoint mobility, 1016 routing scalability is improved by introducing a hierarchical 1017 addressing scheme, why then add an identifier/locator database scheme 1018 to the hIPv4 framework? Introducing an identifier/locator database 1019 scheme, as described in HIP, Identifier/Locator Network Protocol 1020 [ILNP] and Name-Based Sockets [NBS], might ease or remove the locator 1021 renumbering dependencies at firewalls that are used to scope security 1022 zones, but this approach would fundamentally change the currently 1023 deployed security architecture. 1025 However, combining an identifier/locator database scheme with DNSSEC 1026 [RFC4033] is interesting. Today security zones are scoped by using 1027 locator prefixes in the security rule sets. Instead, FQDN could be 1028 used in the rule sets and the renumbering of locator prefixes would 1029 no longer depend upon the security rule sets in firewalls. Another 1030 interesting aspect is that a FQDN is and needs to be globally unique. 1031 The ALOC prefix must be globally unique, but ELOC prefixes are only 1032 regionally unique, in the long-term only locally unique. 1033 Nevertheless, combining identifier/locator database schemes with 1034 security architectures and DNSSEC needs further studies. 1036 In order to provide multi-homing and mobility capabilities for single 1037 path transport protocols such as TCP and UDP, an identifier/locator 1038 database scheme is needed. This scheme can also be used to create a 1039 bidirectional NAT traversal solution with a locator translation map 1040 consisting of private locator prefixes and public identifiers at the 1041 border router. 1043 The hIPv4 routing architecture provides only location information for 1044 the endpoints, that is, the ELOC describes how the endpoint is 1045 attached to the local network, and the ALOC prefixes describe how the 1046 endpoint is attached to the Internet. Identifier/locator split 1047 schemes are decoupled from the routing architecture - the application 1048 layer may or may not make use of an identifier/locator split scheme. 1050 8. ALOC Use Cases 1052 Several ALOC use cases are explored in this section. As mentioned in 1053 section 5.1, ALOC describes an area in the Internet that can span 1054 several Autonomous Systems (ASes), or if the area is equal to an AS 1055 you can say that the ALOC describes an AS. When the ALOC describes an 1056 area, it is hereafter called an anycast ALOC. 1058 The ALOC can also be used to describe a specific node between two 1059 ALOC realms, e.g. a node installed between a private and an ISP ALOC 1060 realm, or between two private ALOC realms. In this use case the ALOC 1061 describes an attachment point, e.g. where a private network is 1062 attached to the Internet. This ALOC type is hereafter called a 1063 unicast ALOC. 1065 The main difference between anycast and unicast ALOC types is: 1067 o In an anycast ALOC scenario ELOC routing information is shared 1068 between the attached ALOC realms. 1070 o In a unicast ALOC scenario no ELOC routing information is shared 1071 between the attached ALOC realms. 1073 Unicast ALOC functionalities should not be deployed between private 1074 and ISP ALOC realms in the intermediate routing architecture - it 1075 would require too many locators from the GLB space - instead, unicast 1076 ALOC functionality will be used to separate private ALOC realms. 1078 ALOC space is divided into two types, a globally unique ALOC space 1079 (a.k.a GLB) that is installed in DFZ, and a private ALOC space that 1080 is used inside private networks. Private ALOCs use the same locator 1081 space as defined in [RFC1918]; a private ALOC must be unique inside 1082 the private network and not overlap private ELOC prefixes. Only ISPs 1083 should be allowed to apply for global ALOC prefixes. For further 1084 discussion, see Appendix A. The ISP should aggregate global ALOC 1085 prefixes as much as possible in order to reduce the size of the 1086 routing table in DFZ. 1088 When a user logs on to the enterprise's network, the endpoint will 1089 receive the following locator prefixes via provisioning means (e.g. 1090 DHCP or manually configured): 1092 o One ELOC prefix for each network interface 1094 o One private ALOC prefix due to 1096 . The enterprise has recently been merged with another 1097 enterprise and overlapping ELOC spaces exist 1099 o Several private ALOC prefixes due to 1101 . The enterprise network spans high-speed long-distance 1102 connections - it is well known that TCP cannot sustain 1103 high throughput for extended periods of time. Higher 1104 throughput might be achieved by using multiple paths 1105 concurrently. 1107 o One or several global ALOC prefixes. These ALOCs describe how 1108 the enterprise network is attached to the Internet. 1110 As the user establishes a session to a remote endpoint, DNS is 1111 usually used to resolve remote locator prefixes. DNS will return ELOC 1112 and ALOC prefixes of the remote endpoint. If no ALOC prefixes are 1113 returned, a classical IPv4 session is initiated to the remote 1114 endpoint. When ALOC prefixes are returned, the initiator compares the 1115 ALOC prefixes with its own local ALOC prefixes (that are provided via 1116 DHCP or manually configured). 1118 o If the remote ALOC prefix is from the private ALOC space, the 1119 initiator will use the given private ALOC prefix for the session. 1121 Two use cases exist to design a network to use private ALOC 1122 functionality. The remote endpoint is far away, leveraging high-speed 1123 long-distance connections, and in order to improve performance for 1124 the session a multipath transport protocol should be used. 1126 The other use case is when the remote endpoint resides in a network 1127 that recently has been merged and private ELOC [RFC1918] spaces 1128 overlap if no renumbering is applied. One or several unicast ALOC 1129 solutions are needed in the network between the initiator and 1130 responder. For long distance sessions with no overlapping ELOC 1131 prefixes, anycast or unicast ALOC solutions can be deployed. 1133 A third use case follows; again the initiator compares returned ALOC 1134 prefixes from DNS with its own local ALOC prefixes: 1136 o If the remote ALOC prefix is from the global ALOC space and the 1137 remote ALOC doesn't match the given global ALOC prefix, the 1138 initiator will use the given global ALOC prefix for the session. 1140 In this use case the remote endpoint resides outside the enterprise's 1141 private network, and the global remote ALOC prefixes indicate how the 1142 remote network is attached to the Internet. When a multipath 1143 transport protocol is used, the subflows can be routed via separate 1144 border routers to the remote endpoint - both at the local and remote 1145 sites, if both are multi-homed. The initiator's egress packets in the 1146 local ALOC realm can be identified by the protocol value in the IP 1147 header, routed to an explicit path (e.g. MPLS LSP, L2TPv3 tunnel, 1148 etc.) based on the ALOC prefix in the locator header. A local ALOC 1149 overlay exit routing scheme can be designed. In the long-term routing 1150 architecture the overlay, the tunnel mechanism, can be removed; see 1151 section 6.2. 1153 Figure 3 shows a conceptual diagram with two endpoints having a 1154 multipath session over a VPN connection and over the Internet (in the 1155 intermediate routing architecture). 1157 Legend: UER=Unique ELOC region 1158 *=attachment point in the ALOC realm 1159 EP=Endpoint 1160 aRBR=anycast RBR 1161 uRBR=unicast RBR 1162 BR=Border Router 1164 |-------------------------------------------------------------| 1165 | UER1 | | UER2 | 1166 |-----------------------------------------------|-------------| 1167 | Enterprise1 | | Enterprise2 | 1168 | ALOC Realm | | ALOC Realm | 1169 | |---------------------------------| | 1170 | | VPN | | 1171 | | ALOC Realm | | 1172 | *uRBR3 uRBR4* | 1173 | |ALOC3 ALOC4| | 1174 | |xxxxxxxxxxxX VPN RIB xxxxxxxxxxxx| | 1175 | | | | 1176 | | ALOC3 & ALOC4 | | 1177 | |---------------------------------| | 1178 | *EP1 | | *EP2 | 1179 | ELOC1 |---------------------------------| ELOC2 | 1180 | | ISP1 | ISP | ISP2 | | 1181 | | ALOC Realm | Tier1 | ALOC Realm | | 1182 | | | | | | 1183 | BR1* *aRBR | | *aRBR *BR2 | 1184 | | ALOC1 | | ALOC2 | | 1185 | | | | | | 1186 |-------------|xxxxxxxxxxxxxx DFZ xxxxxxxxxxxxxx|-------------| 1187 | RIB | RIB | RIB | RIB | RIB | 1188 | | | | | | 1189 | ALOC1 | ALOC1 | ALOC1 | ALOC2 | ALOC2 | 1190 | ALOC3 | ALOC2 | ALOC2 | ALOC1 | ALOC4 | 1191 | ALOC4 | ELOC1 | | ELOC2 | ALOC3 | 1192 | ELOC1 | | | | ELOC2 | 1193 | | | | | | 1194 |-------------------------------------------------------------| 1196 Figure 3: Multi-pathing via VPN and the Internet 1198 The first subflow is established from the initiator (EP1) via uRBR3 1199 and uRBR4 (both use a private unicast ALOC prefix) to the responder 1200 (EP2). Normal unicast forwarding is applied, ALOC prefixes of uRBR3 1201 and uRBR4 are installed in the routing tables of both the local and 1202 remote ALOC realms. A second subflow is established via the Internet, 1203 that is, via BR1->BR2 to EP2. 0/0 exit routing is used to enter the 1204 Internet at both ALOC realms. 1206 Note that ELOC prefixes can overlap since the local and remote ALOC 1207 realms reside in different ELOC regions and are separated by private 1208 unicast ALOC prefixes. 1210 The fourth use case is to leverage the private and global ALOC 1211 functionalities to be aligned with the design and implementation of 1212 [Split-DNS] solutions. 1214 The fifth use case is for residential users. A residential user may 1215 use one or several ALOC prefixes, depending upon the service offer 1216 and network design of the ISP. If the ISP prefers to offer advanced 1217 support for multipath transport protocols and local ALOC exit 1218 routing, the residential user is provided with several ALOC prefixes. 1219 The ALOC provided for residential users is taken from the GLB space 1220 and anycast ALOC functionality is applied. 1222 9. Mandatory Extensions 1224 9.1. Overview 1226 To implement the hierarchical IPv4 framework, some basic rules are 1227 needed: 1229 1. The DNS architecture must support a new extension, an A type 1230 Resource Record should be able to associate ALOC prefixes. 1232 2. An endpoint upgraded to support hIPv4 shall have information 1233 about the local ALOC prefixes; the local ALOC prefixes can be 1234 configured manually or provided via provisioning means such as 1235 DHCP. 1237 3. A globally unique IPv4 address block shall be reserved; this 1238 block is called the Global Locator Block (GLB). A service 1239 provider can have one or several ALOC prefixes allocated from 1240 the GLB. 1242 4. ALOC prefixes are announced via current BGP protocol to adjacent 1243 peers. They are installed in the RIB of the DFZ. When the hIPV4 1244 framework is fully implemented, only ALOC prefixes are announced 1245 between the BGP peers in the DFZ. 1247 5. An ALOC realm must have one or several RBRs attached to its 1248 realm. The ALOC prefix is configured as an anycast IP address on 1249 the RBR. The anycast IP address is installed to appropriate 1250 routing protocols in order to be distributed to the DFZ. 1252 6. The IPv4 socket API at endpoints must be extended to support 1253 local and remote ALOC prefixes. The modified IPv4 socket API must 1254 be backwards compatible with the current IPv4 socket API. The 1255 outgoing hIPv4 packet must be assembled by the hIPv4 stack with 1256 the local IP address from the socket as the source address and 1257 the remote ALOC prefix as the destination address in the IP 1258 header. The local ALOC prefix is inserted in the ALOC field of 1259 the locator header. The remote IP address from the socket API is 1260 inserted in the ELOC field of the locator header. 1262 9.2. DNS Extensions 1264 Since the hierarchical IPv4 framework introduces an extended 1265 addressing scheme and because DNS serves as the "phone book" for the 1266 Internet, it is obvious that DNS needs a new Resource Record (RR) 1267 type to serve endpoints that are upgraded to support hIPv4. The new 1268 RR type must follow the guidelines described in [RFC3597] and 1269 [RFC5395] with the following characteristics: 1271 o Associated with the appropriate Fully Qualified Domain Name 1272 (FQDN), inserted in the NAME field. 1274 o Assigned a new integer (QTYPE) in the TYPE field, to be assigned 1275 by IANA. 1277 o The CLASS field is set to IN. 1279 o The RDATA field is of an unknown type as defined in [RFC3597] and 1280 shall have the following format: 1282 o Preference subfield: A 16-bit integer that specifies the 1283 preference given to this RR among others associated with a 1284 FQDN. Lower values are preferred over higher values. 1286 o ALOC subfield: A 32-bit integer that specifies the Area 1287 Locator of the associated FQDN. 1289 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1290 | Preference | 1291 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1292 | | 1293 | ALOC | 1294 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1296 Figure 4: RDATA format of the ALOC RR 1298 Only endpoints that have been upgraded to support hIPv4 shall make 1299 use of the new ALOC RR. Also, there is no need to define a new ELOC 1300 RR because the A RR is used for that purpose when the ALOC RR is 1301 returned. 1303 9.3. Extensions to the IPv4 Header 1305 Figure 5 shows how the locator header is added to the current IPv4 1306 header, creating a hIPv4 header. 1308 0 1 2 3 1309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 |Version| IHL |Type of Service| Total Length | 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1313 | Identification |Flags| Fragment Offset | 1314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1315 | Time to Live | Protocol | Header Checksum | 1316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1317 | Source Address | 1318 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1319 | Destination Address | 1320 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1321 | Options | Padding | 1322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1323 |A|I|S| FI|VLB|L| Protocol | LH Checksum | 1324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1325 | Area Locator (optional) | 1326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1327 | Endpoint Locator (optional) | 1328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1329 | LH Length (optional) | Padding (optional) | 1330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1332 Figure 5: hIPv4 header 1334 Version: 4 bits 1336 The Version field is identical to that of RFC 791. 1338 IHL: 4 bits 1340 The Internet Header Length field is identical to that of RFC 791. 1342 Type of Service: 8 bits 1344 The Type of Service is identical to that of RFC 791. 1346 Total Length: 16 bits 1348 The Total Length field is identical to that of RFC 791. 1350 Identification: 16 bits 1352 The Identification field is identical to that of RFC 791. 1354 Flags: 3 bits 1356 The Flags field is identical to that of RFC 791. 1358 Fragment Offset: 13 bits 1360 The Fragment Offset field is identical to that of RFC 791. 1362 Time to Live: 8 bits 1364 The Time to Live field is identical to that of RFC 791. 1366 Protocol: 8 bits 1368 A new protocol number must be assigned for hIPv4. 1370 Header Checksum: 16 bits 1372 The Header Checksum field is identical to that of RFC 791. 1374 Source Address: 32 bits 1376 The Source Address field is identical to that of RFC 791. 1378 Destination Address: 32 bits 1380 The Destination Address field is identical to that of RFC 791. 1382 Options and Padding: Variable length 1384 The Options and Padding field is identical to that of RFC 791. 1386 ALOC Realm Bit, A-bit: 1 bit 1388 When the initiator and responder reside in different ALOC realms, 1389 the A-bit is set to 1 and the Area and Endpoint Locator fields must 1390 be used in the locator header. The size of the locator header is 12 1391 bytes. When the A-bit is set to 0, the initiator and responder 1392 reside within the same ALOC realm. The Area and Endpoint Locator 1393 shall not be used in the locator header. The size of the locator 1394 header is 4 bytes. 1396 Identifier Bit, I-bit: 1 bit 1398 The identifier bit is set to 1 if the endpoint is using an 1399 identifier/locator split scheme within the locator header. The 1400 identifier/locator split scheme must indicate by how much the size 1401 of the locator header is increased. The Locator Header Length field 1402 is also added to the locator header. 1404 Swap Bit, S-bit: 1 bit 1406 The initiator sets the swap bit to 0 in the hIPv4 packet. A RBR 1407 will set this bit to 1 when it is swapping the source and 1408 destination addresses of the IP header with the ALOC and ELOC 1409 prefixes of the locator header. 1411 Forwarding Indicator, FI-bits: 2 bits 1413 The purpose of the Forwarding Indicator (FI) field is to provide a 1414 mechanism for a future forwarding plane to identify which 1415 Forwarding Information Base (FIB) should be used for inter-ALOC 1416 realm sessions. The new forwarding plane will remove the swap 1417 functionality of IP and locator header values for both unicast and 1418 multicast sessions. The outcome is that the IP and transport 1419 protocol headers will remain intact and only FI and Locator Header 1420 Checksum values in the locator header will alternate. The following 1421 values are defined: 1423 01: Local ALOC exit routing mode. The initiator shall set the FI- 1424 bits to 01 and the ALOC prefix in the locator header is used to 1425 forward the packets to the RBR that is the owner of the local ALOC 1426 prefix. The RBR shall change the FI-bits to 00. 1428 00: DFZ routing mode. The local ALOC RBR shall forward the packets 1429 according to the value in the destination address field of the IP 1430 header. The DFZ routers shall forward the packets based on the 1431 value in the destination address field of the IP header unless the 1432 destination address matches the local ALOC prefix. When this 1433 situation occurs, the packet enters the remote ALOC realm and the 1434 remote RBR shall change the FI-bits to 10. 1436 10: Remote ELOC approach routing mode. The remote ALOC RBR and 1437 following routers shall forward the packets based on the ELOC 1438 prefix in the locator header. 1440 11: Inter-ALOC RPF check mode. The local ALOC RBR changes the FI- 1441 bits to 11 and following inter-ALOC routers on the shared tree 1442 shall apply the RPF check against the ALOC prefix in the locator 1443 header. 1445 Valiant Load-Balancing, VLB-bits: 2 bits (optional, subject for 1446 further research) 1448 The purpose of the Valiant Load-Balancing field is to provide a 1449 mechanism for multipath-enabled transport protocols to request 1450 explicit paths in the network for subflows, which are component 1451 parts of a session between two endpoints. The subflow path request 1452 can be set as follows: 1454 00: Latency-sensitive application. Only one single subflow 1455 (multipath not applied), the shortest path through the network is 1456 requested. 1458 01: First subflow. The shortest path or Valiant Load-Balancing 1459 might be applied. 1461 11: Next subflow(s). Valiant Load-Balancing should be applied 1463 Load-Balanced, L-bit: 1 bit (optional, subject for further research) 1465 The initiator must set the L-bit to zero. A Valiant Load-Balancing- 1466 capable node can apply VLB switching for the session if the value 1467 is set to zero; if the value is set to 1, VLB switching is not 1468 allowed. When VLB switching is applied for the session, the node 1469 applying the VLB algorithm must set the value to 1. 1471 Protocol: 8 bits 1473 The Protocol field is identical to that of RFC 791. 1475 Locator Header Checksum: 16 bits 1477 A checksum is calculated for the locator header only. The checksum 1478 is computed at the initiator, recomputed at the RBR and verified at 1479 the responder. The checksum algorithm is identical to that of RFC 1480 791. 1482 Area Locator (optional): 32 bits 1484 An IPv4 address assigned to locate an ALOC realm in the Internet. 1485 The ALOC is assigned by a RIR to a service provider. The ALOC is 1486 globally unique because it is allocated from the GLB. 1488 Endpoint Locator (optional): 32 bits 1490 An IPv4 address assigned to locate an endpoint in a local network. 1491 The ELOC block is assigned by a RIR to a service provider or to an 1492 enterprise. In the intermediate routing architecture the ELOC 1493 block is only unique in a geographical region. The final policy of 1494 uniqueness shall be defined by the RIRs. In the long-term routing 1495 architecture the ELOC block is no longer assigned by a RIR, it is 1496 only unique in the local ALOC realm. 1498 Locator Header Length (optional): 16 bits 1500 Locator Header Length is the total length of the locator header. 1501 Locator Header Length is applied when the Identifier bit is set to 1502 1. Identifier/locator split scheme parameters are inserted into the 1503 locator header after this field. 1505 Padding (optional): variable 1507 The locator header padding is used to ensure that the locator 1508 header ends on a 32-bit boundary. The padding is zero. 1510 10. Consequences 1512 10.1. Overlapping Local and Remote ELOC Prefixes/Ports 1514 Because an ELOC prefix is only significant within the local ALOC 1515 realm, there is a slight possibility that a session between two 1516 endpoints residing in separate ALOC realms might use the same local 1517 and remote ELOC prefixes. But the session is still unique because the 1518 two processes communicating over the transport protocol form a 1519 logical session which is uniquely identifiable by the 5-tuple 1520 involved, by the combination of . 1523 The session might no longer be unique when two initiators with the 1524 same local ELOC prefix residing in two separate ALOC realms are 1525 accessing a responder located in a third ALOC realm. In this scenario 1526 the possibility exists that the initiators will use the same local 1527 port value. This situation will cause an "identical session 1528 situation" for the application layer. 1530 To overcome this scenario the hIPv4 stack must accept only one unique 1531 session with the help of the ALOC information. If there is an 1532 "identical session situation" - both initiators use the same values 1533 in the 5-tuple - the hIPv4 stack shall allow only the first 1535 established session to continue. The following sessions must be 1536 prohibited and the initiator is informed by ICMP notification about 1537 the "identical session situation." 1539 MPTCP introduces a token that is locally significant and currently 1540 defined as 32 bits long. The token will provide a sixth tuple for 1541 future applications to identify and verify the uniqueness of a 1542 session. Thus the probability to have an "identical session 1543 situation" is further reduced. By adding an identifier/locator 1544 database scheme to the hIPv4 framework, the "identical session 1545 situation" is completely removed. 1547 10.2. Large Encapsulated Packets 1549 Adding the locator header to an IPv4 packet in order to create a 1550 hIPv4 packet will increase the size of it, but since the packet is 1551 assembled at the endpoint it will not add complications of the 1552 current Path MTU Discovery (PMTUD) mechanism in the network. The 1553 intermediate network between two endpoints will not see any 1554 difference in the size of packets; IPv4 and hIPv4 packet sizes are 1555 the same from the network point of view. 1557 10.3. Affected Applications 1559 There are several applications that insert IP address information to 1560 the payload of a packet. Some applications use the IP address 1561 information to create new sessions or for identification purposes. 1562 Some applications collect IP address information to be used as 1563 referrals. This section tries to list the applications that need to 1564 be enhanced; however, this is by no means a comprehensive list. The 1565 applications can be divided into four main categories: 1567 o Applications based on raw sockets - a raw socket receives packets 1568 containing the complete header, in contrast to the other sockets 1569 that only receive the payload. 1571 o Applications needed to enable the hIPv4 framework, such as DNS and 1572 DHCP databases, which must be extended to support ALOC prefixes. 1574 o Applications that insert IP addresses into the payload or use the 1575 IP address for setting up new sessions or for some kind of 1576 identification or as referrals. An application belonging to this 1577 category cannot set up sessions to other ALOC realms until 1578 extensions have been incorporated. Within the local ALOC realm 1579 there are no restrictions since the current IPv4 scheme is still 1580 valid. The following applications have been identified: 1582 o SIP: IP addresses are inserted in the SDP offers/answers, XML 1583 body, Contact, Via, maddr, Route, Record-Route SIP headers. 1585 o Mobile IP: the mobile node uses several IP addresses during 1586 the registration process. 1588 o IPsec AH: designed to detect alterations at the IP packet 1589 header. 1591 o RSVP: RSVP messages are sent hop-by-hop between RSVP-capable 1592 routers to construct an explicit path. 1594 o ICMP: notifications need to be able to incorporate ALOC 1595 information and assemble the hIPv4 header in order to be 1596 routed back to the source. 1598 o Source Specific Multicast: the receiver must specify the 1599 source address. 1601 o IGMPv3: a source-list is included in the IGMP reports. 1603 o Applications related to security, such as firewalls, must be 1604 enhanced to support ALOC prefixes. 1606 o Applications that will function with FQDN, but many use IP 1607 addresses instead, such as ping, traceroute, telnet, and so on. 1608 The CLI syntax needs to be upgraded to support ALOC and ELOC 1609 information via the extended socket API. 1611 At first glance it seems that a lot of applications need to be re- 1612 engineered and ported - but the situation is not all that bad. The 1613 applications used inside the local ALOC realm (e.g., an enterprise's 1614 private network) do not need to be upgraded, neither in the 1615 intermediate nor in the long-term architecture. The classical IPv4 1616 framework is preserved. Only IP-aware applications used between ALOC 1617 realms need to be upgraded to support the hIPv4 header. IPv6 has the 1618 definitions in place of the applications mentioned above - but the 1619 migration of applications from IPv4 to IPv6 can impose some capital 1620 expenditures for enterprises, especially if the applications are 1621 customized or homegrown; see [Porting_IPv4]. 1623 As stated earlier, hIPv4 does not require to port applications used 1624 inside a private network. The conclusion is that, whatever next 1625 generation architecture is deployed, some applications will suffer, 1626 either during the transition period or when being re-engineered in 1627 order to be compatible with the new architecture. 1629 10.4. ICMP 1631 As long as the ICMP request is executed inside the local ALOC realm, 1632 the normal IPv4 ICMP mechanism can be used. As soon as the ICMP 1633 request exits the local ALOC realm, the locator header shall be used 1634 in the notifications. Therefore, extensions to the ICMP protocol 1635 shall be implemented. These shall be compatible with [RFC4884] and 1636 support ALOC and ELOC information. 1638 10.5. Multicast 1640 Since local ELOC prefixes are only installed in the routing table of 1641 the local ALOC realm, there is a constraint with Reverse Path 1642 Forwarding (RPF) that is used to ensure loop-free forwarding of 1643 multicast packets. The source address of a multicast group (S,G) is 1644 used against the RPF check. The address of the source can no longer 1645 be used as a RPF checkpoint outside the local ALOC realm. 1647 To enable RPF globally for a (S,G), the multicast-enabled RBR (mRBR) 1648 must at the source's ALOC realm replace the value of the source 1649 address field in the IP header with the local ALOC prefix for inter- 1650 ALOC multicast streams. This can be achieved if the local RBR acts 1651 also as an anycast Rendezvous Point with MSDP and PIM capabilities. 1652 With these functionalities the RBR becomes a multicast-enabled RBR 1653 (mRBR). The source registers at the mRBR and a source tree is 1654 established between the source and the mRBR. When an inter-ALOC realm 1655 receiver subscribes to the multicast group, the mRBR has to swap the 1656 hIPv4 header in the following way: 1658 a. Verify that the received packet uses the hIPv4 protocol value 1659 in the protocol field of the IP header. 1661 b. Verify the IP-, locator- and transport protocol header 1662 checksums. 1664 c. Replace the source address in the IP header with the local ALOC 1665 prefix. 1667 d. Set the S-field to 1. 1669 e. Decrease the TTL value by one. 1671 f. Calculate IP-, locator- and transport protocol header 1672 checksums. Transport protocol header calculations do not 1673 include the locator header fields. 1675 g. Forward the packet to the shared multicast tree. 1677 In order for the mRBR to function as described above, the source must 1678 assemble the multicast hIPv4 packet in the following way: 1680 a. Set the local IP address (S) from the API in the source address 1681 field of the IP header and in the ELOC field of the locator 1682 header. 1684 b. Set the multicast address (G) from the API in the destination 1685 address field of the IP header. 1687 c. Set the local ALOC prefix in the ALOC field of the locator 1688 header. 1690 d. Set the transport protocol value in the protocol field of 1691 the locator header and the hIPv4 protocol value in the protocol 1692 field of the IP header. 1694 e. Set the desired parameters in the A-, I-, S-, VLB-, and L- 1695 fields of the locator header. 1697 f. Set the FI-bits of the locator header to 00. 1699 g. Calculate IP-, locator- and transport protocol header 1700 checksums. Transport protocol header calculations do not 1701 include the locator header fields. When completed, the packet 1702 is transmitted. 1704 The downstream routers from the mRBR towards the receiver will use 1705 the source address (which is the source's ALOC prefix after the mRBR) 1706 in the IP header for RPF verification. In order for the receiver to 1707 create Real-Time Transport Control Protocol (RTCP) receiver reports, 1708 all information is provided in the hIPv4 header of the packet. 1710 Because Source Specific Multicast (SSM) and IGMPv3 use IP addresses 1711 in the payload, both protocols need to be modified to support the 1712 hIPv4 framework. 1714 11. Traffic Engineering Considerations 1716 When the intermediate phase of the hIPv4 framework is fully 1717 implemented, ingress load balancing to an ALOC realm can be 1718 influenced by the placement of RBRs at the realm; a RBR provides a 1719 shortest path scheme. Also, if RIR policies allow, a service provider 1720 can have several ALOCs assigned. Hence, traffic engineering and 1721 filtering can be done with the help of ALOC prefixes. For example, 1722 sensitive traffic can be aggregated under one ALOC prefix that is not 1723 fully distributed into the DFZ. 1725 If needed, an ALOC traffic engineering solution between ALOC realms 1726 might be developed, to create explicit paths that can be engineered 1727 via specific ALOC prefixes. For example, develop a mechanism similar 1728 to the one described in [Pathlet_Routing]. Further studies are 1729 needed; first it should be evaluated whether there is demand for such 1730 a solution. 1732 Ingress load balancing to a private remote ALOC realm (remote site) 1733 is influenced by how many attachment points to the Internet the site 1734 uses and where the attachment points are placed at the site. In order 1735 to apply local ALOC exit routing from e.g. a multi-homed site, some 1736 new network nodes are needed between the initiator and the border 1737 routers of the site. 1739 In the intermediate routing architecture this is achieved by using 1740 overlay architectures such as MPLS LSP, L2TPv3 tunnels etc. The new 1741 network node(s) shall be able to identify hIPv4 packets, based on the 1742 protocol field in the IP header, and switch the packets to explicit 1743 paths based on the ALOC prefix in the locator header. In the long- 1744 term routing architecture the overlay solution is replaced with a new 1745 forwarding plane; see section 6.2. 1747 Together with a multipath transport protocol, the subflows can be 1748 routed via specific attachment points, that is, border routers 1749 sitting between the private local/remote ALOC realms (multi-homed 1750 sites) and the Internet. Multi-homing becomes multi-pathing. For 1751 details, see Appendix B. 1753 11.1. Valiant Load-Balancing 1755 The use of multipath-enabled transport protocols opens up the 1756 possibility to develop a new design methodology of backbone networks, 1757 based on Valiant Load-Balancing [VLB]. If two sites that are 1758 connected with a single uplink to the Internet, and the endpoints are 1759 using multipath-enabled transport protocols and are attached to the 1760 network with only one interface/ELOC-prefix, both subflows will most 1761 likely take the shortest path throughout the Internet. That is, both 1762 subflows are established over the same links and when there is 1763 congestion on a link or a failure of a link, both subflows might 1764 simultaneously drop packets. Thus, the benefit of multi-pathing is 1765 lost. 1767 The "subflows-over-same-links" scenario can be avoided if the 1768 subflows are traffic engineered to traverse the Internet on different 1769 paths - but this is difficult to achieve by using classical traffic 1770 engineering, such as IGP tuning or MPLS-based traffic engineering. By 1771 adding a mechanism to the locator header, the "subflows-over-same- 1772 links" scenario might be avoided. 1774 If the RBR functionality is deployed on a Valiant Load-Balancing 1775 enabled backbone node - hereafter called vRBR - and the backbone 1776 nodes are interconnected via logical full meshed connections, Valiant 1777 Load-Balancing can be applied for the subflows. When a subflow has 1778 the appropriate bits set in the VLB-field of the locator header, the 1779 first ingress vRBR shall do VLB switching of the subflow. That is, 1780 the ingress vRBR is allowed to do VLB switching of the subflow's 1781 packets if the VLB bits are set to 01 or 11, the L-bit is set to 0, 1782 and the local ALOC prefix of the vRBR matches the ALOC-field's 1783 prefix. If there are no ALOC and ELOC fields in the locator header, 1784 but the other fields' values are set as described above, the vRBR 1785 should apply VLB switching as well for the subflow - because it is an 1786 inter-ALOC realm subflow belonging to a multipath-enabled session. 1788 With this combination of parameters in the locator header, the 1789 subflow is VLB switched only at the first ALOC realm and the subflows 1790 might be routed throughout the Internet on different paths. If VLB 1791 switching is applied at every ALOC realm, this would most likely add 1792 too much latency for the subflows. The VLB switching at the first 1793 ALOC realm will not separate the subflows on the first and last mile 1794 links (site with a single uplink). If the subflows on the first and 1795 last mile link need to be routed on separate links, the endpoints 1796 should be deployed in a multi-homed environment. Studies on how 1797 Valiant Load-Balancing is influencing traffic patterns between 1798 interconnected VLB [iVLB] backbone networks have been done. 1799 Nevertheless, more studies are needed regarding Valiant Load- 1800 Balancing scenarios. 1802 12. Mobility Considerations 1804 This section considers two types of mobility solutions: site mobility 1805 and endpoint mobility. 1807 Site mobility: 1809 Today, classical multi-homing is the most common solution for 1810 enterprises that wish to achieve site mobility. Multi-homing is one 1811 of the key findings behind the growth of the DFZ RIB - see [RFC4984], 1812 sections 2.1 and 3.1.2. The hIPv4 framework can provide a solution 1813 for enterprises to have site mobility without the requirement of 1814 implementing a classical multi-homed solution. 1816 One of the reasons to deploy multi-homing is to avoid renumbering of 1817 the local infrastructure when an upstream ISP is replaced. Thus today 1818 PI-address blocks are deployed at enterprises. In the intermediate 1819 routing architecture an enterprise is allocated a regional PI ELOC 1820 block (for details, see Appendix A) that is used for internal 1821 routing. The upstream ISP provides an ALOC prefix that describes how 1822 the enterprise's network is connected to the Internet. If the 1823 enterprise wishes to switch to another ISP, it only changes the ALOC 1824 prefix at endpoints, from the previous ISP's ALOC prefix to the new 1825 ISP's ALOC prefix, without connectivity interruptions in the local 1826 network since the ALOC prefix is only used for Internet connectivity. 1827 In the long-term routing architecture, when the forwarding plane is 1828 upgraded, the regional PI ELOC block is returned to the RIR and the 1829 enterprise can use a full 32-bit ELOC space to design the internal 1830 routing topology. 1832 An enterprise can easily become multi-homed or switch ISPs. The local 1833 ELOC block is used for internal routing and upstream ISPs provide 1834 their ALOC prefixes for Internet connectivity. Multi-homing is 1835 discussed in detail in Appendix B. 1837 Endpoint mobility: 1839 As said earlier, MPTCP is the most interesting identifier/locator 1840 split scheme to solve endpoint mobility scenarios. MPTCP introduces a 1841 token, which is locally significant and currently defined as 32 bits 1842 long. The token will provide a sixth tuple to identify and verify the 1843 uniqueness of a session. This sixth tuple - the token - does not 1844 depend upon the underlying layer, the IP layer. The session is 1845 identified with the help of the token and thus the application is not 1846 aware when the locator parameters are changed, e.g. during a roaming 1847 situation, but it is required that the application is not making use 1848 of ELOC/ALOC information. In multi-homed scenarios the application 1849 can make use of ELOC information, which will not change if the 1850 endpoint is fixed to the location. 1852 Security issues arise: the token can be captured during the session 1853 by, for example, a man-in-the-middle attack. These attacks can be 1854 mitigated by applying [tcpcrypt], for example. If the application 1855 requires full protection against man-in-the-middle attacks, the user 1856 should apply Transport Layer Security Protocol, TLS [RFC5246], for 1857 the session. 1859 The most common endpoint mobility use case today is that the 1860 responder resides in the fixed network and the initiator is mobile. 1861 Thus MPTCP will provide roaming capabilities for the mobile endpoint, 1862 if both endpoints are making use of the MPTCP extension. However, in 1863 some use cases the fixed endpoint needs to initialize a session to a 1864 mobile responder. Therefore, Mobile IP, MIP [RFC5944] should 1865 incorporate the hIPv4 extension - MIP provides a rendezvous service 1866 for the mobile endpoints. 1868 Also, many applications provide rendezvous services for their users, 1869 e.g. SIP, peer-to-peer, Instant Messaging services, etc. A generic 1870 rendezvous service solution can be provided by an identifier/locator 1871 database scheme, e.g. HIP, ILNP or NBS. If desired, the user 1872 (actually the application) can make use of one of these rendezvous 1873 service schemes, such as extended MIP, some application-specific 1874 rendezvous services, or a generic rendezvous service - or some 1875 combination of them. 1877 The hIPv4 framework will not define which identifier/locator split 1878 solution should be used for endpoint mobility. The hIPv4 framework is 1879 focusing on routing scalability and supports several 1880 identifier/locator split solutions that can be exploited to develop 1881 new services, with the focus on endpoint mobility. 1883 13. Transition Considerations 1885 The hIPv4 framework is not introducing any new protocols that would 1886 be mandatory for the transition from IPv4 to hIPv4; instead, 1887 extensions are added to existing protocols. The hIPv4 framework 1888 requires extensions to the current IPv4 stack, to infrastructure 1889 systems, and to some applications that use IP address information, 1890 but the current forwarding plane in the Internet remains intact, 1891 except that a new forwarding element (the RBR) is required to create 1892 an ALOC realm. 1894 Extensions to the IPv4 stack, to infrastructure systems and 1895 applications that make use of IP address information can be deployed 1896 in parallel with the current IPv4 framework. Genuine hIPv4 sessions 1897 can be established between endpoints even though the current 1898 unidimensional addressing structure is still present. 1900 When will the unidimensional addressing structure then be replaced by 1901 a hierarchical addressing scheme and a fourth hierarchy added to the 1902 routing architecture? The author thinks there are two possible 1903 tipping points: 1905 o When the RIB of DFZ is getting close to the capabilities of 1906 current forwarding planes. Who will pay for the upgrade? Or will 1907 the service provider only accept ALOC prefixes from other service 1908 providers and avoid capital expenditures? 1910 o When the depletion of IPv4 addresses is causing enough problems 1911 for service providers and enterprises. 1913 The biggest risk why the hIPv4 framework will not succeed is the very 1914 short timeframe until the expected depletion of the IPv4 address 1915 space occurs - actually the first RIR have run out of IPv4 addresses 1916 during the IESG review process of this document (April 2011). Also, 1917 will enterprises give up their global allocation of the current IPv4 1918 address block they have gained - an IPv4 address block has become an 1919 asset with an economical value. 1921 The transition requires upgrade of endpoints' stack and this is a 1922 drawback compared to the [CES] architectures proposed in [RFC6115]. A 1923 transition to an architecture that requires upgrade of endpoints' 1924 stack is considerably slower than an architecture that requires only 1925 upgrade of some network nodes. But the transition might not be as 1926 slow or challenging at it first seems since hIPv4 is an evolution of 1927 the current deployed Internet. 1929 o Not all endpoints need to be upgraded; the endpoints that do not 1930 establish sessions to other ALOC realms can continue to make use 1931 of the classical IPv4 framework. Also legacy applications that are 1932 used only inside a local ALOC realm do not need to be ported to 1933 another framework. For further details, see appendix C. 1935 o Upgrading endpoints' stack at e.g. critical or complicated systems 1936 will definitely take time, thus it would be more convenient to 1937 install a middlebox in front of such systems. It is obvious that 1938 the hIPv4 framework needs a middlebox solution to speed up the 1939 transition; combining CES architectures with the hIPv4 framework 1940 might produce such a middlebox. For further details see 1941 appendix D. 1943 o The framework is incrementally deployable. Not all endpoints in 1944 the Internet need to be upgraded before the first IPv4 block can 1945 be released from a globally unique allocation status to a regional 1946 unique allocation status. That is, to achieve ELOC status for the 1947 prefixes used in a local network in the intermediate routing 1948 architecture, see appendix D. An ALOC realm that wish to achieve 1949 local unique status for its ELOC block in the long-term routing 1950 architecture do not need to wait for other ALOC realms to proceed 1951 to the same level simultaneously. It is sufficient that the other 1952 ALOC realms have achieved the intermediate routing architecture 1953 status. For further details see section 6. 1955 14. Security Considerations 1957 Because the hIPv4 framework does not introduce other network 1958 mechanisms than a new type of border router to the current deployed 1959 routing architecture, the best current practices for securing ISP 1960 networks are still valid. Since the DFZ will no longer contain ELOC 1961 prefixes there are some benefits and complications regarding security 1962 that need to be taken into account. 1964 Hijacking of a single ELOC prefix by longest match from another ALOC 1965 realm is no longer possible because the prefixes are separated by a 1966 locator, the ALOC. To carry out a hijack of a certain ELOC prefix, 1967 the whole ALOC realm must be routed via a bogus ALOC realm. Studies 1968 should be done with the Secure Inter-Domain Routing (SIDR) workgroup 1969 to determine whether the ALOC prefixes can be protected from 1970 hijacking. 1972 By not being able to hijack a certain ELOC prefix there are some 1973 implications when mitigating Distributed Denial-of-Service (DDoS) 1974 attacks. This implication occurs especially in the long-term routing 1975 architecture when e.g. a multi-homed enterprise is connected with 1976 unicast ALOC RBRs to the ISPs. 1978 One method used today to mitigate DDoS attacks is to inject a more 1979 specific prefix (typically host prefix) to the routing table so that 1980 the victim of the attack is "relocated", i.e. a sinkhole is created 1981 in front of the victim. The sinkhole may separate bogus traffic from 1982 valid traffic or analyze the attack. The challenge in the long-term 1983 routing architecture is how to reroute a specific ELOC prefix of the 1984 multi-homed enterprise when the ELOC prefix can not be installed in 1985 the ISP's routing table? 1987 Creating a sinkhole for all traffic designated to an ALOC realm might 1988 be challenging and expensive, depending on the size of the multi- 1989 homed enterprise. To have the sinkhole at the enterprise's ALOC realm 1990 may saturate the connections between the enterprise and ISPs, thus 1991 this approach is not a real option. 1993 By borrowing ideas from a service-centric networking architecture 1994 [SCAFFOLD], a sinkhole service can be created. An example of how a 1995 distributed sinkhole service can be designed follows: 1997 a. A firewall (or similar node) at the victim's ALOC realm 1998 discovers an attack. The security staff at the enterprise 1999 realizes that the amount of the incoming traffic caused by the 2000 attack is soon saturating the connections or other resources. 2001 Thus the staff informs the upstream ISPs of the attack, also 2002 about the victim's ALOC prefix X and ELOC prefix Y. 2004 b. The ISP reserves the resources for the sinkhole service. These 2005 resources make use of ALOC prefix Z, the resources are 2006 programmed with a service ID and the victim's X and Y prefixes. 2007 The ISP informs the victim's security staff of the service ID. 2008 The ISP applies a NAT rule on their RBRs and/or hIPv4 enabled 2009 routers. The NAT rule replaces the destination address in the IP 2010 header of packets with Z when the destination address of the IP 2011 header matches X and the ELOC prefix of the locator header 2012 matches Y. Also the service ID is inserted to the locator 2013 header, the service ID act as a referral for the sinkhole. It is 2014 possible that the sinkhole serves several victims, thus a 2015 referral is needed. PMTUD issues must be taken into account. 2017 c. The victim's inbound traffic is now routed at the RBRs and/or 2018 hIPv4 enabled routers to the sinkhole(s), the traffic is 2019 identified by the service ID. Bogus traffic is discarded at the 2020 sinkhole, for valid traffic the value of the destination address 2021 in the IP header Z is replaced with X. By using service ID in 2022 the analyzed packets the enterprise is informed that the packets 2023 containing service ID is valid traffic and allowed to be 2024 forwarded to the victim. It might possible that not all upstream 2025 ISPs are redirecting traffic to the distributed sinkholes. Thus 2026 traffic that does not contain the agreed service ID might be 2027 bogus. Also by inserting a service ID to the valid packets, 2028 overlay solutions between the routers, sinkholes and victim can 2029 be avoided. In case the valid packet with a service ID traverses 2030 another RBR or hIPv4 enabled router containing the same NAT 2031 rule, that packet is not rerouted to the sinkhole. The 2032 enterprise shall ensure that the victim does not use the service 2033 ID in its replies - if the attacker becomes aware of the service 2034 ID the sinkhole is disarmed. 2036 Today, traffic is sent to sinkholes by injecting host routes into the 2037 routing table. This method can still be used inside an ALOC realm for 2038 intra-ALOC attacks. For attacks spanning over several ALOC realms new 2039 methods are needed, one example is described above. It is desirable 2040 that the RBR and hIPv4 enabled routers are capable of applying NAT 2041 rules and inserting service ID to selected packets in the forwarding 2042 plane. 2044 15. IANA Considerations 2046 There are no IANA considerations for this document. 2048 16. Conclusions 2050 This document offers a high-level overview of the hierarchical IPv4 2051 framework that could be built in parallel with the current Internet 2052 by implementing extensions at several architectures. Implementation 2053 of the hIPv4 framework will not require a major service window break 2054 in the Internet, nor at the private networks of enterprises. 2055 Basically, the hIPv4 framework is an evolution of the current IPv4 2056 framework. 2058 The transition to hIPv4 might be attractive for enterprises since the 2059 hIPv4 framework does not create a catch-22 situation, e.g. when 2060 should an application used only inside the private network be ported 2061 from IPv4 to IPv6? Also, what is the business justification for 2062 porting the application to IPv6? Another matter is that when an 2063 IPv4/v6 dual-stack solution is used it could impose operational 2064 expenditures, especially with rule sets at firewalls - both in front 2065 of servers and at clients. 2067 If an enterprise chooses to deploy hIPv4, however, the legacy 2068 applications do not need to be ported because hIPv4 is backwards 2069 compatible with the classical IPv4 framework. This means lower costs 2070 for the enterprise, and an additional bonus is the new stack's 2071 capabilities to better serve mobility use cases. 2073 But the enterprise must take the decision soon and act promptly, 2074 because the IPv4 address depletion is a reality in the very near 2075 future. If the decision is delayed, IPv6 will arrive, and then, 2076 sooner or later, the legacy applications will need to be ported. 2078 However, though this document has focused only on IPv4, a similar 2079 scheme can be deployed for IPv6 in the future, that is, creating a 2080 64x64 bit locator space. But some benefits would have been lost at 2081 writing moment, such as: 2083 o Backwards compatibility with the current Internet and 2084 therefore no smooth migration plan is gained. 2086 o The locator header, including ALOC and ELOC prefixes, would 2087 have been larger, 160 bits versus 96 bits. And the identifier 2088 (EUI-64) would always have been present, which can be 2089 considered as pros or cons, depending upon one's view of the 2090 privacy issue, as discussed in [RFC4941] and in 2091 [Mobility_& _Privacy]. 2093 If an enterprise prefers hIPv4 (due to e.g. gaining additional IPv4 2094 addresses and smooth migration capabilities), there is an 2095 unintentional side effect (from the enterprise's point of view) on 2096 the routing architecture of the Internet; multi-homing becomes multi- 2097 pathing, and an opportunity opens up for the service providers to 2098 create an Internet routing architecture that holds less prefixes and 2099 generates less BGP updates in DFZ than the current Internet. 2101 The hIPv4 framework is providing a new hierarchy in the routing 2102 subsystem and is complementary work to multipath enabled transport 2103 protocols (such as MPTCP and SCTP) and service-centric networking 2104 architectures (such as SCAFFOLD). End users and enterprises are not 2105 interested in routing issues in the Internet; instead a holistic view 2106 should be applied on the three disciplines with a focus on new 2107 service opportunities and communicated to the end users and 2108 enterprises. Then perhaps the transition request to a new routing 2109 architecture will be accepted and carried out. However, more work is 2110 needed to accomplish a holistic framework of the three disciplines. 2112 17. References 2114 17.1. Normative References 2116 [RFC3031] Rosen, E., Viswanathan, A., Callon, R., "Multiprotocol 2117 Label Switching Architecture", RFC 3031, January 2001 2119 [RFC4884] Bonica, R., Gan, D., Tappan, D., Pignataro, C. "Extended 2120 ICMP to support Multi-Part Messages", RFC 4884, April 2007 2122 [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.J., 2123 Lear, E. "Address Allocation for Private Internets", RFC 2124 1918, February 1996 2126 [RFC5944] Perkins, C. "IP Mobility Support for IPv4, Revised", RFC 2127 5944, November 2010 2129 [RFC1385] Wang, Z. "The Extended Internet Protocol", RFC 1385, 2130 November 1992 2132 [RFC5246] Dierks, T., Rescorla, E., "The Transport Layer Security 2133 (TLS) Protocol Version 1.2", RFC 5246, August 2008 2135 [RFC1812] Baker, F., "Requirements for IP Version 4 Routers", RFC 2136 1812, June 1995 2138 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., Rose, S., 2139 "DNS Security Introduction and Requirements", RFC 4033, 2140 March 2005 2142 [RFC4601] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I. 2143 "Protocol Independent Multicast - Sparse Mode", RFC 4601, 2144 August 2006 2146 17.2. Informative References 2148 [RFC4984] Meyer, D., Zhang, L., Fall, K., "Report from the IAB 2149 Workshop on Routing and Addressing", RFC 4984, September 2150 2007 2152 [RFC4423] Moskowitz, R., Nikander, P. "Host Identity Protocol (HIP) 2153 Architecture", RFC 4423, May 2006 2155 [RFC4960] Stewart, R. "Stream Control Transmission Protocol", RFC 2156 4960, September 2007 2158 [RBridge] Perlman, R., "RBridges, Transparent Routing", 2004, 2159 http://www.ieee-infocom.org/2004/Papers/26_1.PDF. 2161 [Dagstuhl] 2162 Arkko, J., Braun, M.B., Brim, S., Eggert, L., Vogt, C., 2163 Zhang, L., "Perspectives Workshop: Naming and Addressing in 2164 a Future Internet", 2009, 2165 http://www.dagstuhl.de/de/programm/kalender/semhp/?semnr=09 2166 102. 2168 [Nimrod] Chiappa, N., "A New IP Routing and Addressing 2169 Architecture", 1991, http://ana- 2170 3.lcs.mit.edu/~jnc/nimrod/overview.txt. 2172 [RFC6182] Ford, A., Raiciu, S., Handley, M., Barre, S., Iyengar, J., 2173 "Architectural Guidelines for Multipath TCP Development", 2174 RFC 6182, March 2011. 2176 [VLB] Zhang-Shen, R., McKeown, N., "Designing a Predictable 2177 Internet Backbone with Valiant Load-Balancing", 2004, 2178 http://conferences.sigcomm.org/hotnets/2004/HotNets- 2179 III%20Proceedings/zhang-shen.pdf. 2181 [iVLB] Babaioff, M., Chuang, J., "On the Optimality and 2182 Interconnection of Valiant Load-Balancing Networks", 2007, 2183 http://people.ischool.berkeley.edu/~chuang/pubs/VLB- 2184 infocom07.pdf. 2186 [RRG] RRG, "IRTF Routing Research Group Home Page", 2187 http://tools.ietf.org/group/irtf/trac/wiki/RoutingResearchG 2188 roup 2190 [RFC5880] Katz, D., Ward, D., "Bidirectional Forwarding Detection", 2191 RFC 5880, June 2010 2193 [CES] Jen, D., Meisel, M., Yan, H. Massey, D., Wang, L., Zhang, 2194 B., Zhang, L., "Towards A New Internet Routing 2195 Architecture: Arguments for Separating Edges from Transit 2196 Core", 2008, 2197 http://conferences.sigcomm.org/hotnets/2008/papers/18.pdf. 2199 [ILNP] Atkinson, R., "ILNP Concept of Operations", draft-rja-ilnp- 2200 intro-10 (work in progress), February 2011 2202 [NBS] Ubillos, J., Xu, M., Ming, Z., Vogt, C., "Name-Based 2203 Sockets Architecture", draft-ubillos-name-based-sockets-03 2204 (work in progress), September 2010 2206 [Pathlet_Routing] 2207 Godfrey, P.G., Shenker, S., Stoica, I., "Pathlet Routing", 2208 2008, 2209 http://conferences.sigcomm.org/hotnets/2008/papers/17.pdf. 2211 [tcpcrypt] 2212 Bittau, A., Hamburg, M., Handley, M., Mazi`eres, D., Boneh, 2213 D., "The case for ubiquitous transport-level encryption", 2214 2010, http://tcpcrypt.org/tcpcrypt.pdf. 2216 [LISP] Farinacci, D., Fuller, V., Meyer, D., Lewis, D., 2217 "Locator/ID Separation Protocol", draft-ietf-lisp-11 (work 2218 in progress), March 2011 2220 [RFC6115] Li, T., "Recommendation for a Routing Architecture", RFC 2221 6115, February 2010 2223 [RRG_Design_Goals] 2224 Li, T., "Design Goals for Scalable Internet Routing", 2225 draft-irtf-rrg-design-goals-06 (work in progress), January 2226 2011 2228 [RFC3618] Fenner, B., Meyer, D., "Multicast Source Discovery 2229 Protocol", RFC 3618, October 2003 2231 [Split-DNS] 2232 BIND 9 Administrator Reference Manual, 2233 http://www.bind9.net/manual/bind/9.3.1/Bv9ARM.ch04.html#AEN 2234 767. 2236 [Porting_IPv4] 2237 DeLong, O., "Porting IPv4 applications to dual stack, with 2238 examples", 2010, 2239 http://www.apricot.net/apricot2010/program/tutorials/portin 2240 g-ipv4-apps.html. 2242 [RFC3597] Gustafsson, A., "Handling of Unknown DNS Resource Record 2243 (RR) Types", RFC 3597, September 2003 2245 [RFC5395] Eastlake 3rd, D., "Domain Name System (DNS) IANA 2246 Considerations", RFC 5395, November 2008 2248 [RFC4941] Narten, T., Draves R., Krishnan, S., "Privacy 2249 Extensions for Stateless Address Autoconfiguration 2250 in IPv6", RFC 4941, September 2007 2252 [ID/loc_Split] 2253 Thaler, D., "Why do we really want an ID/locator split 2254 anyway?", 2008, 2255 http://conferences.sigcomm.org/sigcomm/2008/workshops/mobia 2256 rch/slides/thaler.pdf. 2258 [Mobility_&_Privacy] 2259 Brim, S., Linsner. M., McLaughlin, B., Wierenga, K. 2260 "Mobility and Privacy", draft-brim-mobility-and-privacy-01 2261 (work in progress), March 2011 2263 [Revisiting_Route_Caching] 2264 Kim, C., Caesar, M., Gerber, A., Rexford, J., "Revisiting 2265 Route Caching: The World Should Be Flat", 2009, 2266 http://www.springerlink.com/content/80w13260665v2013/ 2268 [SCAFFOLD] 2269 Freedman, M.J., Arye, M., Gopalan, P., Steven Y. Ko, S.Y., 2270 Nordstrom, E., Rexford, J., Shue, D. "Service-Centric 2271 Networking with SCAFFOLD", September 2010 2272 http://www.cs.princeton.edu/research/techreps/TR-885-10 2274 18. Acknowledgments 2276 The active participants at the Routing Research Group [RRG] mailing 2277 list are acknowledged. They have provided ideas, proposals and 2278 discussions that have influenced the architecture of the hIPv4 2279 framework. The following persons, in alphabetical order, have 2280 provided valuable review input: Aki Anttila, Mohamed Boucadair, Antti 2281 Jarvenpaa, Dae Young KIM, Mark Lewis, Wes Toman and Robin Whittle. 2283 Also, during the IRSG and IESG review process, Rajeev Koodli, Wesley 2284 Eddy, Jari Arkko and Adrian Farrel provided valuable review input. 2286 Lastly, a special thanks to Alfred Schwab from the Poughkeepsie ITSO 2287 for his editorial assistance. 2289 This document was prepared using 2-Word-v2.0.template.dot. 2291 Appendix A. Short Term and Future IPv4 Address Allocation Policy 2293 In this section we study how the hIPv4 framework could influence the 2294 IPv4 address allocation policies to ensure that the new framework 2295 will enable some re-usage of IPv4 address blocks. It is the Regional 2296 Internet Registries (RIRs) that shall define the final policies. 2298 When the intermediate routing architecture (see Figure 1) is fully 2299 implemented, every ALOC realm could have a full IPv4 address space, 2300 except the GLB, to allocate ELOC blocks from. There are some 2301 implications, however. In order for an enterprise to achieve site 2302 mobility, that is, to change service provider without changing its 2303 ELOC scheme, the enterprise should implement an Autonomous System 2304 (AS) solution with an ALOC prefix at the attachment point to the 2305 service provider. 2307 Larger enterprises have the resources to implement AS border routing. 2308 Most large enterprises have already implemented multi-homing 2309 solutions. Small and midsize enterprises (SME) may not have the 2310 resources to implement AS border routing, or the implementation 2311 introduces unnecessary costs for the SME. Also, if every enterprise 2312 needs to have an ALOC prefix, this will have an impact on the RIB at 2313 the DFZ - the RIB will be populated with a huge number of ALOC 2314 prefixes. 2316 It is clear that a compromise is needed. A SME site usually deploys a 2317 single uplink to the Internet and should be able to reserve a PI ELOC 2318 block from the RIR without being forced to create an ALOC realm, that 2319 is, implement a RBR solution and AS border routing. Since the PI ELOC 2320 block is no longer globally unique, a SME can only reserve the PI 2321 ELOC block for the region where it is active or has its attachment 2322 point to the Internet. The attachment point rarely changes to another 2323 country; therefore, it is sufficient that the PI ELOC block is 2324 regionally unique. 2326 When the enterprise replaces its Internet service provider, it does 2327 not have to change its ELOC scheme - only the local ALOC prefix at 2328 the endpoints is changed. The internal traffic at an enterprise does 2329 not make use of the ALOC prefix. The internal routing uses only the 2330 ELOC prefixes, and thus the internal routing and addressing 2331 architectures are preserved. 2333 Mergers and acquisitions of enterprises can cause ELOC conflicts, 2334 because the PI ELOC block is hereafter only regionally unique. If an 2335 enterprise in region A acquires an enterprise in region B, there is a 2336 slight chance that both enterprises have overlapping ELOC prefixes. 2337 If overlapping of ELOC prefixes occurs, the private unicast ALOC 2338 solution can be implemented to separate them - if all affected 2339 endpoints support the hIPv4 framework. 2341 Finally, residential users will receive only PA locators. When a 2342 residential user changes a service provider, she/he has to replace 2343 the locators. Since a PA ELOC block is no longer globally unique, 2344 every Internet service provider can use the PA ELOC blocks at their 2345 ALOC realms - the PA locators become kind of private locators for the 2346 service providers. 2348 If the forwarding planes and all hosts that establish inter-ALOC 2349 realm sessions are upgraded to support the hIPv4 framework, that is, 2350 the long-term routing architecture (see Figure 2) is implemented, 2351 several interesting possibilities occur: 2353 o The regional allocation policy for PI ELOC spaces can be removed, 2354 and the enterprise can make use of the whole IPv4 address space 2355 that is globally unique today. The ELOC space is hereafter only 2356 significant at a local ALOC realm. 2358 o In case of mergers or acquisitions of enterprises, the private 2359 unicast ALOC solution can be used to separate overlapping ELOC 2360 spaces. 2362 o The GLB space can be expanded to make use of all 32 bits (except 2363 for the blocks defined in RFC1918) for anycast and unicast ALOC 2364 allocations; only ISPs are allowed to apply for GLB prefixes. 2366 o The anycast ALOC solution can be replaced with the global unicast 2367 ALOC solution since the ISP and enterprise no longer need to share 2368 ELOC routing information. Also, there is enough space in the GLB 2369 to reserve global unicast ALOC prefix(es) for every enterprise. 2371 o Residential users will still use global anycast ALOC solutions and 2372 if they change service providers, their locators need to be 2373 replaced. 2375 The result is that a 32x32 bit locator space is achieved. When an 2376 enterprise replaces an ISP with another ISP, only the ALOC prefix(es) 2377 is replaced at endpoints and infrastructure nodes. Renumbering of 2378 ALOC prefixes can be automated by, for example, DHCP and extensions 2379 to IGP. 2381 Appendix B. Multi-homing becomes Multi-pathing 2383 When the transition to the intermediate routing architecture (see 2384 Figure 1) is fully completed, the RIB of an ISP that has created an 2385 ALOC realm will have the following entries: 2387 o The PA ELOC blocks of directly attached customers (residential and 2388 enterprises) 2390 o The PI ELOC blocks of directly attached customers (e.g. 2391 enterprises) 2393 o The globally unique ALOC prefixes, received from other service 2394 providers 2396 The ISP will not carry any PA or PI ELOC blocks from other service 2397 providers in its routing table. In order to do routing and forwarding 2398 of packets between ISPs, only ALOC information of other ISPs is 2399 needed. 2401 Then the question is how to keep the growth of ALOC reasonable? If 2402 the enterprise is using PI addresses, has an AS number, and is 2403 implementing BGP, why not apply for an ALOC prefix? 2405 Classical multi-homing is causing the biggest impact on the growth of 2406 the size of the RIB in the DFZ - so replacing a /20 IPv4 prefix with 2407 a /32 ALOC prefix will not reduce the size of the RIB in the DFZ. 2409 Most likely, the only way to prevent this from happening is to impose 2410 a yearly cost for the allocation of an ALOC prefix, except if you are 2411 a service provider that is providing access and/or transit traffic 2412 for your customers. And it is reasonable to impose a cost for 2413 allocating an ALOC prefix for the non-service providers, because when 2414 an enterprise uses an ALOC prefix, it is reserving a FIB entry 2415 throughout the DFZ - the ALOC FIB entry needs to have power, space, 2416 hardware and cooling on all the routers in the DFZ. 2418 Implementing this kind of ALOC allocating policy will reduce the RIB 2419 size in the DFZ quite well, because multi-homing will no longer 2420 increase the RIB size of the DFZ. But this policy will have some 2421 impact on the resilience behavior because by compressing routing 2422 information we will lose visibility in the network. In today's multi- 2423 homing solutions the network always knows where the remote endpoint 2424 resides. In case of a link or network failure, a backup path is 2425 calculated and an alternative path is found, and all routers in the 2426 DFZ are aware of the change in the topology. This functionality has 2427 off-loaded the workload of the endpoints; they only need to find the 2428 closest ingress router and the network will deliver the packets to 2429 the egress router, regardless (almost) of what failures happen in the 2430 network. And with the growth of multi-homed prefixes, the routers in 2431 the DFZ have been forced to carry greater workloads, perhaps close to 2432 their limits - the workload between the network and endpoints is not 2433 in balance. The conclusion is that the endpoints should take more 2434 responsibility for their sessions by offloading the workload in the 2435 network. How? Let us walk through an example: 2437 A remote enterprise has been given an ELOC block 192.168.1.0/24, 2438 either via static routing or BGP announced to the upstream service 2439 providers. The upstream service providers provide the ALOC 2440 information for the enterprise, 10.1.1.1 and 10.2.2.2. A remote 2441 endpoint has been installed and given ELOC 192.168.1.1 - the ELOC is 2442 a locator defining where the remote endpoint is attached to the 2443 remote network. The remote endpoint has been assigned ALOCs 10.1.1.1 2444 and 10.2.2.2 - an ALOC is a locator defining the attachment point of 2445 the remote network to the Internet. 2447 The initiator (local endpoint) that has ELOC 172.16.1.1 and ALOC 2448 prefixes 10.3.3.3 and 10.4.4.4 has established a session by using 2449 ALOC 10.3.3.3 to the responder (remote endpoint) at ELOC 192.168.1.1 2450 and ALOC 10.1.1.1. That is, both networks 192.168.10/24 and 2451 172.16.1.0/24 are multi-homed. ALOCs are not available in the current 2452 IP stack's API, but both ELOCs are seen as the local and remote IP 2453 addresses in the API, so the application will communicate between IP 2454 addresses 172.16.1.1 and 192.168.1.1. If ALOC prefixes are included, 2455 the session is established between 10.3.3.3:172.16.1.1 and 2456 10.1.1.1:192.168.1.1. 2458 Next, a network failure occurs and the link between the responder 2459 border router (BR-R1) and service provider that owns ALOC 10.1.1.1 2460 goes down. The border router of the initiator (BR-I3) will not be 2461 aware of the situation, because only ALOC information is exchanged 2462 between service providers and ELOC information is compressed to stay 2463 within ALOC realms. But BR-R1 will notice the link failure; BR-R1 2464 could rewrite the ALOC field in the locator header for this session 2465 from 10.1.1.1 to 10.2.2.2 and send the packets to the second service 2466 provider via BR-R2. The session between the initiator 2467 10.3.3.3:172.16.1.1 and the responder 10.2.2.2:192.168.1.1 remains 2468 intact because the legacy 5-tuple at the IP stack API does not 2469 change. Only the ALOC prefix of the responder has changed and this 2470 information is not shown to the application. An assumption here is 2471 that the hIPv4 stack does accept changes of ALOC prefixes on the fly 2472 (more about this later). 2474 If the network link between the BR-I3 and ISP providing ALOC 10.3.3.3 2475 fails, BR-I3 could rewrite the ALOC prefix in the locator header and 2476 route the packets via BR-I4 - and the session would stay up. If there 2477 is a failure somewhere in the network, the border routers might 2478 receive an ICMP destination unreachable message (if not blocked by 2479 some security functionality) and thus try to switch the session over 2480 to the other ISP by replacing the ALOC prefixes in the hIPv4 header. 2481 Or the endpoints might try themselves to switch to the other ALOCs 2482 after a certain time-out in the session. In all session transition 2483 cases the legacy 5-tuple remains intact. 2485 If border routers or one of the endpoints changes the ALOC prefix 2486 without a negotiation with the remote endpoint, security issues 2487 arise. Can the endpoints trust the remote endpoint when ALOC prefixes 2488 are changed on the fly - is it still the same remote endpoint or has 2489 the session been hijacked by a bogus endpoint? The obvious answer is 2490 that an identification mechanism is needed to ensure that after a 2491 change in the path or a change of the attachment point of the 2492 endpoint, the endpoints are still the same. An identifier needs to be 2493 exchanged during the transition of the session. 2495 Identifier/locator split schemes have been discussed on the [RRG] 2496 mailing list, for example multipath-enabled transport protocols and 2497 identifier database schemes. Both types of identifiers can be used to 2498 protect the session from being hijacked. A session identifier will 2499 provide a low level security mechanism, offering some protection 2500 against hijacking of the session and also provide mobility. SCTP uses 2501 the verification tag to identify the association; MPTCP incorporates 2502 a token functionality for the same purpose - both can be considered 2503 to fulfill the characteristics of a session identifier. [tcpcrypt] 2504 can be used to further mitigate session hijacking. If the application 2505 requires full protection against man-in-the-middle attacks, TLS 2506 should be applied for the session. Both transport protocols are also 2507 multipath-capable. Implementing multipath-capable transport protocols 2508 in a multi-homed environment will provide new capabilities, such as: 2510 o Concurrent and separate exit/entry paths via different attachment 2511 points at multi-homed sites. 2513 o True dynamic load-balancing, in which the endpoints do not 2514 participate in any routing protocols or do not update rendezvous 2515 solutions due to network link or node failures. 2517 o Only a single NIC on the endpoints is required. 2519 o In case of a border router or ISP failure, the multipath transport 2520 protocol will provide resilience. 2522 By adding more intelligence at the endpoints, such as multipath- 2523 enabled transport protocols, the workload of the network is offloaded 2524 and can take less responsibility for providing visibility of 2525 destination prefixes on the Internet - for example, prefix 2526 compression in the DFZ can be applied and only the attachment points 2527 of a local network need to be announced in the DFZ. And the IP 2528 address space no longer needs to be globally unique; it is sufficient 2529 that only a part is globally unique, with the rest being only 2530 regionally unique (in the long-term routing architecture, locally 2531 unique) as discussed in Appendix A. 2533 The outcome is that the current multi-homing solution can migrate 2534 towards a multi-pathing environment that will have the following 2535 characteristics: 2537 o An AS number is not mandatory for enterprises. 2539 o The BGP protocol is not mandatory at the enterprise's border 2540 routers; static routing with Bidirectional Failure Detection, 2541 BFD [RFC5880] is an option 2543 o Allocation of global ALOC prefixes for the enterprise should not 2544 be allowed; instead, upstream ISPs provide the global ALOC 2545 prefixes for the enterprise. 2547 o MPTCP provides dynamic load-balancing without using routing 2548 protocols; several paths can be used simultaneously and thus 2549 resilience is achieved. 2551 o Provides low growth of RIB entries at the DFZ. 2553 o When static routing is used between the enterprise and the ISP: 2555 o The RIB size at the enterprise's border routers does not 2556 depend upon the size of the RIB in the DFZ nor in adjacent 2557 ISPs. 2559 o The enterprise's border router cannot cause BGP churn in the 2560 DFZ or in the adjacent ISPs' RIB. 2562 o When dynamic routing is used between the enterprise and the ISP: 2564 o The RIB size at the enterprise's border routers depends upon 2565 the size of the RIB in the DFZ and adjacent ISPs. 2567 o The enterprise's border router can cause BGP churn for the 2568 adjacent ISPs, but not in the DFZ. 2570 o The cost of the border router should be less than in today's 2571 multi-homing solution. 2573 Appendix C. Incentives and Transition Arguments 2575 The media has announced the meltdown of the Internet and the 2576 depletion of IPv4 addresses several times, but the potential chaos 2577 has been postponed and the general public has lost interest in these 2578 announcements. Perhaps it could be worthwhile to find other valuable 2579 arguments that the general public could be interested in, such as: 2581 o Not all endpoints need to be upgraded; only those that are 2582 directly attached to the Internet, such as portable laptops, smart 2583 mobile phones, proxies, and DMZ/frontend endpoints. But the most 2584 critical endpoints, the backend endpoints where enterprises keep 2585 their most critical business applications do not need to be 2586 upgraded. These endpoints should not be reached at all from the 2587 Internet, only from the private network. And this functionality 2588 can be achieved with the hIPv4 framework, since it is backwards 2589 compatible with the current IPv4 stack. Therefore, investments in 2590 legacy applications used inside an ALOC realm are preserved. 2592 o Mobility - it is estimated that the demand for applications that 2593 perform well over the wireless access network will increase. 2594 Introduction of MPTCP and identifier/locator split schemes opens 2595 up new possibilities to create new solutions and applications that 2596 are optimized for mobility. The hIPv4 framework requires an 2597 upgrade of the endpoints' stack; if possible, the hIPv4 stack 2598 should also contain MPTCP and identifier/locator split scheme 2599 features. Applications designed for mobility could bring 2600 competitive benefits. 2602 o The intermediate routers in the network do not need to be upgraded 2603 immediately; the current forwarding plane can still be used. The 2604 benefit is that the current network equipment can be preserved at 2605 the service providers, enterprises, and residences (except 2606 middleboxes). This means that the carbon footprint is a lot lower 2607 compared to other solutions. Many enterprises do have green 2608 programs and many residential users are concerned with the global 2609 warming issue. 2611 o The migration from IPv4 to IPv6 (currently defined architecture) 2612 will increase the RIB and FIB throughout DFZ. Whether it will 2613 require a new upgrade of the forwarding plane as discussed in 2614 [RFC4984] is unclear. Most likely an upgrade is needed. The 2615 outcome of deploying IPv4 and IPv6 concurrently is that the 2616 routers need to have larger memories for the RIB and FIB - every 2617 globally unique prefix is installed in the routers that are 2618 participating in the DFZ. Since the enterprise reserves one or 2619 several RIB/FIB entries on every router in the DFZ, it is 2620 increasing the power consumption of the Internet, thus increasing 2621 the carbon footprint. And many enterprises are committed to green 2622 programs. If hIPv4 is deployed, the power consumption of the 2623 Internet will not grow as much as in an IPv4 to IPv6 transition 2624 scenario. 2626 o Another issue: if the migration from IPv4 to IPv6 (currently 2627 defined architecture) occurs, the routers in the DFZ most likely 2628 need to be upgraded to more expensive routers, as discussed in 2629 [RFC4984]. In the wealthy part of the world, where a large 2630 penetration of Internet users is already present, the service 2631 providers can pass the costs of the upgrade along to their 2632 subscribers more easily. With a "wealthy/high penetration" ratio 2633 the cost will not grow so much that the subscribers would abandon 2634 the Internet. But in the less wealthy part of the world, where 2635 there is usually a lower penetration of subscribers, the cost of 2636 the upgrade cannot be accepted so easily - a "less wealthy/low 2637 penetration" ratio could impose a dramatic increase of the cost 2638 that needs to be passed along to the subscribers. And thus fewer 2639 subscribers could afford to get connected to the Internet. For the 2640 global enterprises and the enterprises in the less wealthy part of 2641 the world, this scenario could mean less potential customers and 2642 there could be situations when the nomads of the enterprises can't 2643 get connected to the Internet. This is also not fair; every human 2644 being should have a fair chance to be able to enjoy the Internet 2645 experience - and the wealthy part of the world should take this 2646 right into consideration. Many enterprises are committed to 2647 Corporate Social Responsibility programs. 2649 Not only technical and economical arguments can be found. Other 2650 arguments that the general public is interested in and concerned 2651 about can be found, for example, that the Internet becomes greener 2652 and more affordable for everyone, in contrast with the current 2653 forecast of the evolution of the Internet. 2655 Appendix D. Integration with CES Architectures 2657 Because the hIPv4 framework requires changes to the endpoints' stack, 2658 it will take some time before the migration of the current IPv4 2659 architecture to the intermediate hIPv4 routing architecture is fully 2660 completed. If a hIPv4 proxy solution could be used in front of 2661 classical IPv4 endpoints, the threshold for early adopters to start 2662 to migrate towards the hIPv4 framework would be less questionable and 2663 the migration phase would also most likely be much shorter. 2665 Therefore, it should be investigated whether the hIPv4 framework can 2666 be integrated with Core-Edge Separation [CES] architectures. In CES 2667 architectures the endpoints do not need to be modified. The design 2668 goal of a CES solution is to minimize the PI-address entries in the 2669 DFZ and to preserve the current stack at the endpoints. But a CES 2670 solution requires a new mapping system and also introduces a caching 2671 mechanism in the map-and-encapsulate network nodes. Much debate about 2672 scalability of a mapping system and the caching mechanism has been 2673 going on at the [RRG] list. At the present time it is unclear how 2674 well both solutions will scale; research work on both topics is still 2675 in progress. 2677 Since the CES architectures divide the address spaces into two new 2678 categories, one that is installed in the RIB of the DFZ and one that 2679 is installed in the local networks, there are to some degree 2680 similarities between CES architectures and the hIPv4 framework. 2681 Actually, the invention of the RBR functionality was inspired by 2682 [LISP]. 2684 In order to describe how these two architectures might be integrated, 2685 some terminology definitions are needed: 2687 CES-node: 2689 A network node installed in front of a local network that must have 2690 the following characteristics: 2692 o Map-and-encapsulate ingress functionality 2694 o Map-and-encapsulate egress functionality 2696 o Incorporate the hIPv4 stack 2698 o Routing functionality, [RFC1812] 2700 o Be able to apply policy-based routing on the ALOC field in the 2701 locator header 2703 The CES-node does not include the MPTCP extension because it would 2704 most likely put too much of a burden on the CES-node to signal and 2705 maintain MPTCP subflows for the cached hIPv4 entries. 2707 Consumer site: 2709 A site that is not publishing any services towards the Internet, 2710 that is, there are no entries in DNS for this site. It is used by 2711 local endpoints to establish outbound connectivity - endpoints are 2712 initiating sessions from the site towards content sites. Usually 2713 such sites are found at small enterprises and residencies. PA- 2714 addresses are usually assigned to them. 2716 Content site: 2718 A site that is publishing services towards the Internet, and which 2719 usually does have DNS entries. Such a site is used by local 2720 endpoints to establish both inbound and outbound connectivity. 2721 Large enterprises use PI-addresses, while midsize/small enterprises 2722 use either PI- or PA-address space. 2724 The CES architectures aim to reduce the PI-address entries in the 2725 DFZ. Therefore, map-and-encapsulate egress functionality will be 2726 installed in front of the content sites. Map-and-encapsulate ingress 2727 functionality is required at the Internet Service Providers, but for 2728 the hIPv4-CES integration study the map-and-encapsulate ingress 2729 functionality at the ISPs is not interesting - RBR functionality and 2730 provider map-and-encapsulate ingress functionality might reside in 2731 the same node. 2733 It is likely that the node containing map-and-encapsulate egress 2734 functionality will also contain map-and-encapsulate ingress 2735 functionality; it is most likely a router, so the node just needs to 2736 support the hIPv4 stack and be able to apply policy-based routing 2737 using the ALOC field of the locator header to become a CES-node. It 2738 is possible that the Large Content Providers (LCP) are not willing to 2739 install map-and-encapsulate functionality in front of their sites. If 2740 the caching mechanism is not fully reliable or if the mapping lookup 2741 delay does have an impact on their clients' user experience, then 2742 most likely the LCP will not adopt the CES architecture. 2744 In order to convince a LCP to adopt the CES architecture, it should 2745 provide a mechanism to mitigate the caching and mapping lookup delay 2746 risks. One method is to push the CES architectures to the edge - the 2747 closer to the edge you add new functionality, the better it will 2748 scale, that is, if the endpoint stack is upgraded, the caching 2749 mechanism is maintained by the endpoint itself. The mapping mechanism 2750 can be removed if the CES architecture's addressing scheme is 2751 replaced with the addressing scheme of hIPv4 when the CES solution is 2752 integrated at the endpoints. With this approach the LCPs might 2753 install a CES-node in front of their sites. Also, some endpoints at 2754 the content site might be upgraded with the hIPv4 stack. 2756 If the LCP faces issues with the caching or mapping mechanisms, the 2757 provider can ask its clients to upgrade their endpoints' stack to 2758 ensure a proper service level. At the same time the LCP promotes the 2759 migration from the current routing architecture to a new routing 2760 architecture, not for the sake of the routing architecture but 2761 instead to ensure a proper service level - you can say that a 2762 business model will promote the migration of a new routing 2763 architecture. 2765 The hIPv4 framework proposes that the IPv4 addresses (ELOC) should no 2766 longer be globally unique; once the transition is completed, a more 2767 regional allocation can be deployed. But this is only possible once 2768 all endpoints (that are establishing sessions to other ALOC realms) 2769 have migrated to support the hIPv4 framework. Here the CES 2770 architecture can speed up the re-usage of IPv4 addresses, that is, 2771 once an IPv4 address block has become an ELOC block it can be re-used 2772 in the other RIR regions, without the requirement that all endpoints 2773 in the Internet must first be upgraded. 2775 As stated earlier, the CES architecture aims to remove PI-addresses 2776 from the DFZ, making the content sites more or less the primary 2777 target for the roll-out of a CES solution. At large content sites a 2778 CES-node most likely will be installed. To upgrade all endpoints 2779 (that are providing services towards the Internet) at a large content 2780 site will take time, and it might be that the endpoints at the 2781 content site are upgraded only within their normal life-cycle 2782 process. But if the size of the content site is small, the 2783 administrator either installs a CES-node or upgrades the endpoints' 2784 stack - a decision influenced by availability, reliability, and 2785 economic feasibility. 2787 Once the content sites have been upgraded, the PI-address entries 2788 have been removed from the DFZ. Most likely also some endpoints at 2789 the consumer sites have been upgraded to support the hIPv4 stack - 2790 especially if there have been issues with the caches or mapping 2791 delays that have influenced the service levels at the LCPs. Then the 2792 issue is how to keep track of the upgrade of the content sites - have 2793 they been migrated or not? If the content sites or content endpoints 2794 have been migrated, the DNS records should have either a CES-node 2795 entry or ALOC entry for each A-record. When the penetration of CES 2796 solutions at content sites (followed up by CES-node/ALOC records in 2797 DNS) is high enough, the ISP can start to promote the hIPv4 stack 2798 upgrade at the consumer sites. 2800 Once a PA-address block has been migrated it can be released from 2801 global allocation to a regional allocation. Why would an ISP then 2802 push its customers to deploy hIPv4 stacks? Because of the business 2803 model - it will be more expensive to stay in the current 2804 architecture. The depletion of IPv4 addresses will either cause more 2805 NAT at the service provider's network - operational expenditures will 2806 increase because the network will become more complex - or the ISP 2807 should force its customers to migrate to IPv6. But the ISP could lose 2808 customers to other ISPs that are offering IPv4 services. 2810 When PA-addresses have been migrated to the hIPv4 framework, the ISP 2811 will have a more independent routing domain (ALOC realm) with only 2812 ALOC prefixes from other ISPs and ELOC prefixes from directly 2813 attached customers. BGP churn from other ISPs is no longer received, 2814 the amount of alternative paths is reduced, and the ISP can better 2815 control the growth of the RIB at their ALOC realm. The operational 2816 and capital expenditures should be lower than in the current routing 2817 architecture. 2819 To summarize, the content providers might find the CES+hIPv4 solution 2820 attractive. It will remove the forthcoming IPv4 address depletion 2821 constraints without forcing the consumers to switch to IPv6, and thus 2822 the content providers can continue to grow (reach more consumers). 2824 The ISP might also find this solution attractive because it should 2825 reduce the capital and operational expenditures in the long term. 2826 Both the content providers and the ISPs are providing the foundation 2827 of the Internet. If both adopt this architecture, the consumers have 2828 to adopt. Both providers might find business models to "guide" the 2829 consumers towards the new routing architecture. 2831 Then how will this affect the consumer and content sites? Residential 2832 users will need to upgrade their endpoints. But it doesn't really 2833 matter which IP protocol version they use - it is the availability 2834 and affordability of the Internet that matters most. 2836 Enterprises will be affected a little bit more. The edge devices at 2837 the enterprises' local networks need to be upgraded - edge nodes such 2838 as AS border routers, middleboxes, DNS, DHCP, and public nodes - but 2839 by installing a CES-node in front of them, the upgrade process is 2840 postponed and the legacy nodes can be upgraded during their normal 2841 life-cycle process. The internal infrastructure is preserved, 2842 internal applications can still use IPv4, and all investment in IPv4 2843 skills is preserved. 2845 Walkthrough of use cases: 2847 1. A legacy endpoint at a content site establishes a session to a 2848 content site with a hIPv4 upgraded endpoint. 2850 When the legacy endpoint resolves the DNS entry for the remote 2851 endpoint (a hIPv4 upgraded endpoint), it receives an ALOC record in 2852 the DNS response. The legacy endpoint ignores the ALOC record. Only 2853 the A-record is used to establish the session. Next, the legacy 2854 endpoint initializes the session and a packet is sent towards the 2855 map-and-encapsulate ingress node, which needs to do a lookup at the 2856 CES mapping system (the assumption here is that no cache entry exists 2857 for the remote endpoint). The mapping system returns either a CES- 2858 node prefix or an ALOC prefix for the lookup - since the requested 2859 remote endpoint has been upgraded, the mapping system returns an ALOC 2860 prefix. 2862 The CES-node will not use the CES encapsulation scheme for this 2863 session. Instead, the hIPv4 header scheme will be used and a /32 2864 entry will be created in the cache. A /32 entry must be created; it 2865 is possible that not all endpoints at the remote site are upgraded to 2866 support the hIPv4 framework. The /32 cache entry can be replaced with 2867 a shorter prefix in the cache if all endpoints are upgraded at the 2868 remote site. To indicate this situation, a subfield should be added 2869 for the ALOC record in the mapping system. 2871 The CES-node must execute the following steps for the egress packets: 2873 a. Verify IP- and transport header checksums. 2875 b. Create the locator header and copy the value in the destination 2876 address field of the IP header to the ELOC field of the locator 2877 header. 2879 c. Replace the destination address in the IP header with the ALOC 2880 prefix given in the cache. 2882 d. Insert the local CES-node prefix in the ALOC field of the 2883 locator header. 2885 e. Copy the transport protocol value of the IP header to the 2886 protocol field of the locator header and set the hIPv4 protocol 2887 value in the protocol field of the IP header. 2889 f. Set the desired parameters in the A-, I-, S-, VLB-, and L- 2890 fields of the locator header. 2892 g. Set the FI-bits of the locator header to 00. 2894 h. Decrease the TTL value by one. 2896 i. Calculate IP-, locator-, and transport protocol header 2897 checksums. Transport protocol header calculations do not 2898 include the locator header fields. When completed, the 2899 packet is transmitted. 2901 j. Because the size of the packet might exceed MTU due to the 2902 insertion of the locator header, and if MTU is exceeded the 2903 CES-node should inform the source endpoint of the situation 2904 with an ICMP message, and the CES-node should apply 2905 fragmentation of the hIPv4 packet. 2907 2. A hIPv4-upgraded endpoint at a consumer/content site establishes a 2908 session to a content site with a CES-node in front of a legacy 2909 endpoint. 2911 The hIPv4 upgraded endpoint receives, in the DNS response, either an 2912 ALOC record or a CES-node record for the resolved destination. From 2913 the requesting hIPv4 endpoint's point of view it really doesn't 2914 matter if the new record prefix is used to locate RBR-nodes or CES- 2915 nodes in the Internet - the CES-node will act as a hIPv4 proxy in 2916 front of the remote legacy endpoint. Thus the hIPv4 endpoint 2917 assembles a hIPv4 packet to initialize the session, and when the 2918 packet arrives at the CES-node it must execute the following: 2920 a. Verify that the received packet uses the hIPv4 protocol value 2921 in the protocol field of the IP header. 2923 b. Verify IP-, locator- and transport protocol header checksums. 2924 Transport protocol header verification does not include the 2925 locator header fields. 2927 c. Replace the protocol field value of the IP header with the 2928 protocol field value of the locator header. 2930 d. Replace the source address in the IP header with the ELOC 2931 prefix of the locator header. 2933 e. Remove the locator header. 2935 f. Create a cache entry (unless an entry already exists) for 2936 returning packets. A /32 entry is required. To optimize the 2937 usage of cache entries, the CES-node might ask the CES mapping 2938 node whether all endpoints at the remote site are upgraded or 2939 not. If upgraded, a shorter prefix can be used in the cache. 2941 g. Decrease the TTL value by one. 2943 h. Calculate IP- and transport protocol header checksums. 2945 i. Forward the packet according to the destination address of the 2946 IP header. 2948 3. A hIPv4-enabled endpoint with a regional unique ELOC at a consumer 2949 site establishes a session to a consumer site with a legacy endpoint. 2951 In this use case the sessions will fail unless some mechanism is 2952 invented and implemented at the ISPs' map-and-encapsulate nodes. The 2953 sessions will work inside an ALOC realm since the classical IPv4 2954 framework is still valid. Sessions between ALOC realms will fail. 2955 Some applications establish sessions between consumer sites. The most 2956 common are gaming and peer-to-peer applications. These communities 2957 have historically been in the forefront of adopting new technologies. 2958 It is expected that they either develop workarounds to solve this 2959 issue or simply ask their members to upgrade their stacks. 2961 4. A legacy endpoint at a consumer/content site establishes a session 2962 to a content site with a CES-node in front of a legacy endpoint. 2964 Assumed to be described in CES architecture documents. 2966 5. A hIPv4-enabled endpoint at a consumer/content site establishes a 2967 session to a content site with a hIPv4-enabled endpoint. 2969 See section 5.2 2971 Authors' Addresses 2973 Patrick Frejborg 2974 Email: pfrejborg@gmail.com