idnits 2.17.1 draft-whittle-ivip-arch-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 07, 2010) is 5163 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-24) exists of draft-ietf-lisp-06 == Outdated reference: A later version (-16) exists of draft-irtf-rrg-recommendation-05 == Outdated reference: A later version (-09) exists of draft-lear-lisp-nerd-06 == Outdated reference: A later version (-02) exists of draft-lewis-lisp-interworking-00 == Outdated reference: A later version (-11) exists of draft-rja-ilnp-intro-02 == Outdated reference: A later version (-01) exists of draft-whittle-ivip-etr-addr-forw-00 Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Whittle 3 Internet-Draft First Principles 4 Intended status: Experimental March 07, 2010 5 Expires: September 8, 2010 7 Ivip (Internet Vastly Improved Plumbing) Architecture 8 draft-whittle-ivip-arch-04.txt 10 Abstract 12 Ivip (Internet Vastly Improved Plumbing) is a Core-Edge Separation 13 solution to the routing scaling problem, for both IPv4 and IPv6. It 14 provides portable address "edge" address space which is suitable for 15 multihoming and inbound traffic engineering (TE) to end-user networks 16 of all types and sizes - in a manner which imposes far less load on 17 the DFZ control plane than the only current method of achieving these 18 benefits: separately advertised PI prefixes. Ivip includes two 19 extensions for ITR-to-ETR tunneling without encapsulation and the 20 Path MTU Discovery problems which result from encapsulation - one for 21 IPv4 and the other for IPv6. Both involve modifying the IP header 22 and require most DFZ routers to be upgraded. Ivip is a good basis 23 for the TTR (Translating Tunnel Router) approach to mobility, in 24 which mobile hosts retain an SPI micronet of one or more IPv4 25 addresses (or IPv6 /64s) no matter what addresses or access network 26 they are using, including behind NAT and on SPI addresses. TTR 27 mobility for both IPv4 and IPv6 involves generally optimal paths, 28 works with unmodified correspondent hosts and supports all 29 application protocols. 31 Status of this Memo 33 This Internet-Draft is submitted to IETF in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 The list of current Internet-Drafts can be accessed at 47 http://www.ietf.org/ietf/1id-abstracts.txt. 49 The list of Internet-Draft Shadow Directories can be accessed at 50 http://www.ietf.org/shadow.html. 52 This Internet-Draft will expire on September 8, 2010. 54 Copyright Notice 56 Copyright (c) 2010 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 72 2. Brief description of Ivip . . . . . . . . . . . . . . . . . . 7 73 3. The routing scaling problem and other goals for an 74 architectural enhancement . . . . . . . . . . . . . . . . . . 11 75 4. Summary of Ivip's architectural choices . . . . . . . . . . . 14 76 5. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 77 5.1. IPv4 and IPv6 . . . . . . . . . . . . . . . . . . . . . . 16 78 5.2. Portability, multihoming and TE for billions of 79 end-user networks . . . . . . . . . . . . . . . . . . . . 16 80 5.3. Modular separation of the control of mapping from the 81 CES architecture itself . . . . . . . . . . . . . . . . . 18 82 5.4. Simple ITRs and ETRs with little or no communication 83 between them . . . . . . . . . . . . . . . . . . . . . . . 19 84 5.5. Maximise the flexibility with which ITRs and ETRs can 85 be located . . . . . . . . . . . . . . . . . . . . . . . . 20 86 5.6. Mobility . . . . . . . . . . . . . . . . . . . . . . . . . 20 87 5.7. Elimination of encapsulation and PMTUD problems . . . . . 21 88 5.8. No requirement for new host functionality . . . . . . . . 23 89 5.9. Full benefits to all adopters irrespective of level of 90 adoption . . . . . . . . . . . . . . . . . . . . . . . . . 24 91 5.10. Business incentives to deploy new infrastructure . . . . . 24 92 5.11. Maintenance of existing levels of security and 93 robustness . . . . . . . . . . . . . . . . . . . . . . . . 25 94 5.12. Avoiding the need for any one server to store or 95 receive the complete mapping database . . . . . . . . . . 26 96 5.13. Eliminating unfair burdens . . . . . . . . . . . . . . . . 27 97 6. Non-goals . . . . . . . . . . . . . . . . . . . . . . . . . . 29 98 6.1. Isolation between core and edge networks is not 99 required . . . . . . . . . . . . . . . . . . . . . . . . . 29 100 6.2. Full adoption not required . . . . . . . . . . . . . . . . 29 101 6.3. Mapping changes need not be free of financial cost . . . . 30 102 6.4. No attempt to cope with partially reachable ETRs . . . . . 31 103 6.5. No attempt to mix IPv4 and IPv6 . . . . . . . . . . . . . 33 104 6.6. Not Locator - Identifier Separation . . . . . . . . . . . 33 105 7. Architectural Choices . . . . . . . . . . . . . . . . . . . . 35 106 7.1. Core-Edge Separation rather than Elimination . . . . . . . 35 107 7.1.1. Core-Edge Elimination (CEE) architectures . . . . . . 35 108 7.1.2. Core-Edge Separation (CES) architectures . . . . . . . 38 109 7.2. Nearby authoritative query servers . . . . . . . . . . . . 39 110 7.3. Real-time mapping distribution . . . . . . . . . . . . . . 41 111 7.4. SPI address management . . . . . . . . . . . . . . . . . . 41 112 7.5. IP in IP encapsulation . . . . . . . . . . . . . . . . . . 44 113 7.6. MHF initially or in the long term to avoid 114 encapsulation and PMTUD problems . . . . . . . . . . . . . 44 115 7.7. Outer header address is that of the sending host . . . . . 44 116 7.8. IPTM (ITR Probes Tunnel MTU) PMTUD management . . . . . . 45 118 8. Architectural Elements . . . . . . . . . . . . . . . . . . . . 48 119 8.1. ITRs . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 120 8.1.1. Types of ITR and their addresses . . . . . . . . . . . 48 121 8.1.2. DITRs - Default ITRs in the DFZ . . . . . . . . . . . 49 122 8.1.3. Modified Header Forwarding - MHF-only ITRs . . . . . . 50 123 8.1.4. Encapsulation and PMTUD management . . . . . . . . . . 50 124 8.1.5. Mapping lookup and caching . . . . . . . . . . . . . . 52 125 8.1.6. ITFH - ITR Function in Host . . . . . . . . . . . . . 55 126 8.1.7. ITRs auto-discovering local query servers . . . . . . 55 127 8.2. ETRs . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 128 8.2.1. In servers or dedicated routers . . . . . . . . . . . 56 129 8.2.2. ETRs in ISP networks . . . . . . . . . . . . . . . . . 56 130 8.2.3. ETRs at the end-user network site . . . . . . . . . . 56 131 8.2.4. MHF ETR functionality - EAF and PLF . . . . . . . . . 57 132 8.2.5. ETR functionality for encapsulation . . . . . . . . . 58 133 8.3. QSRs - Resolving Query Servers . . . . . . . . . . . . . . 58 134 8.4. QSCs - caching query servers . . . . . . . . . . . . . . . 59 135 8.5. MHF - Modified Header Forwarding . . . . . . . . . . . . . 60 136 8.5.1. EAF - ETR Address Forwarding for IPv4 . . . . . . . . 60 137 8.5.2. PLF - Prefix Label Forwarding, for IPv6 . . . . . . . 61 138 8.6. TTR Mobility . . . . . . . . . . . . . . . . . . . . . . . 62 139 9. Security Considerations . . . . . . . . . . . . . . . . . . . 64 140 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 65 141 11. Informative References . . . . . . . . . . . . . . . . . . . . 66 142 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 69 143 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 70 145 1. Introduction 147 Version 03 (2010-01-13) of this Ivip-arch ID was a freshly written 148 document which is shorter than the original from 2007. Some 149 terminology has been changed and the presentation is optimised for 150 people who are involved in the RRG. Please see 151 [I-D.whittle-ivip-glossary] for definitions of some terms and 152 acronyms. Please refer to the RRG mailing list and 153 http://www.firstpr.com.au/ip/ivip/ for the latest developments. 155 This Version 04 includes significant changes to Ivip's mapping 156 system. The DRTM (Distributed Real Time Mapping) system 157 [I-D.whittle-ivip-drtm] removes the need for "Replicators", or for 158 any server to carry the full Ivip mapping database. While DRTM is 159 discussed in this Ivip-arch ID, please see the Ivip-drtm ID for a 160 full description of this system, and how it enables the introduction 161 of scalable routing solutions and global mobility with the initiative 162 and investments being made by organisations which need not be ISPs. 164 The Ivip (pr. "Eye-vip") project began in June 2007 and in early 165 2010 is one of the four Core-Edge Separation (CES) architectures 166 being considered by the RRG (IRTF Routing Research Group) 167 [I-D.irtf-rrg-recommendation] - the others being IRON-RANGER, LISP 168 [I-D.ietf-lisp] and TIDR [I-D.adan-idr-tidr]. 170 For my overall assessment of the proposals submitted to the RRG, and 171 for my arguments for why Ivip is the most suitable for further IETF 172 development, please see 173 http://www.ietf.org/mail-archive/web/rrg/current/msg06162.html 174 ("Recommendation suggestion from RW" 2010-03-04) My discussion of the 175 other proposals can be found in the RRG Archives of January and 176 February 2010. 178 I publicly disclose and discuss all Ivip developments as rapidly as 179 possible in order to gain support and constructive critiques - and in 180 the hope that any novel ideas will remain free from patent 181 encumbrances. 183 This ID is intended for readers who are broadly familiar with the 184 routing scaling problem and RRG discussions and who have, ideally, 185 familiarised themselves with LISP. 187 This ID provides not only a general description of Ivip, but the 188 rationale for architectural choices which distinguish Ivip from other 189 approaches. Some aspects of Ivip's architecture are discussed in 190 greater detail in separate documents: 192 The DRTM (Distributed Real Time Mapping) system 194 [I-D.whittle-ivip-drtm] describes the new approach to Ivip's real- 195 time mapping system, which uses multiple typically "nearby" full 196 database query servers provided directly or indirectly by MABOCs. 198 The TTR approach to mobility is described in [TTR Mobility]. 200 The IPv4 approach to Modified Header Forwarding (MHF) is described in 201 detail in [I-D.whittle-ivip-etr-addr-forw]. The IPv6 approach is 202 described in [PLF for IPv6] and the best summary of its operation can 203 be found at the end of the ~10k word Ivip Conceptual Summary and 204 Analysis: [Ivip Summary and Analysis] . 206 Ivip's approach to Path MTU Discovery, when ITRs tunnel using 207 encapsulation, is discussed in [PMTUD-Frag]. 209 2. Brief description of Ivip 211 Ivip (Internet Vastly Improved Plumbing) is a Core-Edge Separation 212 solution to the routing scaling problem, for both IPv4 and IPv6. It 213 provides portable address "edge" address space which is suitable for 214 multihoming and inbound traffic engineering (TE) to end-user networks 215 of all types and sizes - in a manner which imposes far less load on 216 the DFZ control plane than the only current method of achieving these 217 benefits: separately advertised PI prefixes. 219 The new "edge" subset of the global unicast address space which is 220 used in this fashion is called SPI (Scalable Provider Independent) 221 space. End-user networks divide their SPI space into "micronets", 222 each with a common mapping to a single ETR (Egress Tunnel Router) 223 address. Micronets have arbitrary starting points and integer 224 lengths - in units of IPv4 addresses or, for IPv6, /64 prefixes. 226 When an ITR (Ingress Tunnel Router) receives a packets which are 227 addressed to an SPI address. After looking up the mapping of the 228 micronet which covers the destination address, the ITR tunnels the 229 traffic packet to the ETR specified in that mapping - and the ETR 230 delivers the packet to the end-user network. 232 A Mapped Address Block (MAB) is a DFZ-advertised prefix of global 233 unicast address space which is typically divided up into many 234 separate micronets - such as hundreds to hundreds of thousands of 235 micronets, each of which can be used via any ISP. The total set of 236 all MABs constitutes the "edge" (SPI) subset of the global unicast 237 address range. The remainder is known as "core" space. 239 A MAB is managed by an MABOC (MAB Operating Company). MABOCs may be 240 end-user networks and the micronets their MABs contain may be used 241 solely for that end-user network - but each micronet can be mapped to 242 any ETR in the world. More typically, MABOCs will lease the SPI 243 space to large numbers of end-user networks on a commercial basis, 244 rather than use it themselves. 246 The mapping of each micronet is controlled directly by the end-user 247 network which owns or leases the portion of SPI space the micronet is 248 within - or by another organization appointed by this end-user 249 network. Multihoming end-user networks would typically contract a 250 separate company to change the mapping of their micronets, in 251 response to the reachability of their network through their two or 252 more ETRs and according to the network's inbound TE requirements. 254 DITRs (Default ITRs in the DFZ) are required for handling packets 255 sent to SPI addresses from hosts in networks without ITRs. The one 256 or more DITRs at a DITR site advertise in the DFZ the MABs the site 257 supports, which is typically a subset of all MABs in the Ivip system. 259 ITRs other than DITRs request mapping for SPI addresses from local 260 Resolving Query Servers (QSRs) in their own network or in their ISP's 261 network. They may do this directly or through one or more levels of 262 caching query servers - QSCs. 264 QSRs are caching query servers which query multiple, distributed, 265 authoritative query servers (QSAs) which are typically "nearby", such 266 as within a few thousand km. QSAs are located at a number of widely 267 dispersed sites, such as 5 to 50, where DITRs are located and run by, 268 or for, these MABOCs. Each QSA is authoritative for only a subset of 269 all MABs - the set supported by that DITR site. 271 Each QSR uses a DNS-based mechanism and an additional protocol to 272 discover two or more typically "nearby" QSAs for each MAB. Since 273 each QSA handles mapping requests for multiple MABs, this means the 274 number of such QSA's each QSR needs to discover is much less than the 275 number of MABs. The number of MABs is much less than the number of 276 end-user networks using SPI space - and the number of micronets is 277 greater than this number, since each end-user network may have many 278 micronets. 280 End-user networks or their appointees generate real-time mapping 281 changes using facilities provided by the MABOC which manages the MAB 282 the micronet is located within. Most mapping changes will be to 283 change the ETR address of an existing micronet. Other mapping 284 changes will redefine how an end-user network's SPI space is divided 285 into separate micronets. MABOCs will typically charge their 286 customers for each mapping change. 288 These mapping changes are transmitted in real-time from the MABOC to 289 the organisation which runs the DITR-sites with DITRs which advertise 290 this MAB. The mapping changes are received and incorporated into a 291 real-time updated full mapping database for this MAB, in one or more 292 QSAs at each site. One or more of these QSAs handle mapping queries 293 from the DITRs at the site and one or more handle mapping queries 294 from QSRs in typically nearby ISP and end-user networks. Any QSR can 295 send queries to any QSA, but would normally choose nearby ones. QSAs 296 can give feedback in mapping replies concerning how busy they are, 297 with suggestions of other QSAs to use instead. So there is natural 298 load-sharing with multiple QSAs being spread around the world and 299 dynamic load-balancing between them according to actual loads. 301 Since no one QSA or DITR-site is required to handle the full set of 302 MABs, since each DITR-site organization controls its own real-time 303 push of mapping to its sites, and since there can be any number of 304 DITR-sites and any number of DITR-site operating companies, there are 305 no obvious scaling limits on the number of micronets the entire 306 system can handle, or the frequency of mapping updates to those 307 micronets. If a given global set of DITR-sites hits some kind of 308 scaling limit in these respects, then the total load can be handled 309 by more such systems of DITR-sites. 311 QSRs too can be installed in larger numbers in a busy ISP or end-user 312 network if the query demand exceeds the capacity of one such server. 313 Each QSR can automatically discover very large numbers (tens to 314 hundreds of thousands) of QSAs, and each QSA will typically handle 315 dozens to hundreds or perhaps thousands of MABs. 317 While there is no assurance of nearby QSAs, MABOCs will generally 318 want to have numerous widely dispersed DITR sites, each with QSAs for 319 two reasons. Firstly to ensure the DITRs tunnel packets without been 320 too far from the path between the sending host (in a network without 321 ITRs) and the ETR. Secondly to encourage ISPs and larger end-user 322 networks to install ITRS and use the QSAs - since this will result in 323 shorter paths for packets and less load on the DITRs. 325 ISPs and end-user networks do not absolutely need to install ITRs. 326 However ISPs will be motivated to install them (and therefore install 327 several QSRs for them to send mapping queries to) for two reasons. 328 Firstly, to ensure the ISP's customer's SPI-addressed packets are 329 tunneled reliably, rather than relying on DITRs. Secondly, when 330 their customers send SPI-addressed packets to SPI-using end-user 331 networks which are also customers of the ISP, if the ISP has its own 332 ITRs, then these packets do not leave the ISP's network. Without 333 ITRs, they would leave the network via an expensive upstream link, be 334 tunneled by a DITR and return via the same or a different upstream 335 link. 337 Since end-user networks can run their own ETRs on existing PA address 338 space they get from their ISP, the only thing an ISP needs in order 339 to allow such a network to use SPI space is to accept outgoing 340 packets for forwarding when they have SPI source addresses. All 341 other initiatives and investments - including the provision of 342 multiple widely dispersed DITRs, QSAs and the real-time push of 343 mapping changes to these - is undertaken by the MABOCs who profit by 344 renting SPI space to their end-user customers. A MABOC need not be 345 an ISP. 347 Ivip includes two extensions for ITR-to-ETR tunneling without 348 encapsulation and the Path MTU Discovery problems which result from 349 encapsulation - one for IPv4 and the other for IPv6. Both involve 350 modifying the IP header and require most DFZ routers to be upgraded. 352 Ivip is a good basis for the TTR (Translating Tunnel Router) approach 353 to mobility, in which mobile hosts retain an SPI micronet of one or 354 more IPv4 addresses (or IPv6 /64s) no matter what addresses or access 355 network they are using, including behind NAT and on SPI addresses. 356 TTR mobility for both IPv4 and IPv6 involves generally optimal paths, 357 works with unmodified correspondent hosts and supports all 358 application protocols. TTR Mobility is described in: [TTR Mobility] 360 3. The routing scaling problem and other goals for an architectural 361 enhancement 363 For a fuller account of my understanding of the routing scaling 364 problem, and other problems which should be considered when devising 365 an architectural enhancement to the Internet, please see 366 http://www.ietf.org/mail-archive/web/rrg/current/msg06099.html 367 ("Scalable routing problem & architectural enhancements" 2010-02-23) 368 and http://www.ietf.org/mail-archive/web/rrg/current/msg06162.html 369 ("Recommendation suggestion from RW" 2010-03-04). 371 The most visible aspect of the routing scaling problem can be 372 summarised as there being practical problems and unfair cost-burdens 373 due to the growth in the number of PI prefixes end-user networks 374 advertise in the DFZ. Advertising PI prefixes is currently the only 375 method of providing portability, multihoming and inbound traffic 376 engineering (TE) for end-user networks. The same problem exists in 377 principle for IPv4 and IPv6, but only IPv4 has a problem at present. 379 The less visible part of it is the large number of end-user networks 380 who are unable to gain these benefits due to the costs and other 381 barriers to obtaining their own address space and advertising it in 382 the DFZ. Part of the reason for these costs and barriers is the 383 push-back against this practice, due to concerns about the burden 384 each PI prefix places on the DFZ control plane. Another part is the 385 cost and other difficulties of obtaining the minimum amount of space 386 which can be advertised in the DFZ - currently 256 IPv4 addresses as 387 a /24 prefix. 389 The burden placed on the interdomain routing system (often referred 390 to loosely as the Default-Free Zone - DFZ) by the prefixes advertised 391 by ISPs is generally thought not to be a problem. So the challenge 392 is to find a way of providing address space and new methods of 393 routing so that the portability, multihoming and TE needs of 394 potentially millions or billions of end-user networks can be served 395 in a "scalable" manner: efficiently, robustly and without unfair 396 burdens falling on anyone, such as those who operate the DFZ routers. 398 The unfair, unsustainable, burden is caused by the number of 399 separately advertised PI prefixes of end-user networks today - and 400 the rate at which these prefixes have their point of advertisement 401 changed. (Also, if an end-user network changes the type of 402 advertisement frequently, such as with more or less ASNs, this too is 403 a burden.) Please see 404 http://www.ietf.org/mail-archive/web/rrg/current/msg06163.html 405 ("Geoff Huston's BGP/DFZ research" 2010-03-05) for up-to-date 406 analysis of trends in the number of prefixes and in the problems 407 caused by changes to those prefixes. 409 The most important part of the burden is on the DFZ's "BGP control 410 plane". This is partly the inter-router BGP traffic and the overall 411 behaviour of routers - particularly any difficulty which the 412 excessive number of prefixes causes in the system converging to good 413 enough best-paths in the event of an outage. It is also the burden 414 of CPU effort and storage in the RIB of each router. This includes 415 the effort of writing changes to the FIB when RIB information 416 changes. Also, FIBs may have their ability to handle packets 417 temporarily disabled while new information is written. 419 The actual number of prefixes each DFZ router has to handle is a 420 major part of the problem, though the total RIB burden also depends 421 on how many neighbours each router has. The number of prefixes in 422 the FIB is a serious burden too, but it is widely believed that this 423 is not the most important problem. Any solution which only helps 424 reduce the number of prefixes the FIB must handle is not really a 425 solution to the problem. 427 The number of prefixes advertised in the DFZ is the most obvious and 428 directly costly part of the routing scaling problem - analogous to 429 the tip of an iceberg. The larger, harder-to-measure, part of the 430 problem is the unknown number of end-user networks which want or need 431 portability, multihoming and/or inbound TE but which cannot obtain it 432 at present, due to the costs and other barriers to gaining address 433 space and advertising it as PI prefixes. 435 In order to provide portability etc. to millions or perhaps billions 436 of end-user networks in a scalable manner, it follows that the DFZ 437 routers must not have to consider the prefixes of each individual 438 network in their RIB or FIB. Consequently, the Core-Edge Separation 439 class of scalable routing architectures work by providing a special 440 subset of the global unicast address space, which is suitable and 441 attractive for providing end-user networks with portability, 442 multihoming and TE, but which places only very slight burden on the 443 DFZ compared to the burden each PI prefix places today. (Core-Edge 444 Elimination architectures have a different approach, which is 445 discussed below in "Architectural Choices - Core-Edge Separation 446 rather than Elimination". 448 Support for mobility has not generally been considered part of the 449 routing scaling problem. However, mobility is prominently mentioned 450 in the RRG Charter. With the proliferation of cellphones, VoIP, 451 other IP applications it is reasonable to assume that in the future - 452 such as by 2020 - most hosts will be mobile devices, generally 453 running on limited battery power and relying on wireless links which 454 are frequently slow, unreliable and/or expensive. 456 Mobility is arguably an extreme form of portability and/or 457 multihoming. To embark on a major architectural enhancement for 458 scalable routing, in a manner which did not support billions of 459 mobile devices, would make little sense. While provision of mobility 460 is frequently assumed not to be related to interdomain routing, it is 461 prominent in the RRG's Charter. The TTR (Translating Tunnel Router) 462 Mobility architecture [TTR Mobility] is a new approach to global 463 mobility, for both IPv4 and IPv6 - and is an extension of a CES 464 architecture such as Ivip. 466 In the TTR Mobility architecture, each mobile device is generally 467 considered to be a separate end-user network. An entire 468 corporation's network, or that of a large university, is also an 469 "end-user network". So in the following discussion, this term could 470 mean a wide variety of things - far beyond the small subset of end- 471 user networks whish are currently able to gain and advertise PI 472 space. 474 4. Summary of Ivip's architectural choices 476 Ivip is based on some unique architectural choices, including: ITRs 477 (Ingress Tunnel Routers) receiving mapping changes in real-time; 478 typically "nearby" QSA authoritative query servers which are "full- 479 database" for at least one MAB, but typically a significant fraction 480 of all MABs; migration to Modified Header Forwarding (MHF) to avoid 481 encapsulation and its PMTUD (Path MTU Discovery) difficulties; and 482 (when encapsulation is used) the use of the sending host's address as 483 the outer header's source address, so that ETRs can easily enforce 484 ISP BR (Border Router) source address filtering on decapsulated 485 packets. 487 The following description assumes that Ivip will be introduced with 488 encapsulation, with long-term migration to MHF. However, it is 489 possible that by the time the introduction date is set that most DFZ 490 routers will have firmware based FIBs, and so could be easily 491 upgraded to support MHF. In that case, ITRs and ETRs could be much 492 simpler, since they would not need to handle encapsulation or PMTUD 493 management. 495 Below, Ivip is generally assumed to be introduced as a single system 496 for the purposes of solving the routing scaling problem. However, 497 multiple independent systems along the lines of Ivip (with 498 encapsulation) could also be introduced without need for 499 standardisation for the purpose of supporting commercial TTR Mobility 500 services. 502 The adoption of an architectural enhancement to improve routing 503 scalability is frequently assumed to depend largely or entirely on 504 ISPs making the initial investment. However, with DRTM, this need 505 not be the case. 507 DRTM enables SPI space to be leased to end-user networks - with full 508 support for portability, multihoming and inbound TE for all their 509 communications - with the investment and initiative being taken by 510 organisations which may not be ISPs. These are the MABOCs - MAB 511 Operating Companies - who lease the space in each MAB they control to 512 typically thousands to hundreds of thousands separate end-user 513 networks. An SPI-adopting end-user network can run its own ETR on 514 the existing PA space it obtains from each of its one or more ISPs. 515 ISPs need make no investment to allow this to proceed - but they must 516 forward the outgoing packets from these SPI-adopting networks which 517 have SPI source addresses. 519 DRTM removes the need for the Replicators, full-database query 520 servers in ISP networks and "Missing Payload Servers" which are 521 described in the more recent ID: Ivip Fast Payload Replication 523 [I-D.whittle-ivip-fpr]. However, within DRTM, there remains an 524 option to have the caching QSR (Resolving Query Servers) be full 525 database for one, multiple or perhaps all MABs, and to use a small 526 (such as between several nearby ISPs) Replicator system as part of 527 fanning out mapping updates from DITR-sites to these "full-database" 528 Map Resolvers. DRTM does not currently specify how the organisations 529 which run DITR sites reliably and securely deliver the real-time 530 mapping to each such site. This is an internal matter for these 531 organisations and the potentially multiple MABOCs they receive this 532 mapping information from. It is possible that Replicators could be 533 part of these arrangements too. 535 With TTR mobility, the MN (Mobile Node) can be in any access network 536 at all, including behind one or more layers or NAT and including 537 being on SPI space in an end-user network which has adopted SPI 538 space. In all cases, the MN needs no support from the network it is 539 currently connected to, since the MN establishes a two-way tunnel to 540 the TTR and sends its SPI source address outgoing packets to the TTR 541 for forwarding. So TTR mobility is a scalable routing solution which 542 requires no investment or support from ISPs, and in which the 543 initiative and investment comes from TTR Mobility companies, which 544 need not be ISPs. 546 5. Goals 548 5.1. IPv4 and IPv6 550 Ivip is intended to solve the routing scaling problem (as described 551 in the introduction), for IPv4 and IPv6, for very large numbers of 552 end-user networks - where this includes a single MN (Mobile Node) 553 within the definition of "end-user network". 555 Much of Ivip is identical in principle for both Internets. However 556 the mapping information for IPv6 is lengthier and there are other 557 differences, such as in Path MTU Discovery (PMTUD) when encapsulation 558 is used, and in the IPv4 and IPv6 approaches to MHF which remove the 559 need for encapsulation. 561 5.2. Portability, multihoming and TE for billions of end-user networks 563 Ivip is intended to provide scalable address space for billions of 564 end-user networks - for both IPv4 and IPv6. The new kind of address 565 space - SPI (Scalable Provider Independent) space - is suitable for 566 end-user networks to use in a portable fashion, meaning they can keep 567 this space when choosing another ISP for Internet connectivity. 569 There is an assumed upper bound of order 10^7 on the number of non- 570 mobile end-user networks. This is on the basis of a population of 571 10^10 and there being typically no more than one organization per 572 10^3 people which needs portability, multihoming and/or inbound TE 573 enough to invest in a second ISP service and whatever else is 574 required to achieve these goals. Brian Carpenter suggested the same 575 thing (http://www.ietf.org/mail-archive/web/rrg/current/msg05801.html 576 2010-01-27). 578 Given the growing ubiquity of cell-phones and the desire to give them 579 IP connectivity with mobility, including session survival when 580 changing access networks, it is reasonable to assume an upper bound 581 of order 10^10 on the number of "mobile end-user networks". This 582 order 10^10 upper bound has been discussed on in the RRG and no-one 583 has suggested a routing scaling solution with mobility should aim for 584 any greater number of end-user networks. 586 In Ivip's mapping system and in its ITRs, no distinction is made 587 between end-user networks which are mobile or non-mobile, so the 588 total number 10^10 is the upper bound on number of micronets for the 589 Ivip system to handle. In IPv4, since the smallest micronet is a 590 single IPv4 address and there are only 3.7 billion global unicast 591 addresses in total, from which the "edge" SPI addresses can be drawn, 592 it follows that for Ivip in IPv4, there can be no more than probably 593 3 x 10^9 micronets. 595 Portability of the end-user network address space which is used to 596 identify hosts, routers and networks is an absolute requirement of 597 scalable routing. Even if a network could reliably and inexpensively 598 renumber all its hosts and routers, and change all its configuration 599 files which contained such addresses, it would never be able to 600 reliably and securely alter all the other places where these 601 identifying addresses reside in other networks. These includes the 602 use of these addresses in referrals, existing communication sessions, 603 config files of VPNs and hard-coded (however questionably) into 604 firmware and software. Another example of the need for portability 605 is end-user networks which host services for other organisations - 606 typically their customers - in a way that the IP addresses of the 607 network's hosts appear in the DNS zone files of these other 608 organizations. For the network to have to renumber its network, such 609 as to use PA space from another ISP, would require costly, error- 610 prone and carefully timed updates to zone files of all these other 611 organizations. 613 Assuming the end-user network has two or more ISPs, SPI space will 614 also support multihoming and inbound traffic engineering. In the 615 following, "TE" refers to "inbound traffic engineering" - the ability 616 to steer incoming traffic streams between two or more ISPs. 617 (Outbound TE is simply a matter of sending outgoing packets out 618 whichever ISP link is desired.) Ivip's approach to TE differs from 619 that of other CES architectures. It is potentially finer-grained, 620 more flexible and more able to respond to rapid changes in traffic 621 patterns. 623 The goal of scalable routing is to scalably provide portability, 624 multihoming and TE to all networks which want or need it. However, 625 it is reasonable to assume that most home and SOHO networks, and some 626 smaller factory and office networks, will remain happy with the 627 reliability of their single-provider service, and will not concerned 628 about portability when choosing another ISP. 630 A small number of end-user networks will have multiple sites or some 631 other reason to split their SPI space into multiple micronets, but in 632 any realistic scenario involving billions of such networks, the great 633 majority of such networks will be a single site or device, with 634 little or no need for TE or greater address space than a single IPv4 635 address or an IPv6 /64. Therefore, it is reasonable to expect that 636 most of these billions of networks will require only a single 637 micronet of SPI addresses. So, for these scenarios of billions of 638 end-user networks, the total number of separately mapped micronets of 639 SPI address space will be only marginally greater than the number of 640 end-user networks. 642 5.3. Modular separation of the control of mapping from the CES 643 architecture itself 645 Ivip's real-time mapping system means that the tunneling behaviour of 646 all ITRs can be controlled directly. The mapping consists of a 647 single ETR address, so Ivip ITRs do not need to make any choices 648 between multiple ETRs for the purposes of multihoming service 649 restoration or TE. The non-Ivip CES architectures do not provide 650 real-time mapping to ITRs, and therefore need to have the ITRs 651 perform their own multihoming reachability testing and decision- 652 making, to choose which of several ETRs to tunnel packets to. 654 Control of the tunneling behaviour of Ivip ITRs rests entirely 655 outside the Ivip system. It is the responsibility of end-user 656 networks to control this mapping at all times - and many end-user 657 networks are likely to delegate this responsibility to a company they 658 hire for this purpose. Exactly how end-user networks make their 659 decisions about mapping - and how, for instance, a Multihoming 660 Monitoring (MM) company might detect ETR failure, and alter mapping 661 accordingly - is entirely separate from Ivip's mapping system, ITRs 662 and ETRs. 664 This appointment of another organization to control the mapping of 665 one or more of an end-user network's micronets would involve a 666 private, flexible, arrangement between an end-user network and the MM 667 company it hires to continually probe the network's reachability via 668 its two or more ETRs. This means the frequency and type of probing, 669 and the decision-making algorithms, can be completely open-ended and 670 subject to development and customisation - without any constraints or 671 need for changes in the RFCs which define Ivip. With TTR Mobility, 672 the mapping of the micronet which the MN uses would be controlled by 673 the TTR Company, rather than the end-user or the MN itself. 675 This modular separation of the detection and decision-making 676 functions from the CES architecture is good engineering practice and 677 ensures that the Ivip subsystem can be used flexibly, including for 678 purposes not yet anticipated. 680 Other CES techniques monolithically integrate the following functions 681 into the core-edge separation architecture itself - primarily by 682 specifying exactly how all ITRs must behave regarding: reachability 683 testing to ETRs, or of networks through ETRs, or with ETRs reporting 684 reachability of end-user networks to ITRs by some means; multihoming 685 failure detection based on these; decisions about how to choose 686 between ETRs to restore service; and how to implement TE. This would 687 add greatly to the complexity of the system itself, make it harder to 688 introduce new methods of testing reachability etc. and restrict all 689 end-user networks to relying on the necessarily restricted set of 690 functions which can reasonably be built into all ITRs. 692 5.4. Simple ITRs and ETRs with little or no communication between them 694 With encapsulation, the only time ITRs engage in two-way 695 communication is when probing the Path MTU to the ETR, by using a 696 special pair of packets which carry a larger traffic packet than has 697 previously been successfully received by the ETR from this ITR. The 698 ETR then responds to the ITR and the ITR acknowledges this. 700 Apart from this, ITRs do not communicate with anything but their 701 local query servers - directly with their local QSRs (Resolving Query 702 Servers) or indirectly with these, via one or more levels of QSC 703 caching query servers. ETRs do not communicate with any part of the 704 Ivip system except for ITRs, and then only for this PMTUD management 705 function. 707 If MHF is used rather than encapsulation, there is no need for ITRs 708 to communicate with ETRs - so ITRs only communicate with QSCs and 709 QSRs - and ETRs do not communicate at all. 711 Consequently ETRs and ITRs can be simple functions in existing 712 routers or in standalone servers. The ITR function can also be 713 implemented in the sending host (ITFH), though this is not advisable 714 if the sending host is on a slow, unreliable, link such as a wireless 715 link. ETRs must be on conventional global unicast addresses ("core" 716 addresses) - not on SPI ("edge") addresses. ITRs can be on both 717 kinds of address. Ivip may in the future include an option for an 718 ITR or ITFH to set up a two-way persistent tunnel to its one or more 719 local query servers, which would allow an ITR function to be behind 720 one or more layers of NAT. This "tunnel" could be as simple as TCP 721 or SCTP from the ITR, or ITFH, to each query server, with keepalive 722 packets. 724 It is important to make ITRs as simple as possible, in order that 725 they may be inexpensive and therefore, if desired, more numerous - so 726 as to reduce the load on each one. ETRs are simpler than ITRs, since 727 they simply decapsulate packets with a comparison between outer and 728 inner source addresses and do not look up or cache mapping 729 information. 731 Ivip with encapsulation uses simple IP-in-IP encapsulation. There is 732 no special header and no other data piggybacked onto traffic packets. 733 This minimises encapsulation overhead and reduces the complexity of 734 both ITRs and ETRs. Other CES architectures use their own headers to 735 carry extra information with each traffic packet, with that header 736 behind a UDP header. These other architectures also require ITRs to 737 determine reachability to multiple ETRs. 739 5.5. Maximise the flexibility with which ITRs and ETRs can be located 741 Ivip ITRs can be located in the sending host, in the sending-host's 742 end-user network (which may be an ISP network or an end-user network 743 using either SPI or conventional PI space) or in the ISP network 744 which the host's end-user network connects to the Net through. If 745 there is no such ITR, the packet will enter the DFZ and be forwarded 746 to the nearest (in BGP terms) DITR (Default ITR in the DFZ, 747 previously known as OITRD for Open ITR in the DFZ). 749 ETRs can be located in ISP networks with a link to each end-user 750 network they serve. ETRs can also be located at the end-user network 751 end of a link from an ISP, and so be physically located at the end- 752 use site. In both cases, their address must be a conventional "core" 753 global unicast address (usually from one of the ISP's prefixes) - not 754 an SPI ("edge") address or behind NAT. 756 5.6. Mobility 758 One of Ivip's goals is to support mass adoption of IP mobility, since 759 this will surely a major facet of the future of Internet 760 communications. It would make no sense to introduce one set of 761 architectural changes to solve the routing scaling problem as it 762 appears today, and then have to devise and introduce a second set to 763 provide for billions of mobile devices. 765 Ivip is a good basis for the TTR approach to mobility, and would be 766 attractive to deploy for this reason alone. 768 It is frequently assumed that in order for a CES architecture to 769 support mobility, the Mobile Node (MN) must be its own ETR. LISP-MN 770 makes this assumption. So does draft-jen-mapping-00 771 [I-D.jen-mapping] - a critique of which is [Critique of 772 draft-jen-mapping-00]. 774 TTR mobility does not involve mapping changes every time the MN gains 775 a new physical address, since it continues to use the same one or 776 more TTRs as its one or more ETRs. Mapping changes are needed when 777 the MN uses a new TTR. This is desirable after the MN moves a large 778 distance, such as 1000km or more, but it not absolutely needed. An 779 MN can still work with a TTR which is on the other side of the world 780 - albeit with longer latency and greater chance of packet loss. 782 Although the TTR approach to mobility could be used with other CES 783 architectures, Ivip is a better basis for TTR mobility than other CES 784 architectures such as LISP. None of these other proposals provide a 785 method of ITRs gaining updated mapping within a few seconds, as Ivip 786 does. With Ivip's real-time mapping system, the Mobile Node (MN) can 787 begin using a new, nearby, TTR within seconds and, most importantly, 788 within a few seconds no ITR will be tunneling packets to the 789 previous, and now more distant, TTR. Therefore the MN can promptly 790 end the tunnel to the previous TTR and use the new TTR exclusively. 791 Without this real-time mapping, the MN would need to retain tunnels 792 to one or more previous TTRs for as long as the mapping system takes 793 to ensure no ITRs are tunneling packets to them. This might take 10 794 to 30 minutes or more for the non-Ivip CES architectures. 796 TTR Mobility is not required to solve today's routing scaling 797 problem. It may be regarded as separate to Ivip, because it could be 798 used with other CES architectures. However, it is best to consider 799 TTR Mobility as a natural extension of the basic Ivip architecture, 800 which does not place any constraints on the basic architecture other 801 than that its mapping system will need to scale to billions of 802 (mostly mobile, handheld device) end-user networks. 804 5.7. Elimination of encapsulation and PMTUD problems 806 When ITRs use encapsulation to tunnel traffic packets to ITRs, there 807 are serious problems with Path MTU Discovery (PMTUD) for the sending 808 host. If the packet with its encapsulation header is too long for 809 the next hop link of a router between the ITR and ETR, then there 810 needs to be a mechanism by which the sending host receives a valid 811 ICMP Packet Too Big message, with an MTU value which will result in 812 an encapsulated packet of the correct length. The PTB generated by 813 the router in the tunnel path will not be suitable for the sending 814 host. 816 It is challenging to solve this problem securely and without 817 unreasonable amounts of state in the ITR. Ivip's solution - ITR 818 Probes Path MTU [PMTUD-Frag] - involves extra complexity and state in 819 ITRs and to a lesser extent in ETRs. This, and the transmission 820 overhead of the encapsulation header (particularly heavy with IPv6 821 VoIP packets) makes it desirable to either avoid encapsulation 822 entirely, or to introduce Ivip with encapsulation, but in the long- 823 term change to an alternative system which lacks these problems. 825 Ivip has two techniques, known collectively as Modified Header 826 Forwarding (MHF) which replace encapsulation as the ITR to ETR 827 tunneling technique. They are: 829 1. ETR Address Forwarding (EAF) - for IPv4. 830 [I-D.whittle-ivip-etr-addr-forw] 832 2. Prefix Label Forwarding (PLF) - for IPv6. [PLF for IPv6]. 834 If Ivip is introduced with encapsulation, all ITRs and ETRs should be 835 capable of supporting MHF. At some date in the future, the DFZ 836 routers will be upgraded to support this, probably without any 837 significant cost. 839 Ideally, it would be possible to establish Ivip from the outset 840 without encapsulation. This would save having to develop the more 841 complex ITR and ETR functions required by encapsulation - especially 842 the PMTUD functionality. It would also eliminate the need to design 843 a transition arrangement. 845 I have not been able to reliably determine what proportion of current 846 DFZ routers have firmware-based FIBs. Any such router could be 847 upgraded with a firmware update in order to support MHF. As the 848 years pass, there is an increasing probability that that most or 849 essentially all DFZ routers could be upgraded in this way, for very 850 little cost. Initial deployment with MHF is a goal, with the 851 alternative goal being eventual transition to MHF. 853 For any near-term introduction of Ivip, such as to introduce TTR 854 mobility services or simply to provide SPI space to non-mobile end- 855 user networks, the organizations initiating these services will be 856 unable to have all or perhaps any DFZ routers upgraded in time to 857 start their services. Since these services, especially TTR mobility 858 services, appear to be commercially attractive in the near-term, the 859 most likely outcome is that Ivip will be introduced with 860 encapsulation. If so, it is vital that all ITR and ETR software be 861 updatable so that a future transition to MHF can be performed 862 reliably and completely. 864 MHF involves some restrictions on the location of ETRs. For IPv4, 865 only 30 bits are available for specifying the ETR address. However, 866 an alternative which I have not yet fully explored is to define a new 867 protocol type with its own header to replace the IPv4 header. In the 868 new header at least 31 bits could be found - and probably 32. If 32 869 could be found , then the following paragraph would become 870 irrelevant. 872 This 30 bit MHF ETR address forwarding arrangement is incompatible 873 with the initially desirable arrangement where any "core" address can 874 be used for an ETR. There is further work to do on this problem - 875 but the solution is probably to avoid it with a new header format as 876 noted above. If a large number of end-user networks established 877 their ETRs on a variety of addresses, such as the IP addresses of 878 their existing single PA address services, then it may not be 879 possible to have them alter these addresses in time for the 880 transition to MHF. For instance, in an extreme case, four separate 881 end-user networks may run four separate ETRs on four contiguous 882 addresses 11.22.33.16, 11.22.33.17, 11.22.33.18 and 11.22.33.19. Yet 883 the current 30 bit IPv4 MHF technique can only tunnel packets to 884 addresses specified with 30 bit precision - which covers all four 885 addresses. A workaround would be for a router at this ISP to perform 886 a second lookup on the destination address of these tunnelled packets 887 and to forward them to the correct service directly. 889 5.8. No requirement for new host functionality 891 It is a primary goal not to require any new host functionality - in 892 stacks or applications. However, as an option, the ITR function can 893 be integrated into sending hosts when this is desired. 895 Mobile hosts using the TTR Mobility approach will have a little extra 896 functionality, which could be implemented in the stack or perhaps 897 outside it, as a separate piece of software. The IP stack itself and 898 all applications remain unchanged and communicate with all other 899 hosts, mobile or not, using current IPv4 and IPv6 protocols and 900 addressing. 902 One reason for avoiding the need for new host functionality is to 903 enable the system to be widely enough adopted to solve the routing 904 scaling problem, given the constraints imposed by the need for 905 voluntary adoption. [Constraints-Voluntary] 907 Another more fundamental reason is to ensure there is no extra burden 908 on hosts, which would be particularly a problem for hosts which are 909 on slow, expensive and unreliable links. This includes hosts on 3G 910 wireless links - and in the foreseeable future it is reasonable to 911 expect this to be true of the majority of hosts. 913 While many people are attracted to the idea of hosts doing more, and 914 leaving the network to be simple, there are objections to this. I 915 intend to write these up as an ID, but for now they are on a web-page 916 and in RRG discussions. [Host-Responsibilities] See also 917 http://www.ietf.org/mail-archive/web/rrg/current/msg06162.html 918 ("Recommendation suggestion from RW" 2010-03-04). 920 In summary, it is highly undesirable for a new architecture to 921 require all hosts to do more routing and addressing management than 922 they currently do: just DNS lookups. The delays which are inherent 923 any such arrangement are highly undesirable and the way these delays 924 are worsened by one or both hosts being on high latency, unreliable, 925 wireless links is particularly objectionable. Also, it is desirable 926 not to enforce extra complexity or communication requirements on all 927 hosts, since many of them will be constrained by battery power 928 limitations. 930 5.9. Full benefits to all adopters irrespective of level of adoption 932 Ivip provides the full benefits of portability, multihoming and 933 inbound TE to all end-user networks which adopt its SPI space. 935 In order to do this, packets from hosts in networks which lack ITRs 936 must be forwarded to an ITR and tunneled to the correct ETR. 938 This is achieved by placing a number of ITRs in the DFZ. These are 939 known as DITRs (Default ITRS in the DFZ) and were previously known as 940 OITRDs (Open ITRs in the DFZ). When Ivip was first announced 941 [Ivip-2007-06-15] these were named (erroneously): "Anycast ITRs in 942 the DFZ". By placing DITRs widely around the Net, path lengths from 943 any sending host to the ETR are minimised. 945 LISP Proxy Tunnel Routers (PTRs) perform the same function. 946 [I-D.lewis-lisp-interworking] 948 For a scalable routing solution to be widely enough adopted, it must 949 provide compelling benefits to all adaptors, including the earliest. 950 Without DITRs, PTRs or their equivalent, only a small fraction of 951 packets being sent to an end-user network would use the new system - 952 those sent in networks with ITRs. Yet the goal is for all adopters 953 to use the new form of addressing entirely, and so not to have to use 954 the existing unscaleable "advertise PI prefixes in the DFZ" approach 955 to portability, multihoming and TE. 957 5.10. Business incentives to deploy new infrastructure 959 Some scalable routing proposals involve no additions to the network - 960 just the adoption of new functionality in the end-user networks which 961 use it. These are generally "Core-Edge Elimination" (CES) 962 architectures. [C-E-Sep-Elim] 964 No such proposal meets the constraints imposed by the need for 965 widespread voluntary adoption. Firstly, most or all of them involve 966 changes to host stacks and applications, which is impractical in the 967 absence of compelling motivation for the authors of this software to 968 make such major changes. Secondly all such proposals only provide 969 portability, multihoming and TE benefits for packets sent from other 970 networks which have adopted the scheme. Therefore, only if all 971 networks adopted it would any one network be able to abandon its 972 current routing and addressing arrangements. The benefits of 973 scalable routing in a global sense, and for each adopter, the 974 abandonment of unscaleable alternative routing and addressing 975 arrangements, are only achieved after full (or almost full) adoption 976 by all networks. Yet there is insufficient direct incentive for 977 early adopters for even a fraction of networks to adopt it. 979 A CES architecture with DITRs, PTRs or some equivalent functionality 980 provides full benefits to all adopters, and so is capable of being 981 widely enough adopted to solve the routing scaling problem. Scalable 982 routing benefits accrue in direct proportion to the number of 983 adopting networks. The problem can be substantially solved by 984 widespread adoption. Complete adoption is desirable, but not at all 985 required. 987 CES architectures do not require any changes to hosts - to stacks or 988 applications. They do however involve the creation of at least two 989 items of infrastructure which are typically global in reach, before 990 any end-user network can use the system. Before DRTM, Ivip required 991 a single coordinated global mapping distribution system, though the 992 DITR systems could be operated by or for particular MABOCs (MAB 993 Operating Companies) and need not cope with all the MABS in the Ivip 994 system. Now, with DRTM, there is no need for a single global mapping 995 distribution system. There will be multiple such systems, each 996 handling a subset of the MABs. 998 Ivip's technical structure lends itself to business models in which 999 those who construct and run these two types of infrastructure can do 1000 so on a potentially profitable basis, by charging end-user networks 1001 according to the use they make of the mapping system and of the 1002 DITRs. The DRTM arrangements [I-D.whittle-ivip-drtm] involve MABOCs 1003 (MAB Operating Companies) or TTR mobility companies establishing and 1004 running (or contracting other organizations to establish and run) 1005 multiple DITR-sites. At each DITR-site there are DITRs and QSAs 1006 supporting the subset of MABs which are run by the one or more MABOCs 1007 the DITR-site is run for. 1009 Please see the DRTM ID for more information on how this system can 1010 develop without direct investment by ISPs, with MABOCs taking the 1011 initiative and making the investment in reaching out with DITR-sites 1012 to sending hosts in networks without ITRs, and with QSAs at those 1013 sites to help nearby ISPs run their own ITRs and QSRs. 1015 5.11. Maintenance of existing levels of security and robustness 1017 All scalable routing schemes complexify the Internet - so it is 1018 unlikely that the goals of not degrading security and robustness to 1019 any degree can be fully realized. Only once Ivip is fully designed 1020 and carefully analysed can there be a realistic estimation of the 1021 security and robustness problems it will entail. 1023 It is a goal of Ivip to minimise and ideally to eliminate any such 1024 degradation. 1026 Ivip's approach to handling the PMTUD problems inherent in 1027 encapsulation is intended to be secure against attacks - such as from 1028 spoofed ICMP Packet Too Big messages. 1030 Ivip is the only CES architecture to provide an inexpensive method of 1031 ETRs enforcing the source address filtering ISPs may impose on 1032 packets arriving at their Border Routers (BRs). Such filtering is 1033 imposed to prevent outside attackers spoofing the address of any host 1034 inside the ISP's network - and includes dropping packets with private 1035 (RFC 1918) source addresses. 1037 This is achieved by the simple arrangement of the ITR using the 1038 sending host's address as the outer header source address in all the 1039 encapsulated packets in the tunnel to the ETR. ETRs simply compare 1040 the inner source address with the outer, and drop any decapsulated 1041 packets where the two differ. (With encapsulation, when the ITR 1042 occasionally probes the PMTU to an ETR, it sends an additional packet 1043 with the source address being that of the ITR, but this does not 1044 alter the ETR's ability to enforce BR source address filtering.) 1046 This also works well with packets tunneled from ITRs inside the ISP 1047 network. Please see the section "ETR support for ISP border router 1048 source address filtering" in "Recommendation suggestion from RW" 1049 (http://www.ietf.org/mail-archive/web/rrg/current/msg06162.html or 1050 any later version of this) for a discussion of why it appears to be 1051 impossible for LISP ETRs to enforce this BR source address filtering. 1053 This approach - of the ETR dropping inner packets whose source 1054 address does not match the source address in the outer header - is 1055 only for encapsulation. When MHF is used, there is no need for ETRs 1056 to perform any such task, since the original packet is sent across 1057 the DFZ, with the sending host's source address in the IP header - so 1058 BR filtering occurs normally and the ETR never receives a packet 1059 which violates these filtering rules. 1061 5.12. Avoiding the need for any one server to store or receive the 1062 complete mapping database 1064 With DRTM, QSAs store the complete mapping database for one or 1065 typically many MABs, and so require real-time feeds of mapping 1066 updates for those MABs. At boot time, they need to be able to 1067 download snapshots of the databases and bring that information up-to- 1068 date with the updates sent since the snapshot was made, before the 1069 database can be used to answer mapping queries. The same procedure 1070 would be executed if the QSA ever lost sync with the feed of mapping 1071 updates. 1073 However, there is no requirement that any one QSA handle all the 1074 MABs. There is no prohibition of this - for instance if a DITR-site 1075 handles every MAB in the Ivip system, this will be perfectly 1076 allowable. Its just that the system is intended to work with 1077 multiple sets of DITR-sites, with the DITR-sites of each set handling 1078 a subset of the MABs. To whatever extent there are scaling limits to 1079 the number of micronets a DITR-site and its one or more DITRs and 1080 QSAs can handle, this does not pose a problem for the scaling of the 1081 entire Ivip system, since the total load can be handled by multiple 1082 such DITR-sites. QSRs can handle many sets of DITR sites - so there 1083 is no obvious limit to the scaling of the entire system. 1085 Before DRTM, each ISP with ITRs had to install two or more "QSDs" 1086 (full database query servers - the term is no longer part of Ivip). 1087 These were full-database for all MABs and so required real-time feeds 1088 of all mapping updates for all MABs. This presented a scaling 1089 problem and an unfair burden on the ISP if its customers rarely or 1090 never sent packets to micronets for which a large number of updates 1091 were sent, or never sent packets to whole MABs which the QSD still 1092 had to store and receive updates for. (These statements about ISPs 1093 also apply to any end-user network with ITRs which chooses to install 1094 its own QSR, or previously QSD, rather than use those of its one or 1095 more ISPs.) 1097 With DRTM, QSDs are replaced by QSRs - caching Resolving Query 1098 Servers. So there is no need for ISPs to maintain a server which is 1099 full-database for any MAB. This greatly reduces scaling problems. 1100 It will remain an option for a QSR to be full-database for one or 1101 more MABs - and in principle for it to be full-database for all MABs, 1102 in which case it would function just like the now-obsolete QSD. 1103 However, AFAIK, there will be no need to do this - since caching-only 1104 QSRs should scale well and cope with the largest imaginable numbers 1105 of micronets. 1107 5.13. Eliminating unfair burdens 1109 Prior to DRTM, Ivip had a "non-goal" of eliminating unfair burdens. 1110 This was because with full-database QSDs (as discussed above) it 1111 could not be ruled out that an ISP would face expenses running its 1112 one or more QSDs which in part depended on there being some large 1113 number of micronets, or large number of changes to micronets, which 1114 the ISP never gained any benefit from - because these did not affect 1115 packets sent by its customers. 1117 This unhappy situation is no longer a part of Ivip. 1119 Ivip's goal is to eliminate "unfair burdens", but no scalable routing 1120 system is likely to achieve this entirely. 1122 An example of an "unfair burden" which remains with DRTM is that each 1123 QSR needs to automatically discover two or more typically "nearby" 1124 QSAs for every MAB in the Ivip system. Yet perhaps the QSR and the 1125 ITRs which depend upon it will never send packets to some of these 1126 MABs. This is unfair, but it is much less of a problem than before 1127 DRTM, where the QSD would need to store all the micronets of such 1128 MABs and receive all the updates to them as well. 1130 Ivip's goal is to minimise unfair burdens and to eliminate them where 1131 possible. It should be able to achieve a huge improvement over the 1132 problem which lies at the heart of today's routing scaling problem - 1133 the unfair burden imposed on all DFZ router operators by the addition 1134 of each PI prefix by any end-user network in the world which is able 1135 to obtain the space and advertise it in the DFZ. 1137 6. Non-goals 1139 6.1. Isolation between core and edge networks is not required 1141 At least one CES architecture - APT (which is no longer being 1142 developed) - appeared to have a goal of completely separating (really 1143 "isolating") core networks from edge networks. In this scenario, 1144 only ISPs would have core addresses and all end-user networks (or 1145 perhaps all end-user networks which needed portability, multihoming 1146 and TE) had edge addresses. Then, in theory, it would be possible to 1147 prevent any host in an edge network from sending packets to the core 1148 - which was supposed to provide some security benefits. 1150 Ivip has no such goal. For a discussion of my attempt to understand 1151 this aspect of APT, and how this may have affected the ways in which 1152 some people use and think about the term "Core-Edge Separation", 1153 please see: "CES & CEE: GLI-Split; GSE, Six/One Router; 2008 sep./ 1154 elim. paper (v3)" 1155 (http://www.ietf.org/mail-archive/web/rrg/current/msg06110.html 2010- 1156 02-24, or any later version). 1158 6.2. Full adoption not required 1160 Ivip does not rely for its benefits (improvements to routing 1161 scalability, or the benefits for end-user networks) on complete 1162 adoption of SPI (edge) space by all end-user networks, or by the 1163 subset of them which want or need portability, multihoming and TE. 1165 Ideally, for scalability, the only prefixes advertised in the DFZ 1166 would be those of ISPs (including those used to serve many end-user 1167 networks with PA space) and the relatively small number of prefixes 1168 which encompass the SPI space. "Relatively small" is in comparison 1169 to the very large number of micronets these prefixes contain and to 1170 the likewise very large numbers of end-user networks which are using 1171 this SPI space. 1173 The full benefits for end-user networks which adopt SPI space - 1174 portability, multihoming and TE - do not depend at all on how many 1175 other end-user networks adopt SPI space. 1177 The benefit of routing scalability depends on how many end-user 1178 networks which need or want portability, multihoming and TE actually 1179 do adopt SPI space, rather than the two undesirable alternatives of 1180 either not getting these benefits, or getting them by the unscaleable 1181 method of advertising conventional PI prefixes in the DFZ. 1183 In order to maximise routing scalability, the more end-user networks 1184 which adopt SPI space, the better. But there is no need or intention 1185 to have them all adopt it. 1187 A satisfactory outcome for scalable routing would be for some or many 1188 of the end-user networks which currently advertise PI prefixes in the 1189 DFZ to continue doing so - and for the great majority of all other 1190 end-user networks which want or need portability, multihoming and TE 1191 to use SPI space instead. 1193 6.3. Mapping changes need not be free of financial cost 1195 It appears that the designers of other CES architectures have a goal 1196 of mapping changes being free of financial cost. This is not a goal 1197 of Ivip. 1199 Ivip is the only CES architecture to contemplate or assume that 1200 mapping changes will be paid for - by the end-user network whose 1201 micronet of SPI space the mapping applies to. All other proposals 1202 avoid financial costs such as this. 1204 In the case of the global query server systems - LISP-CONS, LISP-ALT 1205 and TRRP there is no need for payment, since changing the mapping has 1206 no direct impact beyond the authoritative query server(s) in which 1207 the mapping is changed. (Unless there are provisions for sending 1208 mapping changes to particular ITRs which might need it, which may be 1209 a part of LISP.) 1211 Ivip's arrangement for charging end-users for each mapping change, 1212 and for each change to the way their SPI space is divided into 1213 micronets, is intended to achieve two outcomes. 1215 Firstly, the payment - which goes to the MABOC - helps the MABOC 1216 cover its costs of maintaining multiple DITR-sites, each with their 1217 QSA authoritative query servers. Each such change involves data 1218 transmission to these sites and may involve QSAs sending Cache Update 1219 commands to queriers (QSRs) to which mapping for the micronet has 1220 "recently" been sent in a map reply message, or in a Cache Update 1221 message. This is fully described in the DRTM ID. 1223 It is also vaguely possible that if there are really large numbers of 1224 updates, ISPs and other networks with ITRs and QSRs may object to 1225 handling all these Cache Updates without some payment by the MABOC 1226 from whose QSAs they are sent. So it is vaguely possible that MABOCs 1227 may need to use some of these fees to encourage ISPs to accept these 1228 Cache Updates. Such frequent updates are most likely to arise from 1229 end-user networks doing short-timescale inbound TE changes - and they 1230 will do this as long as the cost of the mapping changes is lower than 1231 the benefit they derive from the inbound TE, which may be 1232 substantial. 1234 Secondly, this fee per mapping change inhibits end-user networks from 1235 making so many mapping changes unless they have a suitably strong 1236 reason to do so. This will lighten the load on the MABOC's systems, 1237 including especially the DITRs and QSAs it either runs, or pays 1238 another organization to run. 1240 The cost of changes should be low enough to be a trivial issue in the 1241 rare events of multihoming service restoration and portability to 1242 another ISP. The cost should also be low enough to make reasonably 1243 frequent changes for TE attractive, when it allows significantly 1244 better utilization of multiple links to ISPs. It should also be low 1245 enough to present no problems for TTR Mobility, whenever mapping 1246 changes due to the MN moving more than about 1000km. 1248 6.4. No attempt to cope with partially reachable ETRs 1250 Ivip's use of a single ETR address in the mapping is different from 1251 the use of multiple ETR addresses in the mapping information of all 1252 other CES architectures. This gives rise to a potential benefit of 1253 those other schemes which is not a goal of Ivip. 1255 Ivip ITRs all over the Net tunnel packets which are addressed to any 1256 particular micronet to a single ETR at any one time. (This is 1257 ignoring perhaps a second or less when the mapping is changed, and 1258 some ITRs receive the Cache Update message from their QSC or QSR 1259 query server earlier than others.) It is up to the multihoming end- 1260 user network to ensure that the mapping changes in a manner which 1261 maximises the connectivity of its network during a multihoming 1262 service restoration event. 1264 For instance, an end-user network has two ISPs ISP-A and ISP-B, and 1265 can map its one or more micronets to either ETR-A or ETR-B. Whether 1266 the ETRs are in the ISP or at the end-user site is not important. 1267 ETR-A's connection to the rest of the Net is via ISP-A and ETR-B's is 1268 via ISP-B. In this example, only one micronet is considered, but the 1269 same principles apply with multiple micronets. 1271 When both ISPs and ETRs are working well - that is to say when the 1272 end-user network is reachable via both ETRs - the end-user network 1273 may have the mapping set to ETR-A. If an external monitoring company 1274 (contracted by the end-user network) detects that the end-user 1275 network is no longer reachable via ETR-A, then it will issue a 1276 mapping change so that the micronet is mapped to ETR-B instead. As 1277 long as ETR-B is connected to the end-user network and is reachable 1278 from any router in the DFZ, then this is a perfectly good outcome: 1279 full connectivity is restored within a few seconds of the mapping 1280 change being issued. 1282 However, if ETR-B is unreachable from some subset of the DFZ routers 1283 (and therefore from a subset of sending hosts in end-user and ISP 1284 networks) AND this subset of DFZ routers can reach the end-user 1285 network via ETR-A, then Ivip cannot ensure complete connectivity, 1286 since the end-user network is not reachable to all hosts in all 1287 networks through just one ETR or the other. (Actually, practical 1288 connectivity only concerns the fraction of DFZ routers and other 1289 networks with hosts which are currently sending packets to this end- 1290 user network - but the ideal is that the end-user network is always 1291 reachable from all other networks.) 1293 Other CES architectures such as LISP have a potential advantage in 1294 this scenario, since it is possible that all the ITRs which are 1295 currently sending packets may be able to discern the reachability of 1296 the two ETRs (or, if LISP is ever able to do this: determine the 1297 reachability of the end-user network through the two ETRs) and adapt 1298 their tunneling by choosing an ETR which enables the packets to get 1299 to the end-user network. In this circumstance, the non-Ivip CES 1300 architectures would be able to restore full connectivity when Ivip 1301 could not. 1303 However, this set of circumstances - both ETRs being partially 1304 reachable and the patterns of reachability being complementary so 1305 from anywhere in the Net, at least one was reachable - is likely to 1306 be a transient state, since the DFZ routers will rapidly adapt their 1307 best-paths to restore full connectivity to both ISPs and their ETRs. 1308 Also, it cannot be assured or assumed that the non-Ivip ITRs would 1309 choose the reachable ETR fast enough to take advantage of such a 1310 situation. 1312 Nonetheless, it is possible that a non-Ivip ITR may be able to detect 1313 non-reachability of a particular ETR when the Ivip approach would 1314 not. This is because with Ivip, multihomed end-user networks will 1315 typically contract another company to continually probe the 1316 reachability of their network through their two or more ETRs - and 1317 that company will do so from a finite number of servers in particular 1318 parts of the Net. There may be an outage affecting ITRs which are 1319 handling packets addressed to this end-user network which does not 1320 affect the set of servers the multihoming monitoring company is using 1321 - so that company will not detect the problem affecting these traffic 1322 handling ITRs. In that case, the non-Ivip approach would be superior 1323 - if the non-Ivip ITR could detect the outage and correctly chose 1324 another ETR through which the end-user network was reachable. 1326 With Ivip, end-user networks will be able to choose between many 1327 Multihoming Monitoring (MM) companies and each company would have a 1328 range of options for how frequent the reachability probing occurs, 1329 how many servers in the DFZ are used to probe the path via each ETR 1330 and how decisions should be made if there appears to be a 1331 reachability problem. A MM company with probing servers scattered 1332 widely around the Net should be able to detect most reachability 1333 problems experienced by in any part of the DFZ, but it can't 1334 necessarily detect every one. How the MM company decides which 1335 outages to respond to, with a mapping change, is a matter for the 1336 company and the end-user network to decide. 1338 Ivip's external, user-supplied, detection of reachability problems 1339 and creation of mapping changes can be the subject of ongoing 1340 innovation and choice, with the intention that it be more effective 1341 at restoring full connectivity than the individual, isolated, efforts 1342 of non-Ivip ITRs - which have a difficult task reliably and 1343 inexpensively testing reachability of the end-user network via 1344 various ETRs. This is particularly the case if tens or hundreds of 1345 thousands of ITRs are tunneling to one ETR. Such non-Ivip ITRs may 1346 not actually probe reachability of ETRs with ping or the like, but 1347 rely on ICMP messages due to traffic packets not reaching the ETR. A 1348 difficulty with this (again for non-Ivip ETRs) is that ICMP messages 1349 may be lost or may not always be generated if there is an outage. 1350 Furthermore, it would be costly for these ITRs to be able to securely 1351 distinguish genuine ICMP messages from spoofed ICMP messages. 1353 6.5. No attempt to mix IPv4 and IPv6 1355 Ivip for IPv4 is intended to be a free-standing system completely 1356 independent of Ivip for IPv6. An IPv4 ITR could be implemented in 1357 the same server or router as an IPv6 ITR - just as ITR, ETR and query 1358 server functions could be performed in the one device. 1360 Likewise, the DITR-site systems of DITRs, QSAs and the mapping 1361 distribution systems inside each system or DITR-sites for IPv4 and 1362 IPv6 are intended to be separate and independent - but there's 1363 nothing to prevent one server being used for both the IPv4 and IPv6 1364 systems. 1366 6.6. Not Locator - Identifier Separation 1368 There is considerable terminological inexactitude regarding the use 1369 of the term "Loc/ID Separation". True Locator - Identifier 1370 separation involves hosts handling packets using two objects of 1371 different types, usually called Locator and Identifier, which 1372 therefore are in different namespaces. The Locator is usually 1373 regarded as an "address" but the Identifier is not. 1375 If both types of object are numeric and a Locator and an Identifier 1376 were numerically identical they would refer to different things 1377 because this numeric value has different meanings in each namespace. 1379 Further discussion of the meaning of "namespace" is at: [Namespace] . 1381 HIP and ILNP [I-D.rja-ilnp-intro] are examples of Locator / 1382 Identifier Separation. LISP (Locator/Identifier Separation 1383 Protocol), Ivip, APT, TRRP and TIDR are not. 1385 An architecture which uses FQDNs as Identifiers and IP addresses 1386 (always PI, to ensure scalability) as Locators is also an example of 1387 true Loc/ID separation - for instance Name-Based Sockets [Vogt-2009]. 1389 LISP, Ivip and other CES architectures do not present hosts with 1390 separate Locator and Identifier addresses. The host sees only IP 1391 addresses, which perform both functions simultaneously - just as they 1392 do without Core-Edge Separation. ITRs are the only devices which 1393 treat packets differently if their destination address is in the 1394 "edge" subset of the global unicast address range. 1396 The full arguments about why Core-Edge Separation cannot correctly be 1397 construed as "Locator / Identifier Separation" are at: 1398 [loc-id-sep-vs-ces]. For further discussion and why LISP is 1399 misnamed, please see the following RRG messages from early 2010: 1400 msg05864, msg05865, msg06110 and msg06190. 1402 7. Architectural Choices 1404 7.1. Core-Edge Separation rather than Elimination 1406 7.1.1. Core-Edge Elimination (CEE) architectures 1408 Core-Edge Elimination (CEE) involves hosts dealing with two kinds of 1409 entity for dealing with other hosts and to write into packet headers 1410 in order that they will get to their desired destination: Identifiers 1411 and Locators. The simplest adaptation of existing protocols is to 1412 retain IP addresses as Locator addresses and develop a separate 1413 namespace for the Identifier addresses. Some CEE architectures only 1414 modify the stack of each host, and use unmodified IPv6 applications. 1415 Other require modified stacks and applications. 1417 Each host retains its one or more Identifiers, no matter which one or 1418 more Locator addresses it is using. The Locator addresses are global 1419 unicast addresses which are supplied by ISPs as PI space. The 1420 simplest form of multihomed end-user network would gain a PI prefix 1421 from each of its ISPs and each of its hosts would use one address 1422 from each prefix as a Locator address. Each such prefix is part of a 1423 larger (in terms of number of addresses - shorter in terms of prefix 1424 length) prefix the ISP advertises in the DFZ. The ISP can split one 1425 such advertised prefix into many smaller (longer) prefixes for 1426 multiple end-user networks. This solves the routing scaling problem 1427 because the total number of large (short) prefixes advertised by all 1428 ISPs is scalable, whereas - if not for the CEE architecture - the 1429 number of PI prefixes advertised in the DFZ by multihoming end-user 1430 networks would be an unacceptable burden on all DFZ routers and on 1431 the entire DFZ BGP control plane. 1433 Applications connect to other hosts solely in terms of their 1434 Identifier addresses. It is the task of each host's stack (or 1435 perhaps its applications) to adapt to changes in other hosts' 1436 Locators, and to inform other hosts which need to know about this 1437 host's changed Locators. The Identifier may be numeric or have some 1438 other form, and there is typically a DNS mapping from FQDNs to one or 1439 more Identifier addresses, just as there are to IP addresses today. 1441 Some key points about Core-Edge Elimination architectures include: 1443 1. Identifiers are from a completely different namespace than 1444 Locators. If both are numeric, and a Locator is numerically 1445 equal to an Identifier, there can be no confusion about the 1446 separate entities each refers to, since the Identifier is 1447 interpreted in a different namespace from that used for 1448 Locators. Therefore, if IP addresses are used as Locators, IP 1449 addresses cannot be used as Identifiers. 1451 2. Host stacks are responsible for choosing which of a 1452 correspondent host's Locators to send a packet to. This work 1453 is not done by network elements, such as routers. (However 1454 some CEE architectures may have routers alter part or all of 1455 the outgoing destination address, or perhaps source address, 1456 to exert-network centric control over traffic flows.) 1458 3. While there is typically a global, decentralised mapping 1459 system by which hosts can use another host's Identifier 1460 (perhaps in combination with one of its Locators) to look up 1461 that host's complete set of one or more Locators, the network 1462 itself remains simple and hosts take on more responsibilities 1463 than they have with existing IP protocols. This is regarded 1464 as a virtue by many people, and represents an extension of 1465 TCP/IPs "dumb network, smart end-points" approach, especially 1466 when compared to the telephone network. 1468 4. Since applications need to work with a different kind of 1469 address element than an IP address for establishing and 1470 maintaining communications with other hosts, the host stack, 1471 its API and applications themselves need to be substantially 1472 rewritten in order to be able to work with a CEE architecture 1473 - unless the system supports unmodified applications in some 1474 way. 1476 5. While it may be possible to slowly introduce such an 1477 architecture, the benefits of portability, multihoming and TE 1478 only apply to packets sent between hosts using the new system 1479 - so substantial benefits to adopters only occur when all, or 1480 essentially all, hosts have been upgraded to the new system. 1482 6. CEE architectures are subject to the critique that the extra 1483 management packets which hosts must send and receive as part 1484 of the new system is likely to create extra costs, delays 1485 and/or unreliability compared to current IP techniques. 1487 7. This critique can be extended to argue that mobile hosts, due 1488 to their typically slow, not-necessarily reliable and 1489 potentially costly wireless links are especially impacted by 1490 these new responsibilities. 1492 8. Core-Edge Elimination architectures typically do not apply to 1493 IPv4 and so are based on IPv6 or on entirely new arrangements. 1494 If CEE was used for IPv4, it would not be practical due to the 1495 inherent inefficiency of its use of global unicast address 1496 space. In IPv4, any end-user network which needs a /24 of 1497 address space for its hosts would require a /24 from each of 1498 its upstream ISPs. So all multihomed end-user networks would 1499 consume at least twice the space they need - which is not 1500 practical with IPv4's address shortage. 1502 Points 4 and 5 constitute insurmountable barriers to the adoption of 1503 CEE architectures, since adoption must be very widespread, within a 1504 period of years, rather than decades, and since adoption must occur 1505 on a voluntary basis. [Constraints-Voluntary] 1507 Point 6 is an argument that while CEE architectures are theoretically 1508 elegant and simple, the facts of delay and loss of packets across 1509 global query server systems such as DNS - or whatever mapping system 1510 is used to securely determine the full set of Locators which can be 1511 used for a host with a given Identity - will contribute to delays in 1512 sending application packets. (All CEE architectures to date involve 1513 global query server systems with just one or a few authoritative 1514 query servers. None involve "nearby" or "local" authoritative query 1515 servers, which is the only way to avoid excessive delays and risks of 1516 packet loss.) 1518 Also, if the two hosts have to exchange management packets with each 1519 other, for authentication purposes, before any application packets 1520 can be sent, then this will slow down the establishment of 1521 communications - especially if the hosts are far apart, on high 1522 latency links or if packets are lost. 1524 Point 7 implies that in order to create a network which performs 1525 best, given the vagaries of slow and unreliable last-mile links, all 1526 hosts should not have to perform these additional Routing and 1527 Addressing management functions - that such functions be handled by 1528 better-connected devices, such as routers in ISPs' data-centers. 1529 [Host-Responsibilities] 1531 The only existing routing scaling problem is in the IPv4 Internet. 1532 In early 2010 the IPv4 DFZ has about 300k prefixes with a doubling 1533 time of about 4.5 years. The IPv6 DFZ has about 855 prefixes - 1534 1/350th the IPv4 number. Even if IPv6 prefix numbers had a doubling 1535 time of 1.0 years, it would be mid 2018 before the number reached 1536 current IPv4 levels - which are not yet unworkable. IPv6 adoption 1537 rates have consistently disappointed IETF expectations. Despite the 1538 run-out of unallocated IPv4 space, there is no sign yet that large 1539 numbers of existing users can have their Internet needs adequately 1540 served via IPv6 addresses alone. 1542 For the reasons described in points 4 to 8, Ivip instead adopts a 1543 Core-Edge Separation approach. 1545 7.1.2. Core-Edge Separation (CES) architectures 1547 Ivip uses a Core-Edge Separation (CES) Architecture. CES does not 1548 involve the creation of new namespaces and does not require any 1549 changes to host stacks or applications. 1551 A subset of the global unicast address space is converted to a new 1552 type of address which, in Ivip, is known as Scalable PI (SPI) space. 1553 The addresses which remain once this new, scalable, "edge" subset of 1554 the global unicast space is separated out is known as "core" address 1555 space. 1557 (In LISP, the "edge" subset is known as EID (Endpoint Identifier) and 1558 the remainder is known as RLOC (Routing Locator). However it is a 1559 mistake to think of these as being "Identifiers" and "Locators" or to 1560 think that LISP has anything to do with the Locator / Identifier 1561 Separation naming model.) 1563 This subset will consist of a growing number of prefixes, each of 1564 which is known as a MAB (Mapped Address Block). Each MAB is 1565 advertised in the DFZ by as many DITRs as are at DITR-sites which 1566 support this MAB. (The QSAs at those sites are also authoritative 1567 query servers for the MABs the site supports.) 1569 Within each MAB, the SPI space can be divided up amongst many 1570 (thousands to potentially millions) of separate end-user networks. 1571 If a network gains more than one basic unit of address space - an 1572 IPv4 address or an IPv6 /64 prefix - it can divide this space into 1573 multiple separately mapped "micronets". 1575 As more and more space is converted for use as SPI space, this "edge" 1576 space will grow to become a significant fraction of the total global 1577 unicast space. There must always be some conventional, "core", non- 1578 SPI, space, since ETRs must be located on such addresses. There are 1579 many uses of space within ISPs which do not need to be on SPI space - 1580 including the large numbers of IPv4 addresses, or in the future IPv6 1581 /64s, which are used for individual home and SOHO customers. Each 1582 such customer gets what is effectively a small (long) prefix of PI 1583 space, which is suitable for their purposes because they do not want 1584 or need portability, multihoming or TE. 1586 As noted in the non-goals section, Ivip does not require or aim for 1587 complete conversion of all end-user networks to SPI space. Many will 1588 be happy with existing PI arrangements, and some larger existing end- 1589 user networks with their own (unscaleable) PA prefixes will probably 1590 retain their current arrangements. Nonetheless, SPI space is 1591 intended to be attractive to all end-user networks, including the 1592 largest corporations, universities and government departments. 1594 CES involves the progressive repurposing of existing address space. 1595 It does not involve the creation of any separate namespaces. 1596 "Separation" in "Locator/Identifier Separation" means separate 1597 namespaces. Only CEE architectures implement "Locator / Identifier 1598 Separation". 1600 CES can be introduced gradually, and with DITRs (or their LISP 1601 equivalent - PTRs) the benefits of portability, multihoming and TE 1602 can be supported for all packets sent to the adopting end-user 1603 network. Therefore 100% of traffic receives these benefits, in 1604 contrast to CEE architectures where only the subset of traffic 1605 originating from other upgraded networks has these benefits. 1607 Assuming a CES architecture does not significantly reduce 1608 performance, robustness or security - and if it provides significant 1609 and immediate benefits to all adopters - then it meets the 1610 constraints due to the need for widespread voluntary adoption. 1611 [Constraints-Voluntary] 1613 All CES architectures I am aware of do not require hosts to perform 1614 additional work to manage routing and addressing. So no CES 1615 architecture is subject to the critique which applies to CEE 1616 architectures, particularly with reference to mobile hosts: 1617 [Host-Responsibilities]. 1619 The historical roots of Core-Edge Separation architectures can be 1620 found in the mid-1990s - Steve Deering's "Map & Encap" for IPv4 1621 [Deering-1996], Robert Hinden's "New Scheme for Internet Routing and 1622 Addressing (ENCAPS) for IPNG" (RFC 1955) and the 1992 crocker-ip- 1623 encaps-01.txt. 1625 7.2. Nearby authoritative query servers 1627 Probably the greatest challenge for a CES architecture is how to 1628 ensure ITRs can securely, reliably and rapidly obtain the mapping 1629 they need in order to be able to decide which ETR to tunnel a packet 1630 to. There are four basic approaches to this problem: 1632 1. The complete global set of mapping changes is sent to each 1633 ITR, which maintains an up-to-date copy of the full mapping 1634 database. 1636 2. Local full-database query servers are located in ISP networks 1637 and potentially in end-user networks in which ITRs are based. 1638 The complete global set of mapping changes is sent to each 1639 such query server, which maintains an up-to-date copy of the 1640 full mapping database. ITRs query one or more of these and so 1641 obtain mapping quickly and reliably. 1643 3. ITRs in an ISP network (or in an end-user network) send 1644 queries to local caching query servers - directly to a QSR or 1645 indirectly via one or more levels of QSC. Both these types of 1646 server are caching query servers and are "local" in that they 1647 are in the same ISP network, or if the ITR is in an end-user 1648 network are either in that end-user network or in the networks 1649 of its one or more ISPs. QSRs are the interface between the 1650 ITRs and the authoritative query servers which are not local - 1651 but which are typically "nearby". (See following text of a 1652 definition of "nearby".) 1654 4. No site or device stores a complete copy of the global mapping 1655 database. Instead, there is a global network by which ITRs 1656 can send query to the authoritative query server for the 1657 particular micronet of addresses which match the destination 1658 address of the packet the ITR needs to tunnel. 1660 The only architecture to propose option 1 was LISP-NERD. This is 1661 widely regarded as scaling poorly with large numbers of end-user 1662 networks. LISP-NERD was to be retired, but a new version 07 ID 1663 appeared in early January 2010. [I-D.lear-lisp-nerd] 1665 APT used option 2. Ivip before DRTM (that is, before March 2010) 1666 also used option 2 - the local full database query servers were 1667 called QSDs. In APT, they were also called Default Mappers, and also 1668 handle the encapsulation of some packets. 1670 Ivip with DRTM uses option 3. The definition of "nearby" follows 1671 shortly. 1673 All other CES architectures to date use option 4. The most prominent 1674 examples are LISP-CONS [I-D.meyer-lisp-cons], LISP-ALT and TRRP. 1676 In option 3, "nearby" means something like within a few thousand km. 1677 In fibre, 200km involves approximately 1ms delay. So if the 1678 authoritative query server is 2000km away, the propagation delay in 1679 SiO2 sets a lower bound to the response time of 20ms. It is assumed 1680 that if the ITR buffers any packets it has no mapping for but gets 1681 the mapping within some time like 50ms or perhaps 100ms, then this 1682 constitutes an insignificant delay in the establishment of initial 1683 communications for all applications and human users. Therefore, 1684 "nearby" means close enough not to involve significant delay or risk 1685 of packet loss. "Typically nearby" means that except for unusual 1686 error conditions - assuming MABOCs are looking after the interests of 1687 their SPI-leasing customers well, by placing multiple DITR-sites with 1688 their QSAs in widely spread locations around the Net - that ITRs will 1689 usually be able to send packets within 50ms or so, which is assumed 1690 to be an insignificant delay. 1692 QSC, QSR and QSAs will all have response times, but it is reasonable 1693 to assume these will normally be a few ms, considering the enormous 1694 four-core 3GHz clock CPU power which inexpensive COTS servers now 1695 possess. 1697 The global query server network approach has obvious advantages in 1698 terms of there being no hardware-imposed limit to the number of query 1699 servers or end-user networks which can be supported. Furthermore, 1700 changes to mapping impose no direct burden on any other devices - 1701 whereas for option 1 or 2, information must be sent to potentially 1702 hundreds of thousands of devices all around the world. 1704 However, global query server systems pose apparently insoluble 1705 problems of delay and potential unreliability - due the delays and 1706 risk of packet losses which are inherent in their global nature. 1707 Furthermore it seems to be impossible to make these systems scale to 1708 the very large numbers of EIDs required for ubiquitous mobile 1709 adoption. [LISP-ALT-Critique] 1711 Typically "nearby" full-database query servers is the clear choice 1712 for Ivip because ITRs will normally not delay any packets to a 1713 significant degree and because this system avoids the avoid scaling 1714 problems which arise from any server being required to store the full 1715 mapping database of all MABs, and the need for a single, coordinated, 1716 mapping distribution system to drive these servers. 1718 7.3. Real-time mapping distribution 1720 By getting mapping changes to all ITRs which need it (all ITRs 1721 handling packets addressed to the micronet whose mapping just 1722 changed) in real-time - within a few seconds at most - Ivip achieves 1723 several major benefits. Firstly, the mapping information can be more 1724 compact, since only a single ETR address is needed. Secondly, ITRs 1725 can be much less complex, and do not need to do any reachability 1726 testing. Thirdly, the real time control of all ITRs which is given 1727 to end-user networks modularly externalises the reachability, 1728 multihoming service restoration and TE decision making systems from 1729 the CES architecture itself. 1731 7.4. SPI address management 1733 Traditional IP techniques divide address space into binary boundary 1734 prefixes. Ivip uses traditional prefixes for the largest unit of SPI 1735 space - the "Mapped Address Block" (MAB). The smaller divisions of 1736 this do not use prefixes or binary boundaries. The units of dividing 1737 SPI space are IPv4 addresses and IPv6/64s. 1739 A MAB is a prefix of address space which is devoted to use as SPI 1740 space. The single MAB is advertised in the DFZ, by all the DITRs at 1741 DITR-sites which support this MAB. These DITRs attract packets 1742 addressed to any address in the MAB. (It would also be possible to 1743 load share the MAB between multiple DITRs, each advertising a segment 1744 of it, but in general complete MABs will be advertised.) For 1745 instance, an IPv4 MAB may be 11.22.0.0/16. 1747 A MAB might have previously been conventional PI space of an end-user 1748 network, and may now be used exclusively by this end-user network. 1749 In this case, it will presumably be used to serve the needs of many 1750 sites within this network, so achieving routing scaling by removing 1751 the need to advertise each such smaller (longer) prefix in the DFZ. 1752 In this case, the end-user network is the MABOC of this MAB, and it 1753 does not lease any of the space to any other organizations. 1755 Most MABs will be operated by MABOCs which are specialised companies 1756 - perhaps ISPs but not necessarily. The MABOC typically acquires 1757 rights to multiple prefixes of global unicast space, advertises each 1758 of them in a global system of DITRs and then leases out smaller 1759 portions of the MABs, on an annual basis, to a large number of end- 1760 user networks. 1762 Each end-user network leases a section of the MAB - a User Address 1763 Block (UAB). One end-user network might lease multiple non- 1764 contiguous UABs in the one MAB, and multiple UABs in multiple MABs. 1765 For simplicity, the following discussion assume they rent a single 1766 UAB, such as: 11.22.33.84 to 11.22.33.95 inclusive. This is an 18 IP 1767 address UAB. UABs could be as small as a single IPv4 address or IPv6 1768 /64 or could be very large, including as large as the MAB itself. 1770 The end-user network which rents this UAB is responsible for 1771 generating mapping changes to suit its needs - and for multihoming 1772 would typically hire a Multihoming Monitoring (MM) company and give 1773 them the credentials required to control the mapping via whatever 1774 mechanism the MABOC provides. 1776 The end-user network can split their UAB up as they wish into 1777 typically smaller sections, known as "micronets". (Bill Herrin first 1778 used this term in TRRP.) A micronet is a contiguous set of any 1779 number of IPv4 addresses or IPv6 /64s which fit within the one UAB. 1780 This 18 IP address UAB could be used as a single 18 IP address 1781 micronet, or it could be split in any way - such as into as many as 1782 18 single IP address micronets. 1784 Each micronet is covered by a single Ivip mapping - it is mapped to a 1785 single ETR address. 1787 MABs and micronets are important to ITRs and most of the mapping 1788 system. UABs are not needed for these, but are an administrative 1789 construct of SPI space which an end-user network is authorised to 1790 change the mapping for. 1792 The MABOC would provide a method by which the end-user network, or 1793 some other company it authorises, can change the mapping and the 1794 division of the UAB into micronets quickly and securely. This would 1795 involve the end-user network having complete control, but being able 1796 to give a username and password to another party such as the MM 1797 (Multihoming Monitoring) company, by the MM company could control the 1798 mapping of some or all of the end-user's UAB space. 1800 The technical and administrative arrangements for this are not 1801 described at present, but as the Ivip system comes closer to being 1802 standardized, it would be desirable to provide a standard protocol or 1803 interface by which end-user networks or their appointees could issue 1804 mapping changes, rearrange the division of UABs into micronets etc. 1805 Also, it would be desirable to have a standardised way that an end- 1806 user network could allow its appointee to control the mapping for 1807 individual micronets within its UAB. If this was universally adopted 1808 by all MABOCs, then multihoming monitoring systems would only need to 1809 work with this one system for controlling the mapping of micronets. 1811 For each mapping change and each change to the division of the UAB 1812 into micronets, the end-user network would typically incur a fee from 1813 the MAB company. 1815 The MAB company would charge fees for leasing the UAB space, and for 1816 the load placed on the DITRs which cover this MAB. The MAB company 1817 may run its own DITRs - and their associated QSAs - or may contract 1818 this out to another company which specialises in this service. It 1819 will be an important part of the MAB company's service to locate 1820 DITRs in all corners of the Net, to ensure good load sharing between 1821 them and to minimise the total path length from the sending host to 1822 whichever ETR the end-user network chooses to map their Micronet to. 1823 Likewise, the load-sharing between the QSAs at these sites, and the 1824 desirability of having QSAs "nearby" to the QSRs in ISP and other 1825 networks all over the world. 1827 This flexible integer-based approach to dividing SPI space is 1828 intended to maximise the efficiency with which it is can be used. 1829 Since a single physical site, such as a branch office, may be able to 1830 operate perfectly well on one or a few IPv4 addresses, or on a single 1831 IPv6 /64, a seemingly small UAB of 18 IPv4 addresses could be used to 1832 serve the needs of as many branch offices. Each such site could be 1833 multihomed with two or more local ISPs. 1835 As fresh expanses of IPv4 space disappear, there will be continuing 1836 pressure to slice and dice the address space more finely so it can be 1837 used by more and more ISPs and end-user networks. However, the 1838 convention in the DFZ is not to propagate prefixes longer than /24. 1839 This 256 IP address granularity inherent in the current arrangement 1840 leads to considerable underutilization of space. With SPI address 1841 able to be sliced and diced freely in the smallest possible 1842 increments, a much greater utilization can be expected, in a scalable 1843 fashion, than is possible with current techniques. 1845 7.5. IP in IP encapsulation 1847 When encapsulation is used, there is a simple IP-in-IP header. There 1848 is no need for ITRs to communicate with ETRs, except for the purpose 1849 of PMTUD management. So, when the ITR tunnels traffic packets 1850 ordinarily (in all cases except for the special Path MTU measurement 1851 protocol, which is only used rarely) there is no need for a UDP 1852 header to enclose a special header with extra information. 1853 Architectures with slow mapping distribution and which therefore 1854 require ITRs to choose between multiple ETRs typically require the 1855 ITRs and ETRs to communicate - but this is not needed for Ivip. 1857 7.6. MHF initially or in the long term to avoid encapsulation and PMTUD 1858 problems 1860 Both the IPv4 and IPv6 headers have un-used bits which can be 1861 employed to direct the packet from ITR to ETR. This path is 1862 primarily across the DFZ but typically includes routers inside ISP 1863 and end-user networks. These routers need to be upgraded - and in 1864 the long-term this can be done without significant cost, simply by 1865 building the new capabilities into new routers and implementing it in 1866 firmware updates. 1868 7.7. Outer header address is that of the sending host 1870 When encapsulation is used, it seems natural to use the ITR's address 1871 as the outer header's source address. This is consistent with 1872 traditional tunneling, and ensures the ITR gets any ICMP messages, 1873 including especially Packet Too Big (PTB) messages. 1875 There are two problems with this conventional approach, which is used 1876 by LISP and other CES architectures. Firstly, it is very expensive 1877 for the ITR to securely respond to PTB messages. Secondly, this 1878 approach means that any ISP BR filtering (dropping) of incoming 1879 packets according to their source address will not affect the packets 1880 at the BRs and must be replicated in the ETR. For more than a few 1881 such blocked prefixes, this is extremely expensive too - and we want 1882 ETRs to be as simple as possible. 1884 The answer is to have the ITR use the sending host's source address 1885 in the outer header of the encapsulated packet. All ITRs will 1886 therefore generate packets with identical inner and outer source 1887 addresses. ISP BR filtering will drop the packets with source 1888 addresses matching any prefix inside the ISP's network and the ETR 1889 will never need to handle such packets. 1891 The ETR needs to enforce this in the case where an attacker sends a 1892 packet to the ETR, with an inner packet having a banned source 1893 address and the outer header having a source address which is 1894 allowable. This enforcement is achieved by the ETR performing simple 1895 logic on each decapsulated packet: If its source address does not 1896 match the outer header's source address, the packet is dropped. 1898 This arrangement of the outer source address being that of the 1899 sending host requires a novel approach to Path MTU Discovery 1900 management. 1902 7.8. IPTM (ITR Probes Tunnel MTU) PMTUD management 1904 As long as encapsulation is used, there needs to be a method of 1905 informing sending hosts, via traditional RFC 1191 techniques of what 1906 length packet to send, so that once encapsulated, these packets may 1907 reach, but not exceed the MTU of the path between the ITR and ETR. 1908 This is true of any CES architecture which uses encapsulation. It is 1909 a complex topic and there is a solution, but it requires considerable 1910 thought and significant complexity in all ITR and ETR. 1912 PMTUD management occurs naturally via RFC 1191 mechanisms for DF=1 1913 traffic packets if the router with the too-small MTU is between the 1914 sending host and the ITR, or between the ETR and the destination 1915 host. Without encapsulation - with MHF - packet lengths are not 1916 increased in the ITR to ETR "tunnel", and the modified routers in 1917 this path will convert a too-long packet back to its original IP 1918 header format, before passing it to the ICMP PTB algorithm. 1920 The difficult task is to make PMTUD work for the path between the ITR 1921 and ETR, where the original packet is encapsulated. I intend to 1922 write up IPTM in an ID. For now, the fullest description is on a web 1923 page. [PMTUD-Frag] Here is an overview of the process, which is much 1924 the same for IPv4 and IPv6. 1926 This system involves restrictions on the length of IPv4 DF=0 1927 (fragmentable) packets which are accepted by this system. It is 1928 reasonable to expect applications not to generate such packets, which 1929 place a serious burden on the network of they are too long. Google 1930 servers have been observed sending 1470 byte DF=0 packets. 1931 [DFZ-unfrag-1470] Such companies could presumably be persuaded to 1932 refrain from sending DF=0 packets altogether by the time a scalable 1933 routing solution is deployed. In the long-term, with EAF in place of 1934 encapsulation for IPv4, fragmentable packets addressed to SPI 1935 addresses will be dropped by all ITRs. 1937 A simple approach to PMTUD management would be to choose some packet 1938 length, marginally below 1500 bytes and require all ITRs to accept 1939 only packets which are the encapsulation overhead number of bytes 1940 shorter than this. Longer packets would cause the ITR to generate a 1941 PTB and the sending host would send a suitably shortened packet 1942 instead. This would be simple and perform reasonably well in today's 1943 DFZ, where the Path MTU can reasonably be assumed to be 1460 bytes or 1944 more. 1946 However, such a scheme would fail to take advantage of jumboframe 1947 sized MTUs whenever they appear in the DFZ. ITR to ETR MTUs of 1948 around 9k bytes are likely to become more and more prevalent as more 1949 routers adopt Gigabit Ethernet interfaces, which handle these large 1950 packets. 1952 The encapsulated packet has the sending host's source address. If 1953 such a packet reached a router with a next hop MTU which was longer 1954 than the packet, the router would transmit a PTB to the sending host. 1955 However, the sending host should ignore it, since the destination 1956 address in the enclosed packet headers will be that of the ETR, not 1957 of the destination host - and the rest of the enclosed headers will 1958 not match the packet it sent. Also, the MTU figure in the PTB is 1959 higher than the figure the sending host needs to adhere to. 1961 So the challenge is for the ITR to generate RFC 1191 PTBs when 1962 necessary, in an inexpensive and secure manner, whilst adapting to 1963 potentially higher or lower MTUs to the ETR due to routing path 1964 changes - while making full use of jumboframe paths if and when they 1965 exist. Security in this case means being immune to spoofed PTBs - a 1966 single one of which could greatly reduce the MTU for all traffic from 1967 the ITR to a given ETR for at least ten minutes. 1969 A careful decision will be made to assign a value such as 1200 bytes 1970 to a globally agreed constant MPMTU (Minimum Path MTU). Once set, 1971 this value must remain agreed to indefinitely. A BCP would require 1972 all DFZ routers, and all routers between the DFZ and any ITR or ETR 1973 (and of course the links between these) to handle packets of this 1974 length. 1976 Any packets, which once encapsulated and so ENCAPS (Encapsulation 1977 overhead - 20 bytes for IPv4 and 40 for IPv6) bytes longer, have 1978 lengths less than or equal to MPTU are encapsulated without any extra 1979 processing. No PMTUD problems exist for these packets. 1981 For any packet longer than this, assuming the ITR has not yet probed 1982 the PMTU to its ETR, the ITR performs some special processing. The 1983 packet itself is split into two sections and two packets are sent to 1984 the ETR as part of the ITR's attempt to probe the MTU to this ETR. 1985 One packet uses UDP encapsulation to convey a nonce, some flags and 1986 most of the traffic packet - with the ITR's address in the outer 1987 header's source address. This long packet is exactly the same length 1988 as the original packet would be once encapsulated. 1990 If this exceeds the PMTU to the ETR, then the ITR will be sent a PTB. 1991 Assuming this is received, the ITR will determine a new MTU to send 1992 in the PTB to the sending host. This process will repeat until the 1993 sending host's packets, once encapsulated, no longer exceed the MTU 1994 of the path to the ETR. 1996 IPTM does not rely on these PTBs. The ETR is instructed, in a 1997 shorter packet to report to the ITR whether the long packet arrives 1998 or not - and the ETR repeats this report for a while until it is 1999 acknowledged. The long packet is accompanied by one or more copies 2000 of this shorter packet, which contains a matching nonce, flags and 2001 the remainder of the traffic packet. The shorter packet has the 2002 sending host's address in the outer header, so ISP BR source address 2003 filtering is still enforced. 2005 The effect is that as the sending host (or multiple sending hosts 2006 whose packets must be tunneled to the one ETR) tries longer and 2007 longer packets, the ITR narrows its "zone of uncertainty" (cue 2008 Hammond organ, with reverb and ghostly sounds . . .) about the true 2009 MTU to this ETR. If the traffic packets necessitate it, the ITR will 2010 exactly determine the MTU, and so be able to stop probing it for a 2011 while and send PTBs to sending hosts which generate packets which, 2012 once encapsulated, would be longer than this reliably determined MTU. 2013 Further elaborations are required for the ITR to adapt to changing 2014 conditions and discover longer or shorter MTUs. 2016 Without some kind of PMTUD system, CES architectures cannot use 2017 encapsulation. These techniques will require further design work and 2018 extensive testing, but are more secure and less expensive than the 2019 only other obvious alternative - using the ITR's address in outer 2020 headers and having the ITR maintain a large cache of details about 2021 recently sent "long" packets, in order that it can securely accept 2022 PTBs if they are too long. 2024 8. Architectural Elements 2026 8.1. ITRs 2028 8.1.1. Types of ITR and their addresses 2030 The ITR function can be implemented in a traditional hardware-based 2031 router, in a COTS (Commercial Off The Shelf) server, or as a piece of 2032 software in a sending host. The functions are much the same, but an 2033 ITR in a sending host does not advertise anything in a routing system 2034 - it simply handles outgoing packets which are addressed to any MAB. 2036 If an ITR is built with software and a COTS server, it doesn't need 2037 to be a "router" in most ordinary respects. For instance it doesn't 2038 need multiple interfaces. It may have a single Gigabit Ethernet link 2039 and advertise MABs in the local routing system, forwarding its 2040 encapsulated packets to a router to be forwarded like any other 2041 packet. 2043 An ITR could be built into a DSL, HFC cable, fibre or WiMax / 3G 2044 router. However, it is probably best to do this only when the ITR 2045 function is on a reliable, fast, inexpensive link. Most wireless 2046 links are not like this and it would be better to let SPI packets 2047 flow out of the link, and be handled by ITRs in the ISP network, 2048 which have fast reliable paths to local query servers. 2050 An ordinary ITR (not in a sending host, and not a DITR in the DFZ) is 2051 a device within an ISP or end-user network which attracts packets 2052 addressed to SPI addresses. It may do this by advertising every MAB 2053 - so the only packets forwarded to it, other than those addressed to 2054 the DITR itself, are those addressed to SPI addresses. 2055 Alternatively, if the ITR is a true router (hardware or software) it 2056 my advertise the entire address space and so be forwarded all packets 2057 not addressed to prefixes advertised by local routers. Then, it 2058 would encapsulate packets which are addressed to SPI addresses and 2059 forward all other packets according to its ordinary router functions. 2061 The ITR's address - the address it uses for tunneling packets from, 2062 and which is used for communication with the ETR for PMTUD management 2063 - may be on conventional global unicast space or, if in an end-user 2064 network, on SPI space. This address is also used for communication 2065 with local query servers (QSCs or QSRs) and for receiving PTB 2066 messages. 2068 Here is a description of what happens when a sending host in an ISP 2069 network, such as a QSC or QSR, on the ISP's conventional address 2070 space, sends a packet to a host in an end-user network on an SPI 2071 address - in this case an ITR or ITFH. The packet will go to an ITR 2072 in the ISP network (if the QSC or QSR doesn't have an ITFH installed) 2073 and then will be tunneled to the ETR for this end-user network. This 2074 ETR sends the packet to the SPI-addressed host, in this case an ITR 2075 or ITFH. 2077 When MHF is used, there is no PMTUD management, no interaction with 2078 ETRs and no trace of the ITR's address in any outgoing packets. 2079 However, the ITR still needs an address for communicating with local 2080 query servers. As just noted, this can be on conventional "core" 2081 space or "edge" (SPI) space. 2083 8.1.2. DITRs - Default ITRs in the DFZ 2085 DITRs are "Default ITRs in the DFZ". This first use of "Default" is 2086 different from the use of "Default" in "Default-Free Zone". (This 2087 term looks nonsensical when expanded fully: "Default ITRs in the 2088 Default-Free Zone".) 2090 The initial "Default" means that this ITR acts as one of (typically) 2091 many other such ITRs, all of them outside ISP and end-user networks. 2092 These DITRs advertise MABs from many places in the DFZ and so form 2093 multiple destinations which are the "default" - what happens to the 2094 packet if nothing else happens, meaning the packet does not go into 2095 any other ITR before reaching the DFZ. 2097 In principle, a DITR could advertise every MAB, or be an otherwise 2098 normal DFZ router and encapsulate every packet which is forwarded to 2099 it which is addressed to an SPI address. However, there is a burden 2100 of work looking up mapping, encapsulating packets and on occasions 2101 handling the PMTUD management functions to ETRs, which involves 2102 sending PTBs to sending hosts. It is unlikely that anyone running a 2103 DFZ router would want their device to do more work, unless they are 2104 paid for it by the beneficiaries. The ultimate beneficiaries of 2105 DITRs are the end-user networks which the packets are addressed to - 2106 and these are the customers of the MABOCs who lease the space to them 2107 (except where the one end-user network runs a whole MAB for itself, 2108 and so is its own MABOC). 2110 The most likely arrangement for DITRs is that the MABOCs who lease 2111 SPI space to end-user networks will also run DITRs themselves or 2112 contract specialised companies to run DITRs all over the Net for 2113 them. In this scenario, a DITR would advertise only those MABs of 2114 the MABOCs who are paying the operator for this service. MABOCs 2115 would charge their SPI-renting end-user network companies for the 2116 traffic handled for their networks by DITRs, so DITRs in general 2117 would need to sample traffic reliably and generate reports in a form 2118 which would enable the MAB companies to bill their customers fairly. 2119 Only DITRs need this traffic sampling capability. Other ITRs would 2120 have monitoring and management functions, but would not need to 2121 collect usage statistics for billing. 2123 Theoretically, DITRs could advertise all MABs and so handle packets 2124 addressed to every MAB. In practice, I expect DITRs will usually 2125 only handle packets addressed to specific MABs. Other ITRs, 2126 including those in sending hosts, will handle packets addressed to 2127 any MAB. Consequently, these non DITR ITRs all need a reliable 2128 method of downloading the latest set of MABs. They will do this as 2129 part of discovering and communicating with their one or more local 2130 query servers. The one or more QSRs they rely on will determine the 2131 current set of MABs by the DNS-based mechanism described in the Ivip- 2132 drtm ID. Changes to this set will also need to be propagated to all 2133 QSCs and ITRs in the local system, by a mechanism which is yet to be 2134 designed. 2136 DITRs may be implemented in hardware based routers, or in COTS 2137 servers. They are always located on conventional global unicast 2138 addresses - never on SPI addresses. DITRs are likely to be busy, so 2139 it makes sense to locate them in major datacenters or Internet 2140 exchanges, close to one or more full database query servers. DITRs 2141 advertise MABs to their neighbouring BGP routers, and have a default 2142 route to either one of these routers, or have the full set of DFZ 2143 routes with links to multiple neighbouring routers. So (unless they 2144 are implemented behind a suitable BGP router) DITRs are BGP routers 2145 and may or may not be "DFZ" routers, depending on how they forward 2146 their outgoing packets. 2148 8.1.3. Modified Header Forwarding - MHF-only ITRs 2150 Ivip for IPv4 and for IPv6 separately may or may not begin with 2151 encapsulation. If it does, then all ITRs and ETRs will also be 2152 capable of transitioning in the future to using MHF. 2154 The MHF techniques are discussed in a later section, but involve much 2155 less processing than encapsulation. With MHF, there is no need for 2156 PMTUD management. 2158 8.1.4. Encapsulation and PMTUD management 2160 When the ITR function is implemented in software - either inside a 2161 sending host, or in a COTS server, it will be relatively 2162 straightforward to write C code or the like to implement the 2163 functions of analysing the packet's destination address, deciding 2164 whether to encapsulate it or not, deciding which ETR address to 2165 encapsulate it to, and encapsulating it. Once encapsulated, the new 2166 packet is presented to the internal packet handling functions and 2167 forwarded normally. 2169 This packet-handling code also needs to consider the length of the 2170 packet, with reference to a small set of variables it maintains for 2171 the ETR the packet will tunneled to. So the packet's destination 2172 address would firstly be used to find an ETR address. Generally, 2173 this would be found by reference to the ITR's cached mappings, but 2174 for initial packets in a new communication flow, the packet must be 2175 held for a few milliseconds or tens of milliseconds while the ITR 2176 retrieves the mapping information from its one or more local query 2177 servers. 2179 Once the destination ETR address is known, the length of the packet 2180 is considered. If it is less than some constant, it can be 2181 encapsulated and sent without any further processing. If it is 2182 longer than this constant, then the ITR needs to perform PMTUD 2183 management functions. In this case, the ITR establishes, or has 2184 already established, some variables for this ETR. These include an 2185 upper and a lower estimate of the MTU to this ETR. If these are 2186 different, then there is a "zone of uncertainty" about the MTU. If 2187 they are equal, then the ITR has already reliably established the 2188 MTU. If the packet length, plus the encapsulation overhead, exceeds 2189 the range of possible MTU values the ITR has previously determined 2190 for the path to this ETR, then the ITR will send part of the packet 2191 back to the sending host in an ICMP PTB message. If the encapsulated 2192 length would be less than the lower limit in the "zone of 2193 uncertainty" then the packet can be encapsulated without further 2194 processing. 2196 If the encapsulated length falls within the "zone of uncertainty", 2197 then the ITR emits two packets - a long one and a short one - and 2198 communicates with the ETR in a way which will usually raise the lower 2199 limit of this zone, or lower the upper limit. In the former case, 2200 the ITR is able to determine that the encapsulated length did not 2201 exceed the MTU and that the ETR received it correctly. The traffic 2202 packet's contents are mainly contained in the long packet, which has 2203 the same length as the traffic packet would have had if encapsulated. 2204 The remainder of the traffic packet is conveyed in a short packet, of 2205 which perhaps a few will be sent. This is non-trivial process, which 2206 involves the ETR in some work - but it only occurs for packets whose 2207 encapsulated length falls within the "zone of uncertainty". 2209 Except for rare error conditions, each such operation reduces the 2210 size of the "zone of uncertainty" - and typically the zone will be 2211 reduced to zero. Once this occurs, at least for the next 10 minutes 2212 or so, the ITR need not perform any such probing of the MTU. Every 2213 encapsulated packet which is to be sent to this ETR will be either 2214 shorter than the MTU, in which case it is encapsulated without any 2215 further work - or is longer, in which case a PTB is sent back to the 2216 sending host, with an MTU value such that the host will generate 2217 packets to this destination host of a length which, when 2218 encapsulated, will equal this reliably determined MTU. 2220 This encapsulation and some kind of PMTUD management is required for 2221 any CES architecture which uses encapsulation. All other CES 2222 architectures use encapsulation exclusively. There is at least one 2223 other approach to PMTUD management which is probably more expensive 2224 to perform as securely as this one. The fact that this and other 2225 processes are explained in some detail in this Ivip ID and not in the 2226 IDs of other proposals does not mean that the other proposals, once 2227 developed to the point of proper operation, would be simpler than 2228 Ivip. 2230 The encapsulation itself is straightforward. The sending host's 2231 address is used for the outer source address and the ETR's is used 2232 for the outer destination address. For IPv4 packets, the Diffserv, 2233 TTL and other flags are copied to the outer header. For IPv6, 2234 Traffic Class and Hop Limit bits are also copied. 2236 8.1.5. Mapping lookup and caching 2238 Apart from PMTUD management, looking up the mapping for an incoming 2239 packet is the most complex task that ITRs need to perform. This task 2240 is the same for encapsulation in both IPv4 and IPv4 and for the IPv4 2241 approach to MHF: ETR Address Formatting (EAF). For the IPv6 MHF 2242 technique - Prefix Label Forwarding (PLF) - the mapping lookup is 2243 similar, but only part of the ETR's address is actually needed for 2244 writing 19 or 20 bits into the header. 2246 When encapsulation is used, for IPv4 or IPv6, the result of the 2247 mapping lookup is an IP address of the ETR, which will become the 2248 destination address of the outer header. The result of EAF is 2249 similar, and ETR address where the two least significant bits are 2250 zero. This will be written into the modified IPv4 header. 2252 The result of the PLF mapping will be a 19 or 20 bit value is written 2253 into the modified IPv6 header and which identifies one of 2^19 or one 2254 of 2^20 contiguous DFZ advertised prefixes, each of which is 2255 advertised by a different ISP site. These 20 bits do not uniquely 2256 identify an ETR if there are more than one at each ISP site, but they 2257 are sufficient for the packet to be forwarded across the DFZ to the 2258 nearest BR of that site, where a second mapping lookup may be 2259 performed on the destination address to determine which of multiple 2260 ETRs at that site the packet should be forwarded to. 2262 This following may appear somewhat complex, but it is a description 2263 of different approaches to handling ITR to ETR tunneling for both the 2264 IPv4 and IPv6 Internets. Ideally, encapsulation won't be necessary 2265 to at all. At worst, it will be necessary until DFZ and other 2266 routers are upgraded to handle EAF or PLF modified header packets. 2268 The mapping lookup is driven entirely by the packet's destination 2269 address. Ivip does not attempt to send packets of differing types, 2270 service class or differing source address to different ETRs. (Nor do 2271 the other CES architectures.) 2273 After a packet arrives, and has been classified as being addressed to 2274 an SPI address (meaning it matches one of the MAB prefixes) the next 2275 step is to find out whether the ITR has any mapping cached for the 2276 packet's destination address. For IPv4 the full destination address 2277 is used. For IPv6, only the most significant 64 bits are used, since 2278 SPI space is divided on /64 boundaries. 2280 Busy ITRs may have tens or perhaps hundreds of thousands of mappings 2281 already cached. An ITR function in a sending host may have only a 2282 handful or a few thousand for a busy web-server. A carefully 2283 designed algorithm will be needed to find any existing mapping, or to 2284 determine that the destination address does not match any cached 2285 mapping. 2287 In the former case, the mapping consists of a starting address and 2288 ending address for the micronet which the destination address falls 2289 within - and a single ETR address. This ETR address (or set of PLF 2290 bits) is then applied to the packet - by writing it to the outer 2291 header when encapsulating, or by writing into the modified header for 2292 EAF or PLF. (PLF only uses 19 bits of the ETR address - just enough 2293 to distinguish between the 2^19 contiguous prefixes which are 2294 reserved for this system. The resulting packet is then ready to be 2295 forwarded like any other - according to its outer header, or 2296 according to the bits just written into its modified header. 2298 If no cached mapping is found, the ITR buffers the packet and sends a 2299 map query to a local query server - a QSC or a QSR. This includes a 2300 nonce which is used to secure the reply, and any later map update 2301 messages the query server sends if the mapping changes during the 2302 time the ITR caches it. 2304 The local query server sets the caching time on the mapping. This 2305 time may be locally configured and could be set differently for 2306 different replies by various algorithms in the query server to 2307 optimise its interactions with the ITRs, and to limit the number of 2308 mappings the ITR caches. (Further work: It may be desirable for each 2309 ITR to be able to communicate to its query server(s) the state of its 2310 cache and how close to any limits it is running, so the map replies 2311 can have their caching times adjusted downwards.) 2312 In the future, caching times will be discussed fully in the Ivip-drtm 2313 ID. 2315 The ITR flushes from its cache any mappings whose cache times have 2316 expired. The cache includes the starting and ending address of the 2317 micronet, the ETR address and the nonce which was sent in the query 2318 which returned this mapping. The ITR can also be sent a Cache Update 2319 message to the effect of flushing the cached mapping for a given 2320 micronet - by a QSC or QSR which previously sent mapping for this 2321 micronet in a Map Reply message. This is needed if the micronet has 2322 been deleted from the system, such as due to the end-user network 2323 changing the way their UAB is divided into micronets. If the ITR 2324 receives a packet with a destination address which matches this 2325 micronet, then there will be no cached mapping and the ITR will 2326 request mapping - and so gain the mapping for whatever micronet this 2327 address is now within. 2329 At any time when a mapping is cached, the ITR may receive a Cache 2330 Update message from a QSR or QSC which previously sent it mapping for 2331 this micronet. The Cache Update message, like the Map Request query 2332 and the Map Reply message, will be a UDP packet. The Ivip-drtm ID 2333 has more information on these messages and their acknowledgement. 2335 The Cache Update will be secured by the nonce sent by this ITR in its 2336 original Map Request query which resulted in the QSC or QSR sending a 2337 Map Reply (also secured by that nonce) which specified the start and 2338 end address of this micronet, and its mapping (the ETR address). 2340 The most common update will be that this micronet is now mapped to a 2341 different ETR address. Another type of update is that this micronet 2342 include it being mapped to no ETR (an ETR address of zero) - in which 2343 case the ITR will drop subsequent matching packets. As noted above, 2344 the final kind of Cache Update is a command to flush the mapping 2345 cached for this micronet. This could be encoded via special flags, 2346 but it may be simpler, for instance with IPv4, to define a particular 2347 ETR address such as 0.0.0.1 as meaning the mapping should be flushed. 2349 None of these Cache Updates reset the caching time. So ITR's cached 2350 mappings will time out as usual, no matter how many Cache Updates 2351 have arrived to alter the ETR address stored in this mapping. If the 2352 Cache Update message from the query server reset the caching timeout 2353 process, then continued Cache Updates would keep a mapping in the 2354 ITR's cache for excessive periods - including if the ITR was not 2355 handling any packets for this micronet. 2357 In this way, ITRs receive all the updated mapping they need, within a 2358 fraction of a second of the changed mapping being received by the 2359 nearby QSA. 2361 8.1.6. ITFH - ITR Function in Host 2363 An ITR function in a sending host performs either encapsulation and 2364 PMTUD management or MHF as described above. This function is only 2365 for packets generated in the host. ITFH should only be used on hosts 2366 which have fast, reliable, connections to two or more local query 2367 servers. If there are delays, or packet losses, then the extra 2368 management traffic between the ITR function and the local query 2369 servers may not function well enough to ensure there are no 2370 significant delays to traffic packets. 2372 In many settings, the software and hardware required to implement an 2373 ITR in the sending host will have zero incremental cost. RAM and CPU 2374 capacity is now extremely inexpensive. Hosts - such as desktop PCs 2375 and servers used in hosting farms and cloud systems - come bristling 2376 with multicore CPUs and gigabytes of RAM for the price of a good 2377 shirt. 2379 The host could be on a conventional global unicast address (PI or PA) 2380 or on an SPI address. If it is thought desirable to enable ITFHs in 2381 hosts behind NAT, then at least two additional measures would need to 2382 be taken. Firstly, if encapsulation was used, the PMTUD exchange 2383 with ETRs would need to work through the NAT - which it probably 2384 would. Secondly, the ITFH would need to set up and maintain a two- 2385 way tunnel to two or more local QSCs. I do not suggest that QSRs 2386 should have to maintain sessions with such ITFHs. TCP with a 2387 keepalive might do, but SCTP would probably be much better. Then, 2388 instead of UDP mapping queries, replies and updates, the same 2389 messages would be sent over SCTP. It is not out of the question to 2390 link all ITRs, QSCs and QSRs with SCTP, rather than use UDP packets, 2391 since the SCTP will ensure reliable delivery of messages, and so 2392 reduce the complexity of the code for sending receiving and 2393 acknowledging messages. 2395 8.1.7. ITRs auto-discovering local query servers 2397 There is further work to do to enable ITRs to automatically discover 2398 the addresses of one or more local query servers - whichever two or 2399 more QSCs or QSRs the ITR or ITFH is supposed to send its Map Request 2400 queries to. This is not absolutely necessary, but would greatly ease 2401 the deployment of ITRs in ISP and end-user networks. The more ITRs 2402 there are, the less work each one has to do and so the greater the 2403 chance that they can be implemented with little cost in a COTS 2404 server, rather than an expensive hardware-based router. This 2405 principle applies especially to ITFHs. 2407 Likewise, it would be desirable for QSCs to be able to automatically 2408 discover the upstream QSCs or QSRs they should send their Map Request 2409 queries to. 2411 8.2. ETRs 2413 8.2.1. In servers or dedicated routers 2415 The ETR function can be performed in a dedicated router or in a 2416 server with appropriate software. 2418 Whether the ETR function is performed in a server with one or more 2419 Ethernet ports, or a router with multiple ports of various kinds, 2420 depends on how the traffic packets are to be forwarded to the one or 2421 more end-user networks being served by this ETR. The methods of 2422 forwarding do not need to be part of the Ivip RFCs - just how ETRs 2423 handle the incoming packets, and for encapsulation, how they 2424 communicate with the ITR for PMTUD management purposes. 2426 In the TTR mobility system, the TTRs perform ETR functions. The link 2427 to each end-user network is a separate two-way tunnel, established by 2428 the Mobile Node (MN) to the TTR. 2430 8.2.2. ETRs in ISP networks 2432 An ETR in an ISP network can, in principle, handle packets for many 2433 end-user networks - all from a single global unicast address. This 2434 has a scaling benefit for IPv4 by supporting a potentially large 2435 number of end-user networks, with potentially large numbers of SPI 2436 addresses, while requiring only a one of the ISP's IP addresses. 2437 (For IPv6, inefficiency of address use is not a concern.) 2439 8.2.3. ETRs at the end-user network site 2441 A multihomed end-user network with two links to ISPs might have two 2442 ETRs - one for each link. Each ETR will have a stable conventional 2443 (non-SPI) global unicast address to receive encapsulated packets on. 2444 So each ISP needs to devote at least one of its addresses, or more 2445 likely four, for each such ETR. This saves the ISP from having to 2446 run an ETR for this customer - all the ISP provides is connectivity 2447 and this small amount of stable address space. 2449 There could be one physical ETR, with two links to the two ISPs, 2450 receiving encapsulated packets as above on the two addresses provided 2451 by the two links. This device would be a router of some kind, even 2452 if implemented on a server, since it would also be deciding which 2453 link to send outgoing packets on. 2455 8.2.4. MHF ETR functionality - EAF and PLF 2457 If Ivip is introduced with encapsulation, its ITR and ETR functions 2458 will contain Modified Header Forwarding functionality ready for a 2459 future migration from encapsulation to MHF exclusively. The IPv4 MHF 2460 technique - ETR Address Forwarding (EAF) - is very similar to the 2461 encapsulation arrangement, so the same ETR could do both, from the 2462 same address. However, with EAF, the ETR address is specified with 2463 the most significant 30 bits, giving a granularity of 4 IP addresses. 2464 (But see previous discussion about how this could probably be 2465 redesigned to involve a new header type which would allow 31 or 32 2466 bits to be used.) To avoid having to change ETR addresses when 2467 encapsulation is turned off, only one ETR should be located in each 2468 /30 prefix. 2470 The IPv6 approach to MHF - Prefix Label Forwarding (PLF) - is 2471 conceptually different from the encapsulation approach in which the 2472 packet is tunneled to an ETR at a single IPv6 address. The ITR uses 2473 the mapping to write 19 or 20 bits into the IPv6 header. Upgraded 2474 routers in the DFZ forward the packet to ISP BRs (Border Routers, 2475 facing other ISPs and transit networks) advertising one of 2^19 or 2476 2^20 separate prefixes. While the mapping still specifies an exact 2477 128 bit IP address for the ETR, before MHF can be turned on, all ETRs 2478 must be given addresses within the special set of DFZ-advertised 2479 prefixes which the MHF system can forward these packets to. 2481 On arrival at the BR, the packet itself contains no information of 2482 further use - it does not contain the ETR address, just 19 or 20 bits 2483 of the address bits which differentiate this contiguous set of 2484 prefixes. If there is only one ETR for each such prefix, then the 2485 BRs (or perhaps single BR) needs only to forward the packet to the 2486 ETR. Alternatively, the ETR function could be performed within the 2487 one or more BRs. 2489 However, if this prefix has multiple ETRs, then the BR needs to 2490 behave like an ITR and perform a second mapping lookup, using the 2491 destination address of the packet, to decide how to forward (or 2492 perhaps tunnel) the packet to the correct ETR. There are various 2493 techniques for doing this, including the ISP using the PLF bits 2494 again, interpreted according to its own arrangements by its internal 2495 routers, to forward the packet to some internal prefix (perhaps in 2496 ULA space) which leads to the correct ETR. I have not yet explored 2497 the various ways an ISP could use to get PLF-tunneled packets to the 2498 correct ETR, or how techniques and ETR placement arrangements for 2499 encapsulation can be made compatible with the PLF arrangements. 2501 With both EAF for IPv4 and PLF for IPv6, the work an ETR performs on 2502 each tunnelled packet is trivially simple: restore the altered bits 2503 so the IP header has its standard form again, and forward the packet 2504 to the destination network. The ETR does not communicate with the 2505 ITR or with any other part of the Ivip system, since the ITRs and 2506 ETRs have no Ivip-specific PMTUD problems to solve. 2508 If there resulting packet is too long for the next hop, the existing 2509 IP stack of the server or router in which the ETR function is 2510 performed will implement conventional RFC 1191 PMTUD and generate a 2511 PTB to the sending host. 2513 8.2.5. ETR functionality for encapsulation 2515 With encapsulation, ETRs receive IP-in-IP packets on a stable global 2516 unicast address. The ETR recognises all such packets and decapsulate 2517 them. If the outer header source address matches that of the inner 2518 packet, then the ETR forwards the packet to the end-user network. If 2519 the ETR handles multiple end-user networks, then it will have 2520 appropriate configuration or router functionality to forward the 2521 packet to the correct end-user network. 2523 For PMTUD management, some more complex functionality is required. 2524 When the ITR uses special techniques to send a traffic packet, in two 2525 parts, as a probe of PMTU to this ETR, it sends a long packet and one 2526 short one (or multiple copies of the short one) to the ETR's address. 2527 However, these are not IP-in-IP encapsulated. They are both UDP 2528 packets - the long one with the ITR's address as the source address, 2529 and the shorter one(s) with the sending host's address as the source 2530 address. 2532 If only the short packet arrives, then the long one was lost - 2533 probably due to it being longer than the PMTU from the ITR to this 2534 ETR. The ETR informs the ITR of this non-reception, and receives an 2535 acknowledgement of this. If both the long and short packets arrive, 2536 the ETR reconstructs the full traffic packet, forwards it to the end- 2537 user network, and informs the ITR that it has been received 2538 correctly. This involves significant complexity in the ETR, but does 2539 not involve storing state for more than a few seconds. 2541 Once the traffic packet has been decapsulated, if the forwarding step 2542 leads to the packet being deemed too long for the next-hop MTU, then 2543 the conventional IP stack will generate a PTB to the sending host and 2544 RFC 1191 PMTUD will proceed just as it would if there had been no ITR 2545 to ETR encapsulation. 2547 8.3. QSRs - Resolving Query Servers 2549 Please see the Ivip-drtm ID for a description of QSRs. 2551 8.4. QSCs - caching query servers 2553 A caching query server (QSC) is a relatively simple function, 2554 typically implemented as software in a server. The software for 2555 ITRs, QSCs, QSRs and QSAs would share some common components. A QSC 2556 receives and responds to Map Request queries from ITRs or other QSCs 2557 in the same manner as a QSR. The QSC sends Map Request queries to a 2558 QSC or QSR in just the same way as was described above for an ITR - 2559 and likewise receives Map Reply and Cache Update messages the 2560 upstream QSC or QSR as just described. 2562 There could be zero, one, two or in principle any number of QSCs 2563 between an ITR and the one or more QSDs. All these devices are 2564 typically in the same ISP network - or in an end-user network whose 2565 ITRs and QSCs use the QSCs and QSRs in the ISP network. So 2566 communication between them is very fast, reliable and inexpensive. 2567 Typically, there will be little or no packet loss, but the protocols 2568 will need to cope with any losses in a robust manner. If a querier 2569 sends out a Map Request query and does not get a reply within some 2570 quite short time, such as 100ms, then it should try sending the query 2571 (with a different nonce) to an alternative upstream query server. 2573 Further work: ITRs auto discovering query servers in general - and 2574 QSCs autodiscovering other QSCs and QSRs. Manual configuration of 2575 the tree-like structures of these devices should also be possible. 2577 If the mapping needs of one ITR were completely uncorrelated with the 2578 mapping needs of other ITRs served by the same QSR, then there would 2579 be little or no benefit in deploying intermediate QSCs. However, 2580 there is likely to be sufficient commonality between the mapping 2581 needs of tens or hundreds of ITRs and ITFHs to make QSCs a good 2582 investment in expanding the capacity of a single QSR to support more 2583 ITRs. 2585 If 20 ITRs send their queries to QSC1 and another 20 to QSC2, then 2586 the queries, replies and map update exchanges which must be performed 2587 by the one QSR which both QSC1 and QSC2 query will be significantly 2588 reduced. This is because it will sometimes or frequently be the case 2589 that QSC1 will already be caching the mapping which is needed to 2590 answer a query from one of its 20 ITRs. Without the QSCs, every ITR 2591 query would need to be handled by the QSR - and its querier cache 2592 (where the QSR retains records of the mappings it sent in Map Replies 2593 to various queriers, along with the caching time variables and the 2594 nonce which it was sent in the initial Map Request) would be 2595 correspondingly larger. Furthermore, if more than one of QSC1's ITRs 2596 is caching mapping for a micronet for which the QSR receives a Cache 2597 Update, then the QSR only needs to send a single mapping update to 2598 QSC1, rather than sending one to each such ITR. 2600 There is further work to do planning these protocols. The caching 2601 times do not affect Ivip's ability to get the mapping updates to all 2602 ITRs in real-time. Longer caching times will reduce the need for the 2603 querier, such as an ITR, to make another map request if it is still 2604 sending packets to the micronet. Longer caching times also increase 2605 the number of mapping updates which need to be sent - and perhaps the 2606 time is so long that the querier no longer needs the mapping. 2607 Shorter caching times reduce the number of cached items, but increase 2608 the load of mapping queries and responses. 2610 There needs to be coordination between the caching times of Map 2611 Replies sent out by a QSA and those sent out by dependent QSRs and 2612 QSCs. 2614 Also, each querier needs to periodically check that any upstream 2615 query server it is caching any mapping from is still alive and has 2616 not been rebooted. If the upstream server has died or been rebooted, 2617 there is a danger that the cached mapping in the querier should have 2618 been changed or flushed due to a Cache Update message which the 2619 upstream server would have sent if it had not died or been rebooted. 2620 This is for further work. 2622 While the exact details are TBD, it is clear that it will be possible 2623 to define relatively straightforward protocols by which ITRs, 2624 optional QSCs, QSRs and QSAs can be combined to efficiently support 2625 the mapping needs of many ITRs per QSR. 2627 8.5. MHF - Modified Header Forwarding 2629 8.5.1. EAF - ETR Address Forwarding for IPv4 2631 Please see [I-D.whittle-ivip-etr-addr-forw] and the discussion above 2632 in the ITR section. To-do - rationalise the various mentions of MHF 2633 and especially EAF in this ID. 2635 EAF will not accept fragmented packets or fragmentable packets longer 2636 than some globally agreed constant, somewhat below 1500 bytes. By 2637 the time Ivip is introduced, it will have been over 20 years since 2638 RFC 1191 PMTUD was introduced. There's no need for fragments or 2639 fragmentable packets - and IPv6 does fine without them. 2641 EAF requires upgraded routers between ITRs and ETRs. This does not 2642 necessarily include every DFZ router, but it is reasonable to 2643 approximate the requirement to this. For instance, if a DFZ router 2644 never handles packets for networks which contain either ITRs or ETRs, 2645 then it does not need to handle EAF formatted packets. EAF ETR 2646 addresses contain only the 30 most significant bits. (But see 2647 previous notes on how with a new protocol number a new header could 2648 carry 31 r probably 32 bits of ETR address.) To avoid the need to 2649 change ETRs' addresses when encapsulation is transitioned to EAF, 2650 ETRs should not be placed closer than 4 IP addresses apart. Perhaps 2651 they should be placed on the 01 address of these four. 2653 Since ITRs will commonly be placed deep within ISP and end-user 2654 networks, and ETRs may be deep within ISP networks (such as at an 2655 end-user site, at the end of the link from the ISP) any router 2656 between the DFZ and these devices also needs to handle EAF packets. 2658 It will be straightforward to build this capability into new routers, 2659 and into firmware updates for many existing routers. The upgrade 2660 only concerns the FIB. All that is altered is that the FIB forwards 2661 the packet according to the 30 (32?) bits ETR address bits in the 2662 header, rather than using the destination address. There is no 2663 change to BGP functions, the RIB or how the RIB writes to the FIB. 2665 If it takes a few years before Ivip or the like is introduced, it is 2666 possible that by then, many or almost all of the installed DFZ 2667 routers will be able to do this with a firmware update. 2669 With a year or two's notice, upgrading all the DFZ routers, and 2670 likewise many internal routers, would enable Ivip to be introduced in 2671 its final mode of operation - without encapsulation overhead or its 2672 PMTUD problems. This means all ITRs can be a lot simpler - and that 2673 ETRs can be trivially simple. Reducing the complexity of ITRs is 2674 perhaps the biggest challenge in designing a CES architecture, since 2675 we want ITRs to be cheap and plentiful, including them being easy to 2676 add to the stacks of sending hosts. Starting with EAF would also 2677 avoid the need for devising a transition mechanism from 2678 encapsulation. 2680 8.5.2. PLF - Prefix Label Forwarding, for IPv6 2682 The current state of PLF design is described in [PLF for IPv6]. 2683 Please see this for more details, including why it is totally 2684 different from MPLS and how it could be extended to provide a similar 2685 2^19 or 2^20 destination forwarding system within each ISP (or end- 2686 user) network. 2688 While EAF is pretty much a functional replacement of IPv4's 2689 encapsulation system, PLF is rather different in that it only takes 2690 the packet to a BR of one of 2^19 or 2^20 DFZ-advertised prefixes. 2691 This would be a regular, contiguous, set of prefixes used only by 2692 ISPs - for this and for potentially other purposes. 2694 If Ivip for IPv6 began with encapsulation, then it would make sense 2695 for the ETRs to be already located in these special prefixes. 2697 Otherwise, they would need to be moved there before EAF could be 2698 turned on. 2700 EAF may require a second lookup at the BR of the ISP's network - if 2701 there are more than one ETRs for that prefix. One way of forwarding 2702 the packet from the BR to the correct ETR would be to use these PLF 2703 bits for a similar system within the ISP's network, with 2^19 or 2^20 2704 internal prefixes. How the ISP uses these bits is a private matter. 2705 This could be a very powerful way of directing traffic inside a large 2706 provider network. This would give rise to ePLF - the one system for 2707 the DFZ - and iPLF, as used inside an individual ISP network. 2709 Rapid adoption of IPv6 is still somewhere beyond the immediately 2710 foreseeable future. So there's no hurry about deploying a scalable 2711 routing solution for IPv6. I think the most likely scenario for 2712 widespread adoption of IPv6 is one or more large 3G systems using it 2713 to give each phone (or whatever) its own global unicast address. 2714 This in itself will not cause a scaling problem, since these will be 2715 large systems with few new prefixes to add to the DFZ. However, 2716 there would then be a strong need for mobility - and the TTR approach 2717 has advantages over traditional MIP techniques, as discussed below. 2719 Perhaps by the time Ivip is deployed for IPv6, all the IPv6 DFZ 2720 routers will be upgradable to PLF with firmware updates - so scalable 2721 routing could be done without encapsulation. PLF involves small 2722 changes to the FIB and to the RIB. It does not involve any new BGP 2723 functionality. 2725 8.6. TTR Mobility 2727 TTR Mobility is fully described, with diagrams, in [TTR Mobility]. 2728 This architecture will work equally well for IPv4 and IPv6. The MN 2729 can be on any kind of address, including behind multiple layers of 2730 NAT, on DHCP addresses and on addresses provided by conventional 2731 Mobile IP protocols. The MN can even be on an SPI address which is 2732 within another MN's micronet. No stack or application changes are 2733 required and the hosts communicate normally with all other hosts, 2734 including of course others using TTR mobility. There is no home- 2735 agent and paths to correspondent hosts are generally optimal. 2737 Mapping changes are not required when the MN gains a new address. 2738 They are not actually required at all, but are desirable if the MN 2739 moves to a part of the network which is far from its current TTR. 2740 This may be a distance of 1000km or more. Then, it should establish 2741 a tunnel to a nearby TTR so the TTR company can change the mapping of 2742 its micronet to this new TTR. With Ivip's real-time control of 2743 mapping, this means the MN could close the tunnel to the old TTR 2744 within five or so seconds of the mapping change being sent. Changing 2745 the mapping does not cause any glitch in connectivity, since the MN 2746 gets packets from both TTRs during the changeover. 2748 The MN needs some additional tunneling software - which is controlled 2749 by the TTR company. This could be added alongside existing stacks, 2750 or integrated into the stack. Ideally the MN to TTR interface would 2751 be standardised in RFCs, but this is not strictly necessary, since 2752 the MN only needs to interoperate with TTRs of the TTR company chosen 2753 by the MN's owner. RFC-standardised MN and TTR functionality would 2754 be desirable, by allowing easy choice between TTR companies without 2755 the need to install software. However, there is a lot of scope for 2756 innovation in this area, and it might be difficult to adequately 2757 develop a full range of desirable protocols soon enough for the 2758 expected rapid uptake of mass-market Mobility. 2760 I think this approach to mobility, for IPv4 and at some stage for 2761 IPv6, is so attractive that there would be a business case for a 2762 company setting up its own Ivip-like system just for this purpose - 2763 irrespective of the need for a scalable routing solution. Such a 2764 system would need to use encapsulation. Multiple such systems could 2765 exist at the same time - and a MN in one system A would be able to 2766 communicate directly with a MN in another system B via the following 2767 paths (->) or tunnels (==>): MN-A ==> TTR-A -> (via DFZ) -> ITR-B ==> 2768 TTR-B ==> MN-B. 2770 Any such systems should be designed to upgraded in the future to 2771 comply with future RFCs for an Ivip-like system, including initial or 2772 long-term adoption of Modified Header Forwarding rather than 2773 encapsulation. 2775 9. Security Considerations 2777 Security analysis can only be done in the years to come, once the 2778 protocols are designed in some detail. 2780 Ivip ITRs and ETRs are much simpler than those of LISP. 2782 Ivip ETRs easily enforce ISP BR source address filtering. For LISP 2783 ETRs to enforce this would be at least administratively complex and 2784 very expensive for large numbers of filtered prefixes - and it may be 2785 impossible to do while allowing for ITRs in the local ISP network 2786 tunneling to this ETR. 2788 10. IANA Considerations 2790 [To do.] 2792 11. Informative References 2794 [C-E-Sep-Elim] 2795 Jen, D., Zhang, L., Lan, L., and B. Zhang, "Towards a 2796 Future Internet Architecture: Arguments for Separating 2797 Edges from Transit Core", September 2008, . 2800 [Constraints-Voluntary] 2801 Whittle, R., "List of constraints on a successful scalable 2802 routing solution which result from the need for widespread 2803 voluntary adoption", April 2009, 2804 . 2806 [Critique of draft-jen-mapping-00] 2807 Whittle, R., "draft-jen-mapping does not apply to the TTR 2808 Mobility architecture", January 2010, . 2811 [DFZ-unfrag-1470] 2812 Whittle, R., "Google sends 1470 byte unfragmentable 2813 packets", August 2008, . 2816 [Deering-1996] 2817 Deering, S., "The Map & Encap Scheme for scalable IPv4 2818 routing with portable site prefixes", March 1996, 2819 . 2821 [Host-Responsibilities] 2822 Whittle, R., "Objections to burdening hosts with more 2823 Routing and Addressing responsibilities", December 2009, < 2824 http://www.firstpr.com.au/ip/ivip/RRG-2009/ 2825 host-responsibilities/>. 2827 [I-D.adan-idr-tidr] 2828 Adan, J., "Tunneled Inter-domain Routing (TIDR)", 2829 draft-adan-idr-tidr-01 (work in progress), December 2006. 2831 [I-D.ietf-lisp] 2832 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 2833 "Locator/ID Separation Protocol (LISP)", 2834 draft-ietf-lisp-06 (work in progress), January 2010. 2836 [I-D.irtf-rrg-recommendation] 2837 Li, T., "Recommendation for a Routing Architecture", 2838 draft-irtf-rrg-recommendation-05 (work in progress), 2839 February 2010. 2841 [I-D.jen-mapping] 2842 Jen, D. and L. Zhang, "Understand Mapping", 2843 draft-jen-mapping-00 (work in progress), October 2009. 2845 [I-D.lear-lisp-nerd] 2846 Lear, E., "NERD: A Not-so-novel EID to RLOC Database", 2847 draft-lear-lisp-nerd-06 (work in progress), December 2009. 2849 [I-D.lewis-lisp-interworking] 2850 Lewis, D., "Interworking LISP with IPv4 and IPv6", 2851 draft-lewis-lisp-interworking-00 (work in progress), 2852 December 2007. 2854 [I-D.meyer-lisp-cons] 2855 Brim, S., "LISP-CONS: A Content distribution Overlay 2856 Network Service for LISP", draft-meyer-lisp-cons-04 (work 2857 in progress), April 2008. 2859 [I-D.rja-ilnp-intro] 2860 Atkinson, R., "ILNP Concept of Operations", 2861 draft-rja-ilnp-intro-02 (work in progress), December 2008. 2863 [I-D.whittle-ivip-drtm] 2864 Whittle, R., "DRTM - Distributed Real Time Mapping for 2865 Ivip and LISP", draft-whittle-ivip-drtm-01 (work in 2866 progress), March 2010. 2868 [I-D.whittle-ivip-etr-addr-forw] 2869 Whittle, R., "Ivip4 ETR Address Forwarding", 2870 draft-whittle-ivip-etr-addr-forw-00 (work in progress), 2871 January 2010. 2873 [I-D.whittle-ivip-fpr] 2874 Whittle, R., "Fast Payload Replication mapping 2875 distribution for Ivip", draft-whittle-ivip-fpr-01 (work in 2876 progress), March 2010. 2878 [I-D.whittle-ivip-glossary] 2879 Whittle, R., "Glossary of some Ivip and scalable routing 2880 terms", draft-whittle-ivip-glossary-01 (work in progress), 2881 March 2010. 2883 [Ivip Summary and Analysis] 2884 Whittle, R., "Ivip Conceptual Summary and Analysis", 2885 December 2008, 2886 . 2888 [Ivip-2007-06-15] 2889 Whittle, R., "ViP: Anycast ITRs in the DFZ & mobile 2890 tunnels", June 2007, . 2893 [LISP-ALT-Critique] 2894 Whittle, R., ""How can the ALT structure scale to 10^8, 2895 10^9 or 10^10 EIDs with minimal delay times and robustness 2896 against single points of failure?"", December 2009, . 2899 [Namespace] 2900 Whittle, R., "The meaning of the term *namespace* in 2901 addressing, computer networking etc.", April 2009, 2902 . 2904 [PLF for IPv6] 2905 Whittle, R., "Prefix Label Forwarding (PLF) - Modified 2906 Header Forwarding for IPv6", August 2008, 2907 . 2909 [PMTUD-Frag] 2910 Whittle, R., "IPTM - Ivip's approach to solving the 2911 problems with encapsulation overhead, MTU, fragmentation 2912 and Path MTU Discovery", April 2008, 2913 . 2915 [TTR Mobility] 2916 Whittle, R. and S. Russert, "TTR Mobility Extensions for 2917 Core-Edge Separation Solutions to the Internets Routing 2918 Scaling Problem", August 2008, 2919 . 2921 [Vogt-2009] 2922 Vogt, C., "Simplifying Internet Applications Development 2923 With A Name-Based Sockets Interface", December 2009, . 2927 [loc-id-sep-vs-ces] 2928 Whittle, R., "Loc/ID Separation is different from Core- 2929 Edge Separation", January 2010, 2930 . 2932 Appendix A. Acknowledgements 2934 Thanks to the following people for their help and encouragement: Juan 2935 Jo Aden, Noel Chiappa, Olivier Bonaventure, Brian Carpenter, Dino 2936 Farinacci, Vince Fuller, Joel M. Halpern, Geoff Huston, Ved Kafle, 2937 Eliot Lear, Simon Leinen, Tony Li, Jeroen Massar, Dave Meyer, Chris 2938 Morrow, Dave Oran, Robert Raszuk, Jason Schiller, John Scudder, K. 2939 Sriram, Markus Stenberg, Letong Sun, Christian Vogt, Kilian Weniger 2940 and Xiaoming Xu. 2942 This is not to imply that these people support Ivip. 2944 I especially thank Steve Russert, formerly of Boeing, for 2945 collaborating on the TTR Mobility paper for MobiArch '08. The 2946 original draft wasn't accepted and by the time we revised it to the 2947 point of being happy with it, the paper was 2.5 times as long as the 2948 conference page limit. 2950 Author's Address 2952 Robin Whittle 2953 First Principles 2955 Email: rw@firstpr.com.au 2956 URI: http://www.firstpr.com.au/ip/ivip/