[rrg] DRTM - Distributed Real Time Mapping for Ivip & LISP
Robin Whittle <rw@firstpr.com.au> Thu, 25 February 2010 13:45 UTC
Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6E53E28C101 for <rrg@core3.amsl.com>; Thu, 25 Feb 2010 05:45:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.762
X-Spam-Level:
X-Spam-Status: No, score=-0.762 tagged_above=-999 required=5 tests=[AWL=-0.812, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_MILLIONSOF=0.315, SARE_PROLOSTOCK_SYM3=1.63]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OlL5Bp451bx1 for <rrg@core3.amsl.com>; Thu, 25 Feb 2010 05:45:17 -0800 (PST)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id 5997B28C0FB for <rrg@irtf.org>; Thu, 25 Feb 2010 05:45:12 -0800 (PST)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id AC348175CF0; Fri, 26 Feb 2010 00:47:21 +1100 (EST)
Message-ID: <4B867F6D.60601@firstpr.com.au>
Date: Fri, 26 Feb 2010 00:47:25 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] DRTM - Distributed Real Time Mapping for Ivip & LISP
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Feb 2010 13:45:23 -0000
Here is a new mapping arrangement for Ivip which I think could also be adopted by LISP. Later, I intend to write this up as an Internet Draft: draft-whittle-ivip-drtm. DRTM does not involve any single device needing to contain all mapping information, or a "globally synchronised mapping database". This new arrangement also avoids concerns about lack of motivation for early adoption by ISPs. The initial deployment of Ivip (or LISP) services to end-user networks will be driven by organizations which need not be ISPs. For a non-mobile end-user network (EUN) to use SPI (Scalable PI = edge = EID) space, the only thing required of its ISP is to allow the forwarding of packets sent from the EUN whose source address is of the SPI space the EUN is using. With DRTM, it will be easier for ISPs to install ITRs than with previous approaches to Ivip's mapping system, since there is no need for full-database query servers, streams of real-time mapping etc. DRTM would work fine with TTR Mobility. TTR mobility doesn't require any ISP involvement, since the MN sends outgoing packets to the TTR, and does not rely on its access network to forward them. A company could provide TTR Mobility services with its own DITRs (PTRs) and not need any ISP involvement at all - it wouldn't need ITRs in ISPs. - Robin 1 - Quick description 2 - Different approaches to Ivip's mapping system 3 - More Detailed Description Abbreviations Stage 1 - DITRs only Stage 2 - Add ITRs in ISPs and EUNs, with purely caching MRs Stage 3 (optional) - ISPs/EUNs have non-caching MRs 4 - With TTR Mobility 5 - DRTM for LISP 1 - Quick description ===================== Distributed Real Time Mapping (DRTM) involves significant differences from what I described (2010-01-06) in: Ivip's new distributed mapping distribution system http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html This (msg05975) removed the need for a single global inverted tree of Replicators. Instead, mapping change packets would be sent out from DITR-sites near the ISPs which want to receive them and perhaps Replicate them to drive multiple full-database QSD query servers. This new DRTM arrangement removes the need for Replicators or the ISPs running and full-database QSD query servers. The new DRTM mapping distribution system is still real-time, with updates going directly to all ITRs which need them. I guess the total delay time from mapping change to ITRs changing their tunneling behaviour would be a second or two, but in principle it could be less than half a second. As always with Ivip, the end-user network - or whoever they appoint - controls the mapping, and the mapping of each micronet is to a single ETR address. So the functions of reachability testing, multihoming failure detection and service restoration decision making are modularly separated from the CES architecture. All other CES architectures monolithically integrate these functions, and so require their ITRs to be more complex than Ivip's, while also greatly limiting the ability of end-user networks to have reachability testing and multihoming service restoration done the way they prefer. DTRM involves: Provision of Ivip (or LISP) services will be possible without the ISPs having to install ITRs or ETRs or be involved in Ivip or LISP at all - other than allowing their Ivip SPI-using customers (LISP: EID-using customers) to send out packets whose source addresses are from these SPI (EID) prefixes. SPI-using end-user networks will run their own ETRs on the PA address they get from their current ISP service. They can run ITRs too if they like, though this is not required. The MAB (Mapped Address Block) Operating Companies (MABOCs) - whose business is to lease out MAB space to thousands of SPI-using end-user networks - will do most of the work, setting up DITRs (Default ITRs in the DFZ, or in LISP: Proxy Tunnel Routers) at widely dispersed sites around the Net. ISPs will only become involved in adding ITRs when they want to - which will be when they have significant numbers of SPI-using customers, with sending hosts in the ISP's networks (and ISP's customer networks which lack ITRs) sending packets to hosts on SPI addresses which are mapped to the ETRs of this ISP's SPI-using customers. As long as the ISPs don't have their own ITRs, these packets would go out to the DFZ, to a DITR and then return to the ISP's network. So to reduce this waste of expensive upstream bandwidth, the ISP will want to install their own ITRs. No single server is required to hold the entire mapping database. This can still be done as an option - the equivalent of full-database QSD mapping query servers. Maybe this will be desirable in some settings, but it is not required. If an ISP or end-user network runs ITRs, instead of getting their mapping from one or a few of their own full database QSD servers (the "synchronized databases" some people are concerned about) their ITRs instead query one or more caching Map Resolvers (MRs). An ISP or a large end-user network might run two or more MRs - just as in previous versions of Ivip, they ran two or more full database QSD query servers. However, the MR is a caching query server, without great storage or computational requirements. An MR can optionally be a full-database query server for one or more MABs, including all MABs, if this makes sense, but this is not at all required. Assuming the MR is purely caching, then when an MR gets a query from an ITR it can't answer from its cache, it queries one of multiple typically "nearby" query servers which are full-database for particular MABs. Mapped Address Blocks ("coarse" prefixes in LISP) are short DFZ-advertised prefixes which cover the micronets (EID prefixes in LISP) of scalable "edge" SPI space (EID space in LISP) of many end-user networks. These servers are at the DITR (PTR in LISP) sites which are typically located in widely dispersed places around the Net. So these full database query servers are no longer "local" to the ITRs which depend on their map replies - they are "nearby". These servers only need to handle mapping queries for MABs this DITR-Site covers. DITR-Sites will typically be "nearby" - such as within one or two thousand km - and so will be able to answer queries reliably and with insignificant delay, compared to delays and higher risk of lost packets which are inherent in any global mapping query server system such as LISP-ALT, with only one or two authoritative query servers. As an option, the ISP may choose to get full mapping feeds from these DITR-Sites, and so run the Map Resolver (MR) as a full database query server for some or all MABs. This may or may not involve Replicators fanning out mapping information. Other than this optional arrangement, no single server has the full mapping database. The operators of the DITR-Sites need a full mapping feed to those sites for all MABs they cover - and also to run the query servers there used by nearby ISPs' MRs. How these DITR-Site operators transmit the mapping in real-time to all their DITR-Sites is up to them. They could use Replicators - but that is an internal matter which doesn't necessarily have to comply with any particular standards. Each MABOC has a finite number of DITR-Sites covering its MABs - at most a hundred or so. So it is clearly practical to get mapping updates in real-time to those sites, since they are all run either by the MABOC, or by another company which contracts to this and probably other MABOCs. Private network links to these DITR-Sites might be a good approach, to avoid problems with congestion and DDoS attacks which arise if mapping was sent via the open Internet. A DITR-Site's full-database query server is called a "DITR-Site-QSD". It is full-database only for the MABs the site handles and is not involved in mapping for any other MABs. Mapping changes are initially generated by the end-user network whose micronet's mapping is being changed - or whoever the end-user network appointed to control the mapping. This is conveyed, with appropriate authentication arrangements, to the MABOC or to some other company the MABOC contracts to handle mapping for its MABs. The mapping changes for each MAB are sent in real-time to the DITR-Site company, who - via whatever internal arrangements they choose - convey them to the DITR-Site-QSDs at all their DITR-Sites. When a mapping change arrives at a DITR-Site-QSD, it checks whether this change affects any micronets whose mapping was given out in map replies within the last ten minutes, or whatever caching time this DITR-Site-QSD specifies in its map replies. The DITR-Site-QSD caches the nonces which came with the map requests, and sends to the requester a mapping update command, secured by the nonce from the original request. This mapping update will go to the MR at an ISP network, which uses the same algorithm to send mapping update commands to the one or more ITRs to which it has sent mapping for this micronet within whatever caching time it sets on its replies. (This time should be the same as, or less than the caching time of the replies it got from the DITR-Site query server.) Therefore, all ITRs with currently cached mapping for the micronet whose mapping has just been changed will receive the update within a fraction of a second of it arriving at the DITR-Site-QSD. So all ITRs in the world which are currently tunneling packets whose destination address matches the micronet whose mapping has just been changed will tunnel these packets to the ETR specified in the updated mapping. 2 - Different approaches to Ivip's mapping system ================================================= Before explaining this in greater detail, here is a run-down of the changes to Ivip's mapping system: Plan-A 2007-07-15: Original system with a tree-like structure of Replicators - with the top-level being "Launch Servers" with a fancy protocol between them. ivip-arch-00/01/02 } All ivip-db-fast-push-00/01 } obsolete. 2010-01-13 Same system, but all-new ivip-arch and revised ivip-db-fast-push. ivip-arch-03 Currently the latest version, but the fast push mapping section is no longer up-to-date. ivip-db-fast-push-02 Better documentation of the original Launch Server system. Plan-B 2010-01-18 "Launch servers" replaced by Level 0 Replicators which are fully meshed and have a flooding arrangement which is simpler, faster and more robust. ivip-db-fast-push-03 Significant simplifications and new material to give an overview of Plan-B. ivip-fpr-00 All new ID with goals and non- goals, better description of Replicators and the best Plan-B documentation. Plan-C 2010-02-07 Ivip's new distributed mapping distribution system http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html This keeps the Replicator concept, but has no central tree structure of Replicators. Instead, one or more ISPs (or large end-user networks) make their own small tree of Replicators, and get feeds of mapping changes for the MABs of all MABOCs from the one or typically more than one mapping coordination companies or the MABOCs themselves - whoever runs the nearest one or two DITR-Sites for each MABOC. So there is no central tree of Replicators - just smaller trees or even single QSDs getting feeds from MABOC-run DITR-Site sources of mapping generally not too far away. In Plan-A and Plan-B, the MABOCs were either RUAS (Root Update Authorisation Server) companies, or contracted RUAS companies to handle the mapping of the micronets in their MABs. The RUAS companies collectively ran a decentralised but still unified inverted tree-like structure of Replicators to fan out mapping changes in real-time all over the world to ISPs' full database QSDs. In Plan-C, there is no global inverted tree of Replicators and the MABOCs invest more and reach out to ISPs from their widely distributed DITR-Sites. ISPs don't absolutely need ITRs and QSDs (and therefore mapping feeds and probably Replicators) but they will probably want them after a while (assuming some of their customers are using SPI space) since having their own ITRs will reduces traffic going out to a DITR and returning to these customers' ETRs. Missing Payload Servers are also needed so the ISP's QSDs can get mapping which is somehow missing from the two or more upstream Replicators - due to temporary outages affecting the two or more feeds. Plan-D 2010-02-24 This message. ISPs (or end-user networks) which want to run their own ITRs can still use the Plan-C approach of having their own full-database QSDs, with full feeds, Replicators, Missing Payload servers etc. However, ISPs (and end-user networks) which want ITRs have an intermediate option, which is less expensive - no local full-database query servers, Replicators and reliance on real-time feeds - but to use new query servers at these nearby MABOC-operated (directly or indirectly) sites where the DITRs are. These DITR-Site-QSD query servers are "full database" for the subset of MABs each such DITR site handles. The ISP's ITRs query these via a Map Resolver (MR) - which is like a caching QSC query server but which knows, for each MAB, the addresses of two or more of these typically "nearby" MABOC-run DITR-Site-QSDs query servers which are authoritative, full-database, query servers for each MAB. Therefore, the ITRs in an ISP or an EUN are relying on full-database query servers are no longer strictly "local" - as they were in Plans A, B and C. They are normally "close" or "close enough" that delay times and query/response packet losses are insignificant. So this is fully distributed, but is not a "global" query server system like LISP-ALT: with queries and responses frequently traversing the Earth - with consequent delays, losses and scaling problems. 3 - More Detailed Description ============================= As a preliminary, please read: http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html but bear in mind that the Replicators, QSDs in ISP networks, and the QSDs sometimes using Missing Payload Servers are now an optional part of Plan-D - and that Plan-D has a new intermediate system which should be sufficient for all scenarios even with a fully deployed system covering 10 million or so micronets for non-mobile multihomed EUNs and up to 10 billion micronets for mobile devices. Plan-D has no single servers with the full mapping database, no real-time mapping feeds, Replicators etc. This description assumes the MABOCs (MAB Operating Companies) aim is to lease their space primarily to be used for portability, multihoming and inbound TE. However, if one or more companies, including MABOCs, want to deploy TTR Mobility, that can be part of it as well. I suspect the demand for TTR Mobility will be more urgent, widespread and profitable than for non-mobile network portability, multihoming and inbound TE. This description uses Ivip terminology, but is in general applicable to LISP as well. This discussion is IPv4 specific, but the same principles with different details should be fine for IPv6 too. I need to invent some new terminology, since DRTM is based on various types of business and network operations which do not yet exist. Abbreviations ------------- EUN End User Network - from a mobile device, to a local LAN behind NAT as used for most residential / SOHO DSL etc. services - to the networks of the largest corporations, universities etc. PMHTE: Portability, Multihoming and/or Inbound Traffic Engineering. These are the benefits many EUNs seek and which they can currently only gain by advertising PI prefixes in the DFZ - which is the cause of the scaling problems. For a fuller description, see the first point 4 in: Scalable routing problem & architectural enhancements http://www.ietf.org/mail-archive/web/rrg/current/msg06099.html SPI Scalable Provider Independent: "Edge" space handled by a CES (Core-Edge Separation) architecture's ITRs, ETRs etc. and so which is Provider Independent (portable) and suitable for EUNs to use in a scalable fashion for PMHTE. (LISP: EID space.) MAB Mapped Address Block: A prefix advertised in the DFZ which covers a typically large amount of SPI space which is typically used for many (tens to hundreds of thousands) of individual SPI micronets (Ivip) or EID prefixes (LISP). Dino recently used the term "coarse prefix" to refer to the same thing in LISP. MABs are advertised by DITRs (Default ITRs in the DFZ) AKA "Proxy Tunnel Routers" in LISP, to collect all packets sent to SPI addresses from hosts in networks without ITRs, and then to tunnel them to the correct ETR. MABOC MAB Operating Company: A company which "owns" or runs for someone else, one or more MAB, which they lease out in typically small chunks to typically large numbers of EUNs. (No LISP equivalent - LISP has very little in the way of potential business arrangements, and I think the designers expect EUNs to get space from RIRs and somehow advertise their space in the DFZ from PTRs as a part of a presumably larger "coarse" prefix.) It is also possible for an EUN with PI space to convert some or all of it to a MAB, and so become a MABOC. It need not lease out the space to anyone else, but may use all its space as micronets for its internal divisions. So a corporation or university with PI space today might be able to make do with less space due to the finer slicing and dicing possible with micronets (down to a single IPv4 address, or integer number of IPv4 addresses). Then the EUN might be able to return half of its PI prefix to the RIR and convert the rest to a MAB. As a MABOC, it will need to run PITRs for its MAB, or pay someone else to run them. DITR Default ITR in the DFZ: (LISP: Proxy Tunnel Router.) Ordinary ITRs in ISP and EUN networks always cover all MABs. DITRs could in principle cover all MABs, but in general a DITR will only cover the subset of MABs which the company which operates it is paid to handle. For instance, if a DITR is run by a MABOC directly, it will only advertise in the DFZ the MABs of that MABOC. However, perhaps the MABOC has an arrangement with other MABOCs to handle their MABs as well - or perhaps the DITR is run by a company which supports the MABs of multiple MABOCs. This could involve a DITR covering all MABs, but I expect that most DITRs and their DITR-Sites will cover a subset of all MABs. DITR-Site: Wherever a DITR is located. This site may be in a data centre, peering point, Internet exchange or whatever. The DITR is "full-database" for the MABs it covers - which means that the DITR's ITR function is actually a caching ITR like all others, but that it is closely coupled to a query server which has the full database of mapping for all the covered MABs. How MABOCs (or "DSOCs", see below) get the mapping data to these DITR-supporting full database query servers is up to them - it is an internal affair. Maybe they will use Replicators - or some other system. They could get the data to these sites over the open Internet, with appropriate encryption etc. - or perhaps they will use private network links to each site, which would ensure the delivery of mapping updates could not be disrupted by DDoS flooding attacks from the Internet. DSOC DITR-Site Operating Company. A company which runs at least one, probably dozens or perhaps hundreds, of DITR-Sites. This may be a MABOC, and the DITRs at each site may only cover the one MABOC's MABs. It may be a MABOC selling its DITR services to other MABOCs. It may be a company which a chain of these sites, which is not a MABOC, but which sells its DITR services to MABOCs. It could also be an ISP which runs a DITR-Site for one or more MABOCs. DITR-Site-QSD From the above real-time feed of mapping for all the MABs covered by a DITR-site, the DITR-site-QSD is a full-database query server, which behaves like the QSDs already described for Ivip, except: 1 - They are only "full database" for the MABs this DITR-site supports. They do not have any mapping for other MABs - so should not be queried for mapping of addresses outside these MABs. 2 - They may be the same QSD as is used by the DITRs at this DITR-Site, or they may be separate servers, separate processes or whatever from these - but they still operate from the reliably supplied feed of mapping changes for all the MABs covered by this DITR-site. 3 - They accept mapping queries, as previously described for QSDs, from any querier whatsoever. (Maybe there needs to be ACLs, but for now I assume not.) Previous versions of Ivip assumed the full-database QSDs were in ISP networks or large EUNs, so the query and response traffic was within these networks. DITR-Site-QSDs accept queries from MRs in many ISPs and EUNs all over the world. In practice, the queries will typically come from MRs of nearby ISPs and EUNs. However, the DITR-site-QSDs may also receive queries from MRs anywhere in the Net. 4 - The addresses of these DITR-site-QSDs can be found by a new DNS-based mechanism described below. This is how MRs find them. MRs find all the DITR-site-QSDs which serve a particular MAB, and they choose one or more "closest" (typically) ones to send their queries to. 5 - Each DITR-Site-QSD will probably run an HTTPS server which provides a list of MABs it is authoritative for. This is so MRs can verify what they learnt from the DNS-based mechanism - which specifies this ITR-Site-QSD as being an authoritative server for typically many MABs. MR - Map Resolver (Here I am borrowing a LISP term for a similar purpose in the Plan-D Ivip system.) A MR is like a caching query server (QSC), but it dynamically configures itself to send queries not to a single upstream QSD, but to multiple DITR-Site-QSDs. It chooses these to be the "closest" ones from the list it gets from the new DNS-based mechanism. MRs are inside ISP and large EUN networks and take the place of the Plan-A/B/C QSDs. They typically get mapping replies quickly and reliably, since they are typically using nearby "full-database" DITR-site- QSDs. Within the ISP or EUN, ITRs either query their one or more MRs directly, just as in Plan-A/B/C they queried one or more QSDs - or they query them indirectly through one or more caching QSCs as previously described in Plan-A/B/C. Plan-C AKA (below) Stage 3 only: A MR can optionally be sent full mapping update streams for one or more MABs from nearby DITR-Sites - either directly or via Replicators. If it uses Replicators - rather than some system by which it can communicate two-way with the source of mapping to make sure it doesn't miss any updates - then the MR may also need to access some "Missing Payload" servers to get mapping it somehow missed from the Replicator system it relies upon. Any such Replicator system will be a small, partly of fully meshed, system run by a handful of ISPs to more efficiently and robustly fan out a small number of mapping update feeds from nearby DITR-Sites to a larger number of their own MRs. The description below involves various stages of deployment - since the whole DRTM system is the set of possible arrangements encompassed by these stages. Most of these stages are part of Plan-C. Plan-D is similar to Plan-C, but has a new intermediate stage where ISPs and EUNs don't have QSDs, but run MRs which query multiple (typically) nearby DITR-site-QSDs instead. I think this Plan D will be sufficient for all ISPs, but the Plan-C arrangement (with full-database QSDs in ISPs getting real-time feeds of mapping from nearby DITR-Sites) is an option. Using caching-only MRs (Plan D) means quite a few query packets and mapping replies going back and forth, but it saves on a number of things which are tricky with running a full-database query server (Plan C) at the ISP or EUN: 1 - There is no constant incoming stream of mapping updates. Instead, the traffic requirements for the MR depend on the number of ITRs, their traffic patterns etc. So a small ISP or EUN can quite happily run a caching-only MR without incurring continual incoming traffic costs. 2 - The MR has storage only for caching the map replies it gets from the upstream DITR-Site-QSDs, and for caching the nonces and recently sent mapping details for the queries it receives from downstream devices (ITRs or caching QSCs). So the MR does not have the potentially large and critical storage requirements of a full database QSD - now known as a MR which gets feeds for some or all MABs, and so is full-database for these MABs. 3 - There is no need for "Missing Payload" servers to fill in occasional gaps in the mapping update feeds from Replicators. Why would we want a MR be configured to be full database for one or more MABs? In theory, the full mapping feed would never be less information than the map replies to a caching MR. I am not convinced there would be a need to do this, but here are some possible reasons: 1 - Perhaps the "full-database" choice is made because the nearby DITR-Sites for this MAB prefer to send out mapping feeds rather than respond to queries. But I would expect the DSOCs to be keen to do whatever the ISPs or ITR-running EUNs preferred: accept queries or send a mapping feed to the ISP or EUN. The more ISPs and EUNs run ITRs for the DSOC's MABs, the less work the DSOC's DITRs need to do. 2 - Perhaps the ISP or EUN wants to have super-fast mapping replies for its ITRs - and the ITRs in all EUNs which are using the ISP's one or more MRs. Despite the greater trouble of taking real-time mapping feeds, occasionally relying on "Missing Payload" servers and having their MRs store full mapping databases for one, many or perhaps all MABs, perhaps some ISPs or EUNs will prefer to do this, just for local performance reasons. All this would do is shave a few milliseconds off the response time, assuming the MABs were covered by DITR-Sites which are not too far away. But maybe the ISP or EUN is a long way from the nearest DITR-Site. It would be a pretty slack MABOC which didn't run DITRs in major countries, but perhaps that is the case. Also, the whole ISP or EUN may be remote physically and temporally from all DITR-Sites: It is on the Moon . . . or is in an ocean liner, trans-ocean passenger-jet or the Antarctic, and so relies on geostationary satellite links. Stage 1 - DITRs only -------------------- For non-mobile services, one or more MABOCs set up shop, and run multiple DITRs at DITR-Sites around the world. They can still make their MAB work with a single DITR or DITRs only in a given region. If, for instance, the MABOC's SPI-using EUNs for some reason always use their SPI space via ISPs in Europe, then for the purely DITR purposes, it would be fine for the MABOC to run a handful of DITRs just in European sites. If a Sending Host (SH) was in Adelaide (Australia) and was sending packets to a host in a micronet which is mapped to an ETR accessible via an ISP in Dusseldorf, then it will be fine if these packets traverse most of the DFZ in their raw SPI-addressed state, to be tunneled to the ETR by a DITR in Amsterdam, London or probably Zurich. However . . . with the new Plan-D arrangements, if the MABOC wanted to encourage ISPs and EUNs all over the world to run their own ITRs, then it really needs to do better than just have DITR-Sites in Europe. Also, its a second-rate service to only have DITRs in a given region - since at least some of its European companies might want to use their space in branch offices in Asia, North and South America etc. If the sending host in Adelaide was in an EUN or ISP with ITRs, then it would be best if the caching MR those ITRs depend on can send queries to a DITR-Site-QSD a lot closer than Europe. So this Plan-D arrangement of Ivip doesn't absolutely ensure that the authoritative query server (at the closest DITR-Site which serves this MAB) is "nearby". If the MABOC company is providing a good service, it will ensure it has DITRs widely scattered around the Net and the nearest DITR-site-QSD will be "nearby" or "near enough to generally provide a fast response with little chance of the query or response packets being lost". If a MR in Dallas Fort-Worth finds that the closest DITR-Site is in London, then its not disastrous. There's a 104ms RTT to London via Houston (I think "IAH"), LA and Washington DC, and as long as the query or response packet isn't dropped, this shouldn't cause much complaint about slow starts to communications. But it is not ideal, and it would be much better if the nearest DITR-Site-QSD was in San Jose, which should be a RTT of 40msec or probably much less (though I just got a traceroute from DFW to San Jose via Amsterdam and London with a RTT of 172ms!). But for Stage 1, DITRs only, as long as the DITR-Sites are reasonably close to wherever the SPI space is being used, and as long as they can handle the traffic loads, then the only other things which are needed are: 1 - The DITRs need to have a tunneling and PMTUD protocol which is compatible with the ETR functionality of whatever the SPI-using EUNs (SPI-leasing customers of this MABOC) are using on their PA addresses. For now, I am assuming that the ETR functionality can be provided by the MABOC for free - such as being downloaded from somewhere and run on a COTS server of the SPI-EUN - or, ideally, be implemented on a router which the SPI-using EUN already owns. 2 - The ISPs these EUNs connect with must allow them to emit packets using these SPI addresses as source addresses. In this simple arrangement multiple EUNs EUN-0000 to EUN-0999 are customers of this MABOC-X and are using micronets in one or more of MABOC-X's MABs. Maybe another set of EUNs EUN-1xxx are leasing space from another MABOC-Y. Assuming any EUN-0xxx is only using SPI space from MABOC-X, then their ETR functions only have to be compatible with the DITRs run by MABOC-X. So far, there's no absolute need for standardization. Ideally there would be RFC standards for ITRs and ETRs and all the DITRs in the world would support this one standard. Then EUNs could lease SPI space from multiple MABOC companies and know that the one ETR function could handle packets tunneled by the ITRs run by their two or more different MABOCs. The real need for standardization of ITR and ETR functions comes when parties other than MABOCs are running ITRs, and/or if parties other than MABOC customers are running ETRs. In the latter case, perhaps an ISP runs an ETR which connects to the networks of multiple SPI-using EUNs. They don't want to be mucking around with different ETRs for customers using different MABOCs, and therefore relying on different sets of technically different DITRs. Stage 2 - Add ITRs in ISPs and EUNs, with purely caching MRs ------------------------------------------------------------ The MABOCs charge their SPI-leasing customers for the use these customers make of their DITRs - or rather, the use of these DITRs by packets sent to the SPI addresses of each of their SPI-leasing customers. The MABOCs will also charge their customers for each mapping change - probably a few cents or similar. (There is an unresolved question about if an EUN very frequently changes the mapping of its micronets, and this results in very frequent mapping updates being sent to MRs and ITRs in ISPs whose ITRs are tunneling packets to this micronet. The ISP may be unhappy about this high level of updates giving their MR and ITRs a workout. Should the MABOC, which is charging money for these updates, use some of that revenue keep ISPs happy to receive and act on these frequent mapping updates?) The MABOCs would be happier if some, most or ideally all of the tunneling of the traffic addressed to their SPI-leasing customer EUNs was done by someone else's ITRs: the ITRs of the EUNs or ISPs of the sending hosts (SHs). To this end, *perhaps* the MABOCs would want to pay ISPs and large EUNs to run ITRs covering their MABs. As the use of SPI space becomes more widespread, the ISPs themselves would want to have their own ITRs. As previously noted, if an ISP has one or more customers with SPI space (either with their own ETRs, or using an ISP-supplied ETR) and there are other customers of this ISP sending packets to these SPI addresses, then the ISP would prefer to have its own ITR to tunnel these directly, rather than let the packets go out the upstream link, to a DITR, and return in encapsulated form via that link. If the ISP had its own ITR, at least to cover the MABs of these SPI-using customers, it could reduce the traffic on its expensive upstream links - and provide faster packet delivery times. For this discussion, I will assume that an ISP installing an ITR will make that ITR advertise, in its internal routing system, all the MABs of the complete Ivip system. This is not necessarily the case, but it makes the discussion less complex to assume this. So the ISP wishes to run one or more ITRs which cover all MABs. Also, individual EUNs using this ISP may wish to run their own ITRs so their outgoing packets addressed to SPI addresses will definitely takes the shortest path to the ETR, rather than going by some potentially longer path to the "nearest" DITR. (After seeing a Dalls Fort-Worth to San Jose traceroute go via Amsterdam and London, I am keen to put the word _nearest_ in inverted commas if I mean "nearest in terms of the current state of the DFZ"!) This is for EUNs on conventional PI space, PA space or SPI space. The question of an EUN having its own ITRs, or wanting its ISP to have ITRs, is independent of whether the EUN is using conventional PI or PA space, or is using SPI space via an ETR. In Ivip, ITRs can be on SPI addresses. They can also be implemented in sending hosts on any global unicast address (PI, PA or SPI). At present, I don't have arrangements for ITRs to be behind NAT, but it could be done with a different protocol between the ITR and the upstream caching query server (QSC) or MR it queries. In all cases, these ITRs need a "local" MR to send their queries to. I don't plan for ITRs to directly query the DITR-Site-QSDs. It would probably be technically possible, since the ITR -> QSC/MR query protocol would be no different from the QSC -> QSC, QSC -> MR or MR -> DITR-Site-QSD protocol. So in all the above circumstances, to accommodate one to hundreds or thousands of ITRs in an ISP's network (including in the EUN customers of this ISP) then the ISP should install two or more MRs. (I will refer to ISPs doing this, but the same principles apply to any EUN which wants to run a MR itself.) These ITRs need to be configured - or ideally automatically discover - the two or more MRs of this ISP. I haven't worked on how to do this, but I am sure it will be possible. Maybe the ITRs query the MRs directly. Maybe they query a QSC first, which handles a bunch of ITRs and which queries either the MR directly, or the MR via one or more other QSCs. One way or another, each ITR needs at least two upstream QSCs or MRs to query. It would typically send a query to one, and if nothing came back within some time like 100ms, it would send a query for the same address (with a different nonce) to the other. Then the task is to have these two MRs know at least two (ideally nearby) DITR-Site QSDs to query for each of the MABs in the entire Ivip system. This can be done by a new DNS-based system I described below. There could be other, better, ways - but this will do for now. This Stage 2 is the main difference between Plan-D and Plan-C. In Plan-C, the ITRs always queried, directly or indirectly, one or ideally two or more "full database for all MABs" QSDs in the ISP. With Plan-D, they query a caching-only MR, which (if it has no mapping already cached) queries multiple "nearby" DITR-Site QSDs, depending on which MAB the query SPI address matches. This is a highly scalable arrangement. The MABOCs directly or indirectly push their own mapping, for their own MABs, in real-time, highly reliably, to all their DITR-Sites. They need to do this to have full-database query servers in the same rack (or even the same server) as their DITRs. Theoretically, DITRs could rely on a distant full database query server, but this would be pretty sloppy - and the the DITR is surely going to be getting a lot of packets, so it makes sense for the MABOC to push its full mapping for each MAB to a query server at each DITR-Site which is full-database for all the MABs covered by that DITR-Site. So it is not much extra work to use this fresh, reliable, feed of real-time mapping to drive a publicly accessible DITR-Site QSD. For the MABOC or whoever runs the DITR-Site, it will be a lot less effort easier answering queries and so allow other ITRs to tunnel a bunch of packets, than for this DITR-Site's DITRs to tunnel the same packets. The ITRs, QSCs and MRs in the ISP and its EUNs do not store the entire mapping database, or the full mapping database of any of the MABs. The MRs can boot up very quickly, as described below - they only need to discover the current set of MABs and the DITR-Site-QSDs they will query for each MAB. (If the MR was full-database for one or more MABs, it would need to download snapshots - which could be quite bulky if there were billions of micronets, as there will be with widely deployed TTR Mobility.) The DITR-Site can scale well by spreading the load of traffic for multiple MABs (including potentially every MAB in the whole Ivip system, if for some reason one DITR-Site was working for all the MABOCs) over multiple separate servers, each of which advertises to the DFZ a subset of these MABs. A single MAB could, in principle, be split between two DITRs if necessary, but each advertising half, or a quarter of it. The DITRs would presumably be either acting as DFZ routers, advertising MABs and emitting packets back over the same link, or perhaps another link, to be forwarded by other DFZ routers - or they could be behind a single DFZ router. In the latter case, if four DITRs advertised a quarter of a MAB, then their common router should aggregate these into a single shorter prefix of the original MAB for its DFZ neighbours. The overall load of traffic can be shared by creating more DITR-Sites in the areas where they are most needed. Larger DITR-Sites - such as those which are not operated by a single MABOC, but which serve the MABs of multiple MABOCs - including perhaps every MAB in the Ivip system - would also offer scaling benefits by sharing the various peaks in traffic for particular MABs in the system. It is possible that the ITRs of a given ISP and its dependent EUNs might not advertise, into their local routing systems, all the MABs - but in this discussion, I assume that they do. Then, the MRs they use need to know how to send queries to (ideally) nearby DITR-Site-QSDs for every MAB in the system. More on this below. Stage 3 (optional) - ISPs/EUNs have non-caching MRs --------------------------------------------------- This stage is what I wrote about in Plan-C a few weeks ago. It is not required in Plan-D, because the Stage 2 above - with ISPs having caching-only MRs - looks like it would scale well enough for the entire Ivip system to work well with, even if there was no optional Stage 3. With Stage 2 looking so powerful and scalable, with no need for MRs to get a feed of mapping updates for any of the MABs - there may still be be some situations where it is desirable to have the MR get a full feed for some MABs. Maybe there would be a scenario where it would make sense for all MABs - I can't rule it out, but I can't think of one either, other than as noted above with networks being on passenger jets, ocean liners, Antarctica or the Moon. If a MR receives full real-time mapping updates for all MABs, then the MRs which do this are indeed "full database QSDs with a fully real-time synchronised copy of the entire Ivip mapping database" - the QSDs which I assumed were needed in all previous plans I have made for Ivip. I am not saying this is impossible, or undesirable to have MRs with some or all MABs handled from its own real-time updated database. What is different about this "Plan-D" DTRM is that this is optional - it is entirely optional for a MR to receive mapping feeds and to be full database for any MAB. The Stage 2 arrangements just described - caching-only MRs - look like they will scale very well in every way I can think of. In Stage 3, each MR could use three methods to get its mapping. Of the three methods mentioned below, it could use X alone, X and Y, Y alone, or Z alone. (I am using X, Y and Z to avoid confusion with Plans A to D or Stages 1 to 3. Sorry this is complex - but I am are trying to anticipate a variety of usage situations.) X and Y are as described in draft-whittle-ivip-fpr-00 (Plan-B, 2010-01-18) and msg05975 (Plan-C - 2010-02-06). Z is newly described here. A - The MR (there described as a QSD) could get direct feeds of "Replicator format" mapping update packets directly from servers which generate this at various (ideally) nearby DITR-Sites. It will need to get two feeds for each MAB, so this will involve a bunch of feeds for a set MABs from one DITR-Site, and a similar bunch of feeds for the same MABs from another DITR-Site. This is assuming that the local DITR-Sites are in similar sets, each set serving the MABs of one or more MABOCs. Then there will need to be pairs of feeds from other DITR-Sites, a pair for each other set of MABs. B - As per Plan-B and Plan-C, the MR could get feeds via a system of Replicators which form a fully or partially meshed flooding system, accepting feeds from multiple DITR-Sites, and fanning the sum of the mapping updates they contain. This should be the full set of updates for all MABs, unless all packets with a particular payload somehow don't make it to any of the Replicators. In either X or Y, the MRs will occasionally need to query one or more "Lost Payload" servers to get mapping update payloads they somehow missed. With larger sets of missing payloads, the MR would need to re-sync its database for one or more MABs, which involves downloading a snapshot file and bringing it up to date, as described in draft-whittle-ivip-fpr-00. Z - The MR sets up some kind of secure, two-way, link to one or probably better two (ideally) nearby DITR-Site-QSDs, for a given MAB or set of MABs. Assuming that the entire set of MABs is covered by five DITR-Sites, then this means the MR will set up ten such sessions - using the DITR-Site-QSDs at two sites, for redundant supply of mapping changes for a given subset of the MABs. Each such link should enable the DITR-Site-QSD to quickly and reliably push all mapping changes to the TR, and for the TR to be sent any changes again, if it somehow didn't get some of them. I guess TLS-protected SCPT might be a good protocol for this - RFC 3436. TCP would be OK, but it would be blocked for a moment by the loss of a single packet. Theoretically, the MR needs a single two-way link from DITR- Site-QSD which handles a set of MABs, but it should have two such links, with two such DITR-Site-QSDs - for Justin (Just In Case). Also, it would be possible for a MR to use the Z arrangement to pass on the mapping information to other MRs. In this way, an MR could get real-time mapping feeds for one or more MABs and so be "full database" for these, while still sending queries to DITR-Site-QSDs regarding other MABs. This enables the operators to have some flexibility. Say the MR was in New Zealand, and there were some MABs G, H and I which for some reason it wanted to be full-database for. Maybe these MABs are run by MABOCs who lease the space to SPI-using EUNs for which this ISP wants to respond very quickly to for each new communication session. There may be some other MABs J, K and L for which the ISP doesn't mind so much if the session establishment takes a few tens of milliseconds longer, by relying on some reasonably "nearby" DITR-Site-QSDs. Maybe some MABs are used primarily or solely for SPI-using EUNs who are almost always in a distant country, and for which this ISP's customers hardly ever send packets. There's no problem having these MABs as usual, on a query-basis, like J, K and L - but the ISP can still, for whatever reason, choose to have the MR running full mapping databases for selected MABs G, H and I. Stage 2 needs a DNS-based system so TRs can find DITR-Site-QSDs --------------------------------------------------------------- In Stage 1, there are no TRs, because no ISPs or EUNs run ITRs. The only ITRs are the DITRs which are run directly or indirectly by MABOCs. In Stage 3 - the Plan-C approach which is entirely optional for the new Plan-D DRTM arrangement - the real-time mapping feed arrangements need to be manually configured, since they involve TLS SCTP sessions with multiple DITR-Site-QSDs. However, if Stage 3 involves only some MABs being handled with mapping feeds, then the MR will still need to query some DITR-Site-QSDs regarding the other MABs. I think the way the MR chooses these DITR-Site-QSDs needs to be automated. This will be done by the same method described here for Stage 2 - where the MR is purely caching, and therefore needs to know: 1 - What all the MABs are - and this will change over time as more space is converted into "edge" SPI space to be managed by the Ivip system. 2 - For each MAB, two or so of the closest, currently reachable DITR-Site-QSDs which will reliably respond to map requests. I think that in normal operation, a MR should send a single query to one of the close DITR-Site-QSDs - because normally (99% of the time or more), it will get a response back from this DITR-Site-QSD in 50ms to 70ms or so. I assume that such times are not a significant problem for delaying some initial packets at the ITR while mapping is retrieved. If no response comes back in 100ms or so, the MR should send another query, perhaps to the other close DITR-Site-QSD for this MAB. The purpose of this DNS-based lookup system is to enable each TR to automatically and securely determine all the parts of the global unicast address range which are covered by MABs, and then to find for each MAB, the complete list of DITR-Site-QSDs which will answer mapping queries for this MAB. There may be a better way of doing this than the DNS-based system I suggest here. For brevity, I outline its capabilities, and assume that there is a reasonably efficient way of implementing something like this. If you think it is impractical, or have a better suggestion, please let me know. This part of the system can surely be done one way or another. Here's the rough plan, for IPv4. There is a special DNS domain such as drtm4.arpa. It has three levels of subdomains, representing 224 * 256 * 256 ~=14.7 million /24s - probably not counting 127.0.0.0 /8 and 10.0.0.0 /8. The authoritative servers for this system could initially be one suitably programmed pair of servers, but later, as the system became widely used, this could be split up so there are more than two sets of authoritative servers for each domain and sub-domain. Generally, there won't need to be huge numbers of servers, and I assume a suitably programmed server could implement the whole thing, or part of it, without needing 14.7 million separate zone files. For typographic niceness, I will use 3 digit integers for numbers in these domain names. The domains are: 000.000.000.drtm4.arpa. } 2^16 /24s of 0.0.0.0/8. 001.000.000.drtm4.arpa. } 002.000.000.drtm4.arpa. } ... } 255.255.000.drtm4.arpa. } 000.000.001.drtm4.arpa. } 2^16 /24s of 1.0.0.0/8. 001.000.001.drtm4.arpa. } ... } 255.255.001.drtm4.arpa. } etc. etc. 000.000.203.drtm4.arpa. } 2^16 /24s of 203.0.0.0/8. 001.000.203.drtm4.arpa. } ... } 255.255.203.drtm4.arpa. } etc. 000.000.223.drtm4.arpa. } 2^16 /24s of 223.0.0.0/8. ... } 255.255.223.drtm4.arpa. } A query for any of these DNS domains will return some information. Exactly how to do this with DNS, I have not yet considered. Each of these /24 is either in a MAB - is "edge" SPI space under Ivip - or is not so, and is therefore regarded as "core" space (not counting 127.0.0.0/8 and RFC 1918 addresses). If the /24 is in a MAB, the reply conveys to the MR: 1 - Something to signify this /24 is in a MAB. 2 - The details of the MAB: a - Its base, such as "203.34.0.0" b - The length of the prefix, between 8 and 24 - such as "16". 3 - Information which allows the MR to find either all the DITR-Site-QSDs for this MAB, or at least a reasonable number of them in the MR's vicinity. This could be via one of three or so methods: a - A list of IP addresses. b - A list of FQDNs, each of which the MR can look up and each of which may return multiple IP addresses. c - Maybe some special arrangement of b, by which the MR can find DITR-Site-QSDs in one or more areas the MR chooses to consider. 4 - A caching time. 5 - Maybe something to identify the MABOC which runs this MAB. If the /24 is not in a MAB (it is in a MAB-free zone), then the reply contains: 1 - Something to signify this /24 is not in a MAB. 2 - The lowest /24 in this MAB-free zone, such as "214.0.0". 3 - The highest /24 in this MAB-free zone, such as "216.157.128". 4 - A caching time. MABs and non-MAB zones can be scattered through the space in any way, with the following rules. Every /24 is either in a MAB or a non-MAB zone. Multiple MABs or multiple non-MAB zones can appear in any sequence. MABs are on binary boundary prefixes, since they must be easily advertised in the DFZ. MABs never cross /8 boundaries, and are always binary aligned: 99.0.0.0/15 is OK. 99.1.0.0/15 is not allowed. MABs are not hierarchical or overlapping. Non-MAB zones can have arbitrary lengths and can start and end on any /24 - they are not defined in terms of being prefixes. The control of what this part of the DNS returns needs to be very tightly controlled - because if an attacker could change some of it, they could divert traffic to their own ITRs and wreak havoc. MRs must use DNSSEC when querying this part of the DNS. When a MR boots, it can now use this DNS system to initialise itself. It can start by looking up 000.000.000.drtm4.arpa. If it in a MAB-free zone, then the returned information will tell the MR the /24 at end of that zone. So the MR looks up the next /24. This may be another MAB-free zone, or may be a MAB. This process can continue and the MR will walk through the IPv4 global unicast address space, jumping from the start to the end of each MAB or MAB-free zone. If this takes too long, then the MR can start in several places at once and run the stepping processes in parallel until the whole 0.0.0.0 to 223.255.255.0 range has been covered. Soon after introduction, Ivip managed SPI space might involve a handful of MABs scattered in various parts of the IPv4 global unicast address space. Later, with wide adoption, perhaps as much as half of the space may be covered by MABs. (There will always be plenty of IPv4 space devoted to single IPv4 addresses for millions of residential and SOHO DSL, fibre etc. customers - which will be PA and doesn't need to be on SPI space.) MABs should not be too small, since we want each MAB, on average, to cover many micronets. Still, even if a MAB is a /24, and provides only 10 micronets, it is still doing a good job of scalable routing, since without Ivip, the EUNs which use these 10 micronets would have needed 10 /24 prefixes of PI space instead. MABs probably shouldn't be too big. I don't have a figure in mind, but if a big MAB, such as a /16 or /12, involves very large amounts of traffic, then it could make DITR workloads harder to handle. In general, we want a DITR to advertise a MAB or not advertise it. Splitting MABs to load share the traffic between multiple DITRs creates a risk that the multiple longer prefixes will not be aggregated and will be a further burden on the DFZ control plane. Also, for full-database QSDs (those serving DITRs and any MRs which get the full mapping feed for one or more MABs) a large MAB with more than a few tens of thousands of micronets will require a large snapshot file to be downloaded at boot time, or if the mapping database for this MAB somehow gets out of synch due to loss of updates which can't be recovered from using a "Missing Payload" server. But this paragraph only applies to the option of MRs having mapping feeds - which is not required any more with DRTM. The MR now has a list of all the MABs. The caching time of these DNS replies will cause the MR to periodically check with the DNS to ensure it takes notice of any changes to the "edge" space - the sum total of all the MABs. For each MAB, the MR will obtain a list of IP addresses - either of all the DITR-Site-QSDs which are authoritative for this MAB, or perhaps a subset of these, chosen by region, so the MR still has quite a few to choose from. Initially, the MR can use any of these addresses, but as it has time after boot up, it should determine which are the two or three "closest" ones to use for each MAB. It could do that with ping, but perhaps a slightly fancier protocol could be used: the MR could send a specially formatted map request to the DITR-Site-QSD and firstly verify that the reply is to the effect that this DITR-Site-QSD will accept queries from this MR. Secondly, it would measure the delay time, which is a good enough measure of closeness for it to choose two or more "nearby" DITR-Site-QSDs to use in the future for looking up the mapping of SPI addresses matching this MAB. As mentioned above, the DITR-Site-QSD could also have an HTTPS file which lists the MABs it accepts queries for. Another elaboration is that when the DITR-Site-QSD sends a map reply, it can contain a number indicating how busy this DITR-Site-QSD is. In this way, MRs can choose alternative DITR-Site-QSDs which are not too busy, if the ones it is using at present return a high value for how busy they are. Perhaps DNS is not the best way to do this - but I am sure a secure, automatic self-discovery and self-configuration system can be created so a MR can learn all the MABs and determine where to send queries concerning each MAB. If the MR found one such DITR-Site-QSD was unreliable, it could easily repeat the lookup, test the available DITR-Site-QSDs and choose one or more alternatives. An elaboration to the MR -> DITR-Site-QSD protocol is that the MR needs to periodically check that the DITR-Site-QSD it has recently (inside the caching time) received map replies from is still alive. This is to handle a situation where a DITR-Site-QSD sends a map reply, and then, say three minutes later, dies. Then, at four minutes after the reply, a mapping change is sent for the micronet in the reply. Since this DITR-Site-QSD is not working, there is no mechanism by with the MR would find out about this mapping change. So the MR should periodically, say once a minute, check these recently used DITR-Site-QSDs are still alive (and have not been rebooted since they sent the query). If the MR finds the DITR-Site-QSD is dead, or has been rebooted, it needs to resend the original queries, to another DITR-Site-QSD and if the replies are any different (indicating there was a change in the mapping which the MR would have been notified about by original DITR-Site-QSD, had it been working) then the MR should send appropriate mapping updates to its queriers, so ITRs get the new mapping ASAP. 4 - With TTR Mobility ===================== Please refer to this document for a discussion Translating Tunnel Router Mobility: TTR Mobility Extensions for Core-Edge Separation Solutions to the Internet's Routing Scaling Problem Robin Whittle, Steven Russert 2008-08-25 Minor revisions 2010-01-12 http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf In the above discussion I assumed that the MABOC companies are leasing space in their MABs to SPI-using EUNs which are non-mobile. However, exactly the same principles apply for micronets used for TTR Mobility. Any micronet can be used for TTR Mobility. Generally, I guess, a mobile IPv4 device will only need a micronet of a single IPv4 address. The MNs (Mobile Node's) tunneling software will present an interface to its ordinary IPv4 stack with this micronet address. The applications in the MN will use this address, no matter what actual address the MN is connected to in its one or more access networks - such as by wired or WiFi Ethernet, 3G wireless or other link technologies The MN's applications can use this stable, globally portable, micronet address wherever it is, no matter what sort of access network it is using, and no matter how many levels of NAT these connections are behind. The MN can still use its micronet address even when its connection is on an SPI address. Unlike other attempts to use a CES architecture for Mobility, TTR Mobility does not involve the MN being its own ETR or ITR. The MN tunnels to one or more (typically) nearby TTRs (Translating Tunnel Routers) with an encrypted 2-way tunnel. A single TTR at any one time is the ETR to which the MN's micronet is mapped. This TTR - and potentially other TTRs the MN also has tunnels to but which are not currently playing the ETR role for this MN - also accepts the MN's outgoing packets from its micronet address. The TTR forwards these to the rest of the Net. If the outgoing packet is addressed to an SPI address, the TTR will either perform the ITR function on the packet, or forward it to a co-located ITR which will do so. There would be multiple TTR companies, each selling TTR Mobility services to end-users. An IP cellphone or laptop etc. using TTR Mobility is also an EUN, even if it only gets a single IPv4 address. For IPv6, Ivip's mapping system deals in units of /64, so an IPv6 TTR Mobile MN will get a /64. Each TTR company might have its own network of TTRs - or perhaps there could be TTR-operating companies running one or more sets of actual TTRs, but hiring out their capacity to TTR companies who actually sell services to users. Here I assume each MN only gets a single micronet, but the TTR Mobility architecture supports each MN getting one or more micronets. If an end-user already had a micronet to use with their MN, then they would give their TTR company the credentials to control the mapping of that micronet. So a company XYZ might lease one or more UAB (User Address Blocks) from one or more MABOCs, and split these into micronets of various sizes, mapping them however it likes - and giving various other organisations the credentials they need to change the mapping of any one or more of the micronets XYZ creates in their space. For instance, those micronets used for non-mobile network multihoming might be controlled by a Multihoming Monitoring Company which XYZ hires to probe reachability of its networks via various ETRs, and to change the mapping to another ETR in another ISP's network if the currently used one appears to be incapable of taking packets to the destination network. XYZ could use one of its micronets for a given MN. It may then contract a TTR company TAA to provide it with access to that TAA's TTRs all over the world. When it does this, XYZ configures its arrangement with its MABOC, through which it can send mapping changes for its micronets, so that XYZ still has ultimate administrative control over this micronet, but so it can give TAA a username and password, or whatever is required, so TAA's system can control the mapping of this micronet. If XYZ chooses another TTR company instead, it would cancel those credentials via its MABOC, and get another set to give to the new TTR company TBB. There may be an IETF standardized way in which all TTR MNs tunnel to their TTRs, and in which the TTRs can instruct the MNs to tunnel to new TTRs or participate in activities by which the TTR system can determine topologically where the MN is, for instance to determine if it is too far from the current TTR and should try to contact another. Alternatively, there may be no IETF standardized system for this, and each TTR company would have suitable software for XYZ to download into their cell-phone, laptop or whatever to perform these functions. Generally, I anticipate a single MN will only operate with a single TTR company. However, it would be technically possible for the one MN to be set up for TTR services from multiple TTR companies. Each such service would involve a separate micronet. The owner of the MN - XYZ in this example - could supply their own micronet for the TTR company to use, or it could use a micronet provided by the TTR company. In the second case, the TTR company is also acting as the MABOC for this user's micronet. TTRs will generally be located in, or topologically near (within hundreds or a thousand km of) the access networks wherever MNs might connect to the Net. Since an MN could connect literally everywhere, it follows that TTRs are ideally numerous and located, topologically, all around the Net. The TTRs of a given TTR company are controlled by a fancy management system which orchestrates the MN connecting to new TTRs when needed. When the MN has successfully tunneled to a new TTR, then the TTR company's control system uses its credentials to change the mapping of the micronet from the currently mapped TTR to the new one. Generally, people think of CES and Mobility and imagine vast numbers of mapping changes - for instance whenever the MN gets a new access network, or a new address in the same access networks. Even with a stationary 3G MN, it may get completely different IP addresses in topologically separate access networks, simply due to RF and traffic changes causing the MN to connect to a different base-station which uses a different IP gateway. But with TTR mobility, a new address just means the MN needs to make a new tunnel to its currently used TTR. No mapping change is required. Many or most MNs may go for one year to the next without a mapping change for their micronet. As long as they are some distance, such as 1000km or so, from the currently used TTR, there's probably no need to select a closer TTR and change the mapping to that. Even when the MN suddenly finds itself on a new address, on the other side of the world from the current TTR, it still tunnels to that TTR and continues communications. This has higher latency and more risk of packet loss than using a closer TTR, but it will work. The management software in the TTR company will need to detect the topological location of the MN, and select one or more TTRs which are probably closer to it, for the MN to tunnel to. Only when the MN has successfully tunneled to a closer TTR to will the TTR company's management system change the mapping. Communication sessions using the micronet address will continue without any disruption, since at the time the mapping is changed, the MN has tunnels to both the old and new TTR. It is notable that the criteria for placing TTRs are much the same as for placing DITRs. So a TTR company can also be a MABOC and run a bunch of TTRs, DITRs and so DITR-Site-QSDs at dozens or hundreds of separate sites all over the world. It needs some fancy software to manage the TTR system, and if does not need ISPs to run ITRs for its MABs (this is not really necessary, since a good global system of DITRs will probably be fine) then it is ready for business. A company could be a MABOC for non-mobile networks getting portability, multihoming and inbound TE - without any cooperation with any other organisation other than its customers using ISPs which permit the forwarding of packets whose source addresses are SPI addresses. If the MABOC company (not a TTR company which was also a MABOC) wanted ISPs to run ITRs to cover its MABs, then it would need to operate its DITR-Site-QSDs according to industry (presumably IETF) standards. If it customers to share ETRs provided by ISPs - which must operate with ITRs of other MABOCs - then its DITRs would need to comply with the industry standard. However, these constraints do not necessarily apply to a TTR company. If the TTR company with its own one or more MABs has a bunch of TTR sites around the world - or wherever it wants to operate - then it doesn't necessarily need any ISPs to run ITRs covering its MABs. Probably the TTR sites themselves are perfectly good places to place DITRs serving these MABs. In this case, the TTR company doesn't need to make its ITRs work with anyone else's ETRs, since the only ETRs it uses are implemented in its own TTRs. Also, the TTR company's mobile customers don't need any special arrangement with their access networks to send out packets with SPI source addresses - because these packets go through the encrypted tunnel to the TTR. So a company with the required resources - a bunch of servers at suitable sites around the Net, each with direct access to the DFZ routing system as routers which can advertise the MABs, and forward packets to other DFZ routers - could go into business on its own, without any need to accept mapping queries from ISP MRs, without any need to send mapping feeds to ISP MRs or Replicators and without any need for its ITRs to tunnel packets to ETRs of other companies. It needs suitable software for the various kinds of MN it plans to support, and it needs some fancy management software in its global system to run the whole system, orchestrate the software in each MN, and to do billing etc. But a TTR company such as this, with its own prefixes to use as MABs for its customers, doesn't need to interact with any other organisation - and could go into business initially without waiting for any new IETF standards. Multiple such independent TTR companies could operate, with differing technical standards. Each would advertise just one or a few MABs in the DFZ, but be providing IPv4 mobility for typically up to 256 end-user MNs per /24 of space in these MABs. This is another example of a CES architecture being deployed without relying on ISPs to take the lead. Neither the TTR companies, the MABOCs or their customers might care at all about the number of prefixes in the DFZ - but they will be building infrastructure and selling services which contribute to the solution of the routing scaling problem, while also providing a new form of global mobility - which provides generally good path lengths and works with all protocols, without the need for upgrades to either IPv4 or IPv6 correspondent hosts. 5 - DTRM for LISP ================= I don't assume the LISP team are interested in DRTM or anything like it, but I believe it would be applicable to LISP and would be superior to ALT or any other LISP mapping system I know of. There's nothing in DTRM which relies upon aspects of Ivip which differ from LISP, such as Ivip's of a single ETR address for the mapping, or Ivip's approaches to tunneling. DTRM concerns Stages 1 and 2 above, with something like the DNS-based system I describe in order that ITRs or Map Resolvers can discover the current MABs and find two or so (typically) nearby DITR-Site-QSDs for each MAB. Stage 3 could be done as well, but this goes beyond DTRM and involves pushing full feeds of real-time mapping information beyond the MABOC, with the potential for MRs to store mapping for some or all MABs. Except for NERD, in which the ITRs store all mapping information, the LISP team has very much avoided such concentrations of mapping, so I won't explore Stage 3 for LISP. TTR Mobility would be possible with LISP-ALT or with LISP with DTRM. I will not pursue the LISP-MN approach to mobility, since it involves MNs being their own ETR, which can't work behind NAT, and since it requires extremely rapid changes in ITR tunneling behaviour every time the MN gets a new access network address. TTR Mobility doesn't require frequent or rapid mapping changes, but the sooner the mapping can be changed, once the MN has tunneled to a new TTR, the sooner the MN can drop the tunnel to its previous TTR. TTR Mobility also requires the TTR company be in charge of mapping - rather than the MN itself, as is assumed by LISP-MN. Except for LISP-MN, I think LISP tends to assume that ETRs are the authoritative source of mapping information. This assumption doesn't fit DRTM exactly, since the MABOC actually runs the servers which authoritatively reply to mapping queries. Assuming ETRs were to continue as the authoritative source of mapping, LISP would need to be adapted so the ETRs have a method of securely communicating mapping information to the MABOC which runs the MAB of the one or more EID prefixes the ETR is handling. The MABOC would then be responsible for: 1 - Deciding which ETRs were in fact authoritative for a given EID prefix at any time. This may be tricky since an end-user network could choose a different ISP at any time, and so need to use an ETR on a different address to what it previously used. Maybe it uses its own ETRs, in which case the ETR may have stored credentials the MABOC's mapping system will recognise whatever IP address it communicates from. If not, then the ETR is run by the new ISP and somehow needs to be given credentials which the MABOC's mapping system will recognise. This would involve security problems in trusting ISP's ETRs to use those credentials only as the EUN wished - including ETRs of ISPs which the EUN previously used but no longer uses. 2 - Deciding what the final mapping would be for each EID prefix its MABs cover, considering that perhaps it hasn't heard from any ETRs for a while, or perhaps the commands it receives from two or more ETRs regarding a single EID are contradictory. 3 - Reliably, securely and rapidly transmitting the new mapping to all its DITR-Site-QSDs - for use by its DITRs and for MRs in ISP and other EUN networks. Before anyone could really think of LISP using something like DRTM, they would have to clarify the arrangements by which EID-using EUNs get their EID address space. As far as I know, there isn't a clear statement of how this would work for LISP. If, for instance, the plan was for small EUNs to obtain a /24 from an RIR and somehow run this as EID space on their own, then this EUN will either need to become a MABOC itself, or contract a MABOC to do the mapping and DITR work for this prefix. DRTM in principle could work with very large numbers of MABOCs, each with their own DITR-Sites - but this is unlikely to be economic or practical. I think it is unlikely that more than a dozen or so fully-fledged, independent, systems of DITR-Sites would be built to give good coverage of the whole Net - say with 20 or more DITR-Sites each. Economies of scale and basic capital costs would tend to favour a smaller number of such systems, being operated either by one MABOC and serving the MABs of other MABOCs, and/or being run be companies which are not MABOCs, but which contract to multiple MABOCs and so use their system of DITR-Sites to support hundreds or thousands of individual MABs. Ivip has always assumed real-time mapping distribution to all ITRs which need it - so the mapping of a micronet is a single ETR address and the source of mapping changes is external to the Ivip system. LISP has always assumed that real-time distribution of mapping to all ITRs which need it is impossible. No-one has ever indicated it was undesirable - just that they had practical concerns about scaling, "real-time synchronization" of a single mapping database etc. which they considered insoluble. I believe DRTM overcomes such objections. DRTM is not just a mapping distribution system which delivers mapping to ITRs to be cached, it is also a scalable and secure approach to getting the mapping changes to all DITR-Site-QSDs in real-time - and using the nonces in the query packet, securely sending mapping updates to the querier. Ivip has always had this real-time ability to securely propagate real-time mapping changes to ITRs - but this would be the first system which LISP had which could do this. LISP designs - and the other CES architectures of APT, TRRP, TIDR and IRON-RANGER - are predicated on the assumption that ITRs can't be told about mapping changes in real-time. Therefore, the mapping is always a set of instructions for ITRs to figure out which ETR to which packets for a given EID prefix will be tunneled - rather than be told in real-time exactly which ETR to tunnel the packets to. DRTM makes this more complex mapping information unnecessary. In most respects there is bound to be a better part of the system to determine the tunneling behaviour of ITRs than the ITRs themselves. I am not convinced it is the ETRs - which is why in Ivip the mapping is supplied by unspecified mechanisms chosen and implemented by the SPI-using EUNs. Ivip EUNs are free to program their ETRs or any other part of their network to control the mapping. Its just that I think for multihoming service restoration, or for TTR mobility, some other organisation (a multihoming monitoring company, or the TTR company) is a better place to detect reachability and to make mapping decisions. For any multihomed EID prefix, LISP's more complex mapping involves the addresses of the two or more ETRs and then two sets of additional information. One controls what priority to give between the ETRs if all ETRs seem to be working, or if one or more fails. The other is formally separate and concerns load sharing. However, these items of information perform similar functions in that they tell ITRs how to choose between two or more ETRs which the ITR considers can handle traffic at present. Ivip ITRs have no such information. The mapping information is a single ETR address. This means that Ivip has two potential disadvantages compared to LISP: 1 - Ivip ITRs on their own can't load-share traffic sent to a single micronet between two or more ETRs. Ivip's method of coping with this is for the incoming traffic to be made to go to at least two IP addresses in two separate micronets. Then, each micronet can be mapped to one of the ETRs to achieve load sharing and/or steering different types of traffic, such as VoIP vs. SMTP, over different ISPs and links. Although Ivip is assumed to involve EUNs being charged for each mapping update - probably a few cents - this arrangement may be superior to the LISP approach, since it can be dynamically adjusted one minute to the next, to balance out varying traffic volumes over particular links. This may provide a significantly higher utilization of available bandwidth for a given risk of congestion than the LISP approach - making the fee per mapping change a good deal. 2 - The LISP approach of individual ITRs making their own choices about tunneling for multihoming service restoration could sometimes produce superior connectivity to Ivip's approach where all the ITRs are tunneling to the same ETR. For instance, if there was an outage which caused ITR-1 to be able to reach ETR-A and not ETR-B, and ITR-2 to be able to reach ETR-B and not ETR-A, then in principle, if the LISP ITRs could correctly detect this and make appropriate decisions, both ITRs could successfully deliver packets when with Ivip, only ETR-A or ETR-B could successfully deliver packets. I tend to think such states in routing systems between ITRs and ETRs would be transitory, and that it would be a rare occasion when the LISP approach produced better results than the Ivip approach. But nonetheless, the LISP system of more complex mapping and ITRs could in principle produce superior results. My design choice with Ivip is to keep the mapping and the ITRs simple, use the alternative arrangement for inbound TE, and assume that little of importance is lost with Ivip's "one ETR or the other" approach to multihoming service restoration. Maybe, if LISP adopted something like DRTM, the designers might make the same decision - to use a single ETR address and so to enable their ITRs to be simpler. Then, LISP would need either some new probing and reachability decision-making mechanisms - or to follow Ivip's lead and leave it to the EUNs to do this and make their own decisions about mapping changes. Alternatively, perhaps the LISP designers would retain the current complex mapping and perhaps add more information, such as to tell ITRs exactly how to probe reachability of the destination network through the various ETRs, how to make decisions based on the results, and how this can be done scalably and securely - especially if tens of thousands of ITRs are probing reachability to one or more networks behind the one ETR. Most of DRTM concerns how the MABOCs make the effort to run DITR-Sites all around the Net, pushing mapping in real-time to these sites - and then making this available via DITR-Site-QSDs to ISPs in the area, to encourage the ISPs to run their own ITRs. This does not absolutely require the ISP's MRs or their ITRs to receive mapping updates from the DITR-Site-QSDs. So LISP could continue to operate on the principle of ITRs and their MRs being given cacheable mapping information, but not being notified if the mapping changed during the caching time. This would make LISP MRs and ITRs somewhat simpler in this respect than Ivip MRs and ITRs - since they don't need to accept mapping updates. However, I think it would be a lost opportunity to have much better control of ITR behaviour in real-time, and so to reduce the mapping to a single ETR address - so simplifying ITRs and so moving the probing and decision-making functions somewhere else. Being able to more flexibly and centrally perform the probing and decision-making functions would enable better EUN control over their incoming traffic, and is the only obvious way of solving the scaling problems inherent in lots of ITRs doing reachabilty testing. With LISP's current approach, there is a scaling problem with a single ITR having to determine reachability of many ETRs, and there is a second scaling problem with the one ETR having to participate in some way in reachability testing by large numbers of ITRs. Assuming LISP made full use of the real-time nature of DTRM mapping, including having mapping updates sent when required to MRs and then propagated to ITRs, then LISP could use external, or new internal, mechanisms to test reachability of EUNs via multiple ETRs and to make decisions based on more parameters as set by the EUNs than would be practical to implement in all ITRs. So LISP would come to resemble Ivip more - and perhaps to modularly separate out the control of mapping from the CES architecture itself, as is the case with Ivip. With TTR mobility, it is clear that the TTR company needs to control the mapping, since the MR itself is responding to TTR company instructions about which TTRs to tunnel to - and the MN itself is not as reliably connected to the Net as the TTR company's management system.
- [rrg] DRTM - Distributed Real Time Mapping for Iv… Robin Whittle