[rrg] DRTM - Distributed Real Time Mapping for Ivip & LISP

Robin Whittle <rw@firstpr.com.au> Thu, 25 February 2010 13:45 UTC
Message-ID: <4B867F6D.60601@firstpr.com.au>
Date: Fri, 26 Feb 2010 00:47:25 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] DRTM - Distributed Real Time Mapping for Ivip & LISP
Precedence: list
Here is a new mapping arrangement for Ivip which I think could also
be adopted by LISP.  Later, I intend to write this up as an Internet
Draft: draft-whittle-ivip-drtm.

DRTM does not involve any single device needing to contain all
mapping information, or a "globally synchronised mapping database".

This new arrangement also avoids concerns about lack of motivation
for early adoption by ISPs.  The initial deployment of Ivip (or
LISP) services to end-user networks will be driven by organizations
which need not be ISPs.

For a non-mobile end-user network (EUN) to use SPI (Scalable PI =
edge = EID) space, the only thing required of its ISP is to allow
the forwarding of packets sent from the EUN whose source address is
of the SPI space the EUN is using.

With DRTM, it will be easier for ISPs to install ITRs than with
previous approaches to Ivip's mapping system, since there is no need
for full-database query servers, streams of real-time mapping etc.

DRTM would work fine with TTR Mobility.  TTR mobility doesn't
require any ISP involvement, since the MN sends outgoing packets
to the TTR, and does not rely on its access network to forward them.
A company could provide TTR Mobility services with its own DITRs
(PTRs) and not need any ISP involvement at all - it wouldn't need
ITRs in ISPs.

  - Robin


  1 - Quick description

  2 - Different approaches to Ivip's mapping system

  3 - More Detailed Description
       Abbreviations
       Stage 1 - DITRs only
       Stage 2 - Add ITRs in ISPs and EUNs, with purely caching MRs
       Stage 3 (optional) - ISPs/EUNs have non-caching MRs

  4 - With TTR Mobility

  5 - DRTM for LISP



1 - Quick description
=====================

Distributed Real Time Mapping (DRTM) involves significant differences
from what I described (2010-01-06) in:

     Ivip's new distributed mapping distribution system
     http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html

This (msg05975) removed the need for a single global inverted tree of
Replicators.  Instead, mapping change packets would be sent out from
DITR-sites near the ISPs which want to receive them and perhaps
Replicate them to drive multiple full-database QSD query servers.

This new DRTM arrangement removes the need for Replicators or the
ISPs running and full-database QSD query servers.


The new DRTM mapping distribution system is still real-time, with
updates going directly to all ITRs which need them.  I guess the
total delay time from mapping change to ITRs changing their tunneling
behaviour would be a second or two, but in principle it could be less
than half a second.

As always with Ivip, the end-user network - or whoever they appoint -
controls the mapping, and the mapping of each micronet is to a single
ETR address.  So the functions of reachability testing,  multihoming
failure detection and service restoration decision making are
modularly separated from the CES architecture.  All other CES
architectures monolithically integrate these functions, and so
require their ITRs to be more complex than Ivip's, while also greatly
limiting the ability of end-user networks to have reachability
testing and multihoming service restoration done the way they prefer.


DTRM involves:

   Provision of Ivip (or LISP) services will be possible without the
   ISPs having to install ITRs or ETRs or be involved in Ivip or LISP
   at all - other than allowing their Ivip SPI-using customers (LISP:
   EID-using customers) to send out packets whose source addresses
   are from these SPI (EID) prefixes.  SPI-using end-user networks
   will run their own ETRs on the PA address they get from their
   current ISP service.  They can run ITRs too if they like, though
   this is not required.

   The MAB (Mapped Address Block) Operating Companies (MABOCs) -
   whose business is to lease out MAB space to thousands of SPI-using
   end-user networks - will do most of the work, setting up DITRs
   (Default ITRs in the DFZ, or in LISP: Proxy Tunnel Routers) at
   widely dispersed sites around the Net.

   ISPs will only become involved in adding ITRs when they want to
   - which will be when they have significant numbers of SPI-using
   customers, with sending hosts in the ISP's networks (and ISP's
   customer networks which lack ITRs) sending packets to hosts on
   SPI addresses which are mapped to the ETRs of this ISP's SPI-using
   customers.

   As long as the ISPs don't have their own ITRs, these packets would
   go out to the DFZ, to a DITR and then return to the ISP's network.
   So to reduce this waste of expensive upstream bandwidth, the ISP
   will want to install their own ITRs.

   No single server is required to hold the entire mapping database.
   This can still be done as an option - the equivalent of
   full-database QSD mapping query servers.  Maybe this will be
   desirable in some settings, but it is not required.

   If an ISP or end-user network runs ITRs, instead of getting their
   mapping from one or a few of their own full database QSD servers
   (the "synchronized databases" some people are concerned about)
   their ITRs instead query one or more caching Map Resolvers (MRs).

   An ISP or a large end-user network might run two or more MRs -
   just as in previous  versions of Ivip, they ran two or more
   full database QSD query servers.  However, the MR is a caching
   query server, without great storage or computational requirements.
   An MR can optionally be a full-database query server for one or
   more MABs, including  all MABs, if this makes sense, but this is
   not at all required.

   Assuming the MR is purely caching, then when an MR gets a query
   from an ITR it can't answer from its cache, it queries one of
   multiple typically "nearby" query servers which are full-database
   for particular MABs.  Mapped Address Blocks ("coarse" prefixes in
   LISP) are short DFZ-advertised prefixes which cover the micronets
   (EID prefixes in LISP) of scalable "edge" SPI space (EID space in
   LISP) of many end-user networks.

   These servers are at the DITR (PTR in LISP) sites which are
   typically located in widely dispersed places around the Net.  So
   these full database query servers are no longer "local" to the
   ITRs which depend on their map replies - they are "nearby".  These
   servers only need to handle mapping queries for MABs this
   DITR-Site covers.  DITR-Sites will typically be "nearby" - such as
   within one or two thousand km - and so will be able to answer
   queries reliably and with insignificant delay, compared to delays
   and higher risk of lost packets which are inherent in any global
   mapping query server system such as LISP-ALT, with only one or two
   authoritative query servers.

   As an option, the ISP may choose to get full mapping feeds from
   these DITR-Sites, and so run the Map Resolver (MR) as a full
   database query server for some or all MABs.  This may or may not
   involve Replicators fanning out mapping information.  Other than
   this optional arrangement, no single server has the full mapping
   database.  The operators of the DITR-Sites need a full mapping
   feed to those sites for all MABs they cover - and also to run
   the query servers there used by nearby ISPs' MRs.

   How these DITR-Site operators transmit the mapping in real-time
   to all their DITR-Sites is up to them.  They could use Replicators
   - but that is an internal matter which doesn't necessarily have to
   comply with any particular standards.  Each MABOC has a finite
   number of DITR-Sites covering its MABs - at most a hundred or so.
   So it is clearly practical to get mapping updates in real-time to
   those sites, since they are all run either by the MABOC, or by
   another company which contracts to this and probably other MABOCs.
   Private network links to these DITR-Sites might be a good
   approach, to avoid problems with congestion and DDoS attacks which
   arise if mapping was sent via the open Internet.

   A DITR-Site's full-database query server is called a
   "DITR-Site-QSD".  It is full-database only for the MABs the site
   handles and is not involved in mapping for any other MABs.

   Mapping changes are initially generated by the end-user network
   whose micronet's mapping is being changed - or whoever the
   end-user network appointed to control the mapping.  This is
   conveyed, with appropriate authentication arrangements, to the
   MABOC or to some other company the MABOC contracts to handle
   mapping for its MABs.  The mapping changes for each MAB are
   sent in real-time to the DITR-Site company, who - via whatever
   internal arrangements they choose - convey them to the
   DITR-Site-QSDs at all their DITR-Sites.

   When a mapping change arrives at a DITR-Site-QSD, it checks
   whether this change affects any micronets whose mapping was given
   out in map replies within the last ten minutes, or whatever
   caching time this DITR-Site-QSD specifies in its map replies.

   The DITR-Site-QSD caches the nonces which came with the map
   requests, and sends to the requester a mapping update command,
   secured by the nonce from the original request.  This mapping
   update will go to the MR at an ISP network, which uses the same
   algorithm to send mapping update commands to the one or more ITRs
   to which it has sent mapping for this micronet within whatever
   caching time it sets on its replies.  (This time should be the
   same as, or less than the caching time of the replies it got from
   the DITR-Site query server.)

   Therefore, all ITRs with currently cached mapping for the micronet
   whose mapping has just been changed will receive the update within
   a fraction of a second of it arriving at the DITR-Site-QSD.

   So all ITRs in the world which are currently tunneling packets
   whose destination address matches the micronet whose mapping has
   just been changed will tunnel these packets to the ETR specified
   in the updated mapping.



2 - Different approaches to Ivip's mapping system
=================================================

Before explaining this in greater detail, here is a run-down of the
changes to Ivip's mapping system:

 Plan-A
 2007-07-15:  Original system with a tree-like structure of
              Replicators - with the top-level being "Launch
              Servers" with a fancy protocol between them.

              ivip-arch-00/01/02      } All
              ivip-db-fast-push-00/01 } obsolete.

 2010-01-13   Same system, but all-new ivip-arch and revised
              ivip-db-fast-push.

              ivip-arch-03          Currently the latest version,
                                    but the fast push mapping section
                                    is no longer up-to-date.

              ivip-db-fast-push-02  Better documentation of the
                                    original Launch Server system.

 Plan-B
 2010-01-18   "Launch servers" replaced by Level 0 Replicators which
              are fully meshed and have a flooding arrangement which
              is simpler, faster and more robust.

              ivip-db-fast-push-03  Significant simplifications and
                                    new material to give an overview
                                    of Plan-B.

              ivip-fpr-00           All new ID with goals and non-
                                    goals, better description of
                                    Replicators and the best Plan-B
                                    documentation.

 Plan-C
 2010-02-07
              Ivip's new distributed mapping distribution system
              http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html

              This keeps the Replicator concept, but has no central
              tree structure of Replicators.  Instead, one or more
              ISPs (or large end-user networks) make their own
              small tree of Replicators, and get feeds of mapping
              changes for the MABs of all MABOCs from the one or
              typically more than one mapping coordination companies
              or the MABOCs themselves - whoever runs the nearest
              one or two DITR-Sites for each MABOC.  So there is
              no central tree of Replicators - just smaller trees
              or even single QSDs getting feeds from MABOC-run
              DITR-Site sources of mapping generally not too far
              away.

              In Plan-A and Plan-B, the MABOCs were either RUAS
              (Root Update Authorisation Server) companies, or
              contracted RUAS companies to handle the mapping of
              the micronets in their MABs.  The RUAS companies
              collectively ran a decentralised but still unified
              inverted tree-like structure of Replicators to fan
              out mapping changes in real-time all over the world
              to ISPs' full database QSDs.

              In Plan-C, there is no global inverted tree of
              Replicators and the MABOCs invest more and reach out to
              ISPs from their widely distributed DITR-Sites.  ISPs
              don't absolutely need ITRs and QSDs (and therefore
              mapping feeds and probably Replicators) but they will
              probably want them after a while (assuming some of
              their customers are using SPI space) since having their
              own ITRs will reduces traffic going out to a DITR and
              returning to these customers' ETRs.

              Missing Payload Servers are also needed so the ISP's
              QSDs can get mapping which is somehow missing from the
              two or more upstream Replicators - due to temporary
              outages affecting the two or more feeds.


 Plan-D
 2010-02-24   This message.  ISPs (or end-user networks) which want
              to run their own ITRs can still use the Plan-C approach
              of having their own full-database QSDs, with full
              feeds, Replicators, Missing Payload servers etc.

              However, ISPs (and end-user networks) which want ITRs
              have an intermediate option, which is less expensive -
              no local full-database query servers, Replicators and
              reliance on real-time feeds - but to use new query
              servers at these nearby MABOC-operated (directly or
              indirectly) sites where the DITRs are.  These
              DITR-Site-QSD query servers are "full database" for the
              subset of MABs each such DITR site handles.  The ISP's
              ITRs query these via a Map Resolver (MR) - which is
              like a  caching QSC query server but which knows, for
              each MAB, the addresses of two or more of these
              typically "nearby" MABOC-run DITR-Site-QSDs query
              servers which are authoritative, full-database, query
              servers for each MAB.

              Therefore, the ITRs in an ISP or an EUN are relying on
              full-database query servers are no longer strictly
              "local" - as they were in Plans A, B and C.  They are
              normally "close" or "close enough" that delay times and
              query/response packet losses are insignificant.  So this
              is fully distributed, but is not a "global" query
              server system like LISP-ALT: with queries and responses
              frequently traversing the Earth - with consequent
              delays, losses and scaling problems.



3 - More Detailed Description
=============================

As a preliminary, please read:

   http://www.ietf.org/mail-archive/web/rrg/current/msg05975.html

but bear in mind that the Replicators, QSDs in ISP networks, and the
QSDs sometimes using Missing Payload Servers are now an optional part
of Plan-D - and that Plan-D has a new intermediate system which
should be sufficient for all scenarios even with a fully deployed
system covering 10 million or so micronets for non-mobile multihomed
EUNs and up to 10 billion micronets for mobile devices.  Plan-D has
no single servers with the full mapping database, no real-time
mapping feeds, Replicators etc.

This description assumes the MABOCs (MAB Operating Companies) aim is
to lease their space primarily to be used for portability,
multihoming and inbound TE.  However, if one or more companies,
including MABOCs, want to deploy TTR Mobility, that can be part of it
as well.  I suspect the demand for TTR Mobility will be more urgent,
widespread and profitable than for non-mobile network portability,
multihoming and inbound TE.

This description uses Ivip terminology, but is in general applicable
to LISP as well.  This discussion is IPv4 specific, but the same
principles with different details should be fine for IPv6 too.

I need to invent some new terminology, since DRTM is based on various
types of business and network operations which do not yet exist.

Abbreviations
-------------

   EUN   End User Network - from a mobile device, to a local LAN
         behind NAT as used for most residential / SOHO DSL etc.
         services - to the networks of the largest corporations,
         universities etc.

   PMHTE: Portability, Multihoming and/or Inbound Traffic
         Engineering.

         These are the benefits many EUNs seek and which they can
         currently only gain by advertising PI prefixes in the DFZ -
         which is the cause of the scaling problems.  For a fuller
         description, see the first point 4 in:

         Scalable routing problem & architectural enhancements
         http://www.ietf.org/mail-archive/web/rrg/current/msg06099.html


   SPI   Scalable Provider Independent:  "Edge" space handled by a
         CES (Core-Edge Separation) architecture's ITRs, ETRs etc.
         and so which is Provider Independent (portable) and suitable
         for EUNs to use in a scalable fashion for PMHTE.  (LISP: EID
         space.)

   MAB   Mapped Address Block:  A prefix advertised in the DFZ which
         covers a typically large amount of SPI space which is
         typically used for many (tens to hundreds of thousands)
         of individual SPI micronets (Ivip) or EID prefixes (LISP).
         Dino recently used the term "coarse prefix" to refer to
         the same thing in LISP.  MABs are advertised by DITRs
         (Default ITRs in the DFZ) AKA "Proxy Tunnel Routers" in
         LISP, to collect all packets sent to SPI addresses from
         hosts in networks without ITRs, and then to tunnel them
         to the correct ETR.

   MABOC MAB Operating Company:  A company which "owns" or runs
         for someone else, one or more MAB, which they lease out
         in typically small chunks to typically large numbers of
         EUNs.  (No LISP equivalent - LISP has very little in the
         way of potential business arrangements, and I think the
         designers expect EUNs to get space from RIRs and somehow
         advertise their space in the DFZ from PTRs as a part of a
         presumably larger "coarse" prefix.)

         It is also possible for an EUN with PI space to convert
         some or all of it to a MAB, and so become a MABOC.  It
         need not lease out the space to anyone else, but may
         use all its space as micronets for its internal divisions.
         So a corporation or university with PI space today might
         be able to make do with less space due to the finer
         slicing and dicing possible with micronets (down to a
         single IPv4 address, or integer number of IPv4 addresses).
         Then the EUN might be able to return half of its PI prefix
         to the RIR and convert the rest to a MAB.  As a MABOC, it
         will need to run PITRs for its MAB, or pay someone else to
         run them.

   DITR  Default ITR in the DFZ:  (LISP: Proxy Tunnel Router.)
         Ordinary ITRs in ISP and EUN networks always cover all MABs.
         DITRs could in principle cover all MABs, but in general
         a DITR will only cover the subset of MABs which the company
         which operates it is paid to handle.  For instance, if a
         DITR is run by a MABOC directly, it will only advertise
         in the DFZ the MABs of that MABOC.  However, perhaps the
         MABOC has an arrangement with other MABOCs to handle their
         MABs as well - or perhaps the DITR is run by a company which
         supports the MABs of multiple MABOCs.  This could involve
         a DITR covering all MABs, but I expect that most DITRs and
         their DITR-Sites will cover a subset of all MABs.

   DITR-Site:  Wherever a DITR is located.  This site may be in a
         data centre, peering point, Internet exchange or whatever.
         The DITR is "full-database" for the MABs it covers - which
         means that the DITR's ITR function is actually a caching
         ITR like all others, but that it is closely coupled to
         a query server which has the full database of mapping for
         all the covered MABs.   How MABOCs (or "DSOCs", see below)
         get the mapping data to these DITR-supporting full database
         query servers is up to them - it is an internal affair.
         Maybe they will use Replicators - or some other system.
         They could get the data to these sites over the open
         Internet, with appropriate encryption etc. - or perhaps
         they will use private network links to each site, which
         would ensure the delivery of mapping updates could not be
         disrupted by DDoS flooding attacks from the Internet.

   DSOC DITR-Site Operating Company.  A company which runs at least
         one, probably dozens or perhaps hundreds, of DITR-Sites.
         This may be a MABOC, and the DITRs at each site may only
         cover the one MABOC's MABs.  It may be a MABOC selling its
         DITR services to other MABOCs.  It may be a company which a
         chain of these sites, which is not a MABOC, but which sells
         its DITR services to MABOCs.  It could also be an ISP which
         runs a DITR-Site for one or more MABOCs.

   DITR-Site-QSD  From the above real-time feed of mapping for
         all the MABs covered by a DITR-site, the DITR-site-QSD
         is a full-database query server, which behaves like
         the QSDs already described for Ivip, except:

           1 - They are only "full database" for the MABs this
               DITR-site supports.  They do not have any mapping
               for other MABs - so should not be queried for
               mapping of addresses outside these MABs.

           2 - They may be the same QSD as is used by the DITRs
               at this DITR-Site, or they may be separate servers,
               separate processes or whatever from these - but
               they still operate from the reliably supplied feed
               of mapping changes for all the MABs covered by this
               DITR-site.

           3 - They accept mapping queries, as previously described
               for QSDs, from any querier whatsoever.  (Maybe there
               needs to be ACLs, but for now I assume not.)

               Previous versions of Ivip assumed the full-database
               QSDs were in ISP networks or large EUNs, so the query
               and response traffic was within these networks.

               DITR-Site-QSDs accept queries from MRs in many ISPs
               and EUNs all over the world.

               In practice, the queries will typically come from MRs
               of nearby ISPs and EUNs.  However, the DITR-site-QSDs
               may also receive queries from MRs anywhere in the Net.

           4 - The addresses of these DITR-site-QSDs can be found
               by a new DNS-based mechanism described below.  This is
               how MRs find them.  MRs find all the DITR-site-QSDs
               which serve a particular MAB, and they choose one or
               more "closest" (typically) ones to send their queries
               to.

           5 - Each DITR-Site-QSD will probably run an HTTPS server
               which provides a list of MABs it is authoritative for.
               This is so MRs can verify what they learnt from the
               DNS-based mechanism - which specifies this
               ITR-Site-QSD as being an authoritative server for
               typically many MABs.


    MR - Map Resolver  (Here I am borrowing a LISP term for a similar
               purpose in the Plan-D Ivip system.)  A MR is like a
               caching query server (QSC), but it dynamically
               configures itself to send queries not to a single
               upstream QSD, but to multiple DITR-Site-QSDs.  It
               chooses these to be the "closest" ones from the
               list it gets from the new DNS-based mechanism.

               MRs are inside ISP and large EUN networks and take
               the place of the Plan-A/B/C QSDs.  They typically
               get mapping replies quickly and reliably, since they
               are typically using nearby "full-database" DITR-site-
               QSDs.

               Within the ISP or EUN, ITRs either query their one or
               more MRs directly, just as in Plan-A/B/C they queried
               one or more QSDs - or they query them indirectly
               through one or more caching QSCs as previously
               described in Plan-A/B/C.

               Plan-C AKA (below) Stage 3 only:

                   A MR can optionally be sent full mapping update
                   streams for one or more MABs from nearby
                   DITR-Sites - either directly or via Replicators.
                   If it uses Replicators - rather than some system
                   by which it can communicate two-way with the
                   source of mapping to make sure it doesn't miss any
                   updates - then the MR may also need to access some
                   "Missing Payload" servers to get mapping it
                   somehow missed from the Replicator system it
                   relies upon.

                   Any such Replicator system will be a small, partly
                   of fully meshed, system run by a handful of ISPs
                   to more efficiently and robustly fan out a small
                   number of mapping update feeds from nearby
                   DITR-Sites to a larger number of their own MRs.


The description below involves various stages of deployment - since
the whole DRTM system is the set of possible arrangements encompassed
by these stages.  Most of these stages are part of Plan-C.  Plan-D is
similar to Plan-C, but has a new intermediate stage where ISPs and
EUNs don't have QSDs, but run MRs which query multiple (typically)
nearby DITR-site-QSDs instead.  I think this Plan D will be
sufficient for all ISPs, but the Plan-C arrangement (with
full-database QSDs in ISPs getting real-time feeds of mapping from
nearby DITR-Sites) is an option.

Using caching-only MRs (Plan D) means quite a few query packets and
mapping replies going back and forth, but it saves on a number of
things which are tricky with running a full-database query server
(Plan C) at the ISP or EUN:

   1 - There is no constant incoming stream of mapping updates.
       Instead, the traffic requirements for the MR depend on the
       number of ITRs, their traffic patterns etc.  So a small
       ISP or EUN can quite happily run a caching-only MR without
       incurring continual incoming traffic costs.

   2 - The MR has storage only for caching the map replies it gets
       from the upstream DITR-Site-QSDs, and for caching the nonces
       and recently sent mapping details for the queries it receives
       from downstream devices (ITRs or caching QSCs).  So the MR
       does not have the potentially large and critical storage
       requirements of a full database QSD - now known as a MR which
       gets feeds for some or all MABs, and so is full-database for
       these MABs.

   3 - There is no need for "Missing Payload" servers to fill in
       occasional gaps in the mapping update feeds from Replicators.


Why would we want a MR be configured to be full database for one or
more MABs?  In theory, the full mapping feed would never be less
information than the map replies to a caching MR.  I am not convinced
there would be a need to do this, but here are some possible reasons:

   1 - Perhaps the "full-database" choice is made because the
       nearby DITR-Sites for this MAB prefer to send out
       mapping feeds rather than respond to queries.  But I would
       expect the DSOCs to be keen to do whatever the ISPs or
       ITR-running EUNs preferred:  accept queries or send a mapping
       feed to the ISP or EUN.  The more ISPs and EUNs run ITRs for
       the DSOC's MABs, the less work the DSOC's DITRs need to do.

   2 - Perhaps the ISP or EUN wants to have super-fast mapping
       replies for its ITRs - and the ITRs in all EUNs which are
       using the ISP's one or more MRs.  Despite the greater trouble
       of taking real-time mapping feeds, occasionally relying on
       "Missing Payload" servers and having their MRs store full
       mapping databases for one, many or perhaps all MABs, perhaps
       some ISPs or EUNs will prefer to do this, just for local
       performance reasons.  All this would do is shave a few
       milliseconds off the response time, assuming the MABs were
       covered by DITR-Sites which are not too far away.

       But maybe the ISP or EUN is a long way from the nearest
       DITR-Site.  It would be a pretty slack MABOC which didn't
       run DITRs in major countries, but perhaps that is the case.

       Also, the whole ISP or EUN may be remote physically and
       temporally from all DITR-Sites:  It is on the Moon . . .
       or is in an ocean liner, trans-ocean passenger-jet or the
       Antarctic, and so relies on geostationary satellite links.


Stage 1 - DITRs only
--------------------

For non-mobile services, one or more MABOCs set up shop, and run
multiple DITRs at DITR-Sites around the world.  They can still make
their MAB work with a single DITR or DITRs only in a given region.

If, for instance, the MABOC's SPI-using EUNs for some reason always
use their SPI space via ISPs in Europe, then for the purely DITR
purposes, it would be fine for the MABOC to run a handful of DITRs
just in European sites.

If a Sending Host (SH) was in Adelaide (Australia) and was sending
packets to a host in a micronet which is mapped to an ETR accessible
via an ISP in Dusseldorf, then it will be fine if these packets
traverse most of the DFZ in their raw SPI-addressed state, to be
tunneled to the ETR by a DITR in Amsterdam, London or probably Zurich.

However . . . with the new Plan-D arrangements, if the MABOC wanted
to encourage ISPs and EUNs all over the world to run their own ITRs,
then it really needs to do better than just have DITR-Sites in
Europe.   Also, its a second-rate service to only have DITRs in a
given region - since at least some of its European companies might
want to use their space in branch offices in Asia, North and South
America etc.

If the sending host in Adelaide was in an EUN or ISP with ITRs, then
it would be best if the caching MR those ITRs depend on can send
queries to a DITR-Site-QSD a lot closer than Europe.

So this Plan-D arrangement of Ivip doesn't absolutely ensure that the
authoritative query server (at the closest DITR-Site which serves
this MAB) is "nearby".  If the MABOC company is providing a good
service, it will ensure it has DITRs widely scattered around the Net
and the nearest DITR-site-QSD will be "nearby" or "near enough to
generally provide a fast response with little chance of the query or
response packets being lost".

If a MR in Dallas Fort-Worth finds that the closest DITR-Site is in
London, then its not disastrous.  There's a 104ms RTT to London via
Houston (I think "IAH"), LA and Washington DC, and as long as the
query or response packet isn't dropped, this shouldn't cause much
complaint about slow starts to communications.  But it is not ideal,
and it would be much better if the nearest DITR-Site-QSD was in San
Jose, which should be a RTT of 40msec or probably much less (though I
just got a traceroute from DFW to San Jose via Amsterdam and London
with a RTT of 172ms!).

But for Stage 1, DITRs only, as long as the DITR-Sites are reasonably
close to wherever the SPI space is being used, and as long as they
can handle the traffic loads, then the only other things which are
needed are:

  1 - The DITRs need to have a tunneling and PMTUD protocol which
      is compatible with the ETR functionality of whatever the
      SPI-using EUNs (SPI-leasing customers of this MABOC) are using
      on their PA addresses.

      For now, I am assuming that the ETR functionality can be
      provided by the MABOC for free - such as being downloaded from
      somewhere and run on a COTS server of the SPI-EUN - or,
      ideally, be implemented on a router which the SPI-using EUN
      already owns.

  2 - The ISPs these EUNs connect with must allow them to emit
      packets using these SPI addresses as source addresses.


In this simple arrangement multiple EUNs EUN-0000 to EUN-0999 are
customers of this MABOC-X and are using micronets in one or more of
MABOC-X's MABs.  Maybe another set of EUNs EUN-1xxx are leasing space
from another MABOC-Y.  Assuming any EUN-0xxx is only using SPI space
from MABOC-X, then their ETR functions only have to be compatible
with the DITRs run by MABOC-X.  So far, there's no absolute need for
standardization.

Ideally there would be RFC standards for ITRs and ETRs and all the
DITRs in the world would support this one standard.  Then EUNs could
lease SPI space from multiple MABOC companies and know that the one
ETR function could handle packets tunneled by the ITRs run by their
two or more different MABOCs.

The real need for standardization of ITR and ETR functions comes when
parties other than MABOCs are running ITRs, and/or if parties other
than MABOC customers are running ETRs.  In the latter case, perhaps
an ISP runs an ETR which connects to the networks of multiple
SPI-using EUNs.  They don't want to be mucking around with different
ETRs for customers using different MABOCs, and therefore relying on
different sets of technically different DITRs.



Stage 2 - Add ITRs in ISPs and EUNs, with purely caching MRs
------------------------------------------------------------

The MABOCs charge their SPI-leasing customers for the use these
customers make of their DITRs - or rather, the use of these DITRs by
packets sent to the SPI addresses of each of their SPI-leasing
customers.  The MABOCs will also charge their customers for each
mapping change - probably a few cents or similar.


     (There is an unresolved question about if an EUN very
      frequently changes the mapping of its micronets, and this
      results in very frequent mapping updates being sent to MRs
      and ITRs in ISPs whose ITRs are tunneling packets to this
      micronet.  The ISP may be unhappy about this high level of
      updates giving their MR and ITRs a workout.  Should the
      MABOC, which is charging money for these updates, use some
      of that revenue keep ISPs happy to receive and act on these
      frequent mapping updates?)

The MABOCs would be happier if some, most or ideally all of the
tunneling of the traffic addressed to their SPI-leasing customer EUNs
was done by someone else's ITRs: the ITRs of the EUNs or ISPs of the
sending hosts (SHs).  To this end, *perhaps* the MABOCs would want to
pay ISPs and large EUNs to run ITRs covering their MABs.

As the use of SPI space becomes more widespread, the ISPs themselves
would want to have their own ITRs.  As previously noted, if an ISP
has one or more customers with SPI space (either with their own ETRs,
or using an ISP-supplied ETR) and there are other customers of this
ISP sending packets to these SPI addresses, then the ISP would prefer
to have its own ITR to tunnel these directly, rather than let the
packets go out the upstream link, to a DITR, and return in
encapsulated form via that link.  If the ISP had its own ITR, at
least to cover the MABs of these SPI-using customers, it could reduce
the traffic on its expensive upstream links - and provide faster
packet delivery times.

For this discussion, I will assume that an ISP installing an ITR will
make that ITR advertise, in its internal routing system, all the MABs
of the complete Ivip system.  This is not necessarily the case, but
it makes the discussion less complex to assume this.

So the ISP wishes to run one or more ITRs which cover all MABs.

Also, individual EUNs using this ISP may wish to run their own ITRs
so their outgoing packets addressed to SPI addresses will definitely
takes the shortest path to the ETR, rather than going by some
potentially longer path to the "nearest" DITR.  (After seeing a Dalls
Fort-Worth to San Jose traceroute go via Amsterdam and London, I am
keen to put the word _nearest_ in inverted commas if I mean "nearest
in terms of the current state of the DFZ"!)

This is for EUNs on conventional PI space, PA space or SPI space.
The question of an EUN having its own ITRs, or wanting its ISP to
have ITRs, is independent of whether the EUN is using conventional PI
or PA space, or is using SPI space via an ETR.  In Ivip, ITRs can be
on SPI addresses.  They can also be implemented in sending hosts on
any global unicast address (PI, PA or SPI).  At present, I don't have
arrangements for ITRs to be behind NAT, but it could be done with a
different protocol between the ITR and the upstream caching query
server (QSC) or MR it queries.

In all cases, these ITRs need a "local" MR to send their queries to.
 I don't plan for ITRs to directly query the DITR-Site-QSDs.  It
would probably be technically possible, since the ITR -> QSC/MR query
protocol would be no different from the QSC -> QSC, QSC -> MR or MR
-> DITR-Site-QSD protocol.

So in all the above circumstances, to accommodate one to hundreds or
thousands of ITRs in an ISP's network (including in the EUN customers
of this ISP) then the ISP should install two or more MRs.  (I will
refer to ISPs doing this, but the same principles apply to any EUN
which wants to run a MR itself.)

These ITRs need to be configured - or ideally automatically discover
- the two or more MRs of this ISP.  I haven't worked on how to do
this, but I am sure it will be possible.

Maybe the ITRs query the MRs directly.  Maybe they query a QSC first,
which handles a bunch of ITRs and which queries either the MR
directly, or the MR via one or more other QSCs.   One way or another,
each ITR needs at least two upstream QSCs or MRs to query.  It would
typically send a query to one, and if nothing came back within some
time like 100ms, it would send a query for the same address (with a
different nonce) to the other.

Then the task is to have these two MRs know at least two (ideally
nearby) DITR-Site QSDs to query for each of the MABs in the entire
Ivip system.

This can be done by a new DNS-based system I described below.  There
could be other, better, ways - but this will do for now.

This Stage 2 is the main difference between Plan-D and Plan-C.  In
Plan-C, the ITRs always queried, directly or indirectly, one or
ideally two or more "full database for all MABs" QSDs in the ISP.
With Plan-D, they query a caching-only MR, which (if it has no
mapping already cached) queries multiple "nearby" DITR-Site QSDs,
depending on which MAB the query SPI address matches.

This is a highly scalable arrangement.

The MABOCs directly or indirectly push their own mapping, for their
own MABs, in real-time, highly reliably, to all their DITR-Sites.
They need to do this to have full-database query servers in the same
rack (or even the same server) as their DITRs.  Theoretically, DITRs
could rely on a distant full database query server, but this would be
pretty sloppy - and the the DITR is surely going to be getting a lot
of packets, so it makes sense for the MABOC to push its full mapping
for each MAB to a query server at each DITR-Site which is
full-database for all the MABs covered by that DITR-Site.

So it is not much extra work to use this fresh, reliable, feed of
real-time mapping to drive a publicly accessible DITR-Site QSD.

For the MABOC or whoever runs the DITR-Site, it will be a lot less
effort easier answering queries and so allow other ITRs to tunnel a
bunch of packets, than for this DITR-Site's DITRs to tunnel the same
packets.

The ITRs, QSCs and MRs in the ISP and its EUNs do not store the
entire mapping database, or the full mapping database of any of the
MABs.  The MRs can boot up very quickly, as described below - they
only need to discover the current set of MABs and the DITR-Site-QSDs
they will query for each MAB.  (If the MR was full-database for one
or more MABs, it would need to download snapshots - which could be
quite bulky if there were billions of micronets, as there will be
with widely deployed TTR Mobility.)

The DITR-Site can scale well by spreading the load of traffic for
multiple MABs (including potentially every MAB in the whole Ivip
system, if for some reason one DITR-Site was working for all the
MABOCs) over multiple separate servers, each of which advertises to
the DFZ a subset of these MABs.  A single MAB could, in principle, be
split between two DITRs if necessary, but each advertising half, or a
quarter of it.  The DITRs would presumably be either acting as DFZ
routers, advertising MABs and emitting packets back over the same
link, or perhaps another link, to be forwarded by other DFZ routers -
or they could be behind a single DFZ router.  In the latter case, if
four DITRs advertised a quarter of a MAB, then their common router
should aggregate these into a single shorter prefix of the original
MAB for its DFZ neighbours.

The overall load of traffic can be shared by creating more DITR-Sites
in the areas where they are most needed.  Larger DITR-Sites - such as
those which are not operated by a single MABOC, but which serve the
MABs of multiple MABOCs - including perhaps every MAB in the Ivip
system - would also offer scaling benefits by sharing the various
peaks in traffic for particular MABs in the system.

It is possible that the ITRs of a given ISP and its dependent EUNs
might not advertise, into their local routing systems, all the MABs -
but in this discussion, I assume that they do.  Then, the MRs they
use need to know how to send queries to (ideally) nearby
DITR-Site-QSDs for every MAB in the system.  More on this below.


Stage 3 (optional) - ISPs/EUNs have non-caching MRs
---------------------------------------------------

This stage is what I wrote about in Plan-C a few weeks ago.  It is
not required in Plan-D, because the Stage 2 above - with ISPs having
caching-only MRs - looks like it would scale well enough for the
entire Ivip system to work well with, even if there was no optional
Stage 3.

With Stage 2 looking so powerful and scalable, with no need for MRs
to get a feed of mapping updates for any of the MABs - there may
still be be some situations where it is desirable to have the MR get
a full feed for some MABs.  Maybe there would be a scenario where it
would make sense for all MABs - I can't rule it out, but I can't
think of one either, other than as noted above with networks being on
passenger jets, ocean liners, Antarctica or the Moon.

If a MR receives full real-time mapping updates for all MABs, then
the MRs which do this are indeed "full database QSDs with a fully
real-time synchronised copy of the entire Ivip mapping database" -
the QSDs which I assumed were needed in all previous plans I have
made for Ivip.   I am not saying this is impossible, or undesirable
to have MRs with some or all MABs handled from its own real-time
updated database.  What is different about this "Plan-D" DTRM is that
this is optional - it is entirely optional for a MR to receive
mapping feeds and to be full database for any MAB.  The Stage 2
arrangements just described - caching-only MRs - look like they will
scale very well in every way I can think of.

In Stage 3, each MR could use three methods to get its mapping.  Of
the three methods mentioned below, it could use X alone, X and Y, Y
alone, or Z alone.  (I am using X, Y and Z to avoid confusion with
Plans A to D or Stages 1 to 3.  Sorry this is complex - but I am are
trying to anticipate a variety of usage situations.)

X and Y are as described in draft-whittle-ivip-fpr-00 (Plan-B,
2010-01-18) and msg05975 (Plan-C - 2010-02-06).  Z is newly described
here.

  A - The MR (there described as a QSD) could get direct feeds of
      "Replicator format" mapping update packets directly from
      servers which generate this at various (ideally) nearby
      DITR-Sites.  It will need to get two feeds for each MAB, so
      this will involve a bunch of feeds for a set MABs from one
      DITR-Site, and a similar bunch of feeds for the same MABs from
      another DITR-Site.  This is assuming that the local DITR-Sites
      are in similar sets, each set serving the MABs of one or more
      MABOCs.  Then there will need to be pairs of feeds from other
      DITR-Sites, a pair for each other set of MABs.

  B - As per Plan-B and Plan-C, the MR could get feeds via a system
      of Replicators which form a fully or partially meshed flooding
      system, accepting feeds from multiple DITR-Sites, and fanning
      the sum of the mapping updates they contain.  This should be
      the full set of updates for all MABs, unless all packets with
      a particular payload somehow don't make it to any of the
      Replicators.

      In either X or Y, the MRs will occasionally need to query
      one or more "Lost Payload" servers to get mapping update
      payloads they somehow missed.  With larger sets of missing
      payloads, the MR would need to re-sync its database for one
      or more MABs, which involves downloading a snapshot file
      and bringing it up to date, as described in
      draft-whittle-ivip-fpr-00.

  Z - The MR sets up some kind of secure, two-way, link to one
      or probably better two (ideally) nearby DITR-Site-QSDs, for
      a given MAB or set of MABs.  Assuming that the entire
      set of MABs is covered by five DITR-Sites, then this means
      the MR will set up ten such sessions - using the DITR-Site-QSDs
      at two sites, for redundant supply of mapping changes for a
      given subset of the MABs.

      Each such link should enable the DITR-Site-QSD to quickly
      and reliably push all mapping changes to the TR, and for the
      TR to be sent any changes again, if it somehow didn't get
      some of them.  I guess TLS-protected SCPT might be a good
      protocol for this - RFC 3436.  TCP would be OK, but it would
      be blocked for a moment by the loss of a single packet.

      Theoretically, the MR needs a single two-way link from DITR-
      Site-QSD which handles a set of MABs, but it should have two
      such links, with two such DITR-Site-QSDs - for Justin
      (Just In Case).

Also, it would be possible for a MR to use the Z arrangement to pass
on the mapping information to other MRs.

In this way, an MR could get real-time mapping feeds for one or more
MABs and so be "full database" for these, while still sending queries
to DITR-Site-QSDs regarding other MABs.

This enables the operators to have some flexibility.  Say the MR was
in New Zealand, and there were some MABs G, H and I which for some
reason it wanted to be full-database for.  Maybe these MABs are run
by MABOCs who lease the space to SPI-using EUNs for which this ISP
wants to respond very quickly to for each new communication session.
There may be some other MABs J, K and L for which the ISP doesn't
mind so much if the session establishment takes a few tens of
milliseconds longer, by relying on some reasonably "nearby"
DITR-Site-QSDs.  Maybe some MABs are used primarily or solely for
SPI-using EUNs who are almost always in a distant country, and for
which this ISP's customers hardly ever send packets.  There's no
problem having these MABs as usual, on a query-basis, like J, K and L
- but the ISP can still, for whatever reason, choose to have the MR
running full mapping databases for selected MABs G, H and I.


Stage 2 needs a DNS-based system so TRs can find DITR-Site-QSDs
---------------------------------------------------------------

In Stage 1, there are no TRs, because no ISPs or EUNs run ITRs.  The
only ITRs are the DITRs which are run directly or indirectly by MABOCs.

In Stage 3 - the Plan-C approach which is entirely optional for the
new Plan-D DRTM arrangement - the real-time mapping feed arrangements
need to be manually configured, since they involve TLS SCTP sessions
with multiple DITR-Site-QSDs.  However, if Stage 3 involves only
some MABs being handled with mapping feeds, then the MR will still
need to query some DITR-Site-QSDs regarding the other MABs.

I think the way the MR chooses these DITR-Site-QSDs needs to be
automated.  This will be done by the same method described here for
Stage 2 - where the MR is purely caching, and therefore needs to know:

  1 - What all the MABs are - and this will change over time as
      more space is converted into "edge" SPI space to be managed
      by the Ivip system.

  2 - For each MAB, two or so of the closest, currently reachable
      DITR-Site-QSDs which will reliably respond to map requests.

I think that in normal operation, a MR should send a single query to
one of the close DITR-Site-QSDs - because normally (99% of the time
or more), it will get a response back from this DITR-Site-QSD in 50ms
to 70ms or so.  I assume that such times are not a significant
problem for delaying some initial packets at the ITR while mapping is
retrieved.

If no response comes back in 100ms or so, the MR should send another
query, perhaps to the other close DITR-Site-QSD for this MAB.

The purpose of this DNS-based lookup system is to enable each TR to
automatically and securely determine all the parts of the global
unicast address range which are covered by MABs, and then to find for
each MAB, the complete list of DITR-Site-QSDs which will answer
mapping queries for this MAB.

There may be a better way of doing this than the DNS-based system I
suggest here.  For brevity, I outline its capabilities, and assume
that there is a reasonably efficient way of implementing something
like this.  If you think it is impractical, or have a better
suggestion, please let me know.  This part of the system can surely
be done one way or another.

Here's the rough plan, for IPv4.

There is a special DNS domain such as drtm4.arpa.  It has three
levels of subdomains, representing 224 * 256 * 256 ~=14.7 million
/24s - probably not counting 127.0.0.0 /8 and 10.0.0.0 /8.

The authoritative servers for this system could initially be one
suitably programmed pair of servers, but later, as the system became
widely used, this could be split up so there are more than two sets
of authoritative servers for each domain and sub-domain.  Generally,
there won't need to be huge numbers of servers, and I assume a
suitably programmed server could implement the whole thing, or part
of it, without needing 14.7 million separate zone files.

For typographic niceness, I will use 3 digit integers for numbers in
these domain names.

The domains are:

    000.000.000.drtm4.arpa.  }  2^16 /24s of 0.0.0.0/8.
    001.000.000.drtm4.arpa.  }
    002.000.000.drtm4.arpa.  }
    ...                      }
    255.255.000.drtm4.arpa.  }

    000.000.001.drtm4.arpa.  }  2^16 /24s of 1.0.0.0/8.
    001.000.001.drtm4.arpa.  }
    ...                      }
    255.255.001.drtm4.arpa.  }

    etc. etc.

    000.000.203.drtm4.arpa.  }  2^16 /24s of 203.0.0.0/8.
    001.000.203.drtm4.arpa.  }
    ...                      }
    255.255.203.drtm4.arpa.  }

    etc.

    000.000.223.drtm4.arpa.  }  2^16 /24s of 223.0.0.0/8.
    ...                      }
    255.255.223.drtm4.arpa.  }


A query for any of these DNS domains will return some information.
Exactly how to do this with DNS, I have not yet considered.

Each of these /24 is either in a MAB - is "edge" SPI space under Ivip
- or is not so, and is therefore regarded as "core" space (not
counting 127.0.0.0/8 and RFC 1918 addresses).


If the /24 is in a MAB, the reply conveys to the MR:

   1 - Something to signify this /24 is in a MAB.

   2 - The details of the MAB:

         a - Its base, such as "203.34.0.0"

         b - The length of the prefix, between 8 and 24 - such as
             "16".

   3 - Information which allows the MR to find either all the
       DITR-Site-QSDs for this MAB, or at least a reasonable number
       of them in the MR's vicinity.  This could be via one of three
       or so methods:

         a - A list of IP addresses.

         b - A list of FQDNs, each of which the MR can look up and
             each of which may return multiple IP addresses.

         c - Maybe some special arrangement of b, by which the MR
             can find DITR-Site-QSDs in one or more areas the MR
             chooses to consider.

   4 - A caching time.

   5 - Maybe something to identify the MABOC which runs this MAB.


If the /24 is not in a MAB (it is in a MAB-free zone), then the reply
contains:

   1 - Something to signify this /24 is not in a MAB.

   2 - The lowest /24 in this MAB-free zone, such as "214.0.0".

   3 - The highest /24 in this MAB-free zone, such as "216.157.128".

   4 - A caching time.


MABs and non-MAB zones can be scattered through the space in any way,
with the following rules.

  Every /24 is either in a MAB or a non-MAB zone.

  Multiple MABs or multiple non-MAB zones can appear in any sequence.

  MABs are on binary boundary prefixes, since they must be easily
  advertised in the DFZ.

  MABs never cross /8 boundaries, and are always binary aligned:

     99.0.0.0/15 is OK.
     99.1.0.0/15 is not allowed.

  MABs are not hierarchical or overlapping.

  Non-MAB zones can have arbitrary lengths and can start and end on
  any /24 - they are not defined in terms of being prefixes.


The control of what this part of the DNS returns needs to be very
tightly controlled - because if an attacker could change some of it,
they could divert traffic to their own ITRs and wreak havoc.  MRs
must use DNSSEC when querying this part of the DNS.

When a MR boots, it can now use this DNS system to initialise itself.
   It can start by looking up 000.000.000.drtm4.arpa.  If it in a
MAB-free zone, then the returned information will tell the MR the /24
at end of that zone.  So the MR looks up the next /24.  This may be
another MAB-free zone, or may be a MAB.  This process can continue
and the MR will walk through the IPv4 global unicast address space,
jumping from the start to the end of each MAB or MAB-free zone.   If
this takes too long, then the MR can start in several places at once
and run the stepping processes in parallel until the whole 0.0.0.0 to
223.255.255.0 range has been covered.

Soon after introduction, Ivip managed SPI space might involve a
handful of MABs scattered in various parts of the IPv4 global unicast
address space.  Later, with wide adoption, perhaps as much as half of
the space may be covered by MABs.  (There will always be plenty of
IPv4 space devoted to single IPv4 addresses for millions of
residential and SOHO DSL, fibre etc. customers - which will be PA and
doesn't need to be on SPI space.)

MABs should not be too small, since we want each MAB, on average, to
cover many micronets.  Still, even if a MAB is a /24, and provides
only 10 micronets, it is still doing a good job of scalable routing,
since without Ivip, the EUNs which use these 10 micronets would have
needed 10 /24 prefixes of PI space instead.

MABs probably shouldn't be too big.  I don't have a figure in mind,
but if a big MAB, such as a /16 or /12, involves very large amounts
of traffic, then it could make DITR workloads harder to handle.  In
general, we want a DITR to advertise a MAB or not advertise it.
Splitting MABs to load share the traffic between multiple DITRs
creates a risk that the multiple longer prefixes will not be
aggregated and will be a further burden on the DFZ control plane.

Also, for full-database QSDs (those serving DITRs and any MRs which
get the full mapping feed for one or more MABs) a large MAB with more
than a few tens of thousands of micronets will require a large
snapshot file to be downloaded at boot time, or if the mapping
database for this MAB somehow gets out of synch due to loss of
updates which can't be recovered from using a "Missing Payload"
server.  But this paragraph only applies to the option of MRs having
mapping feeds - which is not required any more with DRTM.

The MR now has a list of all the MABs.

The caching time of these DNS replies will cause the MR to
periodically check with the DNS to ensure it takes notice of any
changes to the "edge" space - the sum total of all the MABs.

For each MAB, the MR will obtain a list of IP addresses - either of
all the DITR-Site-QSDs which are authoritative for this MAB, or
perhaps a subset of these, chosen by region, so the MR still has
quite a few to choose from.

Initially, the MR can use any of these addresses, but as it has time
after boot up, it should determine which are the two or three
"closest" ones to use for each MAB.  It could do that with ping, but
perhaps a slightly fancier protocol could be used:  the MR could send
a specially formatted map request to the DITR-Site-QSD and firstly
verify that the reply is to the effect that this DITR-Site-QSD will
accept queries from this MR.  Secondly, it would measure the delay
time, which is a good enough measure of closeness for it to choose
two or more "nearby" DITR-Site-QSDs to use in the future for looking
up the mapping of SPI addresses matching this MAB.

As mentioned above, the DITR-Site-QSD could also have an HTTPS file
which lists the MABs it accepts queries for.

Another elaboration is that when the DITR-Site-QSD sends a map reply,
it can contain a number indicating how busy this DITR-Site-QSD is.
In this way, MRs can choose alternative DITR-Site-QSDs which are not
too busy, if the ones it is using at present return a high value for
how busy they are.

Perhaps DNS is not the best way to do this - but I am sure a secure,
automatic self-discovery and self-configuration system can be created
so a MR can learn all the MABs and determine where to send queries
concerning each MAB.

If the MR found one such DITR-Site-QSD was unreliable, it could
easily repeat the lookup, test the available DITR-Site-QSDs and
choose one or more alternatives.


An elaboration to the MR -> DITR-Site-QSD protocol is that the MR
needs to periodically check that the DITR-Site-QSD it has recently
(inside the caching time) received map replies from is still alive.
This is to handle a situation where a DITR-Site-QSD sends a map
reply, and then, say three minutes later, dies.  Then, at four
minutes after the reply, a mapping change is sent for the micronet in
the reply.  Since this DITR-Site-QSD is not working, there is no
mechanism by with the MR would find out about this mapping change.
So the MR should periodically, say once a minute, check these
recently used DITR-Site-QSDs are still alive (and have not been
rebooted since they sent the query).  If the MR finds the
DITR-Site-QSD is dead, or has been rebooted, it needs to resend the
original queries, to another DITR-Site-QSD and if the replies are any
different (indicating there was a change in the mapping which the MR
would have been notified about by original DITR-Site-QSD, had it been
working) then the MR should send appropriate mapping updates to its
queriers, so ITRs get the new mapping ASAP.


4 - With TTR Mobility
=====================

Please refer to this document for a discussion Translating Tunnel
Router Mobility:

  TTR Mobility Extensions for Core-Edge Separation Solutions to
  the Internet's Routing Scaling Problem
  Robin Whittle, Steven Russert  2008-08-25
  Minor revisions 2010-01-12
  http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf

In the above discussion I assumed that the MABOC companies are
leasing space in their MABs to SPI-using EUNs which are non-mobile.

However, exactly the same principles apply for micronets used for TTR
Mobility.  Any micronet can be used for TTR Mobility.  Generally, I
guess, a mobile IPv4 device will only need a micronet of a single
IPv4 address.  The MNs (Mobile Node's) tunneling software will
present an interface to its ordinary IPv4 stack with this micronet
address.  The applications in the MN will use this address, no matter
what actual address the MN is connected to in its one or more access
networks - such as by wired or WiFi Ethernet, 3G wireless or other
link technologies

The MN's applications can use this stable, globally portable,
micronet address wherever it is, no matter what sort of access
network it is using, and no matter how many levels of NAT these
connections are behind.  The MN can still use its micronet address
even when its connection is on an SPI address.

Unlike other attempts to use a CES architecture for Mobility, TTR
Mobility does not involve the MN being its own ETR or ITR.  The MN
tunnels to one or more (typically) nearby TTRs (Translating Tunnel
Routers) with an encrypted 2-way tunnel.  A single TTR at any one
time is the ETR to which the MN's micronet is mapped.  This TTR - and
potentially other TTRs the MN also has tunnels to but which are not
currently playing the ETR role for this MN - also accepts the MN's
outgoing packets from its micronet address.  The TTR forwards these
to the rest of the Net.  If the outgoing packet is addressed to an
SPI address, the TTR will either perform the ITR function on the
packet, or forward it to a co-located ITR which will do so.

There would be multiple TTR companies, each selling TTR Mobility
services to end-users.  An IP cellphone or laptop etc. using TTR
Mobility is also an EUN, even if it only gets a single IPv4 address.
  For IPv6, Ivip's mapping system deals in units of /64, so an IPv6
TTR Mobile MN will get a /64.

Each TTR company might have its own network of TTRs - or perhaps
there could be TTR-operating companies running one or more sets of
actual TTRs, but hiring out their capacity to TTR companies who
actually sell services to users.

Here I assume each MN only gets a single micronet, but the TTR
Mobility architecture supports each MN getting one or more micronets.

If an end-user already had a micronet to use with their MN, then they
would give their TTR company the credentials to control the mapping
of that micronet.  So a company XYZ might lease one or more UAB (User
Address Blocks) from one or more MABOCs, and split these into
micronets of various sizes, mapping them however it likes - and
giving various other organisations the credentials they need to
change the mapping of any one or more of the micronets XYZ creates in
their space.   For instance, those micronets used for non-mobile
network multihoming might be controlled by a Multihoming Monitoring
Company which XYZ hires to probe reachability of its networks via
various ETRs, and to change the mapping to another ETR in another
ISP's network if the currently used one appears to be incapable of
taking packets to the destination network.

XYZ could use one of its micronets for a given MN.  It may then
contract a TTR company TAA to provide it with access to that TAA's
TTRs all over the world.  When it does this, XYZ configures its
arrangement with its MABOC, through which it can send mapping changes
for its micronets, so that XYZ still has ultimate administrative
control over this micronet, but so it can give TAA a username and
password, or whatever is required, so TAA's system can control the
mapping of this micronet.  If XYZ chooses another TTR company
instead, it would cancel those credentials via its MABOC, and get
another set to give to the new TTR company TBB.

There may be an IETF standardized way in which all TTR MNs tunnel to
their TTRs, and in which the TTRs can instruct the MNs to tunnel to
new TTRs or participate in activities by which the TTR system can
determine topologically where the MN is, for instance to determine if
it is too far from the current TTR and should try to contact another.

Alternatively, there may be no IETF standardized system for this, and
each TTR company would have suitable software for XYZ to download
into their cell-phone, laptop or whatever to perform these functions.

Generally, I anticipate a single MN will only operate with a single
TTR company.  However, it would be technically possible for the one
MN to be set up for TTR services from multiple TTR companies.  Each
such service would involve a separate micronet.

The owner of the MN - XYZ in this example - could supply their own
micronet for the TTR company to use, or it could use a micronet
provided by the TTR company.  In the second case, the TTR company is
also acting as the MABOC for this user's micronet.

TTRs will generally be located in, or topologically near (within
hundreds or a thousand km of) the access networks wherever MNs might
connect to the Net.  Since an MN could connect literally everywhere,
it follows that TTRs are ideally numerous and located, topologically,
all around the Net.

The TTRs of a given TTR company are controlled by a fancy management
system which orchestrates the MN connecting to new TTRs when needed.
 When the MN has successfully tunneled to a new TTR, then the TTR
company's control system uses its credentials to change the mapping
of the micronet from the currently mapped TTR to the new one.

Generally, people think of CES and Mobility and imagine vast numbers
of mapping changes - for instance whenever the MN gets a new access
network, or a new address in the same access networks.  Even with a
stationary 3G MN, it may get completely different IP addresses in
topologically separate access networks, simply due to RF and traffic
changes causing the MN to connect to a different base-station which
uses a different IP gateway.

But with TTR mobility, a new address just means the MN needs to make
a new tunnel to its currently used TTR.  No mapping change is
required.  Many or most MNs may go for one year to the next without a
mapping change for their micronet.  As long as they are some
distance, such as 1000km or so, from the currently used TTR, there's
probably no need to select a closer TTR and change the mapping to that.

Even when the MN suddenly finds itself on a new address, on the other
side of the world from the current TTR, it still tunnels to that TTR
and continues communications.  This has higher latency and more risk
of packet loss than using a closer TTR, but it will work.  The
management software in the TTR company will need to detect the
topological location of the MN, and select one or more TTRs which are
probably closer to it, for the MN to tunnel to.  Only when the MN has
successfully tunneled to a closer TTR to will the TTR company's
management system change the mapping.  Communication sessions using
the micronet address will continue without any disruption, since at
the time the mapping is changed, the MN has tunnels to both the old
and new TTR.

It is notable that the criteria for placing TTRs are much the same as
for placing DITRs.

So a TTR company can also be a MABOC and run a bunch of TTRs, DITRs
and so DITR-Site-QSDs at dozens or hundreds of separate sites all
over the world.  It needs some fancy software to manage the TTR
system, and if does not need ISPs to run ITRs for its MABs (this is
not really necessary, since a good global system of DITRs will
probably be fine) then it is ready for business.

A company could be a MABOC for non-mobile networks getting
portability, multihoming and inbound TE - without any cooperation
with any other organisation other than its customers using ISPs which
permit the forwarding of packets whose source addresses are SPI
addresses.

If the MABOC company (not a TTR company which was also a MABOC)
wanted ISPs to run ITRs to cover its MABs, then it would need to
operate its DITR-Site-QSDs according to industry (presumably IETF)
standards.  If it customers to share ETRs provided by ISPs - which
must operate with ITRs of other MABOCs - then its DITRs would need to
comply with the industry standard.

However, these constraints do not necessarily apply to a TTR company.

If the TTR company with its own one or more MABs has a bunch of TTR
sites around the world - or wherever it wants to operate - then it
doesn't necessarily need any ISPs to run ITRs covering its MABs.
Probably the TTR sites themselves are perfectly good places to place
DITRs serving these MABs.  In this case, the TTR company doesn't need
to make its ITRs work with anyone else's ETRs, since the only ETRs it
uses are implemented in its own TTRs.  Also, the TTR company's mobile
customers don't need any special arrangement with their access
networks to send out packets with SPI source addresses - because
these packets go through the encrypted tunnel to the TTR.

So a company with the required resources - a bunch of servers at
suitable sites around the Net, each with direct access to the DFZ
routing system as routers which can advertise the MABs, and forward
packets to other DFZ routers - could go into business on its own,
without any need to accept mapping queries from ISP MRs, without any
need to send mapping feeds to ISP MRs or Replicators and without any
need for its ITRs to tunnel packets to ETRs of other companies.

It needs suitable software for the various kinds of MN it plans to
support, and it needs some fancy management software in its global
system to run the whole system, orchestrate the software in each MN,
and to do billing etc.

But a TTR company such as this, with its own prefixes to use as MABs
for its customers, doesn't need to interact with any other
organisation - and could go into business initially without waiting
for any new IETF standards.   Multiple such independent TTR companies
could operate, with differing technical standards.  Each would
advertise just one or a few MABs in the DFZ, but be providing IPv4
mobility for typically up to 256 end-user MNs per /24 of space in
these MABs.

This is another example of a CES architecture being deployed without
relying on ISPs to take the lead.

Neither the TTR companies, the MABOCs or their customers might care
at all about the number of prefixes in the DFZ - but they will be
building infrastructure and selling services which contribute to the
solution of the routing scaling problem, while also providing a new
form of global mobility - which provides generally good path lengths
and works with all protocols, without the need for upgrades to either
IPv4 or IPv6 correspondent hosts.


5 - DTRM for LISP
=================

I don't assume the LISP team are interested in DRTM or anything like
it, but I believe it would be applicable to LISP and would be
superior to ALT or any other LISP mapping system I know of.

There's nothing in DTRM which relies upon aspects of Ivip which
differ from LISP, such as Ivip's of a single ETR address for the
mapping, or Ivip's approaches to tunneling.

DTRM concerns Stages 1 and 2 above, with something like the DNS-based
system I describe in order that ITRs or Map Resolvers can discover
the current MABs and find two or so (typically) nearby DITR-Site-QSDs
for each MAB.  Stage 3 could be done as well, but this goes beyond
DTRM and involves pushing full feeds of real-time mapping information
beyond the MABOC, with the potential for MRs to store mapping for
some or all MABs.  Except for NERD, in which the ITRs store all
mapping information, the LISP team has very much avoided such
concentrations of mapping, so I won't explore Stage 3 for LISP.

TTR Mobility would be possible with LISP-ALT or with LISP with DTRM.
 I will not pursue the LISP-MN approach to mobility, since it
involves MNs being their own ETR, which can't work behind NAT, and
since it requires extremely rapid changes in ITR tunneling behaviour
every time the MN gets a new access network address.  TTR Mobility
doesn't require frequent or rapid mapping changes, but the sooner the
mapping can be changed, once the MN has tunneled to a new TTR, the
sooner the MN can drop the tunnel to its previous TTR.   TTR Mobility
also requires the TTR company be in charge of mapping - rather than
the MN itself, as is assumed by LISP-MN.

Except for LISP-MN, I think LISP tends to assume that ETRs are the
authoritative source of mapping information.  This assumption doesn't
fit DRTM exactly, since the MABOC actually runs the servers which
authoritatively reply to mapping queries.

Assuming ETRs were to continue as the authoritative source of
mapping, LISP would need to be adapted so the ETRs have a method of
securely communicating mapping information to the MABOC which runs
the MAB of the one or more EID prefixes the ETR is handling.  The
MABOC would then be responsible for:

  1 - Deciding which ETRs were in fact authoritative for a given
      EID prefix at any time.  This may be tricky since an end-user
      network could choose a different ISP at any time, and so need
      to use an ETR on a different address to what it previously
      used.  Maybe it uses its own ETRs, in which case the ETR
      may have stored credentials the MABOC's mapping system will
      recognise whatever IP address it communicates from.  If not,
      then the ETR is run by the new ISP and somehow needs to be
      given credentials which the MABOC's mapping system will
      recognise.  This would involve security problems in trusting
      ISP's ETRs to use those credentials only as the EUN wished -
      including ETRs of ISPs which the EUN previously used but no
      longer uses.

  2 - Deciding what the final mapping would be for each EID prefix
      its MABs cover, considering that perhaps it hasn't heard from
      any ETRs for a while, or perhaps the commands it receives from
      two or more ETRs regarding a single EID are contradictory.

  3 - Reliably, securely and rapidly transmitting the new mapping to
      all its DITR-Site-QSDs - for use by its DITRs and for MRs
      in ISP and other EUN networks.


Before anyone could really think of LISP using something like DRTM,
they would have to clarify the arrangements by which EID-using EUNs
get their EID address space.  As far as I know, there isn't a clear
statement of how this would work for LISP.

If, for instance, the plan was for small EUNs to obtain a /24 from an
RIR and somehow run this as EID space on their own, then this EUN
will either need to become a MABOC itself, or contract a MABOC to do
the mapping and DITR work for this prefix.  DRTM in principle could
work with very large numbers of MABOCs, each with their own
DITR-Sites - but this is unlikely to be economic or practical.

I think it is unlikely that more than a dozen or so fully-fledged,
independent, systems of DITR-Sites would be built to give good
coverage of the whole Net - say with 20 or more DITR-Sites each.
Economies of scale and basic capital costs would tend to favour a
smaller number of such systems, being operated either by one MABOC
and serving the MABs of other MABOCs, and/or being run be companies
which are not MABOCs, but which contract to multiple MABOCs and so
use their system of DITR-Sites to support hundreds or thousands of
individual MABs.

Ivip has always assumed real-time mapping distribution to all ITRs
which need it - so the mapping of a micronet is a single ETR address
and the source of mapping changes is external to the Ivip system.

LISP has always assumed that real-time distribution of mapping to all
ITRs which need it is impossible.  No-one has ever indicated it was
undesirable - just that they had practical concerns about scaling,
"real-time synchronization" of a single mapping database etc. which
they considered insoluble.

I believe DRTM overcomes such objections.  DRTM is not just a mapping
distribution system which delivers mapping to ITRs to be cached, it
is also a scalable and secure approach to getting the mapping changes
to all DITR-Site-QSDs in real-time - and using the nonces in the
query packet, securely sending mapping updates to the querier.  Ivip
has always had this real-time ability to securely propagate real-time
mapping changes to ITRs - but this would be the first system which
LISP had which could do this.

LISP designs - and the other CES architectures of APT, TRRP, TIDR and
IRON-RANGER - are predicated on the assumption that ITRs can't be
told about mapping changes in real-time.  Therefore, the mapping is
always a set of instructions for ITRs to figure out which ETR to
which packets for a given EID prefix will be tunneled - rather than
be told in real-time exactly which ETR to tunnel the packets to.

DRTM makes this more complex mapping information unnecessary.  In
most respects there is bound to be a better part of the system to
determine the tunneling behaviour of ITRs than the ITRs themselves.
I am not convinced it is the ETRs - which is why in Ivip the mapping
is supplied by unspecified mechanisms chosen and implemented by the
SPI-using EUNs.  Ivip EUNs are free to program their ETRs or any
other part of their network to control the mapping.  Its just that I
think for multihoming service restoration, or for TTR mobility, some
other organisation (a multihoming monitoring company, or the TTR
company) is a better place to detect reachability and to make mapping
decisions.

For any multihomed EID prefix, LISP's more complex mapping involves
the addresses of the two or more ETRs and then two sets of additional
information.  One controls what priority to give between the ETRs if
all ETRs seem to be working, or if one or more fails.  The other is
formally separate and concerns load sharing.  However, these items of
information perform similar functions in that they tell ITRs how to
choose between two or more ETRs which the ITR considers can handle
traffic at present.

Ivip ITRs have no such information.  The mapping information is a
single ETR address.  This means that Ivip has two potential
disadvantages compared to LISP:

  1 - Ivip ITRs on their own can't load-share traffic sent to a
      single micronet between two or more ETRs.

      Ivip's method of coping with this is for the incoming traffic
      to be made to go to at least two IP addresses in two separate
      micronets.  Then, each micronet can be mapped to one of the
      ETRs to achieve load sharing and/or steering different types of
      traffic, such as VoIP vs. SMTP, over different ISPs and links.

      Although Ivip is assumed to involve EUNs being charged for each
      mapping update - probably a few cents - this arrangement may be
      superior to the LISP approach, since it can be dynamically
      adjusted one minute to the next, to balance out varying traffic
      volumes over particular links.  This may provide a
      significantly higher utilization of available bandwidth for a
      given risk of congestion than the LISP approach - making the
      fee per mapping change a good deal.

  2 - The LISP approach of individual ITRs making their own choices
      about tunneling for multihoming service restoration could
      sometimes produce superior connectivity to Ivip's approach
      where all the ITRs are tunneling to the same ETR.

      For instance, if there was an outage which caused ITR-1 to be
      able to reach ETR-A and not ETR-B, and ITR-2 to be able to
      reach ETR-B and not ETR-A, then in principle, if the LISP ITRs
      could correctly detect this and make appropriate decisions,
      both ITRs could successfully deliver packets when with Ivip,
      only ETR-A or ETR-B could successfully deliver packets.  I tend
      to think such states in routing systems between ITRs and ETRs
      would be transitory, and that it would be a rare occasion when
      the LISP approach produced better results than the Ivip
      approach.  But nonetheless, the LISP system of more complex
      mapping and ITRs could in principle produce superior results.

My design choice with Ivip is to keep the mapping and the ITRs
simple, use the alternative arrangement for inbound TE, and assume
that little of importance is lost with Ivip's "one ETR or the other"
approach to multihoming service restoration.

Maybe, if LISP adopted something like DRTM, the designers might make
the same decision - to use a single ETR address and so to enable
their ITRs to be simpler.  Then, LISP would need either some new
probing and reachability decision-making mechanisms - or to follow
Ivip's lead and leave it to the EUNs to do this and make their own
decisions about mapping changes.

Alternatively, perhaps the LISP designers would retain the current
complex mapping and perhaps add more information, such as to tell
ITRs exactly how to probe reachability of the destination network
through the various ETRs, how to make decisions based on the results,
and how this can be done scalably and securely - especially if tens
of thousands of ITRs are probing reachability to one or more networks
behind the one ETR.

Most of DRTM concerns how the MABOCs make the effort to run
DITR-Sites all around the Net, pushing mapping in real-time to these
sites - and then making this available via DITR-Site-QSDs to ISPs in
the area, to encourage the ISPs to run their own ITRs.  This does not
absolutely require the ISP's MRs or their ITRs to receive mapping
updates from the DITR-Site-QSDs.  So LISP could continue to operate
on the principle of ITRs and their MRs being given cacheable mapping
information, but not being notified if the mapping changed during the
caching time.  This would make LISP MRs and ITRs somewhat simpler in
this respect than Ivip MRs and ITRs - since they don't need to accept
mapping updates.  However, I think it would be a lost opportunity to
have much better control of ITR behaviour in real-time, and so to
reduce the mapping to a single ETR address - so simplifying ITRs and
so moving the probing and decision-making functions somewhere else.
Being able to more flexibly and centrally perform the probing and
decision-making functions would enable better EUN control over their
incoming traffic, and is the only obvious way of solving the scaling
problems inherent in lots of ITRs doing reachabilty testing.

With LISP's current approach, there is a scaling problem with a
single ITR having to determine reachability of many ETRs, and there
is a second scaling problem with the one ETR having to participate in
some way in reachability testing by large numbers of ITRs.

Assuming LISP made full use of the real-time nature of DTRM mapping,
including having mapping updates sent when required to MRs and then
propagated to ITRs, then LISP could use external, or new internal,
mechanisms to test reachability of EUNs via multiple ETRs and to make
decisions based on more parameters as set by the EUNs than would be
practical to implement in all ITRs.  So LISP would come to resemble
Ivip more - and perhaps to modularly separate out the control of
mapping from the CES architecture itself, as is the case with Ivip.

With TTR mobility, it is clear that the TTR company needs to control
the mapping, since the MR itself is responding to TTR company
instructions about which TTRs to tunnel to - and the MN itself is not
as reliably connected to the Net as the TTR company's management
system.
[rrg] DRTM - Distributed Real Time Mapping for Iv… Robin Whittle