[rrg] IRON-RANGER, an interesting Core-Edge Separation (CES) architecture

Robin Whittle <rw@firstpr.com.au> Wed, 31 March 2010 05:26 UTC

Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BB17F3A6BE7 for <rrg@core3.amsl.com>; Tue, 30 Mar 2010 22:26:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.413
X-Spam-Level: **
X-Spam-Status: No, score=2.413 tagged_above=-999 required=5 tests=[AWL=0.578, BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mwtIfHQQ6YWo for <rrg@core3.amsl.com>; Tue, 30 Mar 2010 22:26:18 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id CDE0B3A6C12 for <rrg@irtf.org>; Tue, 30 Mar 2010 22:18:50 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 81C68175BCC; Wed, 31 Mar 2010 16:19:12 +1100 (EST)
Message-ID: <4BB2DB52.1070901@firstpr.com.au>
Date: Wed, 31 Mar 2010 16:19:14 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>, "Templin, Fred L" <Fred.L.Templin@boeing.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] IRON-RANGER, an interesting Core-Edge Separation (CES) architecture
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Mar 2010 05:26:26 -0000

*** 104 kbyte message ahead. ***

Short version:    Fred's IRON-RANGER CES architecture has been
                  significantly redesigned in the last few weeks.

                  Here I present an explanation of a subset of this
                  architecture, as I understand it, since I think
                  this subset is novel and interesting.

                  I call this subset IRON-RANGER-lite (I-R-lite) -
                  and I suggest some improvements to this which
                  may be different from what Fred has in mind.

                  I compare the scaling properties of this modified
                  subset of I-R to Ivip, which is a more complex
                  architecture.  I think Ivip scales better - because
                  I-R's overly simple architecture does not allow
                  any "cached concentration", or "aggregation" of
                  the querying demands of ITR-like functions - so
                  all these queries need to be handled directly by
                  the authoritative query servers: IR(VP) routers.


                  These improvements simplify the I-R system - by
                  having End User Networks (EUNs) tell VP companies
                  directly which IR(EID) routers their EID should
                  be mapped to.  This removes the need for IR(EID)
                  routers to register the EIDs with IR(VP) routers.

                  With a relatively minor addition to this - Map
                  Update  messages, which are similar or identical
                  to Map Reply messages - I-R would have a real-time
                  mapping system which would be as fast as Ivip.

                  Real-time mapping distribution to the ITR-like
                  devices - IR(LF/GW) routers - enables the mapping
                  to be a single IR(EID) (AKA "ETR") address.  If the
                  system is then designed to allow only a single such
                  address, then IR(LF/GW) devices can be much
                  simpler, since they don't need to test reachability
                  or choose between multiple IR(EID) routers.  EUNs
                  would then be responsible (as they are in Ivip) for
                  changing the mapping in real-time to achieve their
                  portability, multihoming, inbound TE or other
                  goals.  (This would enable I-R to support TTR
                  Mobility as efficiently as Ivip - and better than
                  LISP - since the MN could drop the tunnel to the
                  old TTR earlier than if non-real-time mapping was
                  used.  Note: TTR Mobility only requires mapping
                  changes when the MN moves more than about 1000km.)

                  This "I-R-lite" subset of the current IRON-RANGER
                  design, and its modified version with real-time
                  mapping could be used for dynamic TE on a
                  minute-by-minute basis - irrespective of whether
                  the mapping is simplified to always be a single
                  IR(EID) address.

                  Still, this souped up real-time version of I-R
                  won't scale as well as Ivip.  The reason is I-R's
                  simplicity: the ITR-like devices - IR(LF/GW) - work
                  directly with the authoritative query servers -
                  IR(VP).  In Ivip, the ITRs do not work directly
                  with the authoritative QSAs - their query burden
                  is "aggregated" by the use of QSR intermediaries.
                  Optional QSCs between the ITRs and QSRs further
                  reduce the burden on QSRs.

                  Another scaling problem of I-R is the "scattergun"
                  approach of tunneling initial traffic packets to

                  all (I assume 3 or so) IR(VP) routers.  This is
                  somewhat inefficient, but more importantly sets
                  strict limits on the number of IR(VP) routers
                  there can be for each VP.  Ivip has no such
                  inefficiencies or restrictions in the number of
                  authoritative QSA query servers.


I guess few people have been closely following the discussions
between Fred and I on his proposal, originally "RANGER", but now
known as "IRON-RANGER" (I-R).  I wouldn't want to be examined on all
we have discussed.

Fred responded to my critiques with clarifications and design changes
and the result is a Core-Edge Separation architecture which I think
interesting.

I still think Ivip is the best CES architecture (msg06219), but I
think the current I-R design, or at least my understanding of the
following subset of it, is superior to LISP-ALT or to any other LISP
approach I know of.

Here is my description of my understanding of what to me are the most
interesting aspects of Fred's design.  I will refer to as "I-R-lite".

I haven't checked this with Fred.  Please await Fred's response
before drawing any inferences about what he is planning.

  - Robin          http://www.firstpr.com.au/ip/ivip/



I-R terminology
===============

Here is a version of the roles which an IRON router can perform,
based the discussions which lead to (msg06351).  There is a minor
update to the IR(VP) description.


  IR(LF) - Forwarding traffic packets which arrive from its local
           routing system.  Advertises all I-R "edge" space in
           its local routing system.

           (All IRON routers are capable of doing this, but it is
           still a distinct role.)


  IR(GW) - Like an IR(LF) role except that the IRON router
           advertises the I-R "edge" space in the DFZ (that is: in
           the DFZ, to DFZ routers of other ASes but its own).
           (This is similar to LISP's Proxy Tunnel Routers and
           Ivip's DITRs, though a DITR normally only advertises a
           subset of the "edge" space.)


  IR(VP) - One of typically 3 or so. IRON routers which handles
           a given Virtual Prefix of I-R "edge" space.  This means
           it:

             1 - Accepts registrations of EID prefixes which fall
                 within this VP, from the IRON routers which
                 perform the IR(EID) role for these EID prefixes.

             2 - Accepts tunneled traffic packets from IRON routers
                 performing the IR(LF) and IR(GW) functions and
                 then tunnels the packet to the correct IR(EID)
                 role router for the EID prefix which covers the
                 packet's destination address.  (This may be the
                 same router, so no tunneling is required - but
                 usually it will be another IRON router.)

             3 - After 2, sends mapping for the matching EID to the
                 IR(LF) or IR(GW) role router.

            If it has no registered EID prefix for a traffic packet
            it receives, it drops the packet.

   IR(EID) - Accepts tunneled packets from IR(VP), IR(LF) and IR(GW)
            role routers and then delivers the packet to the
            destination network.  A multihomed end-user network
            will have two or more ISPs and there will be an IR(EID)
            role router for each such ISP.  (Whether this router is
            in the ISP network or in the end-user network is not
            fixed, but it is always on a conventional "RLOC" - AKA
            "core" address.)



How I-R-lite differs from the full I-R proposal
===============================================

"I-R-lite" is described here as a CES system for either IPv4 or IPv6,
where each of these protocols is handled independently by a separate
I-R-lite system.  While I-R as Fred plans it is intended to be a
single system linking IPv4 and IPv6, for simplicity and clarity,
I-R-lite involves one complete independent system for IPv4 and
another for IPv6.

(Ivip is the same - two independent systems.  However, I don't rule
out there being some kind of linkage between the two if the extra
complexity is justified by some important inter-working or transition
benefits.)

In this description I won't go into details of encapsulation or of
how to solve the resulting Path MTU problems.  I figure they can be
solved one way or another.  I am assuming something like Fred's SEAL
tunneling would be used, where the outer header's source address is
that of the Ingress Tunnel Endpoint.  This is different from what I
plan for Ivip.

The I-R-lite subset I describe below does not support (at least in
any simple, inexpensive, manner) any ISP BR filtering of packets by
their source address.

Fred's proposal also apparently involves "recursion".  I don't really
understand this and I can't see what purpose it might serve in
scalable routing, so it is not part of I-R-lite.

Fred intends I-R to support Mobility in some way - but as far as I
know, this is not along the lines of TTR Mobility (Translating Tunnel
Router), which is the only way I know of doing global mobility, for
IPv4 and IPv6.  Ivip uses TTR Mobility this, and so could LISP.
Maybe I-R-lite could use TTR Mobility, but below I ignore Mobility
and focus only on providing scalable support for non-mobile end-user
networks (EUNs) gaining portability, multihoming and inbound TE.

Fred has discussed his intention that each ETE (Egress Tunnel
Endpoint) router, such as the second in these pairs:

   IR(LF/GW) -> IR(VP)
   IR(VP)    -> IR(EID)
   IR(LF/GW) -> IR(EID)

will be able to establish whether the ITE (Ingress Tunnel Endpoint),
the first of these pairs, is authorised to handle packets from a
given prefix.  I understand this as meaning:

  ITE-A tunnels a traffic packet which has a source address of XXX.

  ETE-B receives this, and assumes it came from ITE-A.  (However, it
  may not have, since perhaps it has never received a packet from
  ITE-A, and so this is the first SEAL encapsulation with a new
  SEAL-ID which can only later be used to verify packets as having
  come from the same ITE).

  ETE-B is somehow able to ascertain which prefixes of address space
  ITE-A is authorised to handle.

  ETE-B drops the traffic packet if its source address does not
  match one of these prefixes.

I can't imagine how ETE-B could do this - much less how it could do
so in a tiny fraction of a second, because any longer would
unreasonably delay the traffic packet.  Also, if the ITE is an IR(GW)
router, it could be handling packets sent from any network in the
world - so how could the ETE decide which source addresses were and
were not acceptable for traffic packets tunneled by an IRON router
performing the IR(GW) role?

I-R currently has no business plan for how the operators of IR(GW)
routers pay for their operation.  If an IR(GW) router advertised all
VPs into the DFZ, then it is doing work for all the companies which
control VPs.  An IR(GW) could also advertise just some VPs in the
DFZ.  In both cases, there are no current plans for the operators to
charge the VP companies.  I think this won't work.  In Ivip,
companies which run DITRs will charge the MABOC (Mapped Address Block
Operating Companies) whose MABs these are, and furthermore will
provide the MABOCs with itemised traffic figures for individual
micronets, so the MABOCs can charge their end-user networks according
to the DITR traffic which was addressed to the micronets of each
end-user network.

I am not promoting my "I-R-lite" idea as a solution to the routing
scaling problem.  I am suggesting that this subset of Fred's proposal
is interesting and worthy of the attention of RRG folks.

I also discuss some things which I would do to improve I-R-lite which
I thin differ from Fred's plans.

Then I discuss how well I think this modified version would scale -
comparing it to Ivip


I-R-lite
========

I describe I-R-lite using the current terminology Fred and I
developed recently for the various roles an IRON router ("IR") could
perform - though perhaps Fred doesn't see the need for every such
term.  Any statement below about "I-R" (Fred's IRON-RANGER
architecture) also applies to my "I-R-lite" unless there is a note to
the contrary.


Traffic packets are tunneled from one IRON router to another.  I-R
uses Fred's SEAL approach to tunneling and Path MTU discovery, but in
this description I will simply assume encapsulated tunneling can be
done, without going into details.

The goals of I-R-lite include:

   Scalable support for end-user networks using an "edge" subset
   of the global unicast address space to achieve portability,
   multihoming and inbound traffic engineering (TE).

   Works the same in principle for IPv4 and IPv6.  However,
   IPv6 address allocation would probably be simpler, since
   all the "edge" space could come from fresh prefixes, while
   for IPv4, it must come from potentially very numerous
   prefixes scattered throughout the global unicast range.

   Complete decentralisation of all crucial aspects of the system.
   (I-R, as currently designed, does not achieve this because it
   relies on a single file to be read by all IRs, and for a
   centralised source of changes to this.  So I-R-lite doesn't
   achieve this either - but I suggest improvements which would
   do so.)

The non-goals include:

   Mobility - just for this discussion.  (In fact, IRON-RANGER
   and I-R-lite would support TTR Mobility.  My modified
   version of I-R-lite with real-time mapping would support it
   marginally better, as does Ivip.)

   Support for ISP BR source address filtering.

   Real-time control of tunneling.  Therefore, I-R-lite, like
   LISP (and unlike Ivip) involves mapping for a multihomed
   EUN consisting of multiple IR(EID) router addresses, with the
   ITE device having to choose which of these to use.   This
   raises many problems, and I don't suggest it is the best way
   to run a CES system, but the LISP team and Fred seem to be
   happy with it.  (However my modified version of I-R-lite
   supports real-time control of mapping, like Ivip.)



Both LISP and Ivip use the term "ITR" and "ETR" in the same way:

  ITR - Ingress Tunnel Router.  Accepts traffic packets and tunnels
        them to an ETR.

  ETR - Egress Tunnel Router.  Receives these tunneled packets and
        delivers them to the destination network.

I-R does not use these terms, but the terms ITE and ETE (Ingress /
Egress Tunnel Endpoint) means much the same thing.

LISP, Ivip and I-R are all capable of handling packets sent by hosts
in networks which lack ITRs (ITEs).  In order to do this, each
architecture has a special subset of these devices which, instead of
advertising "edge" space in an ISP or End User Network (EUN),
advertise it to neighbouring ASes - they advertise "edge" space in
the DFZ.

  Ivip:    DITR (Default ITR in the DFZ).  Previously known as
           OITRDs and before that, erroneously, as "Anycast ITRs
           in the Core/DFZ".   DITRs generally only advertise a
           subset of the "edge" space.

           Ivip "edge" space is advertised in the DFZ by these
           DITRs as separate "Mapped Address Blocks" (MABs),
           each of which typically covers the space used by many
           EUNs.  An IPv4 /16 MAB would typically provide "edge"
           space for thousands or tens of thousands of EUNs.

  LISP:    PTR - Proxy Tunnel Router.  As far as I know, each PTR
           advertises all the "edge" space.  There is no specific
           LISP term equivalent to Ivip's "MAB", but such a term
           is needed.  Dino recently used "coarse prefix" to refer
           to the same thing.

  I-R:     IRON routers performing the IR(GW) role.  As with LISP
           I understand each such router advertises all the
           "edge" space.  The closest thing to a MAB in I-R is
           a "Virtual Prefix" - "VP".

In LISP and I-R, for IPv6, if the "edge" space is always in a
currently unused part of the address space, then each PTR or IR(GW)
router could probably advertise the whole "edge" space to its
neighbouring DFZ routers with a single prefix, or just a few prefixes.

In the same circumstance, Ivip's DITRs would not do this, since each
one typically only advertises a subset of the total set of MABs.
This is because each DITR site is run by a company which typically
works for a subset of the MABOCs.

This means that for IPv6, in the above scenario, both LISP and I-R
could have very large numbers of "coarse" prefixes or VPs and still
only add one or a very small number of prefixes to "the DFZ routing
table" (shorthand for the set of globally advertised prefixes which
every DFZ router needs to handle).

Both Ivip and LISP-ALT involve a single tunneling arrangement: from
the ITR to the ETR.  LISP-ALT optionally allows the ITR to send (in a
different form of tunneling) the initial(*) traffic packets to the
ETR, via the ALT network.  But otherwise, LISP-ALT involves ITRs
tunneling traffic packets directly to ETRs, just like Ivip.

  * "Initial traffic packets" is shorthand for those packets
    an ITR or ITE device receives before it has mapping for how
    to properly tunnel such packets to the final destination device -
    which for Ivip is a single ETR, for LISP is one of several ETRs
    and for I-R is one of several IRON routers performing the IR(EID)
    role for the matching prefix.

I-R is fundamentally different from Ivip and LISP in this regard.
The initial traffic packets are tunneled by an IR(LF/GW) router to an
IR(VP) router which tunnels them to the correct (of potentially
several) IR(EID) routers.

This is the general process, but it is possible that the IRON router
playing the IR(VP) role (and there are typically 2, 3 or 4 such
routers for any one VP) is also the correct router of the potentially
multiple IR(EID) routers for the EID prefix which the packet is
addressed to.  In this case, there is no need for a second stage of
tunneling.  Also, if the IR(VP) role router is also acting as an
IR(LF) or IR(GW) router, it may accept the traffic packet from its
local network or the DFZ with this role, and so have no need to
tunnel it to a separate IR(VP) role routers, since this router also
happens to be one of the IR(VP) role routers for the VP which covers
the traffic packet's destination address.

As soon as the IR(VP) router receives such a traffic packet, it
regards it as an implicit Map Request, and sends back some "mapping"
information to whichever IRON router (performing an IR(LF) or IR(GW)
role) tunneled the packet to the IR(VP) router.   This was initially
described as a "route redirection" message, but I will refer to it as
simply a Map Reply message.

In the following example, I refer to an IR(LF) router - which
advertises all the "edge" space in the routing system of whichever
ISP or end-user AS network it resides in.  The same example would
apply to an IR(GW) router which does the same thing - except that it
advertises all "edge" space in the DFZ.

In this example, an IR(LF) router advertises all the "edge" space in
its local routing system and so receives a traffic packet addressed
to 44.44.01.02.  The IR(LF) router recognises this as being within
the "edge" subset of the global unicast address range, but does not
have any mapping for an EID prefix which covers this address.   It
does this by knowing already every VP in the system, and recognising
that this address falls within one of those VPs.

In this example, there is a VP 44.44.0.0 /16 - though perhaps this is
rather large for VP - maybe Fred intends they be somewhat smaller,
such as a /18 or /20, to better share the load on a greater number of
IR(VP) routers.  The load depends very much on the traffic volumes,
not so much on how much address space is within each VP.

The EID prefix which matches this destination address is 44.44.01.00
/28 - but the IR(LF) router doesn't have any knowledge of this.  If
it did, it would have mapping for this EID prefix, and wouldn't need to
tunnel the packet to an IR(VP) router.

The IR(LF) router tunnels the packet to *all* of the IRON routers
which are playing the IR(VP) role for this VP: 44.44.01.0 /28.  (How
they know the addresses of these IR(VP) routers is described below.)
 The reasons for tunneling to all these IR(VP) routers is to avoid
the need for buffering the packet, while ensuring that at least one
copy of the packet will arrive at a working IR(VP) router.  (See the
most recent discussions with Fred about this.  I referred to this as
the "scattergun" approach.  It raises a few inefficiency - and
therefore scaling difficulties - but I think it is a good way of
solving the problem of IR(VP) routers being suddenly unreachable or
dead.)

Each IR(VP) has (ideally) complete knowledge of the mapping for all
the EIDs in its VP.  (How it gains this is discussed below.)  For
each traffic packet it receives from an IR(LF) or IR(GW) router, it
sends back a Map Reply packet with mapping information for the
matching EID.

(Whether it should do this for every such packet, or just for the
first few, could be debated.  Since there could be a flurry of such
traffic packets, it is probably good enough to send mapping for just
the first 2 or 3.)

Below, I assume there are always 3 IR(VP) routers for each VP.

Assuming, for simplicity, that there is a single IR(EID) router, the
paths taken by an initial traffic packet are:

                /--->--IR(VP)1-->--\
               /                    \
  SH-->--IR(LF)---->--IR(VP)2-->----IR(EID)--->--Destination-EUN.
               \                    /
                \->--IR(VP)3-->----/

Fig 1.

Presumably, there will be a method by which the IR(EID) router can
ignore the second and third copies of these packets, and only send
the first one to the destination EUN.  I don't know how Fred plans to
do this, but I will assume something can be added to the SEAL
tunneling protocol to make it easy for the IR(EID) to recognise these
 three packets as all resulting from the one initial traffic packets.

This can't be done by simply looking at the traffic packet, since
perhaps the sending host sent several identical packets.

Probably the way to do this is something like this:

  The IR(LF) creates a nonce for this particular traffic packet
  and includes it in the SEAL headers when it tunnels it to each
  IR(VP) router.

  This can't be the SEAL ID, since it maintains separate (random
  start, monotonically incrementing) SEAL ID sequences for each
  IR(VP) router.

  Each IR(VP) router includes this nonce in the packet it sends to
  the IR(EID) router.

  The IR(EID) router uses the nonce to identify unique traffic
  packets and so to discard the second and subsequent copies
  it receives, sending only the first to the destination EUN.

This looks OK, but it adds to the length of the SEAL header.  See
below for a gotcha when there are two or more IR(EIDs).  Such nonces
are only needed for these initial packets which get tunneled to all 3
or so IR(VP) routers - the nonce is not needed for traffic packets
tunneled directly from an IR(LF) router (which in this section of the
discussion includes all IR(GW) routers) to an IR(EID) router.


Before, while or after each IR(VP) router tunnels the traffic packet
(perhaps with the nonce I suggested above) to the IR(EID) router, it
also sends a Map Reply packet to the IR(LF) router:


                /---<--IR(VP)1
               /
         IR(LF)----<--IR(VP)2
               \
                \-<--IR(VP)3

Fig 2.

The map reply packets are secured with the SEAL-ID the IR(LF) router
used when tunneling to each IR(VP) router.

    (I recall that SEAL-IDS are 32 bit integers, starting at some
    random value by any particular ITE router at the time it first
    tunnels a packet to any particular ETE router.  Then the number
    is incremented for every packet tunnelled from this ITE to ETE.
    Off-path attackers are assumed not to be able to guess the
    starting or the current value.  There may be some gotchas in how
    the routers at each end easily recognise valid and invalid
    SEAL-IDs without having to cache each one.  Fred and I discussed
    this in the past month or two, and I recall there was probably a
    way of doing it.  However I think it would be tricky or perhaps
    impossible to implement by using simple counters to frame a range
    of acceptable values, rather than by caching the use of each
    value along with a caching time, or time of use - to give the
    effect of maintaining a timer for each individual SEAL-ID.)

All traffic packets this IR(LF) receives before it gets a "map reply"
are "initial" packets.  As soon as the IR(LF) has mapping in its
cache which covers the EID prefix - in this case 44.44.01.0 /28 -
then all further traffic packets it receives which match this EID
prefix are no longer "initial" and so are not tunneled to all the
IR(VP) routers for the matching VP.  All these subsequent traffic
packets are tunneled to the IR(EID) router:


  SH-->--IR(LF)---->------------->--IR(EID)--->--Destination-EUN.

Fig 3.

For a multihomed EUN, there will be two or more IR(EID) routers, so
the process could be more complex.  Depending on how the priorities
and weightings or whatever are defined, and the individual
perspective of each of the three IR(VP) routers, it is possible that
one or more IR(VP) routers will choose to tunnel the packet to a
different router IR(EID)1 than one or more of the other IR(VP)
routers, which tunnel their copies of the traffic packet to IR(EID)2.

Then instead of the initial packet flow of Fig 1, we could get:


                /--->--IR(VP)1-->--\
               /                    \
  SH-->--IR(LF)---->--IR(VP)2-->----IR(EID)1--->--Destination-EUN.
               \                                 /
                \->--IR(VP)3-->----IR(EID)2-->--/

Fig 4.

This would result in a duplicate copy of the traffic packet arriving
at the destination EUN.  A assume this cannot be tolerated, so I
guess Fred will devise a way of preventing this.

   Note: below I suggest I-R adopting Ivip's approach of the mapping
         containing just a single address, that of one IR(EID)
         router.  If this is adopted, then this duplicate packet
         problem can't occur.

If both IR(EID)1 and IR(EID)2) were at the destination EUN, then
there would be a simple solution, since they could compare notes
about nonces and so determine which was the first arrived copy of the
traffic packets and which were duplicates which could be dropped.
For instance, the two functions of IR(EID)1 and IR(EID)2 could be
performed by the one router.  Each would have a different "core"
address of course, IR(EID)1 from PA space from one ISP and IR(EID)2
from PA space from another ISP.

However, if IR(EID)1 and IR(EID)2 are not in the same place or device
- such as if they were in separate ISP networks - then I think the
problem of duplicate packet delivery to the destination EUN is a
serious "gotcha".  I think it could only be solved by having a
special router function at the EUN to distinguish between duplicate
packets, and by the IR(EID) routers tunneling packets to it, with the
nonce I suggest above which would be generated for each initial
traffic packet by the IR(LF) router.


The IR(VP) routers somewhat resemble a "Default Mapper" (DM) in APT.
 However, in APT, the DM was within the same (ISP, typically) network
as the ITR, and I recall the ITR simply forwarded the traffic packet
to one of the potentially multiple DMs in its network.  I also recall
that the DM interpreted the mapping information, chose a particular
ETR to tunnel the packet to, and then in the Map Reply message to the
ITR, told it simply to tunnel packets to this ETR if they matched the
the EID prefix which was also in the Map Reply message.


I-R's forwarding of initial packets also resembles the use of
LISP-ALT where traffic packets are sent on the ALT network.  I think
the ALT network is a bad idea, since the paths can easily cris-cross
the globe and because there are fundamental difficulties making it
scale well, while maintaining generally short paths while also
avoiding single points of failure.  In LISP-ALT, there isn't an exact
equivalent to an IR(VP) router or an APT DM - since the ALT network
would deliver the packet to one of the correct ETRs.

I-R involves tunneling initial packets directly from the IR(LF)
router to the IR(VP) router (actually to all the 3 or so IR(VP)
routers) and then directly from the IR(VP) router(s) to the IR(EID)
router(s).  This tunneling is via the Internet - there are two such
tunnels, or really three parallel parallel paths, each with two
tunnels in series, as shown in Fig 1 and Fig 4.

LISP-ALT's delivery of initial packets is by forwarding over an
overlay network composed of Internet tunnels between ALT routers.
The traffic packet has to be forwarded typically by multiple ALT
routers, and this involves multiple Internet tunnels, since there is
a tunnel between each router.  There could be half a dozen or a dozen
such tunnels - maybe more, it depends on how the ALT system is
structured, which has never been described in detail.


There needs to be a careful choice about how many IRON routers
perform the IR(VP) role for each VP.

  1:  Too few.  Single point of failure.

  2:  Better, but still pretty highly strung with dependence on
      only two VP routers being alive and reachable - and lots
      of EIDs prefixes depend on this.

  3:  Better still - maybe a good choice.

  4:  Maybe a good choice.

  5: Probably an excessive number.


I suggest 3 or 4 as good choices.  More than this and there are
efficiency concerns, since each initial packet needs to go to every
IR(VP) router.  Also, for a choice of 4, assuming all the IR(VP)
routers are up and reachable, this quadruples the workload of the
IR(EID) routers in handling initial packets and the nonces I suggest
above.   Furthermore, since there are likely to be two or perhaps
more IR(EID) routers, and since different IR(VP) routers might choose
a different IR(EID) router, there would be more trouble with
duplicate initial packets going over the final links to the
destination network.

Choosing a higher number of IR(VP) routers has one advantage in
addition to increasing robustness.  It will also tend to mean that
the shortest path from the IR(LF) router to the IR(EID) router(s) via
any of the IR(VP) routers will tend to be shorter - since the more
IR(VP) routers there are (presumably scattered widely around the
Net), the greater the chance that one will result in a short total
path from the IR(LF) to it, and then to the IR(EID).

Below, I will assume there are exactly 3 IR(VP) routers for each VP.
 I also assume that these are geographically and topologically
located in a diverse manner, so an outage which makes one unreachable
is unlikely to affect the other two.

I will also assume that all the EUNs which use I-R "edge" space are
multihomed - typically with two ISPs and so with two IR(EID) routers,
but occasionally with 3 or 4.


This "I-R-lite" subset of IRON-RANGER is a novel and interesting
arrangement.

Every IRON router can perform the IR(LF) role, and some of them are
configured to do the same to the DFZ, and so be perform the IR(GW)
role.   Neither of these roles involves the router initially knowing
any mapping at all.  All they need to know is:

   1 - A complete list of VPs.

   2 - For each VP, 3 (in my example) IRON routers which are
       performing the IR(VP) role for this VP.

Then, after they tunnel traffic packets to all the IF(VP) routers for
a given VP, they get back a Map Reply which covers a specific EID
prefix - and don't need to bug the IR(VP) routers again about traffic
packets whose destination addresses match this EID, for a period
specified by the caching time in the Map Reply message.

Fred suggested that all IRON routers discover the above by
downloading a single file which contains all this information - when
they boot - and then doing regular checks, which I call "delta
checks" with some network of centrally coordinated servers to find
out how the file changes over time.

For a full-scale deployment with tens of thousands of VPs, perhaps
100k of them - the size of this file is going to be a few megabytes,
and the rate of change is going to be quite slow.  The file would
contain either IP addresses or FQDNs of the IRON routers playing the
IR(VP) role.

I think this is practical, but there objections to the centralised
nature of this one file, which I discuss below.  For now, I assume a
single file and that each IRON router would do a delta check, on its
own schedule unsynchronised with other IRON routers, every 10 minutes.

This is my choice for an example.  Exactly what the time would be, or
should be, hasn't yet been discussed.


Let's say that at 3:00 UTC a particular IRON router 22.22.88.88 is
configured to be one of the 3 IR(VP) role routers the VP previously
discussed: 44.44.0.0 /16.

By 3:10 UTC, in theory, we can assume that every IRON router knows
this.

This information is used for two purposes:

  1 - As described above, for an IR(LF) or IR(GW) router to
      decide which 3 (in my example) IRON routers to tunnel
      an initial packet to, based on the traffic packet's
      destination address matching a given VP.

  2 - As not so far mentioned, how the 2 (or perhaps 3 or 4)
      IR(EID) routers for a given EID in this VP register themselves
      with each of the three IR(VP) routers for this VP.

Registration will take some time.  Fred roughly described a way of
doing it, involving a message with some crypto stuff (such as using
signatures an PKI I think) so the IR(VP) router could tell from the
message sent by the IR(EID) router that it really was authorised as
performing this role for a given EID.  Below I suggest some
improvements to this, but for now I an discussing the I-R-lite subset
of what I understand to be Fred's I-R design.

Let's say it takes each IR(LF) router up to two minutes to register
itself with the newly established IR(VP) router.  Now, for Justin
(Just In Case) we add a 3 minute fudge factor and decree that within
15 minutes of an IR(VP) role for a router being made available on the
master list (and via delta checks) that any self-respecting IR(EID)
router for an EID within this VP can be assumed to have completed its
registration with the new IR(VP) role router.

This enables a simple solution to the problem of IR(LF) and IR(GW)
routers tunneling packets to newborn IR(VP) routers before they have
had time to get registrations for all their EIDs:

   IR(LF) and IR(GW) routers will wait 15 minutes after the
   appearance of a new IR(VP) router before tunneling any
   traffic packets to it.

However, since it could take 10 minutes for any one IR(LF) or IR(GW)
router to discover that the new IR(VP) router has been established,
then this blows out the maximum time before the IR(VP) router is
doing its full workload to 25 minutes.  This can be fixed by all IRON
routers knowing the current UTC and by the master file and its delta
updates including the UTC time each IR(VP) role for a given VP was
established.  Then, we would have a 15 minute time before each IR(VP)
role was fully operational.

  (Fred did not want an IR(VP) role router to tunnel a traffic packet
   to any other IR(VP) router just because it lacked a registration
   for an EID which matches its destination address.  I think this
   dropping of such packets is a good idea.)

One thing I haven't explored with Fred is the creation of the
"mapping" information, and how the 3 or so IR(VP) routers might work
together on this.

I understand that each IR(VP) role router would receive an EID
registration from each of the typically 2 or more IR(EID) routers for
a given EID.  Somehow, the IR(VP) router (perhaps after comparing
notes with the other 2 or so IR(VP) role routers) must develop
"mapping" which can be sent to the IR(LF) and IR(GW) routers.

I understand from previous discussions that the mapping will resemble
that of LISP:

  The EID: start address and the length of the prefix in bits.

  The addresses of the typically 2 or perhaps more IR(EID) role
  routers for this EID.

  Information on preferences and weightings to control load sharing
  between them, including no load sharing: sending all packets to
  IR(EID)1 and not IR(EID)2, unless IR(EID)1 is dead, unreachable
  or unable to get packets to the destination network.

     (IRON-RANGER seems to have the same problems as LISP in the
     ITR / IR(LF) / IR(GW) routers having no direct way of testing
     the reachability of the destination network through each
     ETR / IR(EID) router.  Ivip has no such problem, since there
     is a single ETR address in the mapping, and the end-user network
     is responsible for changing the mapping in real-time.  The end-
     user network, or some company they appoint to do so, can easily
     test reachability of the network through each ETR, since it
     knows one or more hosts or routers on the network it can try to
     get a response from via each ETR.  LISP ITRs, APT ITRs/DMs and
     I-R IR(LF/GW) routers are given no information about a host or
     router in the destination network by which they could test
     actual reachability of the network through each ETR / IR(EID)
     router.)

It is not clear how the IR(EID) role routers give this preference or
weightings or whatever to each of the IR(VP) routers they register
with - or how the IR(VP) routers individually or collectively decide
on the mapping for this EID.  Somehow, each IR(VP) router must have
already computed mapping for each registered EID in its VP.  It needs
this when it tunnels a traffic packet addressed to this EID - and it
needs it to create the Map Reply it sends to IR(LF/GW) routers.

In recent discussions, there was a difficult problem when the IR(LF)
router tunneled a packet to just one IR(GW) role router.  What if the
IR(GW) role router was dead, being rebooted, or unreachable?  The
traffic packet would be lost, and there is no reliable way of finding
out in a fraction of a second that this had occurred.

Fred's "scattergun" approach solves this nicely - the IR(LF) router
tunnels the traffic packet to all IR(VP) role routers (according to
the 15 minute start-up time arrangements just described).  So as long
as at least one of these IR(VP) routers is reachable and alive, the
traffic packet won't be lost.  The first Map Reply from any of these
tells the IR(LF) router not to tunnel packets to the IR(VP) routers
any more - but to use the mapping to tunnel them directly to the
correct IR(EID) router out of 2 (or perhaps 3 or 4) specified in the
mapping.

  (In the above discussion, for brevity, I used "IR(LF)", but
  all the above also applies to routers playing the IR(GW) role
  too.)


Fred also suggested that the central file should list every IRON
router.  I think this is unnecessary since in the functionality I
described above for I-R, all of which is part of my I-R-lite subset,
there is no need for IRON routers to know about all other IRON
routers.  They only need to know about the subset which is performing
the IR(VP) role for one or more VPs.

He mentioned there would only be 100k or so IRON routers, and that
these would be larger routers in ISPs, or similar.

Fred's ~100k estimate is subject to the critique that it forces all
this ITE work to be done by a relatively small number of routers,
when it would be desirable to have the option of spreading it out
over more numerous, lower capacity, less expensive devices which are
generally closer to sending hosts.

Below, in the improvements section I envisage many more IRON routers
- primarily to perform the IR(LF) role.


This single file arrangement, with a single source of information for
the delta checks (even though the delivery of the file and the delta
checks themselves might be farmed out in a distributed manner) is
subject to the critique of being overly centralised, and so having a
single point of failure.  Below, in the improvements section I
suggest some alternative arrangements which would not be subject to
these critiques.

I can't remember what Fred's design involves in terms of "Map
Updates".   An IR(VP) router may at some stage become aware that a
given IR(EID) registration has lapsed or been cancelled.  (Fred
hasn't described a cancellation arrangement, but I assume this could
be added securely as an extension of the registration arrangement, so
an IR(EID) role router could de-register itself.)

Does the IR(VP) role router maintain in its cache a record of every
map reply it sent, in a form by which it can find those sent in the
current caching time which concerned an EID prefix whose mapping has
just changed?   If so, can it then send a Map Update message, with
the new mapping for this EID, in a similar fashion to the Map Reply
message it sent to each such IR(LF/GW) router which tunnels an
initial packet to the IR(VP) router.


With these caveats, the subset of I-R I have presented here - my
"I-R-lite" subset - has some interesting properties:

  1 - All IRON routers can perform - and generally do perform -
      the IR(LF) or perhaps IR(GW) roles.  To do this, they need
      only to cache the mapping information sent by IR(VP)
      routers.

  2 - There is no central repository of mapping.  (Only a central
      repository of which IRON routers are performing IR(VP) roles
      for which VPs.)

  3 - Initial packets get delivered reliably and pretty quickly.
      With three or so IR(VP) routers, this means at worst-case the
      packet will traverse the Earth and come back again.  For
      instance, if the SH and IR(LF/GW) router is in South Africa
      and the closest operational IR(VP) router is in Vancouver,
      and the destination network and its IR(EID) router is in
      Italy, then the initial packets need to go from South Africa,
      to North America, and then back to Italy.

      The typical initial packet path for I-R will probably be
      longer than the typical outcome for Ivip, but I think it
      will typically be better than using LISP-ALT to deliver
      initial packets.  It will certainly be superior to the current
      LISP-ALT arrangement of ITRs dropping all initial packets until
      a new packet arrives after the map reply arrives.





Improving I-R-lite
==================

Splitting up the VP file
------------------------

The first obvious improvement is to decentralise the master VP file.
 Having a single file or a single organisation controlling files in
multiple locations, represents a single-point of failure for the
entire system.  It would be better if a failure by one organisation
only affected a subset of the IR "edge" space - the subset which that
organization is paid to be responsible for.

I-R "edge" space is all within VPs.  Presumably each VP has a single
organization which is responsible for it.  That VP organization is
the only one which would want to run this VP's 3 or so IR(VP) routers
- and the only organization which should be allowed to control them.

It seems likely that one organization will be responsible for more
than one VP, at least in IPv4 where various small (long prefix) VPs
might be carved out the address range wherever possible.  In IPv6,
there's almost endless space, so all the VPs would be on fresh unused
space, and have vast address space within them - so each organization
could have a single VP with huge capacity.

However, if there is a lot of traffic to the VP, as there would be,
this would overload a single set of IR(VP) servers - so even with
IPv6, VPs will need to be kept small enough that the traffic for each
VP is within the scaling limits of the set o3 3 or so IR(VP) servers.

One way or another, the VP organizations could decide which of some
smaller number of "VP file" companies they wanted their VP to be
handled by.  Then, for load sharing and splitting up the whole system
into smaller chunks, each IRON router would download a file from each
such "VP file" company, and likewise do the 10 minute or whatever
interval delta checks with each such company.

However, if there were more than a few dozen such "VP file"
companies, this means each IRON router has to do quite a lot more
delta checks to the servers of each such company.

The servers for the VP file companies could be discovered by DNS in
some way.

There's no free lunch here - but the current design of a single file
for all VPs is not the only way the same basic architecture could be
implemented.  Having some kind of flexibility and multiple smaller
files seems like a good idea.


Alternatives to the VP file
---------------------------

It would also be possible for IRON routers to discover all the VPs
and which 3 or so IRON routers perform the IR(VP) role for each VP by
a method similar to what I propose in Distributed Real Time Mapping.

Later, this will be fully described in:

  http://tools.ietf.org/html/draft-whittle-ivip-drtm

At present (version 01) this ID doesn't yet include this, so please
refer to the section "Stage 2 needs a DNS-based system so TRs (QSRs)
can find DITR-Site-QSDs (QSAs)" of:

  http://www.ietf.org/mail-archive/web/rrg/current/msg06128.html

This would involve each IRON router walking a special part of the DNS
which describes VPs and non-VP areas, where the DNS replies for the
VPs also contain by some means the IP addresses of the IR(VP)
routers.  Instead of "delta checks", there would be DNS cache
time-outs for the information which was discovered in this way, so
there would be a continual series of DNS requests for these items
from all IRON routers.   This has some scaling problems, because
there are larger numbers of IRON routers than there would be QSRs in
Ivip, but it is probably a valid way of replacing the file-download
with delta check approach with something more decentralised.  If
multiple IRON routers shared a single caching DNS resolvers, then the
DNS resolver would cache a lot of this, so the authoritative
nameservers for these items wouldn't have to handle all the queries.

The delta-check to one or a few dozen servers (or more local servers
with the same information) would probably be more efficient than the
DNS approach.  It depends a lot on how many VPs there are, and how
rapidly all the IRON routers need to know about IRON routers taking
on new IR(VP) roles.


"Aggregating" the VP discovery process
--------------------------------------

I-R has no intermediate buffer layer between the IRON routers and the
source of this VP information.

For I-R to have as many IR(LF) routers as Ivip has ITRs, this would
place a much greater load on I-R's VP discovery system (whether with
a file and delta checks or via DNS as suggested above).  This doesn't
happen in Ivip, because the ITRs don't need to know very much at all.
All they need to know is how to query one or more local (such as in
the same ISP network) QSR Resolving query servers, perhaps via local
caching QSC query servers.   The QSRs need to know about all the
MABs, which is the rough Ivip equivalent of I-R's VP.

I-R could be enhanced with some local or nearby server which enables
multiple IRON routers to more efficiently discover the VP file(s) and
changes to it.  This is a form of aggregation, to make it unnecessary
for a bunch of IRs in one area to all, individually, get files from
and send delta checks to some far distant server.   With the DNS
model, this can be easily achieved by multiple IRON routers sharing a
nearby caching resolving DNS server.

Another approach might be some IRON routers passing on the changes
they discovered to neighbours, but this needs to be done securely.

Both these approaches would reduce long-distance communications, but
would do little or nothing to reduce the workload of each IRON router
in keeping itself up-to-date with the VPs.

Probably the most efficient arrangement would be for multiple IRON
routers to connect to a nearby "resolving and notifying" server of
some kind, which took responsibility for directly sending all changes
to the VP information to each IRON router.  Then, each IRON router
doesn't need to ask, or poll (delta check) for changes.  It just sits
there and is securely told whenever there is a change.  The resolving
server would take responsibility for discovering all the VPs - by the
multiple files approach, or the DNS approach - and would be devoted
to either polling these (delta check) or in some other way getting
all the changes on a timely basis.  Then it would send out the
changes to all its client IRON routers.   Each IRON router would
probably use two or perhaps three of these for robustness, which
would multiply the total workload and the workload of each IRON router.

Note that this arrangement would be somewhat analogous to how
multiple (dozens to hundreds) of Ivip ITRs rely on two or three local
QSR servers.  However, with I-R, each IRON router is required to know
all the VPs and for each VP the 3 or so IR(VP) routers - which is a
more onerous task than that of the Ivip ITRs.

Ivip ITRs only need to know the full subset of the global unicast
address space which is "edge" space.  They don't need to know
anything about individual MABs.  In IPv6, a whole short prefix, such
as a /8 or /16, could contain all the edge space, so this becomes
trivially simple.  For IPv4, there will be MABs of various sizes
scattered all about.  Even then, the ITRs don't need to know specific
MABs, so if four are adjacent and can be covered by a single shorter
prefix, then the shorter prefix of "edge" space is all each ITR needs
to know.

Ivip ITRs don't need to know about where one MAB ends and another
starts, since they always send their mapping queries to an upstream
QSC or QSR query server which is in the same network, or in an
upstream ISP's network.  The QSRs need to know about every MAB, and
the addresses of 2 or so typically nearby authoritative QSA servers
for each MAB.

I think this last approach - multiple IRON routers sharing a single
"resolving and notifying" server of some kind - is the one which
holds most promise for both reducing the workload of each IRON router
and for reducing the burden a larger number of IRON routers would
place on whatever centralised or somewhat centralised systems specify
the VPs and which routers are performing the IR(VP) role for each VP.



More, cheaper, devices for the IR(LF) role?
-------------------------------------------

Fred envisages about 100k total IRON routers, most of which would be
performing the IR(LF) role and many of which would also be performing
the IR(EID) roles and I guess quite a few of which will be performing
IR(VP) roles for one or more VPs.

In order to reduce costs and spread the load better, it would be
desirable to have many more devices playing the IR(VP) role.  For
instance, if it could be done in software by cheap servers, as well
as in large, centrally located, hardware-based routers, then this
could be a lot more cost-effective and easier to introduce.  However,
to allow for ten times more IR(VP) role devices, there needs to be a
pretty lightweight method by which they discover all the VPs and
their IR(VP) routers.

As noted above, some kind of nearby "resolving and notifying" server
looks most promising.

If the global impact of each IRON router performing only the IR(LF)
role could be really minimised, and if the workload for each such
device could also be minimised, then the IR(LF) role could be
performed in an sending host which is not in itself on an I-R "edge"
address.

IPv6 allows the provision of vast amounts of address space in each
VP, so few are needed to get sufficient address space.  Still, it
seems that in order to split up the IR(VP) workload effectively,
there needs to be a large number of VPs for both IPv4 and IPv6.

Ivip ITRs only need to know what parts of the global unicast address
range are "edge".  This is done by them learning from their local QSR
the details of every MAB (Mapped Address Block), each of which is a
DFZ-advertised prefix.  They don't need to know anything more about
the MAB.  Any traffic packet addressed to "edge" space will be
handled by the ITR function.

I-R IR(LF) functions need not only to know the VPs (the closest I-R
equivalent to Ivip's MABs) but to know 3 or so IR(VP) routers for
each VP.  Unlike Ivip ITRs, they don't need to buffer traffic packets
when awaiting mapping - they simply tunnel the traffic packet to the
3 or so IR(VP) routers, which is a non-trivial amplification of the
basic workload of tunneling each traffic packet once.  IR(LF) routers
don't need to re-request mapping, as an Ivip ITR might if it gets no
response after a while, because each of the traffic packets tunneled
to IR(VP) routers is a map request.

I think the biggest scaling bottleneck in I-R is the reliance on a
small number of IR(VP) routers to handle initial packets.  The
"scattergun" approach means that it is a bad idea to have more than
about 3 or 4 such IR(VP) routers per VP.  Due to each such IR(VP)
router for a given VP getting every initial packet handled by any
IR(LF/GW) router, having more IR(VP) routers per VP doesn't achieve
load sharing - it just creates more work for the IR(LF/GW) routers,
and for the IR(EID) routers which have to handle 3 or more traffic
packets, one from each IR(VP) router.

This means that VPs need to be made small enough not to have so much
traffic as to overload the IR(VP) routers.

A MAB is advertised by a set of DITRs around the world, and it is up
to the MABOC (MAB Operating Company) how many DITRs it runs, and
where, to handle that part of the traffic sent to the MAB's micronet
addresses which comes from hosts in networks without ITRs.  The more
DITR sites which handle this MAB, the less work the DITR at each site
needs to do.

I think this means that I-R's VPs need to be more numerous than
Ivip's MABs.

The next bottleneck is that each IR(LF/GW) router needs to know about
all this larger number of VPs, and for each VP, needs to know 3 or so
IR(VP) routers.

Ivip ITRs need to know only what is "edge" or not ("core"). They
don't need to know exactly where MABs begin and end.  When expressing
this "edge" space as prefixes, the number of prefixes each Ivip ITR
needs to know could be reduced in some circumstances when
neighbouring MABs could be covered by a single shorter prefix.  For
instance, if there are two MABs:

   5.5.0.0   /17
   5.5.128.0 /17

the ITR only needs to know that 5.5.0.0 /16 is "edge" space.  ITRs
will discover this from their upstream QSCs or QSRs by a simple
mechanism I haven't yet designed.

   (For IPv4, it might be easier for each ITR to learn about what
   is "edge" space from its local QSR by downloading a single
   bit-map, with 2^24 bits (actually, a little less: 224 x 256 x 256
   = 14.6Mbits ~= 1.83 Mbytes), one for each /24, to flag whether
   each /24 in the global unicast address range is "edge" or not.)

A further bottleneck with I-R is that the ITRs have a greater need of
changed information about VPs than Ivip ITRs have about which parts
of the address range are "edge".  This is because with I-R, even if
VPs remain reasonably stable, as would Ivip MABs, there would be a
degree of instability in which 3 or so IRON routers are performing
the IR(VP) role for each VP.

An Ivip DITR doesn't need to know about all the MABs in the Ivip
system or what all the "edge" space is.  It only needs to know which
MABs it is handling.

Compared to Ivip's MABs (or a simpler expression of what is "edge"
space or not), IR is likely to involve a greater number of VPs which
every IR(LF/GW) router needs to know about.  The IR(LF/GW) router
needs to know more about each VP, and it needs to get more updates
about this information.

This makes it more difficult to introduce larger numbers of IR(LF/GW)
routers into I-R than it is to introduce larger numbers of ITRs into
Ivip.   This has the effect, in I-R, of concentrating work into a
smaller number of IR(LF/GW) routers than the number of ITRs (and
DITRs) which would handle the work in Ivip.  To avoid congestion,
each of this fewer number of IR(LF/GW) routers needs to be more
powerful, with greater bandwidth.

Also, the "scattergun" approach of I-R increases to some extent the
upstream bandwidth required compared to an Ivip ITR handling the same
traffic.

Due to Ivip's real-time mapping system, Ivip ITRs are simple compared
to the ITRs of LISP or the IR(LF/GW) routers of I-R.  The lack of
real-time mapping in LISP and I-R requires more complex ITRs, to
cache more complex mapping, and to do some kind of reachability
testing in order to choose for themselves which ITR (LISP) or IR(EID)
router to tunnel the packets to.

Also, Ivip ITRs can be on "edge" space, whereas LISP ITRs and IR(LF)
routers need to be on "core" addresses.

All these factors make it a lot easier with Ivip to have large
numbers of ITRs, closer to sending hosts than with LISP or I-R.  Ivip
ITRs can be in in sending hosts (not behind NAT), which cannot be
attempted with LISP and I-R as currently described.

The use of large numbers of server-based ITRs in Ivip spreads the
load and enables this work to be done with less expense than by
concentrating it into a smaller number of expensive, dedicated,
routers - as is the current plan with I-R and LISP.


But even if this goal is achieved - more IR(LF) routers closer to
hosts, each handling less traffic and so being potentially cheaper
and/or less congested - this means that there will be more work for
the IR(VP) routers than with a fewer number of IR(LF) routers.

For instance, if an ISP only had 3 IR(LF) routers, on average more
traffic packets would be covered by currently cached mapping than
would be the case for any one of a more numerous arrangement of
IR(LF) routers.   If there were 20 of these routers, in each one, it
is less likely that any one IR(LF) router would already have the
mapping.

Due to the "unaggregated" exposure of the IR(VP) routers to every
single IR(LF/GW) router, I-R involves more map query and reply
activity, all other things being equal, than Ivip or perhaps LISP.
(Also, the Map Request in I-R consists of tunneling a potentially
bulky traffic packet to each of the 3 or so IR(VP) routers, and
continuing to do so until a map reply packet comes back from one of
them.)

LISP-ALT on its original form doesn't provide any such "aggregation",
since each ITR gets its mapping from the authoritative query server
(usually an ETR) via the ALT network.  However, LISP-ALT with Map
Resolvers would have such aggregation IF the Map Resolvers cached map
reply information they got from the authoritative query servers.  I
am not sure whether LISP-ALT's Map Resolvers are caching or not.

With Ivip (Distributed Real Time Mapping) there is a great deal of
this "aggregation", since dozens or hundreds of ITRs may use one or a
few QSRs, which are caching query servers.  (QSRs in turn will handle
less queries if there are caching QSCs between the ITRs and these
QSRs, but this doesn't alter the degree to which the QSRs reduce the
mapping query load on the authoritative QSA query servers.)


Map Updates, resulting in real-time mapping distribution
--------------------------------------------------------

IR(VP) routers - individually or perhaps working with the other
IR(VP) routers for the one VP - as part of making up their mapping
information to be ready to send to IR(LF/GW) routers, will at various
times change the mapping for a given EID prefix.  This would happen
when, for instance:

   1 - The EID prefix is defined by one or more LF(EID) routers
       registering it.

   2 - The EID prefix becomes undefined if no LF(EID) routers
       register it - including by not re-registering it at the
       rate required by the IR(VP) router.

   3 - One or more new IR(EID) routers register an existing EID
       prefix - so adding their addresses to the mapping for
       that EID prefix.

   4 - The registration of one or more IR(EID) routers expires
       due to it not reregistering in time.

   5 - The registration is cancelled.  (I am not sure if Fred
       intends an IR(EID) router to be able to cancel it, but
       I guess it could be part of the registration protocol.)

   6 - Some other reasons, such as changes to the nature of
       EID registrations which affect the currently undescribed
       methods by which the "preference and weighting" part of
       the mapping is generated.


When the mapping changes, it really needs to change in an identical
fashion for all the 3 or so IR(VP) routers - by some inter-router
protocol which is yet to be described and would be closely related to
the yet-to-be described EID registration protocol.

When the mapping changes, it would be possible for each IR(VP) router
to send out a "Map Update" message, similar in form to the Map Reply
message, and likewise secured by the SEAL ID of the tunneled traffic
packet which gave rise to the original Map Reply.  I don't recall if
this if part of Fred's design, but I think it would be possible and
desirable.

Then, if the IR(VP) router kept a record of which IR(LF/GW) routers
it had sent Map Replies to, concerning the prefix whose mapping has
just changed, then it could send out Map Update messages to those
routers.

There is some inefficiency in this, due to 3 or so IR(LF) routers
sending out their own, identical content (different SEAL ID), Map
Update to each IR(LF/GW) router.  But perhaps that means that this
"reverse scattergun" effect can be used to advantage - to assume that
there is no need for the IR(VP) router to expect a handshake,
acknowledgement or whatever from the IR(LF/GW) router, because
generally at least one of the Map Updates will arrive.

   (There's no need for IR(LF/GW) routers to acknowledge the
   receipt of a Map Reply message.  If all 3 or so Map Reply
   messages are lost, then the IR(LF/GW) will continue to tunnel
   traffic packets to the 3 IR(VP) routers, and presumably these
   will be sending back more Map Replies for these, so the process
   will be self-limiting without need for ACKs.)

These proposed "Map Update" messages will cause IR(LF/GW) tunneling
to be changed more rapidly to a better arrangement than the only
alternative: not sending the Map Update and waiting for the cached
mapping in the IR(LF/GW) routers to expire - at which time the
routers will request fresh mapping, by sending traffic packets to all
3 or so IR(VP) routers again until at the first Map Reply response
arrives.

These proposed Map Update messages involve extra state in, and effort
by, IR(VP) routers.  However, it would enable several benefits,
including:

  1 - IR(LF/GW) routers could cache their mapping for a much longer
      time, since the cache time no longer controls the ability of
      the system to get fresh mapping to the IR(LF/GW) routers.

  2 - This extended cache time reduces the frequency with which
      IR(LF/GW) routers have their cached mapping time out, and so
      reduces their effort in tunneling traffic packets to all 3
      or so IR(LF/GW) routers.

  3 - This in turn will significantly reduce the workload of the
      IR(VP) routers.

  4 - Point 4 means that VPs can be bigger and less numerous.

  5 - Point 5 means that there will be less VPs in the total
      list, enabling there to be less work done by each
      IR(LF/GW) router, and therefore makes these cheaper and
      more able to be installed in greater numbers, closer to
      hosts.

IF I-R was upgraded to do Map Updates from IR(VP) routers to
IR(LF/GW) routers, then this would make its mapping distribution a
genuinely real-time arrangement, like Ivip's.

If so, then I-R could adopt Ivip's very simple mapping information -
a single ITR address for Ivip, or a single IR(EID) address for I-R.
This would simplify IR(LF/GW) routers considerably.  This would also
externalise the detection of reachability and the decisions about how
to restore connectivity after a multihoming service failure, making
it the responsibility of end-user networks or of whoever they appoint
to do this.  I regard this as one of the major benefits of Ivip over
LISP and the current I-R design.


A different approach to "registering EID prefixes"
--------------------------------------------------

The current I-R design appears to be based on the same assumptions as
LISP - that the ETR (LISP) or IR(EID) (I-R) function is responsible
for the mapping of an EID it handles.

Therefore, in I-R, IR(EID) routers are assumed to be configured
securely, and given some authentication items - and then it is the
job of the IR(EID) router to register itself with the 3 or so IR(VP)
routers for the VP its EID prefix is within.

I have never thought of Ivip this way - and I think there's no reason
why I-R needs to work this way either.

The 3 or so IR(VP) routers which handle a given VP will definitely be
run by, or run for and controlled by, whatever single organization is
responsible for this VP - the "VP Company".  The rough equivalent in
Ivip is the MABOC (MAB Operating Company).

These IR(VP) routers are not going to accept registrations without
each registration passing stringent security arrangements - hence the
need to give the 2 or so IR(EID) routers the correct authentication
items which the IR(VP) routers will accept.  This could be a signed
message to the effect that this particular IRON router (specified by
its IP address) is authorised to perform the IR(EID) role for a
specified EID prefix.  Then the IR(VP) routers will need to use some
arrangements (PKI?) to verify these signatures, and to cache this
verification in some way so they don't need to do this for every
re-registration - and to occasionally check to see the public key
used for the signature has not been revoked by the relevant PKI CA
(Certification Authority).

So the current design is something like this, for each EID prefix,
assuming there are two ISPs and so two IR(EID) routers:

  1 - The multihomed EUN (end-user network) chooses which 2 IRON
      routers will perform the IR(EID) role for its EID prefix.
      This is basically a matter of receiving an IP address for
      an IR(VP) router from each of from its ISPs.

  2 - The EUN creates two messages, one for each of the IP
      addresses, attesting that its EID prefix is to be registered
      by an IRON router with the given IP address.

      One approach to this would be the EUN to sign these messages
      with its own key pair.  To do this, it would have to use a
      key pair which is covered by a PKI system which is recognised
      by the IR(VP) routers for the VP its EID prefix is within.
      This clearly involves the company which controls the VP, which
      is probably the company it leases this EID space from.

      Another approach is to obtain these signed messages by having
      them signed by the VP company's key-pair.  But to do so, the
      EUN will need to authenticate itself with that VP company.

      So either way, the EUN needs to authenticate itself to the
      VP company in some way, and provide the IP addresses of the
      IR(EID) role routers.

  3 - These signed messages are now in the possession of the EUN,
      who passes one to one ISP and the other to the other ISP.
      This may involve the EUN authenticating itself to the ISP, but
      it already has a business relationship with the ISP - so this
      is probably trivial.

  4 - Each ISP loads its message into the IRON router which is
      performing an IR(EID) role for this VP.  Now this IRON router
      will be able to register itself with the 3 or so IR(VP)
      routers, and keep doing so for as long as is required.

This raises a tricky question if the EUN stops using one or both
ISPs.  How can it prevent these messages still being sent by these
IRON routers?   It should be able to prevail on the ISPs to take
these signed messages out of their routers, but let's say the ISP has
gone feral and won't respond to reasonable requests.   This would be
an intolerable situation, black-holing traffic packets.  So there
needs to be some additional mechanism by which the EUN can contact
the VP company and have the 3 IR(VP) routers ignore the registration
message from the errant router.

I think this is all a lot of work for no good purpose.

A much better idea is for the EUN to get IP addresses of the routers
from its ISPs, and securely communicate these to the VP company.
Then the VP company programs that into the 3 IR(VP) routers it
controls.  That's it - there's no need for the two IR(EID) routers to
register anything with the 3 IR(VP) routers.


But now let us consider two scenarios - the original, non-real-time
mapping arrangement and my proposed real-time approach.

In the original non-real-time approach, the absence of a registration
from a previously registered IR(EID) router would signify to each
IR(VP) router that that IRON router should no longer be included in
the mapping for the given prefix.  Depending on the caching time
specified in the Map Reply messages sent from IR(VP) routers to
IR(LF/GW) routers, it would take some time for all IR(LF/GW) routers
to gain updated mapping which did not mention this IRON router as one
of the IR(EID) routers for this EID prefix.

But if, as just suggested, the IR(VP) routers don't figure out the
mapping of an EID prefix based on repeated registrations from IR(EID)
routers, because they are simply configured with the IP addresses of
supposed IR(EID) routers by the VP company, then how do the IR(VP)
routers figure out the mapping and how would they change this mapping
to remove a particular IR(EID) router if it became "unreachable"?  I
think the IR(VP) router has no business deciding what the mapping
should be once the above suggestion is implemented.  In the
non-real-time arrangement (the original I-R design) the ITR-like
IR(LF/GW) routers are supposed to figure out which IR(EID) router are
reachable or not, and which ones can be used to reach the destination
network as part of their built-in independent per IR(LF/GW) router,
approach to multihoming service restoration.

So with the original non-real-time mapping, and the new, direct (no
registration) arrangements where the VP company simply configures its
3 VP routers with the IP addresses of the IR(LF/GW) routers, then I
think there is nothing more to do.  Just let the IR(LF/GW) routers do
their reachability testing however Fred intends this to be done.

With the real-time-mapping arrangements (Map Update), if the EUN
chooses a new ISP, or the ISP tells them they have a different
IR(EID) router, then the EUN must simply communicate the new IP
address securely to the VP company, who will pop it straight into
their 3 IR(VP) routers.  This removes the need for the just suggested
extra mechanism by which the VP company could prevent its 3 IR(VP)
routers from accepting a registration from the errant IRON router in
the feral ISP.

In the real-time approach (Map Updates), combined with the
just-suggested (no registration) direct "VP company controls the
mapping directly via its IR(VP) routers" approach, then the moment
the VP company changes the mapping in its 3 IR(VP) routers, each one
will send a Map Update message to whichever IR(LF/GW) routers to
which it has, within the caching time, sent the mapping for this EID.

So this gives the VP company direct real-time control of the mapping
in all the ITRs which need it.  Since the EUN directly tells the VP
Company which IP addresses to use, and since these real-time updates
directly control the cached mapping in all ITRs currently tunneling
packets to this EID prefix, this means the EUN has direct, real-time,
control of the mapping of its EID prefix and so direct real-time
control of which IR(EID) role router the IR(LF/GW) routers will
tunnel packets to.  This is just like Ivip.  There's no longer a need
for ITRs to do any reachability testing, since the EUN can easily
hire a company to do better reachability testing of its actual
network via the two or more IR(EID) routers - and give that company
the authentication details it needs to be able to tell the VP Company
to change the mapping.

So, with these to elaborations - which look feasible to me -
IRON-RANGER could have real-time mapping just like Ivip.   This is at
odds with the LISP tradition and with the expectation of many RRG
folks that such things are impossible.

With this proposed pair of modifications:

  1 - "EUN -> VP Company -> IR(VP) router" instead of "IR(EID)
      routers registering their EID prefixes with the IR(VP)
      routers".

  2 - Map Updates for real-time mapping.

I-R would still not be as scalable as Ivip if there were a very large
number of ITRs tunneling packets to this EID prefix at the time the
mapping is changed.  With this modified version of I-R, each such
change would require each of the 3 or so IR(VP) routers to send a Map
Update to each of these IR(LF/GW) routers which is (or might be)
tunneling packets - all those IRON routers which, within the caching
time, were recently sent Map Replies for this EID prefix.

This is a triplication of total workload for the IR(VP) routers,
compared to a single device sending the Map Update to the IR(LF/GW)
router.  (However, as noted previously, this "reverse scattergun"
approach might be robust enough to send the Map Update without any
requirement for acknowledgement, retries etc.)

It is also an escalation of workload for each affected IR(LF/GW)
router, because it will receive not just one Map Update, but 3.

The real problem is for the IR(VP) routers which must each send a Map
Update packet to every IR(LF/GW) router which, within the caching
time, was previously sent a Map Reply for this EID.   This will not
scale well.  If there are 100 such IR(LF/GW) routers, then it will be
bad.  If there are 10,000, it will be really bad.

Ivip avoids these scaling problems, because the mapping change goes
out to a dozen or more DITR sites, and is received at each site by
the authoritative QSA query server there.  That sends out any needed
Map Updates to its queriers, who likewise were sent a Map Reply for
this EID prefix within the current caching time.

But these will be far fewer in number per QSA than the total number
of ITRs which need to get such a Map Update.  Firstly, the work is
spread over a (typically) dozen or more QSAs at this many widely
distributed DITR sites.  Secondly, each querier - each QSR - which
needs to get a Map Update will sometimes, or frequently, pass the
information on to multiple ITRs.

So I think the Ivip approach will scale better, and not result in an
overly large workload for any server (QSA, QSR or QSC).  Also, in
Ivip, the ITR gets a single Map Update which it will acknowledge.
This is probably less expensive than the I-R approach where the
IR(LF/GW) router gets 3 Map Updates, one from each IR(VP) router.


Scaling vs. simplicity - and more on real-time mapping distribution
===================================================================

Here I compare Ivip (with Distributed Real Time Mapping) and version
of the I-L-light subset with two significant modifications, as
described above:

  1 - No registrations of EIDs from IR(EID) routers.  Instead, EUNs
      tell the VP company the IP addresses of the IR(EID) routers
      which will be handling each of their one or more EID prefixes.

      This is inherently a real-time process, since the VP company
      receives the new information and can easily configure the
      3 or so VP routers to have new mapping, in less than a second.

  2 - By adding Map Updates (if they were not already in Fred's
      design), the mapping changes (which always result from the
      above real-time process) can be sent to each IR(LF/GW)
      router which is caching the mapping for this EID prefix.

These two changes would make the I-R mapping distribution system just
as real-time as Ivip.  Then, the mapping could still consist of
multiple IR(EID) routers with priorities, weightings etc. to tell
IR(LF/GW) routers how to do load sharing and choose between IR(EID)
routers them more than one of them appears to be working - as with
LISP and the current I-R design.

However, it would also be possible to simplify the system so that,
like Ivip, the mapping can only consist of a single address of a
single IR(EID) router.  This would mean that IR(LF/GW) routers would
never be required to choose between IR(EID) routers - and all EUNs
would be required to achieve their multihoming service restoration
and/or inbound TE goals by changing the mapping in real-time.

Even if this decision is not taken, the above two points make the
system real-time - so any EUN which wants to control the mapping of
its EIDs in real-time can do so.  The only question is whether the
system is simplified, as in Ivip - which requires all EUNs to do this.

Either way, the real-time control of mapping and tunneling is a major
advance on LISP and the original I-R design.

Some EUNs might want to use the system for dynamic (responsive to
traffic flows, minute to minute, or even faster) inbound TE.  Some
such EUNs will find it highly advantageous to steer inbound traffic
over their two or more ISP links, being able to maximise the
utilization of each link within some acceptable level which avoids
congestion.  I guess content distribution networks could also steer
traffic sent to various EID prefixes to their IR(EID) routers at
various separate sites all around the Net, to dynamically load
balance the workload of these sites.

For instance, a single site EUN has 5 hosts, or groups of hosts, each
with a differing pattern of incoming traffic - where the levels of
traffic for each changed hour-to-hour or minute-to-minute.  By
splitting each group into two smaller groups, and defining an EID
prefix for each small group, the EUN will be able to dynamically
steer these 10 streams of traffic between two or more ISP links,
perhaps of different capacities, and so maximise the utilization of
these expensive links while avoiding congestion which would occur
without this dynamic inbound TE capability.

The next thing which will happen is the VP Companies will want to
charge for mapping changes, or at least for frequent mapping changes.
 That's fine - they should do so.  The EUNs which have a high enough
need for inbound TE to pay the fee per mapping change will continue
to make those changes.  The fee might be between a few cents and a
few tens of cents - and still be highly worth the expense for some
EUNs, since it is cheaper to do this at peak times than to pay for
higher capacity links to their ISPs.


This need to charge for frequent mapping changes results directly
from the ability of the system to let EUNs (or companies they
appoint) to directly control the tunneling behavior of IR(LF/GW)
routers all over the world.  This applies whether the system uses the
current complex mapping, with complex IR(LF/GW) functions - or adopts
the simpler Ivip-style single address mapping, and so enables all
IF(LF/GW) routers to be significantly simpler.

Sill, this modified I-R architecture seems to have more scaling
problems than Ivip.  It is simpler than Ivip - and I argue below that
the extra complexity in Ivip is justified by the way this complexity
enables Ivip to scale better.

Anything which increases the number of ITR / IR(LF/GW) routers is
assumed to be good, since the more there are of these, the less work
each one has to do, and the cheaper each one can be.  This makes it
more attractive to implement the ITR or IR(LF/GW) functions in cheap
COTS (Commercial Off The Shelf) servers - and this enables the system
to be introduced rapidly with fewer costs and risks than relying on
major upgrades to the functionality of routers from the major router
manufacturers.


The I-R architecture (or at least the above modified real-time
version of I-R-lite) is simpler than the architecture of Ivip with
DRTM.

The change "l" listed above - EUNs telling VP companies directly what
the mapping should be - is a considerable simplification of the
original I-R design.  As far as I know, it is superior in every
respect to the the plan for each IR(EID) router to have a signed
message which it uses to repeatedly register an EID with each of 3 or
so IR(VP) routers.

The change "2" above adds some complexity to the I-R design, but the
Map Update message is similar or identical to the Map Reply, so there
are no more complex protocols required.

The additional complexity is partly in the IR(VP) router needing to
retain state about, within the current caching time, which IR(LF/GW)
router it has sent mappings to for each EID in the VP.

The other extra complexity is in the IR(LF/GW) functionality - these
need to be able to recognise Map Update messages and act on them
(which is very similar to how they respond to a Map Reply message).
They may need a fancier SEAL-ID recognition system, since the SEAL-ID
the IR(LF/GW) sent with the original traffic packet to the IR(VP)
router is used to secure both Map Reply and Map Update messages from
the IR(VP) router.  These Map Update messages may arrive quite a time
(whatever the caching time is) after the initial traffic packet was
tunneled, so the "window of acceptable SEAL-IDs" needs to be
correspondingly wider, in time.

Depending on the caching time, this may involve recognising
significantly older SEAL-IDs than in the current design.  I think
Fred's current plan is to use some kind of sliding window arrangement
on SEAL-IDs to recognise them as valid, rather than to cache each one
and maintain some kind of timer for each one.  Still, I think adding
Map Updates will not involve excessive complexity.

I will dub this real-time souped up version of my I-R-lite subset
"I-R-RT".

I-R-RT has virtues of simplicity.  It has only three kinds of network
element which handle traffic packets, if we consider the IR(GW) role
of IRON routers - like LISP PTRs or Ivip DITRs, receiving traffic
packets from the DFZ - to be not significantly different from the
IF(LF) role, which does the same thing, but only advertised the
"edge" space in a local routing system, and so doesn't attract
traffic packets from the DFZ (from other ASes).

All network elements except the newly proposed VP List servers are
roles for IRON routers.  There are three types of role, each with
different responsibilities.  All IRON routers are assumed to be
capable of performing the IR(LF/GW) role, and a subset of them will
also perform one or both of the other two roles.

  IR(LF/GW)   Must know all the VPs and for each VP must know
              the IP addresses (or FQDNs?) of the 3 or so IRON
              routers which are performing the IR(VP) role for
              that VP.  Do this by either:

                 1 - Download file(s) and then do delta checks,
                     using the VP File servers mentioned below.

                 2 - A DNS-based approach as suggested above.

              Initially, in the absence of its cache containing
              mapping for a matching EID, the IR(LF/GW) router
              tunnels any traffic packet which is addressed to an
              "edge" address to all 3 IR(VP) routers for the
              matching VP.

              Accepts Map Reply messages from these IR(VP) routers
              with a caching time.  (This time can now be quite long,
              since we are no longer relying on cache time-outs for
              IR(LF/GW) routers to discover mapping changes.)

              Tunnels each subsequent traffic packet matching the
              EID specified in the Map Reply message to one IR(EID)
              router.

              Also accepts (within the caching time) Map Update
              messages, which are similar or identical to Map Reply
              messages, and alters its tunneling instantly according
              to the new mapping.

              Work with IR(VP) and IR(EID) role routers to solve
              PMTUD problems.


  IR(VP)      Typically 3 IRON routers will perform the IR(VP) role
              for a given VP.  Some IRON routers will do this for
              many VPs.

              Receives mapping for each EID directly from the VP
              company which is responsible for the VP.  Presumably
              this role router is run by - or at least controlled and
              paid for by - the VP Company.

              When it receives a traffic packet in a tunnel from
              an IR(LF/GW) router, or from its own internal IR(LF/GW)
              function, does four things:

                1 - Tunnels the packet to a single IR(EID) router
                    according to the mapping of the matching EID.

                2 - Sends back (perhaps within itself) to the
                    IR(LF/GW) function the mapping for this EID
                    prefix, with a caching time.   (This is
                    secured by the SEAL-ID in the SEAL tunnel
                    headers of the traffic packet just received.)

                3 - Stores the EID, the SEAL-ID and maintains some
                    kind of timer, so it can perform the following
                    if necessary:

                4 - If the mapping for an EID prefix is changed,
                    send the new mapping as a Map Update message to
                    all the IF(LF/GW) routers which, in the recent
                    N minutes (whatever the caching time) were sent
                    a Map Reply for this EID.

              Work with IR(LF/GW) and IR(EID) role routers to solve
              PMTUD problems.


   IR(EID)    These accept tunneled packets sent by IR(LF/GW) routers
              - all but the "initial" packets - and from IR(VP)
              routers - the "initial" traffic packets the IR(LF/GW)
              router received before it got mapping for a matching
              EID.

              Forwards the decapsulated traffic packets to the
              destination EUN (End User Network).

              Assuming there are 3 IR(VP) routers and that all are
              working, ignore the 2nd and 3rd packet from these -
              only forward the first to the EUN.

              Work with the IR(LF/GW) and IR(VP) routers to solve
              PMTUD problems.

              (If existing complex mapping is used, then the IR(EID)
              role routers would need to work together in some way
              to prevent the destination EUN receiving duplicate
              initial traffic packets if some IR(VP) role routers
              tunneled packets to IR(EID)1 and others to IR(EID)2.
              However, if the mapping is only allowed to be a single
              IR(EID) address, like Ivip, then there will be no such
              problem with duplicated traffic packets.)


    VP-list servers

              These currently have no formal name or function - and
              they are probably not routers.  There needs to be
              some kind of global, redundant, load-shared,
              system by which all IRON routers get to know what the
              current VPs are, and which 3 or so IRON routers are
              playing the IF(VP) roles for each VP.

              This involves providing files for download and
              responses to delta checks.

              This is only if a DNS-approach is not used.

              Ideally, I think, there should be no single file, or
              single server - but some more distributed and fault-
              tolerant system than the one originally proposed by
              Fred, which involves a single file, from potentially
              multiple servers.


As with Ivip, EUNs must make their own arrangements to control the
mapping themselves, or appoint some other organization to do so.

If single address mapping is used, the above is a reasonably complete
description of the system.  If the existing LISP-like multi address
(two or more IF(EID) address) mapping is retained, then the IR(LF/GW)
role, and the IR(VP) role must also retain whatever mechanisms Fred
proposes for testing reachability and deciding which of the mapping's
multiple IR(LF/GW) role routers to tunnel the traffic packet to.

Ivip's network elements are as follows.  Please see these for more
information:

  http://tools.ietf.org/html/draft-whittle-ivip-arch
  http://www.firstpr.com.au/ip/ivip/drtm/

These could be roles so a single server or router performed multiple
such roles - but generally  I consider these to be separate classes
of device.  All these could be implemented as software on a server.


   ITR     Learn about all the "edge" space, such as by getting
           a list of MABs, from their local QSR (perhaps via
           one or more intermediate QSCs) - or a simpler list
           which doesn't mention individual MABs when two or
           more are adjacent.

           Advertise this "edge" space in the local routing system
           and so tunnel each received traffic packet which is
           addressed to an "edge" address to an ETR.

           If the ITR has no cached mapping matching the destination
           address, buffer the packet and send a Map Request (which
           includes a nonce and the packet's destination address) to
           a local QSR or QSC.  (Each ITR will auto-discover, or be
           configured with, the address of 3 or so QSRs or QSCs
           which it uses for all its Map Queries.  The ITR needs
           to resend the Map Request if no Map Reply arrives within,
           say, 80ms.)

           The Map Reply specifies a micronet of SPI (Scalable PI =
           "edge" space) and a single ETR address, with a caching
           time.  When this mapping arrives, tunnel the buffered
           packet to the single ETR specified in the mapping.

           If the mapping is already cached, when the traffic
           packet arrives, tunnel the packet to the single ETR
           specified in the mapping.

           Cache this mapping for the caching time, together with
           the nonce of the original request, and accept Map Update
           messages from the QSR or QSC, which will be secured by
           the same nonce.  These updates will either change the
           ETR address or tell the ITR to flush this micronet from
           the cache.  (The latter would be for when the existing
           micronet is split or joined to some other micronet - so
           if the ITR is still handling packets addressed to the
           old micronet, it will buffer them and make a new Map
           Request, to receive mapping for a different micronet which
           covers the destination address.)

           ITRs can be in ISP and EUNs, including EUNs using
           conventional edge space and those using SPI "edge"
           space.  So ITRs can be on a micronet address - SPI "edge"
           space.  ITRs cannot be behind NAT in the current design.

           ITRs work with ETRs to handle PMTUD problems caused by
           encapsulation.


   DITR    Like an ITR, but:

             1 - Advertises only a subset of the "edge" space -
                 specific MABs (Mapped Address Blocks) into the
                 DFZ, and so receives packets addressed to these
                 MABs.  An ordinary ITR receives packets addressed
                 to all MABs.

                 A DITR doesn't need to know all the MABs or all
                 "edge" space - it just gets the list of MABs it
                 needs to advertise from its QSA.

             2 - Looks up the mapping, if it is not already cached.
                 Sends a Map Request to a QSA which is in the same
                 rack at the same DITR site, so there is a fast
                 100% reliable, connection, and only a few ms
                 delay in being able to tunnel the packet.

             3 - Is located at a DITR site, where the site and its
                 one or more DITRs and QSAs typically only handle
                 a subset of the MABs.  (According to which MABOCs
                 this DITR site operator is working for.)

             4 - Analyses traffic so the company which operates the
                 DITR site can bill the MABOCs (MAB Operating
                 Companies) for the traffic handled for each MAB.
                 This analysis will include time and micronet
                 details so the MABOC can bill its SPI-leasing
                 EUN customers for each such customer's DITR traffic.

             5 - May connect to the QSA via a QSC, but most likely
                 just sends Map Requests, and receives Map Replies
                 and Map Updates, directly to the QSA which is at
                 the same site, and presumably in the same rack.

                 Conceptually, there is a single QSA, but in
                 fact there may be two or more for redundancy,
                 and perhaps the DITR will be configured to use
                 a QSA at another DITR site run by the same
                 company as a backup if its own site's QSAs
                 fail.  (This last option is not mentioned in
                 the DRTM ID or in Ivip-arch.)

             6 - Stops advertising its MABs in the DFZ if its
                 QSA is dead, or can't get up-to-date mapping.


   ITFH    Like an ITR, but is built into the sending host.

           Handles traffic packets sent to all MABs, but does
           not "advertise" routes to these - it simply intercepts
           outgoing packets generated by the host's otherwise
           conventional stack.

           The sending host can be on conventional space or on
           "edge" space (SPI, micronet space).  In the current
           Ivip design, it can't be behind NAT.


   QSA     Authoritative Query Server.  These are only located at
           DITR sites.  In theory a QSA could be authoritative for
           the mapping of all MABs, but in practice, each DITR site
           will only support a subset of MABs.

           Gets, by some means, a real-time feed of mapping changes
           for all its MABs and so maintains a complete mapping
           database for each MAB.  (How this is done is not
           currently specified, but since this is only for the QSAs
           in a single DITR network, and since each such network only
           handles a subset of MABs and will probably have no more
           than a few dozen DITR sites, this is assumed to be
           possible in secure, scalable, fashion.  Private network
           links could be used between these sites.)

           Responds to Map Requests from DITRs at this site -
           sending them Map Reply messages within a few milliseconds
           and sending them Map Update messages if and when
           required.

           Conceptually, there is a single QSA at each DITR site.
           In reality, there may be one for the use of the DITRs
           there and one or more for accepting Map Replies from
           typically nearby QSRs.

           QSAs are in DITR sites.  A single DITR network might have
           one or two dozen such sites, each handling the same
           subset of MABs.  These would be scattered around the Net
           to share the load and generally minimise total path
           lengths.


   QSR     Caching Resolving Query Server.  ISPs run 1 or more
           likely 2 or 3 of these for their own ITRs and for the
           ITRs in their customer networks.

           Auto-discovers, via a DNS mechanism, all the MABs and
           provides a form of this information - the complete set of
           "edge" space - to all the ITRs it serves.  Also, for each
           MAB, discovers the address of 2 or 3 typically nearby QSAs
           which handle that MAB.

           Accepts Map Requests from queriers (ITRs and QSCs) and
           sends them Map Replies and Map Updates.

           Answers the queries from its own cached mapping or by
           sending a query to one of the nearby QSAs, depending
           on which MAB the queried address lies within.

           Sends its own Map Requests to QSAs, depending on which
           VP the queried address fits within.  Accepts Map Replies
           and later potentially Map Updates from these QSAs.


   QSC     This is an optional device - a Caching Query Server.

           It accepts Map Requests from ITRs and/or other QSCs -
           and sends them Map Replies and Map Updates.

           It sends its own Map Requests (when it receives a
           Map Request it can't answer from its cache) to
           one of the handful of QSRs and/or QSCs which are
           "upstream".   QSCs, when they serve multiple ITRs,
           can frequently answer Map Requests from their cache
           - since a previous request by another ITR filled the
           cache.  So QSCs can reduce the workload of QSRs.

           (The code for the ITR/DITR, QSA, QSR and QSC functions
           will have many common elements.)


   ETR     Egress Tunnel Router.  Accepts the tunneled traffic
           packets from ITRs and forwards them to the destination
           network.  May be in the ISP network and so shared by
           multiple destination networks, or may be located in
           the destination network, such as on a PA address from
           the ISP.

           Works with ITRs to handle PMTUD problems caused by
           encapsulation.

Please see the text and diagrams at:

  http://www.firstpr.com.au/ip/ivip/drtm/

for how all these fit together, and for how initial services and
substantial scalable routing benefits will result without any ISP
investment, just by using DITRs, QSAs and ETRs.

The network elements are, in their logical groups:

   ITR        QSC          QSR        QSA      ETR
   DITR       (Optional)
   ITFH

All but the ETR would share some common code elements.


Ivip scales better than the above modified version of I-R for a
number of reasons:

  1 - Ivip ITRs are simpler, since they don't need to know all
      the MABs - just the subset of global unicast space which is
      "edge" (SPI) space.

      They don't need to know each MAB, or know anything about each
      MAB - while the I-R equivalent - IR(LF/GW) - needs to know all
      VPs and know 3 or so IR(VP) routers for each VP.

      If I-R retains its current multi-address mapping arrangements,
      Ivip ITRs are much simpler because they use single ETR address
      mapping and so do not do any reachability testing or make any
      choices between multiple ETR addresses.

      This means ITRs can be more numerous, closer to hosts and even
      in sending hosts.  This reduces the load per ITR, enabling them
      to be cheaper, including being implemented with software on an
      inexpensive server.

  2 - An Ivip ITR only tunnels all traffic packets to a single ETR,
      while an I-R IR(LF/GW) router tunnels initial packets to all 3
      or so IR(VP) routers.

  3 - An Ivip ETR only receives a single initial packet, while an I-R
      IR(EID) router typically receives 3 or so, and must use only
      the first.  This is only true if the current multi-address
      complex mapping is retained.  If I-R adopts the single address
      (single IR(EID) router) mapping like Ivip, then this problem
      of duplicate traffic packets arriving from two or more IR(EID)
      routers won't occur.

  4 - In I-R, the authoritative query servers are the IR(VP) routers.
      Due to the "scattergun" approach to handling their potential
      unreachability - tunneling initial traffic packets to all
      IR(VP) routers, the number of IR(VP) routers needs to be
      strictly limited.  I assume a figure of 3 or so.

      This means that the VPs must be made small enough that each
      IR(VP) router can handle the load of all initial traffic
      packets handled by all the IR(LF/GW) routers in the world.

      Consequently, there will be more VPs in I-R than MABs in Ivip.

  5 - The current I-R design involves each IR(GW) advertising all the
      VPs in the DFZ.  (With IPv6, this might be achieved with a
      single short prefix, if all the "edge" space is within that
      prefix.)

      This means there must be a lot of these IR(GW) routers, since
      at any one location in the DFZ, it must be assured that no such
      IR(GW) router becomes congested.

      While an Ivip DITR could advertise all MABs, in practice each
      DITR (each DITR site) will only handle a subset of MABs.  To
      the extent that a single DITR can't handle all the MABs of a
      site, there can be multiple DITRs there.  (A MAB can even be
      split and handled by 2 or more DITRs at a given site, but
      still advertised by a router there as the single MAB prefix.)

  6 - Ivip includes plans (charged-for mapping changes and DITR
      traffic) which allow attractive business cases to be made for
      DITRs and the DITR sites, which will be run by, or for,
      MABOCs.

  7 - In I-R, each IR(LF/GW) directly queries the authoritative
      query server(s).  It actually queries all 3 or so IR(VP)
      routers.  Furthermore, the query is not a short packet, but
      a potentially long traffic packet.

      These implicit queries continue, in their triplicated form,
      until the IR(LF/GW) router receives at least one Map Reply from
      one of the IR(VP) routers.  Also, the authoritative query
      servers - IR(VP) routers - need to tunnel these traffic packets
      to IR(EID) routers.

      In short, there is no "aggregation", "cached concentration" or
      whatever it might be called, between the querying ITR-like
      devices and the authoritative query servers.

      Likewise, in this modified version of I-R with real-time
      mapping distribution due to Map Update messages, the
      authoritative query servers need to do all the work of
      sending Map Updates directly to each ITR-like IR(LF)
      router which is caching a Map Reply mapping for the EID
      prefix whose mapping just changed.  (Also, all 3 or so
      IR(VP) routers have to do this, so each IR(LF) router gets
      typically 3 Map Updates.  This is simple and more robust than
      getting one, but it is less efficient.)

      In Ivip, the authoritative QSAs are not queried directly by the
      ITRs, except by the DITRs at that site.

      The ITRs may query via one or more levels of caching query
      server (QSC), each level of which tends to reduce the workload
      of the upstream query server, which may be another QSC or one
      of the QSRs.

      Even without any QSCs, ITRs always query local QSRs,
      and when each QSR serves a large number of ITRs, each QSR will
      frequently be able to answer from its cached mapping, thereby
      reducing the number of Map Requests the authoritative QSAs must
      handle.

  8 - This reduces the workload of QSAs.  Firstly, they receive fewer
      Map Queries and send less Map Replies.  Secondly, when a
      micronet's mapping changes, they send typically far less Map
      Updates than there are ITRs which need Map Updates, because
      they send the Map Update to a QSR, which will typically send an
      equivalent Map Update to multiple ITRs, either directly or via
      one or more levels of QSCs.

  9 - In Ivip, there is no very low, such as 3 or so, limit on the
      number of authoritative query servers.  As noted above, for
      each I-R VP, there can probably be no more than 3 or 4 IR(VP)
      routers.

      With Ivip, there can be as many QSAs as there are DITR sites -
      and even within a DITR site, there can be multiple QSAs to
      spread the load of the queries for concerning multiple MABs.

There may be other scaling benefits to Ivip as well.

To summarize:

    I-R is an interesting and in principle relatively simple CES
    architecture.  However, this simplicity involves a direct
    communication path between the ITR-like devices and the
    authoritative query servers - with a further "scattergun"
    inefficiency in these interactions.

    Ivip has more types of network elements and is more complex,
    but this enables the workload to be split up in a manner which
    reduces total effort - and in particular reduces the total
    effort by the authoritative QSA query servers or any other
    single network element.

    With no low (3 or so) limit on the number of authoritative
    QSA query servers, Ivip can have dozens, or in principle
    hundreds of QSAs, to share the total load for the MABs they
    handle - though I expect most DITR networks to work fine with
    10 to 20 sites.