[rrg] Ivip map-encap & MHF in draft-irtf-rrg-recommendation-00

Robin Whittle <rw@firstpr.com.au> Tue, 24 February 2009 10:45 UTC

Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3B79E3A6A90 for <rrg@core3.amsl.com>; Tue, 24 Feb 2009 02:45:45 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.663
X-Spam-Level:
X-Spam-Status: No, score=-0.663 tagged_above=-999 required=5 tests=[AWL=0.433, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, SARE_SUB_RAND_LETTRS4=0.799]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1hU7exzLlPkN for <rrg@core3.amsl.com>; Tue, 24 Feb 2009 02:45:42 -0800 (PST)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id B122D3A67FD for <rrg@irtf.org>; Tue, 24 Feb 2009 02:45:41 -0800 (PST)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id 44D91175A86; Tue, 24 Feb 2009 21:45:59 +1100 (EST)
Message-ID: <49A3CF70.2020602@firstpr.com.au>
Date: Tue, 24 Feb 2009 21:44:00 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.19 (Windows/20081209)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: [rrg] Ivip map-encap & MHF in draft-irtf-rrg-recommendation-00
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Feb 2009 10:45:45 -0000

Here are some suggestions for improving:

  http://tools.ietf.org/html/draft-irtf-rrg-recommendation-00

to properly describe the architectural elements which are part of the
Ivip proposal.  This is an update of:

  Re: [rrg] Summary of architectural solution space - Ivip still
  isn't properly covered  (2008-12-21)
  http://www.irtf.org/pipermail/rrg/2008-December/000526.html


I think there are multiple problems with section 3.3.  As it stands
at present, it is based on Bill Herrin's page (sixth draft):

   http://bill.herrin.us/network/rrgarchitectures.html

You can see from my attempt to match actual proposed architectures to
this taxonomy:

   http://www.firstpr.com.au/ip/ivip/rrgarch/

that I think this taxonomy needs quite a lot of work to ensure it
properly covers the most prominent proposals.


3.3.1.1. Variants

  I think A1c or perhaps A1b best describes Ivip.  I am not sure if
  A1a or perhaps A1b applies to any architecture which has been
  proposed.



3.3.1.2. Mapping approaches

  As I wrote here:

    http://www.irtf.org/pipermail/rrg/2008-December/000526.html

  I think none of A2a, A2b or A2c adequately describe Ivip.

  A2b comes closest, but it is not true to say of Ivip that:

     "The registry pushes all incremental changes in near-real time
      to all encoders which add RLOCs to the packets."

  That implies all ITRs get all mapping changes, which is not
  the case with Ivip - or with APT.  I think A2b does not match any
  proposed architecture.

  I suggest a new item be added (or maybe to replace A2b):

  A2d.  GUIDs dynamically mapped to each RLOC are pushed towards a
        central or distributed registry as they change. The registry
        pushes all incremental changes in near-real time to all
        full database query servers in ISP and/or end-user networks.
        The encoders request mapping from these local query servers.
        The response has a caching time and the local query server
        will push any changed mapping to an encoder if it receives
        such a change for mapping which matches a recent query which
        is still within its caching time.  There may be one or more
        full database query servers in each ISP and there may be one
        or more layers of caching query servers between these and the
        encoders.

  For brevity, I am leaving out other important things such as it
  being possible for an end-user network to have its own query
  servers, including full-database and/or caching, for its own ITRs
  - and the option of having the ITR function in sending hosts (not
  behind NAT).

  I think APT should have its own separate description.


3.3.1.3. Failure handling approaches

  I suggest A3b be replaced by:

    A3b. End-user networks are responsible for controlling the
         mapping of their EID address space.  They may do this
         directly or they may contract a third party to do this
         for them.  Each EID prefix (or non-prefix, contiguous,
         range of address space: one or more contiguous IPv4
         addresses or IPv6 /64s) is mapped to a single RLOC
         address.   With strategy A2d, for reasons such as a
         multihoming link failure, to implement inbound traffic
         engineering, or to implement address portability when
         moving to another ISP, the end-user network causes the
         mapping to change to the new RLOC address, and this is
         conveyed to all full database query servers in near real
         time.  These push the changed mapping to any encoders which
         may need it, based on previous queries and the caching
         times of the responses.



3.3.1.4. Compatibility approaches

  The current A4d is a rough description of LISP's approach to
  PMTUD.  It does not adequately describe Ivip's approach.

  I suggest this new version of A4d to make it clearer that this is
  one of at least two ways of dealing with the PMTUD problems caused
  by encapsulation:

  A4d  Standard IPv4 and IPv6 packets are tunnelled while they
       transit the Internet core, by encapsulating them with
       a header in which the source address is that of the encoder.
       Path-MTU problems in routers between the encoder and decoder
       are handled by sending a Packet To Big (PTB) to the encoder
       which must construct a valid PTB message and send it to the
       originating host.  As noted below {pointing to the new text
       below I suggest for 3.3.1.6) this requires the encoder to
       retain significant state, the encapsulation header to
       include a nonce and the original PTB to contain enough of the
       offending packet to include the nonce.


            (I guess APT would use much the same approach as
             LISP's, although the work would be split in some way
             between the ITR and Default Mapper, whereas in LISP it
             is all done by the ITR - "encoder" in this document.)

  I think this is a good, terse, description of LISP's stateful
  approach to PMTUD.  However, to keep it short, I haven't mentioned
  one important thing - that this approach is unlikely to support
  hosts sending fragmentable packets longer than some length a little
  less than 1500 bytes.  We don't want jumboframe-inclined hosts
  sending 9000 byte fragmentable packets in a world where DFZ links
  often or even sometimes have 1500 byte MTUs, the ITR dutifully
  encapsulating them and then having them fragmented into 8 or more
  pieces there or en-route to the ETR.


  I suggest this new item be added.  I will call it A4h, but
  these should probably be reordered in the final draft so the
  following comes after A4d:

  A4h  Standard IPv4 and IPv6 packets are tunnelled while they
       transit the Internet core, by encapsulating them with
       IP-in-IP, with no other header information and with the source
       address of the outer header being that of the originating
       host.  The encoder maintains an upper and lower estimate of
       the PMTU to each decoder it is currently tunneling packets to.
       When a traffic packet arrives at the encoder which, with
       encapsulation overhead, would have a length exceeding the
       upper estimate of the PMTU to the relevant decoder, the
       encoder sends a PTB to the originating host.  When the
       encapsulated length is less than the lower estimate, the
       encoder encapsulates and tunnels the packet normally.

       When the encapsulated length falls between the two estimates,
       the encoder splits the packet and sends it as two.  One short
       packet is encapsulated with the outer source address being
       that of the original sending host.  The other packet has the
       same length as the traffic packet would have if it were
       normally encapsulated.  Both packets contain a nonce and
       potentially other information to enable the decoder to
       positively acknowledge the correct receipt of both packets.
       The longer packet's outer source address is that of the
       encoder, so if it exceeds the next hop MTU of an intermediate
       router, that router will send a PTB to the encoder.  That PTB
       should contain enough of the encapsulation header to include
       the nonce.  This, the positive acknowledgement or lack thereof
       from the decoder, and subsequent packets from the
       originating host will enable the encoder to progressively
       adjust its lower and upper estimates of PMTU for this decoder
       until they match.  The encoder is able to send a valid PTB
       to the sending host if the pair of packets did not
       successfully reach the decoder.


  I know this is longish, but map-encap's PMTUD problems are
  difficult to solve.  This is a really short description of
  the best approach I can think of - actually the only one I
  think which could work.  This is more fully developed than
  LISP's stateful approach.

  It avoids sending nonces or any other header guff beyond basic
  IP-in-IP encapsulation for the great majority of packets.  Only
  those few packets whose length falls within the current zone of
  uncertainty a (between the lower and upper estimates of PMTU) are
  given this special dual-packet treatment.  Every such treatment
  will enable the ITR to narrow the zone of uncertainty by either
  raising its lower estimate of PMTU or dropping its upper estimate.

  This is a good enough overall description for the taxonomy, but it
  doesn't describe everything, for instance how the encoder (ITR)
  needs to periodically test that the current estimates of PMTU
  are not too high or too low.


  Here are improved versions of the descriptions of Ivip's
  Modified Header Forwarding techniques.  I am trying to keep
  them brief.


   A4f  For IPv6, rather than encapsulate the packet to tunnel it to
        the decoder, use a modified header format in which
        19 or 20 bits of the IPv6 header's flow label are used
        to contain enough information about the RLOC (such as
        which of a range of DFZ prefixes the RLOC is within) for the
        packet to be forwarded by suitably modified routers to the
        correct ISP network for that RLOC.  The encoder sets these
        bits and  the decoder restores the original state of the
        packet.


   A4g  For IPv4, rather than encapsulate the packet to tunnel it
        to the decoder, use a modified header format containing
        the 30 bit RLOC address of the decoder so suitably modified
        routers will forward the packet to the decoder.  These bits
        are currently used for the More Fragments flag, the Fragment
        Offset and the Header Checksum.  The encoder sets these
        bits and the decoder restores the original state of the
        packet.  The encoder does not accept fragments or
        fragmentable packets longer than some constant, likely to
        be marginally below 1500 bytes.



The descriptions of these approaches in 3.3.1.x are exceedingly
minimal.  I don't know why everything has to be so terse.  Internet
drafts are printed on 100% recycled electrons, and I think it is
important to help people - including those outside the RRG - identify
and understand the various possible architectures as clearly as
possible.

The taxonomy should clearly cover every seriously proposed
architecture - and should probably not cover anything else without
noting that such an architecture may be possible, but has not yet
been proposed.


I think that if the RRG's recommendation (even if it is an interim
summary of progress to date) is is to have any discussion of the
strengths and weaknesses of various approaches, then that discussion
should be full and detailed, not just some terse, arbitrarily
truncated subset of what a fully detailed list of strengths and
weaknesses would be.  I think the best way to do this is for the
individual strategies to have a fuller description than at present,
including probably their most prominent strengths and weaknesses.
Otherwise, many important aspects of the strategies would need to be
mentioned in the later evaluation of strengths and weaknesses.



3.3.1.6. Major criticisms

  This statement is inadequate:

    Handling path-MTU is a usually problem since the packets
    in the core are different than the origin host would recognize.

  This does not apply to "Map & Forward with Modified Header
  Forwarding" because the router which creates the PTB message can
  reconstruct the original state of the packet, and so can generate a
  perfectly valid PTB which will be recognised by the sending host.
  Also, since there is no  packet overhead, the MTU value returned in
  the PTB message gives the sending host the correct guidance on how
  long a packet it can send to avoid MTU problems at that router in
  the future.

  I am wary of the term "genuinely clean", but will retain it in
  what follows.

  I suggest this new text for 3.3.1.6:

    There don't appear to be any genuinely clean ways of implementing
    strategy A.

    Encapsulation-based approaches raise Path MTU Discovery problems
    for two reasons.

    Firstly, the encapsulated packet is longer than the packet sent
    by the sending host, meaning that any MTU value returned by a
    router in a PTB message does not give the sending host the
    information it needs to avoid the MTU limitation.  Secondly, the
    PTB as generated by the router is will not be recognised by the
    sending host, either due to the encapsulated packet being
    different to that sent (if the outer source address is that of
    the sending host) and/or (if the outer source address is that of
    the encapsulating router) due to the PTB being sent to the
    encapsulating router, rather than the sending host.  In the
    latter case, a successful solution to the PMTUD problem relies on
    the encapsulating router (encoder) being able to securely
    construct a valid PTB to be sent to the sending host.  This in
    turn involves significant processing at that encoder and the
    retention of a problematic amount of state: for each packet sent
    which might cause MTU problems.  It also relies on the
    encapsulated packet having a nonce in its headers and the
    original PTB carrying enough of that packet to include the
    nonce - which is more than is required by RFC 1191.

    Forwarding with modified headers does not involve the PMTUD
    and overhead problems which are inherent in encapsulation.  The
    router at which the MTU limit is reached can reconstruct the
    original packet as sent by the originating host, and so can create
    a valid PTB.

    However, forwarding with modified headers involves upgrades to
    all core routers and some or many internal routers.  This is
    feasible in the long-term, so an encapsulation system could
    and arguably should migrate to a modified header forwarding
    system over time.  Depending on the flexibility of the installed
    base of routers and on the urgency of introduction, it may be
    possible to introduce a scalable routing architecture without
    using encapsulation by using just modified header forwarding from
    the outset.

    For IPv4, with either modified header forwarding or encapsulation
    and the PMTUD management required for encapsulation, it seems
    unlikely that it will be possible to support fragments, or
    fragmentable packets longer than a constant which is likely to be
    somewhat below 1500 bytes.


I can't see a terser way of summing up the PMTUD problems of Map &
Encap and how MHF differs.

Unless there is a robust, efficient method of solving the PMTUD
problems, these problems are a show-stopper for Map & Encap.  If the
RRG's report is going to discuss the strengths and weaknesses of the
various approaches, then I think the discussion must be much more
complete than the current, overly terse, rather than piecemeal text.


I have avoided mentioning how Ivip's approach involves only a handful
of packets getting the special handling required for PMTUD, and how
those include a nonce, but that all other traffic packets include no
such nonce, thereby reducing the overhead.  Nor have I mentioned the
reason why Ivip sends encapsulated packets with the outer source
address being that of the sending host (to automatically extend
border router ingress filtering of source addresses to the
decapsulated packet, by way of the ETR dropping packets where the
inner source address does not match the outer source address).



The next sentence is inadequate too:

  Extra bandwidth is consumed by the ingress tunnel router figuring
  out whether the egress tunnel router is still available and
  functioning.

I suggest this be replaced with something like:

  Extra bandwidth, in addition to that required to distribute
  mapping, may be consumed by some Strategy A approaches.

  For instance, some architectures use A3a - multiple RLOCs per
  EID - with the encoders (ingress tunnel routers) being responsible
  for individually determining which RLOC address to use when
  encapsulating packets.  Multiple RLOC addresses are required
  because these architectures use A2c or some other non-real-time
  approach to mapping distribution.  The encoders are therefore
  required to individually determine reachability to each encoder
  they are currently tunneling packets to.  This involves extra
  state, computational load and bandwidth consumption by each encoder
  and probably each decoder.   While this approach may be capable of
  coping better with some localised outages than A4h, it involves
  considerable scaling problems for busy encoders and decoders.

  Extra bandwidth is consumed by all encapsulation approaches.  A4h
  encapsulation consists only of IP-in-IP header (except when
  the encoder is discovering the PMTUD to a decoder).  Encapsulation
  architectures other than A4h add additional headers, such as a UDP
  header followed by an architecture-specific header to each
  traffic packet.  When probing PMTU, A4h involves extra complexity
  for the encoder and decoder, with an additional short packet and a
  differently encapsulated full-length packet, with further packets
  needed for the decoder to acknowledge receipt of one or both
  packets.  However, this probing activity is only prompted by
  longer traffic packets which are used as the probes.  It is
  self-limiting since each such probe reduces the uncertainty the
  encoder has about the PMTU to a given decoder.


This sentence needs to be changed too:

  Border filtering of source addresses becomes problematic.

This is true of LISP, APT and TRRP - but Ivip shouldn't be tarred
with the same brush.

Here is a suggested replacement.

  The encapsulation architectures which use the encoder's address
  in the outer header involve significant problems with border
  filtering of incoming packets based on source address - for
  instance to stop spoofing of source addresses inside an ISP's
  network.   For such architectures (A4d), the only solution seems
  to be to replicate this expensive filtering on the decapsulated
  packets, at every decoder (egress tunnel router).

  With strategy A4h, border filtering of source addresses can
  be extended to the decapsulated packets much more easily - by
  the ETR dropping any inner packets whose source address do not
  match the outer source address.

  Strategies A4f and A4g have no problems with this filtering,
  since the modified border routers will continue to filter on the
  source address, which is unchanged in the header.


This sentence is another instance of a statement which applies to
LISP, as currently conceived, but not to Ivip:

   Deployment may require heavy weight "for the public good" relays
   in the non-upgraded part of the Internet to facilitate migration.

Depending on your view of APT's handling of packets from non-upgraded
networks (which I think is more favourable if there is a single APT
island, rather than multiple such islands) the above statement is not
an adequate assessment of APT either.

Here is a suggestion which retains the critique of LISP - which
currently has no business plan for Proxy Tunnel Routers (or a
technical arrangement supporting ownership and running which could
lead to a favourable business plan) but which recognises that Ivip
has more promising arrangements.  This would need to be modified or
extended to adequately describe APT and TRRP.  (TRRP's approach
involves a bunch of sticks and carrots which would be challenging to
describe in a concise manner!)

  In order to be attractive for early adoptors, Strategy A
  architectures need a means of handling packets sent from networks
  which lack encoders.  This raises technical and business problems,
  considering the strong desirability of these packets reaching a
  suitable "open", "proxy" or "relay" encoder which is not too far
  out of the path the packet needs to take to the correct decoder.

  Since these "open" (etc.) encoders need to be in the DFZ, and so
  need to advertise prefixes in order to attract these raw packets
  from non-upgraded networks, it is vital for routing scaling reasons
  that each such advertisement covers the EID space of a very large
  number of end-user networks.

  These constraints lead to questions of who is to operate this
  presumably widely spread set of "open" encoders, who is to manage
  the large blocks of address space from which EIDs are subdivided,
  and how the operators of these encoders will be paid according to
  the widely varying traffic they handle for packets addressed to
  different end-user networks.

  These problems are not necessarily incapabable of solution.  The
  best outcome will involve technical arrangements specifically
  intended to support address management and other business
  arrangements which make it attractive for organisations to
  operate these one or more globally distributed networks
  of "open", "proxy" or "relay" encoders.

For a fuller discussion of what I am trying to summarize here, see:

  Business incentives for LISP PTRs and Ivip OITRDs
  http://psg.com/lists/rrg/2008/msg02021.html


These sentences do apply properly to Ivip:

   During the transition period, it appears difficult to remove
   legacy prefixes from the global routing table.  The best that can
   be done is to advertise aggregates of legacy prefixes from the
   relays.  This may have an impact on stretch.

I am not sure how well they apply to LISP or APT, since there is
little or nothing in those proposals about how end-user networks will
gain new EID space, or convert their current PI space to EID space.

The concept of a "transition period" is not necessarily valid.  There
may never be a complete switch-over of all end-user networks to the
new form of addressing.  Ivip is not predicated on there being a
complete switch-over.  The idea is to make the new kind of space so
attractive to end-user networks of all types and of all sizes (at
least those which require multihoming, portability and/or mobility)
that over time, the great majority of end-user networks will adopt
the new space.  In this model, the routing scaling problem can still
be solved without a complete transition.

The first sentence above seems to assume that EID space will often or
always be created by converting existing PI prefixes.  That will be
part of the story, I am sure - for instance for large corporate and
university networks with PI space today, who decide to adopt the new
form of scalable addressing.  However, since the various scalable
routing architectures are intended to scale to a vastly higher number
of end-user networks (billions) than have PI space today (one or so
hundred thousand?), clearly, in the future, most such end-user
networks will be getting fresh address space which is already part of
the new addressing scheme.  The first sentence's concern does not
apply to the EID space of all those "new" end-user networks.

"Stretch" (longer total paths than would otherwise be the case) would
be a problem if the "relays" (meaning LISP Proxy Tunnel Routers) are
not widely distributed.  (Actually, it is OK to have PTRs or their
Ivip equivalents - OITRDs - only in one area, as long as all the ETRs
for EID prefixes advertised by those PTRs or OITRDs are in the same
area.  I am assuming a more general case which must provide generally
optimal paths when both the sending hosts and the ETRs are freely
distributed all over the world.)

If my above suggested text is included, I don't think we need these
last 3 sentences ("During ..", "The best ..." and "This may ...").
There are challenges.  I think Ivip has a promising set of technical
and business measures to solve them - and that the other proposals so
far do not.

We do need to discuss the problems of support of packets from
non-upgraded networks, since lack of this would be a show-stopper for
any map-encap or map-forward-with-modified-headers architecture.

Without going into much longer discussions, I think my text above
discusses the challenges and indicates that solutions seem to be
possible, but are not yet developed for all architectures.


The current discussion says nothing of the ability of various
Strategy A architectures to support mobility.  Current mobile IP
approaches are limited in scope, and only work for IPv6. They are
essential for mobility within a particular network but not suited to
the real problem of providing seamless global mobility across every
kind of access network.

So just adding map-encap or map-forward without special consideration
of mobility is not going to enable a better total mobility outcome
than what is currently possible with MIPv6.

Mobility support for map-encap or map-forward does not involve
changing the mapping every time the MN gets a new CoA.  The only
feasible solution seems to be to use two-way tunnels from each CoA to
an ETR-like "Translating Tunnel Router" (TTR), as described here:

  TTR Mobility Extensions for Core-Edge Separation Solutions to the
  Internet's Routing Scaling Problem
  http://www.firstpr.com.au/ip/ivip/TTR-Mobility.pdf

This approach does not add any complexity to the core-edge separation
scheme itself, since the TTRs behave like any other ETR or ITR.

The near-real-time mapping approach of Ivip, or any other
architecture which adopts the A2d approach, as described above, is
clearly in a better position to adapt quickly to end-user needs to
change the mapping to a new TTR than the non-real-time approaches of
A2a (LISP-NERD) or A2c (LISP-ALT, APT and TRRP).


I suggest some new text such as:


   The optimal new routing and addressing architecture should
   arguably provide maximal support for mobility on a global scale.
   Existing mobile IP approaches are for IPv6 only and tend to rely
   on additional capabilities in the access network and/or on
   a "home agent" router, which may not be close to the
   mobile node's current location - raising performance and
   efficiency problems.

   While strategy A architectures may not pose inherent difficulties
   for existing mobile IP techniques, it would be an advantage if
   an architecture supported potentially new mobility techniques
   which provided global mobility, with any access network, with
   generally optimal path lengths and without reliance on any
   fixed "home agent" or on particular capabilities of access
   networks.


  - Robin