[rrg] LISP PMTU - 2 methods in draft-farinacci-lisp-11

Short version:  Stateless approach is a really bad idea in the future
                when ~9000 byte MTU ITR to ETR paths become common.

                Stateful approach somewhat resembles Ivip's approach
                - which is documented in greater detail and limits
                the number of traffic packets for which state must
                be stored in case a PTB is received.

Here my thoughts on the two techniques (Stateless and Stateful) for
LISP Path MTU Discovery management in the latest (19 December) draft.

My understanding of the whole problem (for Ivip's map-encap modes or
for any other map-encap scheme, such as LISP or APT), together with
my solution for Ivip, is here:

  http://www.firstpr.com.au/ip/ivip/pmtud-frag/

Here is my critique of the Stateless approach, assuming L = 1500, as
is recommended at the end of:

  http://tools.ietf.org/html/draft-farinacci-lisp-11#section-5.4.1

The stateless approach was rejected by the OpenLISP team:

http://tools.ietf.org/html/draft-iannone-openlisp-implementation-01#section-6.8.1

Stateless IPv4 DF=0
-------------------

  L is chosen for the entire global LISP system to be a minimum
  value of MTU which can be expected of any ITR->ETR tunnel.  Here
  I assume 1500 bytes, as is recommended in the last line of section
  5.4.1.  Perhaps it would need to be less, such as 1470 or less.
  Google servers regularly send 1470 byte DF=0 packets:

http://www.firstpr.com.au/ip/ivip/ipv4-bits/actual-packets.html#google-no-pmtud

  IPv4 DF=0 packets longer than some size S will be fragmented
  into two fragments, each of which will be encapsulated.  The
  ETR decapsulates them separately and sends the two fragments
  to the destination network.

  S is set globally to a value is L minus the encapsulation
  overhead = 36 bytes.  (IPv4, UDP and LISP headers, in section
  5.1).  So S = 1464 bytes.

  This will not work when the packet length is more than about
  twice S (since fragmentation produces two packets whose
  combined length is marginally longer than the original packet).

  The statement:

    "This will ensure that the new, encapsulated packets are of
     size (S/2 + H), which is always below the effective tunnel MTU."

  will not apply when the incoming packets are more than about
  twice S in length.

  Still, with a bit of tightening of the spec, I see no major
  problems with this approach to IPv4 fragmentable packets compared
  to what I propose for Ivip.  I don't think a core-edge separation
  scheme should be required to support DF=0 packets longer than
  something like 1470 bytes into the indefinite future.

  RFC 1191 PMTUD has been around since the early 1990s and I think
  it is time that hosts stopped expecting the network to fragment
  packets.  The IPv6 designers evidently thought the same in 1996.

Stateless IPv4 DF=1 and IPv6
----------------------------

  This approach makes no sense for the long-term future since it
  forces all traffic to be in packets no longer than 1464 (IPv4)
  bytes:

      the ITR will drop the packet when the size is greater than L,
      and sends an ICMP Too Big message to the source with a value of
      S, where S is (L - H).

  Replacing the variables with constants for IPv4, this means:

      the ITR will drop the packet when the size is greater than 1500
      bytes, and sends an ICMP Too Big message to the source with a
      value of 1464 bytes.

  For IPv6, the limiting size is set by the 56 byte overhead, to 1444
  bytes.

  A proper solution should keep working well when the DFZ includes
  MTUs of 9000 bytes between ITR and ETR - and should allow the
  sending host to generate packets of nearly this size, so that once
  encapsulated they still fit within whatever the MTU is for the
  path to this ETR.

  Fred Templin raised similar concerns:

  http://www.irtf.org/pipermail/rrg/2009-January/000884.html

  (My guess is that the non-DF=0 part of this approach was written
  with an assumption that L would be individually determined by the
  ITR for each ETR - but that is not the way the I-D is written: L is
  fixed at 1500 bytes, and any such method would be stateful.)

Stateful approach for all packets
---------------------------------

I am reading the LISP version - I have not looked in detail at the
OpenLISP source of this approach.

This makes no reference to IPv4 DF=0 packets.  So this approach of
the ITR sending a PTB packet to the sending host when a DF=0 packet
exceeds some length is not going to result in any action on the part
of the sending host.  Such a DF=0 packet will be dropped by the ITR.
 That may be OK - Ivip will do much the same - but it needs to be
specified clearly.

This approach of determining the MTU to each ITR by receiving ICMP
messages from an intermediate router needs to be done securely.

It requires the ITR to cache significant amounts of information for
every packet it sends which might trigger such a PTB.

The intermediate router would need to send back sufficient of the
original packet to ITR to include the LISP nonce.  Otherwise, PTBs
spoofed by off-path attackers would be accepted and the whole system
could easily be DoSed.

The ITR needs to store an initial fragment of each incoming traffic
packet for some time, so it can generate a PTB message for the
sending host.  It can't rely on enough of the original packet coming
back in the PTB from the intermediate router.  The ITR needs to cache
this for a second or two at least - while it waits for a possible
PTB.  This is an onerous requirement in a high-volume ITR.

Ivip map-encap ITRs need to perform a stateful PMTU determination
process which is somewhat similar to this.  However, the Ivip
approach quickly narrows the zone of uncertainty so that the number
of packets involved in testing PMTU is very limited.

The Ivip approach is specified in greater detail, including
occasionally sending longer packets (if and when the current or some
other sending host generates them) to see if the MTU has grown
since the last PTB set a limit on it, as far as the ITR was concerned.

 - Robin

_______________________________________________
rrg mailing list
rrg@irtf.org
http://www.irtf.org/mailman/listinfo/rrg