Re: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware

"Templin, Fred L" <Fred.L.Templin@boeing.com> Wed, 03 February 2010 16:09 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 36D6C28C119 for <rrg@core3.amsl.com>; Wed, 3 Feb 2010 08:09:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.524
X-Spam-Level:
X-Spam-Status: No, score=-6.524 tagged_above=-999 required=5 tests=[AWL=0.075, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iknk7l1pjrLv for <rrg@core3.amsl.com>; Wed, 3 Feb 2010 08:09:36 -0800 (PST)
Received: from stl-smtpout-01.boeing.com (stl-smtpout-01.boeing.com [130.76.96.56]) by core3.amsl.com (Postfix) with ESMTP id EE04728C0EC for <rrg@irtf.org>; Wed, 3 Feb 2010 08:09:35 -0800 (PST)
Received: from stl-av-01.boeing.com (stl-av-01.boeing.com [192.76.190.6]) by stl-smtpout-01.ns.cs.boeing.com (8.14.0/8.14.0/8.14.0/SMTPOUT) with ESMTP id o13G9tSf013651 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 3 Feb 2010 10:09:58 -0600 (CST)
Received: from stl-av-01.boeing.com (localhost [127.0.0.1]) by stl-av-01.boeing.com (8.14.0/8.14.0/DOWNSTREAM_RELAY) with ESMTP id o13G9tcZ026978; Wed, 3 Feb 2010 10:09:55 -0600 (CST)
Received: from XCH-NWHT-07.nw.nos.boeing.com (xch-nwht-07.nw.nos.boeing.com [130.247.25.111]) by stl-av-01.boeing.com (8.14.0/8.14.0/UPSTREAM_RELAY) with ESMTP id o13G9s0i026953 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Wed, 3 Feb 2010 10:09:55 -0600 (CST)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-07.nw.nos.boeing.com ([130.247.25.111]) with mapi; Wed, 3 Feb 2010 08:09:54 -0800
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>
Date: Wed, 03 Feb 2010 08:09:53 -0800
Thread-Topic: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware
Thread-Index: Acqk25dOklPODwXvQ7WBRr0dZpqMnQAC/Beg
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64950FECB19@XCH-NW-01V.nw.nos.boeing.com>
References: <4B5ED682.8000309@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF 0A64950F33198@XCH-NW-01V.nw.nos.boeing.com> <4B5F8E7E.1090301@firstpr.com. a u> <E1829B60731D1740BB7A0626B4FAF0A64950F332A8@XCH-NW-01V.nw.nos.boeing. c om> <4B5FC783.4030401@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64 9 50F3333F@XCH-NW-01V.nw.nos.boeing.com> <4B6103C8.6090307@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64950FEC1D3@XCH-NW-01V.nw.nos.boeing.com> < 4B6473E5.1000508@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64950FEC98C@XCH-NW-01V.nw.nos.boeing.com> <4B698565.8030301@firstpr.com.au>
In-Reply-To: <4B698565.8030301@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: RRG <rrg@irtf.org>
Subject: Re: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Feb 2010 16:09:38 -0000

Hi Robin,

Thanks for your message; a few points for clarification below:

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Wednesday, February 03, 2010 6:17 AM
> To: Templin, Fred L
> Cc: RRG
> Subject: Re: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware
>
> Short version:    Refining my understanding of Fred's SEAL
>                   protocol for tunneling with PMTUD management.
>
> Hi Fred,
>
> I will include only parts of my previous message to which you are
> replying.
>
> >>>> IPTM doesn't rely on the PTB. (See below for how it will be able to
> >>>> work with minimal length IPv4 PTBs.)  As long as a PTB does get to
> >>>> the ITR - which it would in most cases - then the ITR knows about the
> >>>> MTU problem without having to wait for the ETR to time out and send a
> >>>> message to the ITR saying the big packet did not arrive.  Also, the
> >>>> ITR gets an exact MTU value from this PTB, rather than having to do
> >>>> what SEAL does - hunt back and forth to find a packet size which is
> >>>> reliably delivered without MTU problems.
>
> >>> SEAL doesn't hunt back and forth.
>
> >> 4.3.9.1.2 mentions an "iterative searching strategy" - which sounds
> >> like a fancy term for "hunt"!  This occurs only in IPv4 when the ETE
> >> gets a first fragment shorter than 576 bytes, then this is
> >> interpreted as a "runt fragment" and so is not regarded as a true
> >> measurement of the limiting MTU.
> >
> > Yes, this follows directly from the RFC1191 "plateau
> > table" approach to guessing MTUs when old routers fail
> > to fill in the MTU field in their PTB messages.
>
> OK - an iterative search strategy which only works downwards is not
> "hunting"!
>
> Are there really routers in use which don't include the 16 bit value?
>
> Even if there theoretically are not, SEAL still has to cope with the
> possibility of zero or unreasonable values in this MTU field.

SEAL explicitly turns off PMTUD and uses its own tunnel
endpoint-to-endpoint MTU determination, so in the normal
case it does not expect to receive any ICMP PTBs from
routers within the tunnel. SEAL *can* enable PMTUD for
certain "expendable" packets, and can benefit from any
ICMP PTBs coming from within the tunnel that contain
sufficient information. But, that would simply be an
optimization.

> > What I
> > meant was that yes, SEAL will hunt "back" when there is
> > a router in the path that does weird fragmentation. But,
> > it does not hunt "forth".
>
> OK - I understand that SEAL follows "an iterative searching strategy
> that parallels (Section 5 of RFC1191)":
>
>   http://tools.ietf.org/html/rfc1191#section-5
>
> presumably the final part of this, which leads to:
>
>   http://tools.ietf.org/html/rfc1191#section-7
>
> I understand this means stepping downwards through the values in
> Table 7-1.  However, I guess you would choose some new values to add
> to this list
>
>
>
> >> In this application of SEAL, I understand there is no need for any
> >> mid-layer protocol between the IPv4 or IPv6 header and the SEAL
> >> header, or between the SEAL header and the traffic packet.  This is
> >> not clearly specified anywhere, since the SEAL and RANGER documents
> >> are general purpose, and their use for a scalable routing solution as
> >> a Core-Edge Separation architecture is only one thing they could be
> >> used for.
> >
> > In some environments, it may be necessary to insert a
> > mid-layer UDP header in order to give ECMP/LAG routers
> > a handle to support multipath traffic flow separation.
>
>    http://en.wikipedia.org/wiki/Equal-cost_multi-path_routing
>
> http://www.force10networks.com/CSPortal20/TechTips/0065_HowDoIConfigureLoadBalancing.aspx
>
> As far as I know, these techniques are not something to consider with
> the RANGER CES, or with LISP or Ivip.  If the routers can handle
> ordinary traffic packets they can handle encapsulated packets too.  I
> haven't read about these techniques in detail.  I guess that within
> RANGER, beyond its use as a CES scalable routing solution, you may
> want to support ECMP and LAG.

There has been a great deal of talk about taking care
of ECMP/LAG routers within the network that only
recognize common-case protocols (i.e., TCP and UDP),
which is why LISP has locked into using UDP encaps.

> >> Firstly I describe my understanding of what your ID specifies for IPv4.
> >>
> >> Secondly I describe two other ways you might do PMTUD with IPv4,
> >> without using DF=0 packets.  These would avoid whatever risk there
> >> might be of setting the ITE's PMTU estimate too low due to a limiting
> >> router sending out fragments which are shorter than the limiting next
> >> hop MTU.
> >>
> >> Finally, I describe my understanding of what your ID specifies for IPv6.
> >>
> >> This is partly for my own reference, since it took me many hours to
> >> discern this by reading the SEAL ID and corresponding with you.
> >>
> >>
> >> IPv4:
> >>
> >>   The ITE sends a DF=0 packet into the tunnel.  This starts with
> >>   an IPv4 header, then has a SEAL header (there's no mid-level
> >>   protocol in this Core-Edge Separation usage of RANGER and SEAL)
> >>   and then the inner packet, the original IPv4 traffic packet.
> >>
> >>   The source address in the outer header is that of the ITE and
> >>   the destination address is that of the ETE.  The 32 bit
> >>   SEAL ID is split in two.  16 bits go into the IPv4 header's
> >>   ID field and 16 into the SEAL header's ID Extension field.
> >>
> >>   The limiting router in the tunnel (the one where the next-hop
> >>   MTU is less than the the length of this whole packet) fragments
> >>   it into at least two fragments.
> >>
> >>   Now the second para in 4.4.2 comes into play:
> >>
> >>         When the ETE processes the IP first-fragment (i.e.,
> >>         one with MF=1 and Offset=0 in the IP header) of a
> >>         fragmented SEAL packet, ...
> >>
> >>   The first para was for reassembling packets which had been
> >>   fragmented by the SEAL protocol.  But the second para is for
> >>   SEAL packets, as was just sent, being fragmented by a
> >>   router between the ITE and the ETE.  This only occurs for
> >>   IPv4 and I think it would be helpful to mention IPv4 in this
> >>   paragraph.  Maybe it needs its own section.
> >>
> >>        ...  it sends a "Reassembly Report - Fragmentation
> >>        Experienced" message back to the ITE with the S_MSS field
> >>        set to the length of the first-fragment and with the
> >>        S_MRU field set to no more than the size of the reassembly
> >>        buffer (see Section 4.4.5).
> >>
> >>   I think this last part about the value of S_MRU is not clear
> >>   enough.  What value should it be set to?
> >>
> >>   I will assume it is set to some non-zero value.
> >>
> >>   Assuming the limiting router sent out the first fragment with
> >>   a length equal to the limiting next-hop MTU, then this MTU
> >>   value is now in the S_MSS field of the message sent to the
> >>   ITE.
> >>
> >>
> >>   This message arrives at the ITE.  This message, according to
> >>   Figure 4, contains:
> >>
> >>        As much of invoking packet as possible without the
> >>        message exceeding 576 bytes.
> >>
> >>   Maybe your ID specifies this, but I am having trouble
> >>   following it - there has to be a way the ITE securely
> >>   accepts this "Fragmentation Experienced" message.
> >>
> >>   As far as I know, the ITE looks into the message, finding
> >>   the initial part of the packet which the ETE received as
> >>   a first fragment.  That will contain the outer IPv4 header
> >>   and the SEAL header, and from this these the 32 bit SEAL ID
> >>   in the SEAL encapsulated packet can be found.
> >>
> >>   I think you either cache the recently sent 32 bit SEAL IDs
> >>   or maintain a sliding window function over their range so
> >>   you can easily identify a value which was used in the last
> >>   second or two.  In a given ITE, each ETE has its own SEAL
> >>   ID counter.  Its value is intitialized randomly when the
> >>   state for this ETE is created.  After that, its value
> >>   increments with each each packet sent to the ETE.
> >>
> >>   (I may adopt this incrementing value per ETR arrangement,
> >>   with its sliding window, rather than using a nonce.)
> >>
> >>   The wider the window in time, the longer you can accept these
> >>   messages.  Since the ETE and the ITE could be on opposite
> >>   sides of the Net, I guess you need to have a window which
> >>   accepts SEAL IDs sent at least a second ago.
> >>
> >>   The longer the window in time, and the more packets the
> >>   ITE sends to this ITE, the wider the window is numerically
> >>   and the easier it is for an attacker to guess a valid value
> >>   and have the ETE accept a PTB with a low enough value
> >>   to cause lost efficiency - for the next 10 minutes or so.
> >
> > The above is all correct wrt the window management. The
> > ITE can ensure that the window size remains bounded by
> > sending periodic explicit probes (e.g., once explicit
> > probe per every N data packets).
>
> I don't have a really clear idea of how SEAL sets the numeric window,
> or what you mean by your second sentence above.  If you can give a
> more complete description with an example, I would really appreciate
> it - in part because I might want to use or adapt your technique for
> Ivip, which currently uses nonces.

What I am asking for in SEAL is that the ITE sets the
SEAL_ID in each packet in monotonically-incrementing
fashion (modulo 32). Then, on every Nth packet (e.g.,
500th, 1000th, etc.) the ITE sets the "Acknowledgement
Requested" bit and the ETE (upon seeing the bit set)
sends back an explicit acknowledgement. The ITE can
then keep a window of outstanding SEAL_ID's by keeping
track of the most recently sent and most recently
acknowledged SEAL_IDs.

> >>   Now to 4.3.9.1.2:
> >>
> >>        4.3.9.1.2. Fragmentation Experienced (Code=1)
> >>
> >>        If the value in the S_MRU field is non-zero, the
> >>        ITE records the value in its soft state for this ETE.
> >>
> >>   This means this value is stored in the S_MRU variable for
> >>   this ETE, as defined in 4.3.3.  As noted above, I am not
> >>   clear on what value was written into this field of the
> >>   report by the ETE.
> >>
> >>        The ITE then adjusts the S_MSS value in its soft state.
> >>
> >>   This means this value is stored in the S_MSS variable for
> >>   this ETE, as defined in 4.3.3, subject to the instructions
> >>   in the next few sentences.
> >>
> >>   I am a bit confused about the differing roles of these two
> >>   variables.
> >>
> >>        If the S_MSS value in the Reassembly Report is greater
> >>        than 576 (i.e., the nominal minimum MTU for IPv4 links),
> >>        the ITE records this new value in its soft state.
> >>
> >>   OK - this is based on the assumption that the length of the
> >>   first fragment received by the ETE reflects the limiting
> >>   MTU of the ITE to ETE path.
> >>
> >>        If the S_MSS value in the report is less than the current
> >>        soft state value and also less than 576,
> >>
> >>   How could the ITE's S_MSS value for this ETE be less than
> >>   576?  I can't see how.  If it can't be, then the first part
> >>   of the above sentence may be redundant.
> >
> > No, the sentence is correct. It is possible for the ITE
> > to need to reduce its cached S_MSS value to a size less
> > than 576 if there is truly a link with a small MTU (e.g.,
> > 256) on the path. Although 576 is often considered to
> > be the "nominal" minimum MTU for IPv4 links, the actual
> > minimum MTU is only 68 bytes per RFC791.
>
> OK - I guess I wouldn't go this far for Ivip.  I will probably have
> some guidance that if anyone has routers, tunnels or whatever with
> PMTUs less than 1280 or some figure in that range, for both IPv4 or
> IPv6, then they should make sure that ITRs or ETRs are not located so
> that such low PMTU parts of the network are between the DFZ and these
> ITRs or ETRs.

I want to be able to use SEAL over truly constrained
links, such as certain wireless communications systems.
On such links, there may be a reason to set an
unusually small MTU.

> >> OK - but at some point we need to stop adopting band-aid measures
> >> like artificially limiting MSS or MTU values.  That just lets the PTB
> >> filtering and lousy tunnels be less noticed.   We should not be
> >> trying to upgrade the stacks of all hosts in the world because a few
> >> end-user networks filter PTBs or ISPs and perhaps end-user networks
> >> run tunnels which don't support the otherwise perfectly good RFC 1191
> >> / 1981 PMTUD techniques.
> >>
> >> We would just be heaping limitations and complications on ourselves
> >> in an overly-defensive, expensive and inefficient attempt to cope
> >> with failure of a few ISPs and end-user networks to run the Internet
> >> as it needs to be run.  We are paying the ISPs.  The end-user
> >> networks which are filtering PTBs are disrupting a subset of their
> >> own communications.
> >
> > I still think there are problems out there. I will post
> > another message on this soon.
>
> Indeed you did - I had no idea things were this bad.
>
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05907.html
>   http://www.ietf.org/mail-archive/web/rrg/current/msg05910.html
>
> >> I just think it wrong in principle to develop messy new protocols
> >> such as RFC 4821 to cope with these failings.
> >
> > In my opinion, packetization layers are operating "at risk"
> > if they use packet sizes larger than 1500 but are not in
> > some way checking with the final destination to ensure that
> > the big packets are actually getting through. RFC4821 is
> > a method for the source to do just that without requiring
> > any changes on the destination. But to be sure, SEAL does
> > not *depend* on RFC4821 but rather *sets the stage* for
> > RFC4821 and/or any functional equivalents.
>
> OK.  I need to revisit all my thinking on PMTUD given the results of
> your recent research.  But I haven't yet changed my thinking that RFC
> 4821 is messy and expensive, and that it would be better to
> straighten out whatever it is within the network which is stopping
> PMTUD from working correctly.

The good thing about RFC4821 is that only the source
host needs to implement it, so there is a natural path
for incremental deployment. I guess there are other
possibilities as well (e.g., similar schemes that do
PMTUD black hole detection) so as long as the method
chosen causes no harm the source host is welcome to
implement whatever it sees fit.

Fred
fred.l.templin@boeing.com

>  - Robin