Re: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware

"Templin, Fred L" <Fred.L.Templin@boeing.com> Fri, 29 January 2010 20:54 UTC

Return-Path: <Fred.L.Templin@boeing.com>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A2D183A6929 for <rrg@core3.amsl.com>; Fri, 29 Jan 2010 12:54:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.239
X-Spam-Level:
X-Spam-Status: No, score=-6.239 tagged_above=-999 required=5 tests=[AWL=-0.240, BAYES_00=-2.599, J_CHICKENPOX_51=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EYrRWfkLENid for <rrg@core3.amsl.com>; Fri, 29 Jan 2010 12:54:45 -0800 (PST)
Received: from stl-smtpout-01.boeing.com (stl-smtpout-01.boeing.com [130.76.96.56]) by core3.amsl.com (Postfix) with ESMTP id A1C313A692B for <rrg@irtf.org>; Fri, 29 Jan 2010 12:54:45 -0800 (PST)
Received: from stl-av-01.boeing.com (stl-av-01.boeing.com [192.76.190.6]) by stl-smtpout-01.ns.cs.boeing.com (8.14.0/8.14.0/8.14.0/SMTPOUT) with ESMTP id o0TKsqb7014242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 29 Jan 2010 14:54:55 -0600 (CST)
Received: from stl-av-01.boeing.com (localhost [127.0.0.1]) by stl-av-01.boeing.com (8.14.0/8.14.0/DOWNSTREAM_RELAY) with ESMTP id o0TKspJQ015648; Fri, 29 Jan 2010 14:54:52 -0600 (CST)
Received: from XCH-NWHT-01.nw.nos.boeing.com (xch-nwht-01.nw.nos.boeing.com [130.247.70.222]) by stl-av-01.boeing.com (8.14.0/8.14.0/UPSTREAM_RELAY) with ESMTP id o0TKsp4J015633 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=OK); Fri, 29 Jan 2010 14:54:51 -0600 (CST)
Received: from XCH-NW-01V.nw.nos.boeing.com ([130.247.64.120]) by XCH-NWHT-01.nw.nos.boeing.com ([130.247.70.222]) with mapi; Fri, 29 Jan 2010 12:54:51 -0800
From: "Templin, Fred L" <Fred.L.Templin@boeing.com>
To: Robin Whittle <rw@firstpr.com.au>, RRG <rrg@irtf.org>
Date: Fri, 29 Jan 2010 12:54:50 -0800
Thread-Topic: SEAL critique, PMTUD, RFC4821 = vapourware
Thread-Index: AcqfyhN71WhnsEpzQIiPlSdmCBqmFgBQ0dQA
Message-ID: <E1829B60731D1740BB7A0626B4FAF0A64950FEC1D3@XCH-NW-01V.nw.nos.boeing.com>
References: <4B5ED682.8000309@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF 0A64950F33198@XCH-NW-01V.nw.nos.boeing.com> <4B5F8E7E.1090301@firstpr.com. a u> <E1829B60731D1740BB7A0626B4FAF0A64950F332A8@XCH-NW-01V.nw.nos.boeing.c om> <4B5FC783.4030401@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64950F3333F@XCH-NW-01V.nw.nos.boeing.com> <4B6103C8.6090307@firstpr.com.au>
In-Reply-To: <4B6103C8.6090307@firstpr.com.au>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
x-tm-as-product-ver: SMEX-8.0.0.1181-6.000.1038-17160.005
x-tm-as-result: No--38.908000-8.000000-31
x-tm-as-user-approved-sender: No
x-tm-as-user-blocked-sender: No
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [rrg] SEAL critique, PMTUD, RFC4821 = vapourware
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Jan 2010 20:54:48 -0000

Robin,

> -----Original Message-----
> From: Robin Whittle [mailto:rw@firstpr.com.au]
> Sent: Wednesday, January 27, 2010 7:26 PM
> To: RRG
> Cc: Templin, Fred L
> Subject: SEAL critique, PMTUD, RFC4821 = vapourware
>
> Short version:   Can any router folks confirm whether or not IPv6 DFZ
>                  routers handle packets with the Fragmentation Header
>                  just as quickly and efficiently as ordinary packets?
>
>                  I argue against Fred Templin's position that
>                  ordinary RFC1191 DF=1 Path MTU Discovery (and
>                  therefore its RFC1981 IPv6 equivalent) is "busted".
>
>                  Where is the evidence that networks filtering out
>                  PTB (Packet Too Big) messages is a significant
>                  problem?
>
>                  To the extent that any such problem exists, why
>                  should this be accepted and further protocols
>                  created to work around it?  I think the networks
>                  which do this are non-compliant with the inter-
>                  working requirements of the Internet.  These
>                  networks should change their ways.
>
>                  I argue that RFC4821 Packetization Layer PMTUD is
>                  unnecessary, over-complex and therefore undesirable.
>                  Furthermore I assert it is *vapourware* - since I
>                  can't see any evidence of applications or stacks
>                  which actually use it on user traffic packets.
>                  (There is a minimal implementation in the Linux
>                   stack, which is turned off by default.)
>
>                  My IPTM PMTUD system could work fine with minimal
>                  length IPv4 RFC1191 PTB messages.
>
>                  Fred says that his SEAL system couldn't work with
>                  minimal IPv4 PTBs, but I suggest that if the
>                  SEAL header was extended from its current 32 bits
>                  to 64, with a 32 bit nonce, that it would work fine.
>                  This would avoid the need for his current approach
>                  of sending DF=0 packets, and having them fragmented
>                  as a method of detecting an MTU problem.
>
> Hi Fred,
>
> Thanks for your reply in the "Re: [rrg] RANGER and SEAL critique"
> thread, in which you wrote:
>
> >> I read the current ID:
> >>
> >>   http://tools.ietf.org/html/draft-templin-intarea-seal-08
> >>
> >> I used the term "jumboframe" to denote IPv4 or IPv6 packets which are
> >>  up to about 9kbytes long, as opposed to the conventional length of
> >> ~1500 bytes.
> >>
> >> Your current ID and your previous mailing list message used the term
> >> "jumbogram", which is IPv6 only and refers to packets of 65,575 bytes
> >> or more (up to 4 gigabytes) - and which have a "Jumbo Payload" option
> >> header.
> >
> > The terminology is has become a bit loose here since
> > the GigE community began using the term "jumbo" to
> > refer to their 9KB target. For SEAL, what I am
> > meaning to say is that SEAL-SR can handle all sizes
> > from 1500 Ethernet size (or smaller) on the low end
> > up to all sizes of IPv6 jumbograms on the high end.
> > I'll see if I can disambiguate in the text.
>
> I had never made the distinction between "jumboframe" and "jumbogram"
> - but I only used the former.
>
> When reading your SEAL ID, I looked them up:
>
>   http://en.wikipedia.org/wiki/Jumbogram
>
> This points to RFC 2675 - "IPv6 Jumbograms":
>
>   http://tools.ietf.org/html/rfc2675
>
> which explicitly refers to packets with a special "Jumbo Payload
> option" - with lengths up to 4 gigabytes.
>
>   http://en.wikipedia.org/wiki/Jumbo_frame
>
> Jumboframes are ordinary format packets up to 64kbytes - and this is
> what SEAL works with.
>
>
>
> >> As far as I know, no applications use RFC 4821.  Can you cite any
> >> which do?  RFC 4821 involves different applications sharing
> >> information via the operating system about their experience with
> >> packet lengths sent to a particular destination host.  This sounds
> >> complex to implement and text - and I guess there is little
> >> motivation for early adoptors amongst application writers if few
> >> other apps do it.
> >
> > Applications do not know whether they are running on
> > a stack that uses RFC4821 or just classical PMTUD;
> > the use or non-use of RFC4821 is a stack consideration
> > and not an application consideration.
>
> That's not my recollection . . .
>
> There are potentially multiple packetization layer systems in a
> single host.  One is the TCP layer in the stack.  Others could be in
> applications.  Perhaps there would be an SCTP layer in the stack too.
>
> All these are supposed to share information - but there's no
> standardized way of doing this.
>
>    Most of the difficulty in implementing PLPMTUD arises because it
>    needs to be implemented in several different places within a
>    single node.  In general, each Packetization Protocol needs to
>    have its own implementation of PLPMTUD.  Furthermore, the natural
>    mechanism to share Path MTU information between concurrent or
>    subsequent connections is a path information cache in the IP
>    layer.  The various Packetization Protocols need to have the means
>    to access and update the shared cache in the IP layer.  This memo
>    describes PLPMTUD in terms of its primary subsystems without fully
>    describing how they are assembled into a complete implementation.
>
>
> So theoretically a TCP layer of a stack could do PMTUD not only with
> the conventional RFC1191 (IPv4) and RFC1981 (IPv6) approach, but
> could also attempt to discern the PMTU to a destination host based on
> the assumption that ICMP PTBs were not being received, and that the
> need to repeatedly retransmit packets above a certain size was good
> enough evidence of a PMTU limit to decide the PMTU was a lower figure
> than it would otherwise have defaulted to.
>
> The only deployed implementations of RFC4821 seems to be a minimal
> one, turned off by default, in the Linux kernel.  I guess this is
> just for TCP.  Does it have any way of sharing information with other
> packetization layers, in the IP stack or in applications?
>
> If an application had its own packetization code, for instance to
> decide how big to make UDP packets, then it would need to be able to
> get from the stack any information which might have been placed there
> (a Path MTU figure to this destination host) by the TCP RFC4821
> layer, when previously sending packets to this destination host - or
> by other applications' packetization layers.  Also, there would be a
> need to update this information - and at the same time inform all
> other RFC4821-aware packetization layers of the changed MTU value:
>
>    If PLPMTUD updates the MTU for a particular path, all
>    Packetization Layer sessions that share the path representation
>    (as described in Section 5.2) SHOULD be notified to make use of
>    the new MTU and make the required congestion control adjustments.
>
> Applications with their own packetization layers are definitely involved:
>
> 9. Application Probing
>
>    All implementations MUST include a mechanism where applications
>    using connectionless protocols can send their own probes.
>
>    . . .
>
>    Implementing PLPMTUD in an application also requires a mechanism
>    where the application can inform the operating system about the
>    outcome of the probe as described in Section 7.6, or directly
>    update search_low, search_high, and eff_pmtu, described in Section
>    7.1.
>
> So I disagree with your statement except to the extent of
> applications which do no packetization and just rely on the stack's
> TCP (or SCTP or whatever) packetization layers which may or may not
> implement RFC4821.

Yes; thanks for the correction. RFC4821 concerns packetization
layer path MTU discovery, where the application itself is a
packetization layer when, e.g., UDP is used as the transport.

> Matt Mathis, who co-wrote RFC4821 has a page devoted to MTU matters:
>
>   https://www.psc.edu/~mathis/MTU/
>
> There's no mention of SEAL or my IPTM approach.  There's a mailing
> list which I just joined.  There have been 16 messages since 2005.
> The archives are not public.  I will post some links to this SEAL
> discussion.
>
> There were only about a hundred messages in the IETF PMTUD mailing
> list over the 8 months of its activity which resulted in the
> standards track RFC4821:
>
>   http://www.ietf.org/mail-archive/web/pmtud/current/maillist.html
>
> My impression is that this is pretty minimal for a standards track
> RFC concerning a problem affecting potentially all applications.
>
> The state of RFC4821 adoption is:
>
>   https://www.psc.edu/~mathis/MTU/#pmtud
>
>      Implementations:
>
>        * Linux 2.6.17: ICMP black hole recovery only, off by default
>
>      As we gain field experience with wide deployment of RFC 4821 in
>      the above two configurations, we will document any additional
>      recommendations for implementors. Watch this page for future
>      information.
>
> I interpret this to mean that the single potentially used
> implementation is minimal and is turned off by default.  I guess it
> is just for TCP.
>
> I always suspected that RFC4821 was never going to be widely adopted.
>
> Firstly I can't find evidence of widespread problems with RFC1191 /
> RFC1981 PMTUD.  I guess there are glitches, but if any network is
> dropping ICMP PTB packets, then IMHO the administrators need to
> change their ways.  This is a simple question of network
> administration being at odds with the needs of a perfectly good
> protocol, which normally works fine all over the Net.
>
> The solution to bad network administration is not a new protocol.
>
> Where is the evidence of widespread PMTUD problems?
>
> Secondly, RFC4821 is extremely hard to implement.  It requires
> communication back and forth between packetization layers in multiple
> applications plus TCP and maybe SCTP (others in the future?) in the
> stack.  Why should these trust each other?  How can this sort of
> thing be debugged in actual operation in sending hosts around the
> world?  It sounds like a development nightmare.
>
> It seems it has not been implemented beyond a subset being put in the
> Linux kernel and turned off by default.
>
>
> Here are my current beliefs:
>
>   1 - There is no widespread problem with RFC1191 PMTUD (or its IPv6
>       equivalent RFC1981).
>
>   2 - RFC1191/1981 is both necessary and sufficient to solve PMTUD
>       problems.  It will work fine as long as the limiting router
>       sends a PTB message and as long as nothing drops these
>       messages by filtering.  (However, perhaps the fragment of
>       packet sent back in the IPv4 PTB message should be made
>       somewhat longer so that more information - such as a nonce in
>       the payload behind encapsulation headers - can be found.  Its
>       my impression that quite a few routers already do this.
>       The ICMPv6 PTB specifies "As much of invoking packet as will
>       fit without the ICMPv6 packet exceeding 576 octets.")
>
>   3 - Therefore, any problems with RFC1191/1981 PMTUD are caused by
>       network maladministration, and should be solved there.
>
>   4 - So there's no genuine need for RFC4821.
>
>   5 - Any document which points to RFC4821 as if it is, or will
>       be, adopted to any significant degree is misleading.
>
> I think your SEAL documents do this - point to RFC4821 as if it is a
> legitimate, desirable or even superior method of doing PMTUD.  I
> think RFC4821 is unnecessary and undesirable - and that it is vapourware.
>
> SEAL is designed not to depend on RFC1191 PTB packets - but you do
> rely on RFC1981 packets.  More on this below.  You go to a lot of
> trouble to achieve end-to-end confirmation of packets arriving intact
> or being fragmented en-route (at last for IPv4).  But if the Net is
> going to be upgraded with a bold initiative based on RANGER, surely
> part of this upgrade can be to insist that no-one filter out PTB
> packets.  Also, if you can implement something as far-reaching as
> RANGER, then you can also devise updated versions of RFC1191 to
> return longer segments of packets, to include wherever a nonce is
> likely to be, even with packets with considerable levels of extra
> encapsulation.  For IPv6, the router sends enough of the packet to
> achieve this already.
>
>
> >> Also, the "ITE which hosts applications" is confusing, since in
> >> RANGER, the ITE is a router.  Maybe you mean apps inside the router
> >> which use TCP.
> >
> > Yes; app's inside the router which use TCP. I guess
> > we can use MSS clamping for apps inside the router
> > if we are concerned about them getting too large of
> > an initial PMTU estimate.
>
> OK.
>
>
> . . .
>
> >> In IPTM (for both IPv4 and IPv6) I send a pair of packets, one short
> >> and one long, (perhaps a few copies of the short one, to improve
> >> robustness) for any traffic packet which is longer than the ITR has
> >> previously successfully sent to the ETR.  The ETR responds to the
> >> shorter one, and tells the ITR if the long one didn't arrive.  (Also,
> >> the ETR will reply to the ITR if only the long one arrives.)  These
> >> are DF=1 packets for IPv4 and ordinary (implicitly DF=1) packets for
> >> IPv6.
> >
> > I can see how the ETR can send a response for packets that
> > arrive, but then how long must the ITR wait to receive the
> > response? And, what if the response is lost?
>
> I haven't completed the design - I expect the ITR should retry after
> a second or so.  If it can't get any response after a few retries,
> then it should conclude, for the next few minutes or so, that this
> ETR is not reachable.  Normally the ITR does no reachability testing
> for ETRs.  But in this case, it discovers non-reachability as a
> by-product of PMTUD management.  It should probably not send any long
> packets to this ETR address for a while - and then only with the
> short ones as part of another attempt to probe the PMTU to it.
>
>
> > I can't see how the ETR can send a response for packets
> > that *don't* arrive. For example, what if the large
> > packet was dropped due to congestion and not due to
> > an MTU limitation? And, for how long must the ETR wait
> > before sending the NACK?
>
> Again, I need to work on the details.  I think that if the ETR gets a
> short packet and no long packet arrives within 0.2 or 0.4 seconds or
> so, then the long packet is never going to arrive.  So it should send
> its response then.  ETRs are supposed to be connected via fibre, DSL
> or some other wired arrangement. It would be wrong to put an ETR on
> some long latency link.  Likewise, it would be a bad idea to put an
> ITR on a long latency link.  (TTR Mobility complies with these
> constraints.)

It seems then that the ITE and ETE would need to engage in
a guessing game based on packets *not* arriving within a
certain time? And, both the ITE and ETE have to maintain
a separate timer for every tunnel they belong to? I have
gone down that path many times in the past, and each
attempt invariably hit a dead end. You can check the I-D
archives for some of my expired works if you are interested.

> > IMHO, DF=1 for PMTUD is busted, and these are a few of
> > the reasons why.
>
> That's a very big statement which goes beyond your critique of how I
> might implement IPTM.
>
> Why do you believe DF=1 for PMTUD is busted?
>
> Every application today uses it.  None of them use RFC4821.
>
> Are all applications today breaking due to RFC1191 PMTUD being busted?
>
> I see no evidence of this whatsoever.
>
>
> >> I think it is bad to send DF=0 packets into the DFZ and expect the
> >> routers to dutifully fragment them if they are too long - though I
> >> know this should not happen frequently with SEAL, since it is just
> >> part of detecting the PMTU.
> >
> > In-the-network fragmentation is a noise indication, and
> > SEAL moves quickly to tune out the noise.
>
> OK - I just think that DF=0 packets should be avoided in general.
>
> DF=0 is an indulgence still granted to lazy bourgeois hosts in their
> dotage, enabling them to continue their regressive habits of unfairly
> burdening the proletariat of Worker routers with the responsibility
> for slicing up and individually carrying and reassembling their
> overweight, carelessly emitted, packets.  Come the Revolution . . . .
>
> When IPTM sends a big packet (containing most of a traffic packet) as
> part of its PMTU probing, if this hits an MTU limit, then it is
> dropped, with a short PTB going back to the ITR.  With SEAL, in IPv4
> mode, the limiting router has to split the packet up and forward it,
> so other routers and the ETR have to carry all the fragments.  So my
> IPTM approach is arguably lighter weight than your SEAL approach.

With SEAL-FS, the ETE does not carry all the fragments.
Instead, it uses first-fragments for reporting purposes
only and otherwise discards all fragments.

> IPTM doesn't rely on the PTB. (See below for how it will be able to
> work with minimal length IPv4 PTBs.)  As long as a PTB does get to
> the ITR - which it would in most cases - then the ITR knows about the
> MTU problem without having to wait for the ETR to time out and send a
> message to the ITR saying the big packet did not arrive.  Also, the
> ITR gets an exact MTU value from this PTB, rather than having to do
> what SEAL does - hunt back and forth to find a packet size which is
> reliably delivered without MTU problems.

SEAL doesn't hunt back and forth. In SEAL, every data
packet is an implicit probe, and the ETE uses IPv4
fragmentation as an indication that it needs to tell
the ITE to reduce the size of the packets it is sending.

> >> Do DFZ routers handle IPv6 packets with the Fragmentation Header just
> >> as fast as those without?
> >
> > I don't honestly know, but I sure hope that they do.
> > Anyone from the major router vendor community care
> > to comment on this?
>
> I am not suggesting they don't - but it would be good to find out.
>
>
> >> With IPv6, does SEAL detect its encapsulated packets being too long
> >> for the ITE to ETE PMTU by relying on ICMP PTB messages from the
> >> limiting router in the ITE to ETE path?  (AFAIK, IPv6 packets are
> >> never fragmented by ordinary routers.)  If so, then why not do the
> >> same with IPv4 packets?
> >
> > ICMPv6's are guaranteed to include a sufficient portion
> > of the original packet whereas ICMPv4's are not. Hence,
> > the theory is that the ITE can "match up" the ICMPv6s
> > with actual packets that cause the PTB. Also the ICMPv6's
> > can readily be translated into PTBs to send back to the
> > original source while ICMPv4's may not contain enough
> > data for translation.
>
> OK - thanks for explaining the reasons for these different approaches.
>
> I think these deficiencies with IPv4 PTBs relate to any routers which
> only return the minimum number of bytes specified in RFC1191, which
> is the IPv4 header and the next 8 bytes.
>
> I recall people writing in the past that most routers send back more
> than this.
>
> I understand that all TCP layers in all IPv4 stacks rely on RFC1191
> PMTUD for their proper operation.  As far as I know, there are no
> problems with this, other than whatever level of difficulty is caused
> by the occasional network dropping PTB packets.  However, this is for
> packets which are being forwarded in their original form.
>
> I guess applications and encryption functions within them deal
> achieve IPv4 PMTUD via the same RFC1191 mechanisms.
>
> It is trickier with an ITR using encapsulation to tunnel a traffic
> packet to an ETR.
>
> When an ITR encapsulates the packet, and the limiting router only
> sends back 8 bytes after the outer IPv4 header, then there is not
> enough of the inner packet to send back a valid PTB to the sending
> host.  That's OK if the ITR caches enough of the original packet for
> this purpose.  The real question is whether the returned IPv4 header
> and the next 8 bytes enables the ITR to:
>
>   1 - Match the PTB against a packet it tunneled to this ITR, so
>       that the ITR knows which sending host the original packet
>       came from - and to identify the correct initial part of that
>       packet in its cache (which exists for this purpose alone).
>
>   2 - Whether the ITR can recognise this securely enough to avoid
>       spoofing attacks.
>
> In Ivip, most traffic packets are encapsulated by the ITR with the
> sending host's address as the outer header's source address.  Any PTB
> which results from those goes to the sending host, which will not
> recognise it.

If the source address of the original packet also
goes as the source address of the outer packet, then
wouldn't that constitute mixing both EID and RLOC
addressing within the same routing region? I thought
the whole purpose of the CES approach was to keep the
EID and RLOC routing and addressing spaces separate.

> Ivip's PMTUD management is done via the ITR using the IPTM (also my
> invention - yet to be fully designed) protocol to carry traffic
> packets in a pair of probe packets, whenever the traffic packet, if
> normally encapsulated, would exceed the lower limit of the ITR's Zone
> of Uncertainty about the real MTU to this particular ETR.
>
> Both these packets are UDP packets.  The long one (B) has the ITR's
> address in the outer header.  The smaller one (A) - of which multiple
> may be sent for inexpensive robustness improvement - uses the sending
> host's address in the outer header.
>
> Here I consider how IPTM:
>
>    http://www.firstpr.com.au/ip/ivip/pmtud-frag/
>
> would work if the router only sent back 8 bytes following the IP
> header of the long probe packet.  (This is only for these long probe
> packets, which are accompanied by a shorter packet, or multiple
> copies of the shorter packet - but the shorter packets will never
> result in valid PTBs.)
>
> If IPTM accepted a PTB from a long (B) packet, purely on the basis of
> the IP header and the next 8 bytes being returned, and assuming this
> set of probe packets resulted from a traffic packet specifically
> generated by the attacker, then what would the attacker need to guess
> in order to successfully have the ITR accept the spoofed PTB packet.
>
> The aim of the attacker is DoS by having the ITR accept a lower than
> genuine PMTU to this ETR - so it is not a disaster, just a cause of
> less efficiency for the next ten minutes or so before the one or more
> affected sending hosts tries its luck with a larger packet, as is
> allowed and expected by RFC1191.
>
> In this scenario, the ITR gets back just the IPv4 header and the UDP
> header.  The attacker has to guess the 16 bit ID field in the IPv4
> header, which is tricky - but it could eventually succeed in doing
> so.  Here are the components of the UDP header:
>
>   Source port     The ITR could use a randomized source port.  This,
>                   combined with the 16 bit ID field, could extend
>                   the number of bits to be guessed to 32 - which
>                   I think is sufficiently secure, considering a
>                   successful attack only degrades efficiency, rather
>                   than causes actual loss of connectivity.
>
>   Destination port   Currently, I assume there is a single UDP port
>                   on all ETRs to send the long (B) packet to.  If
>                   I could easily randomize this too - such as making
>                   the most significant 8 bits fixed, and the others
>                   up to the ITR to choose.
>
>                   This would be 40 random bits - perfectly secure
>                   considering the moderate level of DoS the attack
>                   could result in.
>
>   Length          If the attacker created the traffic packet, they
>                   would know the length of what follows the UDP
>                   header.
>
>   Checksum        Ahhh - this is not a header checksum.  This covers
>                   the data behind the UDP header.  This data is
>                   mainly from the traffic packet, but it contains
>                   a nonce.  So the 16 bit checksum is affected by
>                   the nonce.
>
> I hadn't realised this before - the UDP checksum contains another
> 16 bits the attacker has to guess.  Combined with the IPv4 header's
> 16 bit ID field, I think this makes it highly secure.  If this is
> not enough, the 16 bit random ITR source UDP port should be sufficient.
>
> So the ITR doesn't need any more bits than are necessarily supplied
> by a minimally compliant RFC1191 implementation in the router which
> sends the PTB.
>
> How would this work for SEAL?

Using the UDP/TCP checksum as a nonce requires that the
ITE cache copies of its recently-sent packets. But then,
it would need to do this for every tunnel it belongs to
and it has no way of knowing for how long it will have
to retain the cached copies. With SEAL, the ITE never
has to cache packets in order to match them up with
any PTB feedback.

> At present, for IPv4 and IPv4, your ITE (ITR) functions emit packets
> with an outer header of IPv4 or IPv6, followed by a 32 bit SEAL header.
>
> Immediately following the SEAL header you may have some "mid-layer
> headers" which I don't properly understand.  Then you have the IPv4
> or IPv6 traffic packet, or perhaps a segment of it.
>
> You could make the SEAL ITE work fine with minimal length IPv4 PTBs
> if the SEAL header was extended to 64 bits, with the additional 32
> bits being a nonce.  That would always be returned in any PTB.

SEAL uses the 32bit ID (gotten from the 16bit IPv4 ID
concatenated with SEAL's 16bit ID extension) as a nonce.
There is no need that I can see for including an
additional nonce.

> So I think your objection to using RFC1191 PTBs should only be based
> on your concern about the PTBs being systematically dropped due to
> filtering.
>
> I assert that such filtering is a symptom of a badly administered
> network - and that it should be fixed in the network, not worked
> around with a protocol such as SEAL or IPTM.

In my understanding, in the interdomain routing region of
the Internet there is no close coordination regarding the
way "the network" is administered. There is also a wide
variety of network vendor equipment deployed in the
Internet which may have widely varying default behaviors.
So, in general it seems overly optimistic to assume that
all of the diverse policies, implementations and operational
practices out there could be brought into strict uniformity.

> > Other than that, both ICMPv6's and ICMPv4's can be dropped
> > by filtering gateways so both IPv6 and IPv4 are susceptible
> > to black-holing. But, for the non-fragmenting IPv6 network
> > using DF=1 this is the best we can do; for IPv4 we can do
> > better by using DF=0.
>
> I just showed you can do both IPv4 and IPv6 the same way with a nonce
> in the SEAL header.
>
> What evidence can you show me for filtering of PTBs being a
> significant problem in today's Internet?
>
> To the extent that this is true, surely if you can upgrade the Net
> along the lines of RANGER, then you can have a BCP that any network
> which filters PTBs is not compliant with the basic requirements of
> the Net.
>
>
> >> Assuming the above-mentioned approaches enable each SEAL ITE to
> >> discover the actual PMTU to a given ETE (and to probe occasionally to
> >> see if the value has increased), and assuming it uses this to send
> >> PTB packets to the SH in order that subsequently emitted packets will
> >> be of the right size, once encapsulated, to not exceed this PMTU,
> >> then what is the purpose of the SEAL segmentation system?
> >>
> >>    Actually, it is trickier.  The one SH may be sending to multiple
> >>    destination hosts (DHs) all of which are for one or more EID
> >>    prefixes for which the ITE is currently tunneling to a given ETE.
> >>    Then the ITE needs to send the SH a separate PTB for each too-long
> >>    packet it sends to each such DH.  Also, there may be multiple such
> >>    SHs using this ITE whose packets will be tunneled to this ETE.
> >>
> >> If the SH sends a really long IPv4 DF=0 packet, the ITE will use
> >> ordinary IPv4 fragmentation to turn it into multiple shorter fragment
> >> packets, before SEAL processing.  (I think, come the Revolution, that
> >> DF=0 packets longer than some value like 1450 bytes or similar should
> >> be *dropped* - applications are expecting too much of the Network to
> >> fragment and carry the fragments of their excessively long packets.
> >> Applications have had since 1991 - RFC 1191 - to change their
> >> free-loading ways.)
> >>
> >>
> >> Under what circumstances would SEAL segmentation be used?
> >
> > SEAL is intended not only for the core routers we usually
> > talk about in RRG, but also for any ITR/ETR routers
> > deployed in any operational scenario. For example, we
> > would never ask core routers to do segmentation and
> > reassembly (hence the recommendation for SEAL-FS in
> > those environments).
>
> OK.
>
> > But, we might want to ask routers
> > in edge networks (e.g., MANET routers, CPE routers in
> > ISP networks, etc.) to do segmentation and reassembly
> > in order to avoid having to constantly tell hosts to
> > reduce their PMTU estimate to a degenerate size (i.e.,
> > anything less that 1500). So, in those environments
> > we would recommend SEAL-SR.
>
> > Thanks - Fred
>
> OK - this makes sense.  Thanks for discussing this in detail.
>
>  - Robin


Fred
fred.l.templin@boeing.com