2.8.14 Path MTU Discovery (pmtud)

NOTE: This charter is a snapshot of the 61st IETF Meeting in Washington, DC USA. It may now be out-of-date.

Last Modified: 2004-10-15


Matt Mathis <mathis@psc.edu>
Matthew Zekauskas <matt@internet2.edu>

Transport Area Director(s):

Allison Mankin <mankin@psg.com>
Jon Peterson <jon.peterson@neustar.biz>

Transport Area Advisor:

Allison Mankin <mankin@psg.com>

Mailing Lists:

General Discussion: pmtud@ietf.org
To Subscribe: pmtud-request@ietf.org
In Body: In Body: subscribe email_address
Archive: http://www.ietf.org/mail-archive/web/pmtud/index.html

Description of Working Group:

The goal of the PMTUD working group is to specify a robust method for
determining the IP Maximum Transmission Unit supported over an
end-to-end path. This new method is expected to update most uses of
RFC1191 and RFC1981, the current standards track protocols for this
purpose. Various weakness in the current methods are documented in
RFC2923, and have proven to be a chronic impediment to the deployment
of new technologies that alter the path MTU, such as tunnels and new
types of link layers.
The proposed new method does not rely on ICMP or other messages from
the network. It finds the proper MTU by starting a connection using
relatively small packets (e.g. TCP segments) and searching upwards by
probing with progressively larger test packets (containing application
data). If a probe packet is successfully delivered, then the path MTU
is raised. The isolated loss of a probe packet (with or without an
ICMP can't fragment message) is treated as an indication of a MTU
limit, and not a congestion indicator.
The working group will specify the method for use in TCP, SCTP, and
will outline what is necessary to support the method in transports
such as DCCP. It will particularly describe the precise conditions
under which lost packets are not treated as congestion indications.
The work will pay particular attention to details that affect
robustness and security.
Path MTU discovery has the potential to interact with many other parts
of the Internet, including all link, transport, encapsulation and
tunnel protocols. Thereforethis working group will particularly
encourage input from a wide cross section of the IETF to help to
maximize the robustness of path MTU discovery in the presence of
pathological behaviors from other components.
Input draft:
                Packetization Layer Path MTU Discovery

Goals and Milestones:

Done  Reorganized Internet-Draft. Solicit implementation and field experience.
Done  Update Internet-Draft incorporating implementers experience,
Feb 04  Submit completed Internet-draft and a PMTUD MIB draft for Proposed Standard.


  • draft-ietf-pmtud-method-03.txt

    No Request For Comments

    Current Meeting Report

    Path MTU Discovery WG (pmtud)
    Tuesday, November 9, 2004 at 13:00 to 14:00
    The meeting was chaired by Matt Zekauskas and Matt Mathis. These minutes are edited from notes taken by Jeff Dunn and John Heffner.


    1. Agenda Bashing, Milestones Status
    2. PMTUD draft update
    3. Fragmentation considered very harmful

    PMTUD Draft Update
    -- Matt Mathis

    Matt noted that lots changed, based on a huge amount of input from the working group. He started with a quick algorithm overview (see slides) and then noted the changes based on the feedback so far. Major feedback included making the method (transport) protocol independent, and the results of the questionnaire on different protocols sent sent to the mailing list. There is a large amount of commonality for all transport protocols, and the document has been restructured.

    > From beginning this effort has focused on increasing PMTUD robustness,

    and part of that was to not require any dependence on ICMP. Based on feedback the document now describes how to fully interoperate with the existing path MTU discovery.

    The document also incorporates the observation that the algorithm is parallel to congestion control; this is a fundamental observation that was not recognized. Incorporating that idea clarified what needed to be specified rigorously earlier. Both use loss as the primary signal (with well understood limitations); this is more consistent with the end to end principle... and shows why packet-level PMTUD is more robust than ICMP-based PMTUD.

    ICMP limitations could be argued as being artifact of layering violation.

    Matt noted the issues surrounding robustness in more detail. Worries about respecting the "don't fragment" bit are addressed in the draft; Matt felt this was the most significant lingering robustness issue. The method can be used in exceptional cases; there is no increased risk from striping across multiple paths with dissimilar MTUs. Magnus Westerlund noted that header compression over a link could lead to the same problems as striping.

    Matt noted that another issue has also arisen: with streaming media protocols, compression depends on codec parameters; you can't independently set packet sizes. Magnus said that was a different issue; how to generate data. With variable rate codecs you can have a number of segments with very small headers; then you might need a full-context header which could exceed the MTU.

    Finally, Matt felt that even though the 03 version missed the last 40 or so hours of editing due to a machine crash, it was far better than the previous version. Matt is reluctant to push further without other implementation experience.

    However, we do need to do a MIB. All of the important state is in the IP layer; should we augment the IP MIB?

    Larry Dunn felt the document is pretty much wrapped up; maybe have one more meeting. He wanted to know Matt's opinion. Matt wants to get it done, but feels we do need to get some implementations. John Heffner promised an implementation for next meeting, but he is an author, so he too would like to get another.

    2. Fragmentation considered very harmful
    --Matt Mathis

    Since the IP id field is 16 bits, modern systems can wrap the field really quickly. The Internet checksums aren't strong enough to protect a packet if you splice fragments together. For example, it takes 8 seconds to wrap IPid at 100Mbps. If you lose a low fragment, and the IPid wraps, you might associate the low fragment from the new packet with the old high fragment. This is sustainable.

    If you lose the high fragment, things aren't quite as bad. The standards don't specify what do. Generally, the new low fragment replaces the old low fragment.

    Did some experiments (see slides for details). With 250 runs of random data, got 41,000 UDP checksum errors and one corrupted file. With pathological data, the checksum passes even though the data is incorrect.

    However, this phenomenon can occur with low-rate servers or application, if the server is busy enough to wrap it's own IPid space. Say the server wraps in a few seconds, and an individual connection is slow so the IPids are not well correlated. Every time there's a loss, there is a 1 in 232 chance of the next packet being misassociated. So, a low-rate vpn on busy server is exposed to problem.

    Things like IPsec help - strengthen checksum. With ssh, you would get an encryption error and the session would be dropped.

    An audience member questioned the 216 in the slide. The two instances of 116 on slide 7 should be replaced by 216.

    What about changing TCP to crc-32 like SCTP? You could, but then once you open it up to make this change, there are a whole host of other things that people would try to change. There could be an alternate checksum option.

    Matt said the point is that IPv4 fragmentation basically doesn't work. Anyone just depends on it us inviting troubles.

    Michael Richardson noted that in IKEv2, there was a long conversation about fragment attacks, and state attack of someone sending thousands of non-initial fragments for port 500. It doesn't want to send any payloads that fragment. PKI for IPsec is looking at this again. It doesn't want to transport complete CERT chains because they become too big.

    Matt Mathis asked if those protocols have state to do MTUd?

    Michael felt that IKE has enough state, or you can have the fragment header in IKE. You could reassemble larger things in IKE once figure out what it was.

    Someone noted that most assume a 1280 minimum MTU. The biggest problem is a PKIX cert at 500 bytes, so 3 layers of CERT e.g. verisign-corpCA-deptCA : too big. verisign-corpCA would be ok.

    It was also noted that some applications use fragmented UDP because they observed higher data rates in some situations.

    Gorry Fairhurst asked about "full stop timeouts". Go back to 512 bytes for both v4 and v6, or just for one and why?

    Matt replied that his fear was that there would be an implementation that would be a tunnel , with overhead that created an effective MTU below 1280. How should we deal with that?

    One can argue it should be forbidden; but is it better to sort-of work? We could go back to 1024.

    Another problem relates to making congestion control work with small windows; what if loss rate is so high that appropriate average window size is below 6Kbytes? In that case, you might be better off using 512 byte packets than 1500, because fast retransmit doesn't work well with less than four packets in flight.

    Gorry noted that going back to small segments for other transports might be strange.

    This raises the point that whenever change MTU for whatever reason, there ought to be opportunity for transport protocol to speak up and say "I can't deal with it".


    Agenda and Milestone Update
    PMTUD Method update
    Fragmentation Very Harmful