idnits 2.17.1 

draft-ietf-pmtud-method-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3667, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1559.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1543.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1549.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     1565), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 37.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        By submitting this Internet-Draft, I certify that any applicable patent
        or other IPR claims of which I am aware have been disclosed, or
        will be disclosed, and any of which I become aware will be
        disclosed, in accordance with RFC 3668.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 805 has weird spacing: '...retried  after...'

  == Line 1026 has weird spacing: '...irement  has p...'

  == Line 1170 has weird spacing: '...portant  that ...'

  == Line 1253 has weird spacing: '...ntation  would...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 2004) is 7255 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: 'ISOTP' on line 1268

  == Unused Reference: '10' is defined on line 1444, but no explicit
     reference was found in the text

  == Unused Reference: '11' is defined on line 1447, but no explicit
     reference was found in the text

  == Unused Reference: '13' is defined on line 1453, but no explicit
     reference was found in the text

  == Unused Reference: '15' is defined on line 1459, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 1981 (ref. '3') (Obsoleted by RFC 8201)

  ** Obsolete normative reference: RFC 2401 (ref. '5') (Obsoleted by RFC 4301)

  ** Obsolete normative reference: RFC 2414 (ref. '6') (Obsoleted by RFC 3390)

  ** Obsolete normative reference: RFC 2460 (ref. '7') (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 2960 (ref. '9') (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 1063 (ref.
     '10') (Obsoleted by RFC 1191)

  -- Obsolete informational reference (is this intentional?): RFC 1626 (ref.
     '12') (Obsoleted by RFC 2225)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-tsvwg-sctpimpguide-10


     Summary: 13 errors (**), 0 flaws (~~), 11 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                          M. Mathis
2	Internet-Draft                                                J. Heffner
3	Expires: November 30, 2004                                           PSC
4	                                                                K. Lahey
5	                                                               Freelance
6	                                                               June 2004

8	                           Path MTU Discovery
9	                       draft-ietf-pmtud-method-02

11	Status of this Memo

13	   By submitting this Internet-Draft, I certify that any applicable
14	   patent or other IPR claims of which I am aware have been disclosed,
15	   and any of which I become aware will be disclosed, in accordance with
16	   RFC 3668.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups. Note that other
20	   groups may also distribute working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time. It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at http://
28	   www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on November 30, 2004.

35	Copyright Notice

37	   Copyright (C) The Internet Society (2004). All Rights Reserved.

39	Abstract

41	   This document describes a robust new method for Path MTU Discovery
42	   that relies on TCP or other Packetization Layer to probe an Internet
43	   path with progressively larger packets. This method is described as
44	   an extension to RFC 1191 and RFC 1981, which specify ICMP based Path
45	   MTU Discovery for IP versions 4 and 6, respectively. This document
46	   does not define a protocol, but rather a method to use features of
47	   existing protocols to discover the path MTU.

49	   The general strategy of the new algorithm is to start with a small
50	   MTU and probe upward, testing successively larger MTUs by probing
51	   with single packets.  If the probe is successfully delivered, then
52	   the MTU is raised.  If the probe is lost, it is treated as an MTU
53	   limitation and not as a congestion signal.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
58	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
59	   3.  Overview . . . . . . . . . . . . . . . . . . . . . . . . . . .  6
60	   4.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . .  9
61	   5.  Implementation Issues  . . . . . . . . . . . . . . . . . . . . 10
62	     5.1   Layering . . . . . . . . . . . . . . . . . . . . . . . . . 10
63	       5.1.1   Accounting for Header Sizes  . . . . . . . . . . . . . 10
64	       5.1.2   Storing PMTU information . . . . . . . . . . . . . . . 11
65	     5.2   Lower Layers . . . . . . . . . . . . . . . . . . . . . . . 12
66	       5.2.1   Generating Probes  . . . . . . . . . . . . . . . . . . 12
67	       5.2.2   Selecting the initial MTU  . . . . . . . . . . . . . . 14
68	       5.2.3   Normal sequence of events to raise the MTU . . . . . . 14
69	       5.2.4   Processing MTU Indications . . . . . . . . . . . . . . 15
70	       5.2.5   Probing Intervals  . . . . . . . . . . . . . . . . . . 20
71	       5.2.6   Host fragmentation . . . . . . . . . . . . . . . . . . 21
72	       5.2.7   Multicast  . . . . . . . . . . . . . . . . . . . . . . 22
73	     5.3   Search Strategy  . . . . . . . . . . . . . . . . . . . . . 22
74	       5.3.1   Search . . . . . . . . . . . . . . . . . . . . . . . . 23
75	       5.3.2   Monitor  . . . . . . . . . . . . . . . . . . . . . . . 24
76	       5.3.3   Suspend  . . . . . . . . . . . . . . . . . . . . . . . 24
77	     5.4   Specific Packetization Layers  . . . . . . . . . . . . . . 24
78	       5.4.1   Probing method using TCP . . . . . . . . . . . . . . . 24
79	       5.4.2   Probing method using SCTP  . . . . . . . . . . . . . . 25
80	       5.4.3   Probing Method for IP Fragmentation  . . . . . . . . . 27
81	       5.4.4   Issues for other transport protocols . . . . . . . . . 27
82	     5.5   Operational Integration  . . . . . . . . . . . . . . . . . 27
83	       5.5.1   Interoperation with prior algorithms . . . . . . . . . 27
84	       5.5.2   Interoperation over subnets with dissimilar MTUs . . . 28
85	       5.5.3   Interoperation with tunnels  . . . . . . . . . . . . . 28
86	       5.5.4   Diagnostic tools . . . . . . . . . . . . . . . . . . . 29
87	       5.5.5   Management interface . . . . . . . . . . . . . . . . . 29
88	   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 30
89	   6.1   Normative References . . . . . . . . . . . . . . . . . . . . 30
90	   6.2   Informative References . . . . . . . . . . . . . . . . . . . 31
91	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 32
92	   A.  Security Considerations  . . . . . . . . . . . . . . . . . . . 32
93	   B.  IANA considerations  . . . . . . . . . . . . . . . . . . . . . 32
94	   C.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 33
95	       Intellectual Property and Copyright Statements . . . . . . . . 34

97	1.  Introduction

99	   This document describes a method for Packetization Layer Path MTU
100	   Discovery (PLPMTUD) which is an extension to existing Path MTU
101	   discovery methods as described in RFC 1191 [2] and RFC 1981 [3].  The
102	   proper MTU is determined by starting with small packets and probing
103	   with successively larger packets.  The bulk of the algorithm is
104	   implemented above IP, in the transport layer (e.g. TCP) or other
105	   "Packetization Protocol" that is responsible for determining packet
106	   boundaries.

108	   This document draws heavily RFC 1191 [2] and RFC 1981 [3] for
109	   terminology, ideas and some of the text.

111	   The methods described in this document apply both IPv4 and IPv6, and
112	   many transport protocols.   This document does not define a protocol,
113	   but rather a method to use features of existing protocols to discover
114	   the path MTU.  It does not require cooperation from the lower layers
115	   (except that they are consistent about what packet sizes are
116	   acceptable) or the far node.  Variants in implementations will not
117	   cause interoperability problems.

119	   The methods described in this document are carefully designed to
120	   maximize robustness in the presence of less than ideal
121	   implementations of other protocols or Internet components.

123	   For sake of clarity we uniformly prefer TCP and IPv6 terminology.  In
124	   the terminology section we also present the analogous IPv4 terms and
125	   concepts for the IPv6 terminology.  In a few situations we describe
126	   specific details that are different between IPv4 and IPv6.

128	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
129	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
130	   document are to be interpreted as described in RFC 2119 [4].

132	   This draft is a product of the Path MTU Discovery (pmtud) working
133	   group of the IETF.  Please send comments and suggestions to
134	   pmtud@ietf.org.   Interim drafts and other useful information will be
135	   posted at http://www.psc.edu/~mathis/MTU/pmtud/index.html .

137	2.  Terminology
138	   IP Either IPv4 [1] or IPv6 [7].

140	   node A device that implements IP.

142	   router A node that forwards IP packets not explicitly addressed to
143	      itself.

145	   host Any node that is not a router.

147	   upper layer A protocol layer immediately above IP. Examples are
148	      transport protocols such as TCP and UDP, control protocols such as
149	      ICMP, routing protocols such as OSPF, and Internet or lower-layer
150	      protocols being "tunneled" over (i.e., encapsulated in) IP such as
151	      IPX, AppleTalk, IP itself.

153	   link A communication facility or medium over which nodes can
154	      communicate at the link layer, i.e., the layer immediately below
155	      IP. Examples are Ethernets (simple or bridged); PPP links; X.25,
156	      Frame Relay, or ATM networks; and Internet (or higher) layer
157	      "tunnels", such as tunnels over IPv4 or IPv6. Occasionally we use
158	      the slightly more general term "lower layer" for this concept.

160	   interface A node's attachment to a link.

162	   address An IP-layer identifier for an interface or a set of
163	      interfaces.

165	   packet An IP header plus payload.

167	   MTU Maximum Transmission Unit, the size in bytes of the largest IP
168	      packet, including the IP header and payload, that can be
169	      transmitted on a link or path. Note that this could more properly
170	      be called the IP MTU, to be consistent with how other standards
171	      organizations use the acronym MTU.

173	   link MTU The Maximum Transmission Unit, i.e., maximum IP packet size
174	      in bytes, that can be conveyed in one piece over a link. Beware
175	      that this definition differers from the definition used by other
176	      standards organizations.

178	      For IETF documents, link MTU is uniformly defined as the IP MTU
179	      over the link. This includes the IP header, but excludes link
180	      layer headers and other framing which is not part of IP or the IP
181	      payload.

183	      Beware that other standards organizations generally define link
184	      MTU to include the link layer headers.

186	   path The set of links traversed by a packet between a source node and
187	      a destination node

189	   PMTU, path MTU The minimum link MTU of all the links in a path
190	      between a source node and a destination node.

192	   classical PMTU discovery, Process described in RFC 1191 and RFC 1981,
193	      in which nodes rely on ICMP "Packet Too Big" messages to learn the
194	      MTU of a path.

196	   PL, packetization layer The layer of the network stack which segments
197	      data into packets.

199	   PLPMTUD Packetization Layer Path MTU Discovers, the method described
200	      in this document, which is an extension to classical PMTU
201	      discovery.

203	   Packet Too Big message An ICMP message reporting that an IP packet is
204	      too large to forward. This is the IPv6 term that corresponds to
205	      the IPv4 "ICMP Can't fragment" message.

207	   flow A context in which MTU discovery is applied. This is naturally
208	      an instance of the packetization protocol, e.g. one side of a TCP
209	      connection.

211	   MPS The maximum IP payload size available over a specific path. This
212	      is typically the path MTU minus the IP header. As an example, this
213	      is the maximum TCP packet size, including TCP payload and headers
214	      but not including IP headers. This has also been called the "L3
215	      MTU".

217	   MSS The TCP Maximum Segment Size, the maximum payload size available
218	      to the TCP layer. This is typically the path MPS minus the size of
219	      the TCP header.

221	   probe packet A packet which is being used to test a path for a larger
222	      MTU.

224	   probe size The size of a packet being used to probe for a larger MTU.

226	   successful probe The probe packet was delivered through the network
227	      and acknowledged by the Packetization Layer on the far node.

229	   inconclusive probe The probe packet was not delivered, but there were
230	      other lost packets close enough to the probe where it can not be
231	      presumed that the probe was lost because it was larger than the
232	      path MTU. By implication the probe might have been lost due to
233	      something other than MTU (such congestion), so the results are
234	      inconclusive.  Inconclusive probes are generally repeated at the
235	      same probe size, after a suitable delay.

237	   failed probe The probe packet was not delivered and there were no
238	      other lost packets close to the probe. This is taken as an
239	      indication that the probe was larger than the path MTU, and future
240	      probes should generally be for at smaller sizes.

242	   errored probe There were losses or timeouts during the verification
243	      phase which suggest a potentially disruptive failure or network
244	      condition. These are generally retried only after substantially
245	      longer intervals.

247	   probe gap The payload data that will be lost and need to be
248	      retransmitted if the probe is not delivered.

250	   probe phase The interval (time or protocol events) between when a
251	      probe is sent, and when it is determined that the the probe
252	      succeeded, failed or was inconclusive

254	   verification phase An additional interval during which the new path
255	      MTU is considered provisional. Packet losses or timeouts are
256	      treated as an indication that there may be a problem with the
257	      provisional MTU.

259	   Transition phase The interval between the probe phase and the
260	      verification phase, during which packets using the new MTU
261	      propagate to the far node and the acknowledgment propagates back.

263	   full stop timeout a timeout where none of the packets transmitted
264	      after some event are acknowledged by the receiver, including any
265	      retransmissions. This is taken as an indication of some failure
266	      condition in the network, such as a routing change onto a link
267	      with a smaller MTU.   For the sake of PLPMTUD we suggest the
268	      following definition of a full stop timeout:  the loss of one full
269	      window of data and at least one retransmission or at least 6
270	      consecutive packets including at least 2 retransmissions (along
271	      with two retransmission timer expirations).   [@@@ This probably
272	      needs some experimentation.]

274	   search strategy the heuristics used to choose successive probe sizes
275	      to converge to the proper path MTU, as described in section 5.5.

277	3.  Overview

279	   This document describes a method for TCP or other packetization
280	   protocols to dynamically discover the MTU of a path without relying
281	   on explicit signals from the network. These procedures are applicable
282	   to TCP and other transport- or application-level packetization
283	   protocols in which the receiver always reports to the sender complete
284	   information about which packets were lost in the network.

286	   The general strategy of the new procedure is for the packetization
287	   layer to find the proper MTU by probing with progressively larger
288	   packets, without disrupting its normal protocol operation. If a probe
289	   packet is successfully delivered, then the path MTU is provisionally
290	   raised. If there are no additional losses during the subsequent
291	   verification phase, then the path MTU is confirmed (verified) to be
292	   at least as large as the provisional MTU. PLPMTUD can then probe
293	   again with an even larger MTU, according to MTU search strategy
294	   described in Section 5.3.

296	   The verification phase is used to detect some situations where
297	   raising the MTU raises the packet loss rate.  For example if a link
298	   is striped across multiple physical channels with inconsistent MTUs,
299	   it is possible that a probe will be delivered even if it is too large
300	   for some of the physical channels. In such cases raising the path MTU
301	   to the probe size will cause severe periodic loss and abysmal
302	   performance.  The verification phase is designed to prevent the path
303	   MTU from being raised if doing so causes excessive packet losses.

305	   A conservative implementation of PLPMTUD would use a full round trip
306	   time for the verification phase.  In this case each time PLPMTUD
307	   raises the MTU it takes three full round trip times to do so. It
308	   takes one round trip for the probe phase, during which the probe
309	   propagates to the far node and an acknowledgment is returned.   The
310	   second round trip is the transitional phase, during which data
311	   packets using the provisional MTU propagate to the far node and are
312	   acknowledged. During he third and final round trip time, it is
313	   verified that raising the MTU does not cause excessive loss.

315	   The isolated loss of a probe packet (with or without a Packet Too Big
316	   message) is treated as an indication of an MTU limit, and not as a
317	   congestion indicator. In this case alone, the packetization protocol
318	   is permitted to retransmit the probe gap without adjusting the
319	   congestion window.

321	   If there is a timeout or any additional lost packets during any of
322	   the three phases, the loss is treated as a congestion indication as
323	   well as an indication of some sort of failure of the PLPMTUD process.
324	   The congestion indication is treated like any other congestion
325	   indication: window or rate adjustments are mandatory per the relevant
326	   congestion control standards [8].   Probing can resume with some new
327	   probe size after a delay which is determined by the nature of the
328	   indicated failure.

330	   The most likely (and least serious) PLPMTUD failure is the link
331	   experiencing legitimate congestion related losses at about the same
332	   time as the probe.   In this case, it is appropriate to retry the
333	   probe (with the same probe size) as soon as the packetization layer
334	   has fully adapted to the congestion and recovered from the losses.

336	   In other cases, additional losses or timeouts indicate problems with
337	   the link or packetization layer, and that probes may be disruptive.
338	   In these situations it is desirable to use progressively longer
339	   delays depending on the severity of the failure and if it persists.

341	   PLPMTUD can optionally process Packet Too Big messages to select the
342	   provisional MTU for faster convergence in exchange for a slight
343	   decrease in robustness.  Processing malicious or erroneous Packet Too
344	   Big messages can cause PLPMTUD to arrive at the incorrect MTU for a
345	   path, which is likely to reduce protocol performance. There are
346	   several different options for processing Packet Too Big messages: in
347	   one extreme they could be completely ignored, in the other extreme,
348	   accept all of them (fully implementing classic PMTUD within PLPMTUD).
349	   We advocate a compromise, where Packet Too Big messages are only
350	   processed in conjunction with probes (described in Section 5.2.4.1),
351	   and Packetization Layer timeouts (described in Section 5.2.4.3).

353	   Relatively few details of this procedure affect interoperability with
354	   other standards or Internet protocols.  These details are specified
355	   in RFC2119 standards language in Section 4.

357	   Most of the difficulty in implementing PLPMTUD arises because it
358	   needs to be implemented in several different places within a single
359	   node.  In general each packetization protocol needs to have it's own
360	   implementation of PLPMTUD. Furthermore, the natural mechanism to
361	   share path MTU information between concurrent or subsequent
362	   connections over the same path is a path information cache in the IP
363	   layer.  The various packetization protocols need to have the means to
364	   access and update the shared cache in the IP layer. This memo
365	   describes PLPMTUD in terms of its primary subsystems without fully
366	   describing how they are assembled into a complete implementation.
367	   Section 5 describes: the separation into layers, the mechanics of
368	   probing from the point of view other lower layers, Maximum Payload
369	   Size search heuristics; implementation in specific Packetization
370	   Layers; and operational integration issues.

372	   The vast majority of the implementation details are recommendations
373	   based on experiences with earlier versions of path MTU discovery.
374	   These are motivated by a desire to maximize robustness of PLPMTUD in
375	   the presence of less than ideal implementations as they exist in the
376	   field.

378	4.  Requirements

380	   All Internet nodes SHOULD implement PLPMTUD in order to discover and
381	   take advantage of the largest MTU supported along the Internet path.

383	   Links MUST NOT deliver packets that are larger than their MTU. Links
384	   that have parametric limitations (e.g. MTU bounds due to limited
385	   clock stability) MUST include explicit mechanisms to consistently
386	   reject packets that might otherwise be nondeterministically
387	   delivered.

389	   All hosts SHOULD use IPv4 fragmentation in a mode that mimics IPv6
390	   functionality.  All fragmentation SHOULD be done on the host, and all
391	   IPv4 packets, including fragments, SHOULD have the DF bit set such
392	   that they will not be fragmented (again) in the network.  See Section
393	   5.2.6

395	   The requirements below only apply to those implementations that
396	   include PLPMTUD.

398	   If the Packetization Layer uses application data to implement PLPMTUD
399	   it MUST use a loss reporting mechanism mechanism (e.g. TCP SACK)
400	   which avoids spurious retransmission of other data when a probe
401	   packet is lost.

403	   A Packetization Layer using application data for probes MUST NOT send
404	   a probe unless it has sufficient following data available to send
405	   such that a lost probe will trigger Fast Retransmit or similar data
406	   recovery algorithm.

408	   A Packetization Layer using application data for probes SHOULD NOT
409	   send a probe packet unless the flow is expected to have at least the
410	   3 round trips worth of data needed to successfully complete the
411	   probe, transition and verification phases.

413	   Normal congestion control algorithms MUST remain in effect under all
414	   conditions except when only an isolated probe packet is detected to
415	   be lost. In this case alone the normal congestion (window or data
416	   rate) reduction can be suppressed.  If any other lost data is
417	   detected, all normal congestion control MUST take place.

419	   When a probe is lost and normal congestion control is suppressed as
420	   permitted above, then the Packetization Layer MUST NOT probe again
421	   until at least an interval equal to the normal congestion control
422	   cycle.  For TCP and TCP friendly protocols this generally means one
423	   round trip of elapsed time for each packet permitted under the
424	   current congestion window.

426	   If PLPMTUD updates the MTU for a particular path, all Packetization
427	   Layer sessions that share the flow (path) must be notified.

429	   Whenever the MTU is raised, the congestion state variables must be
430	   rescaled to not to raise the window size in bytes (or date rate in
431	   bytes per seconds).

433	   Whenever the MTU is reduced (e.g. when unconditionally processing
434	   ICMP Packet Too Big messages) the congestion state variable must be
435	   rescaled not to raise the window size in packets.

437	   All implementations MUST include a mechanism to implement diagnostic
438	   tools that do not rely on the operating systems implementation of
439	   path MTU discovery.   This specifically requires the ability to send
440	   packets that are larger than the known MTU for the path, and
441	   collecting any resultant ICMP error message. See Section 5.5.4

443	5.  Implementation Issues

445	   This section discusses a number of issues related to the
446	   implementation of Path MTU Discovery.  This is not a specification,
447	   but rather a set of notes provided as an aid for implementers.

449	   The issues include:
450	   o  The seperation into layers
451	   o  The Mechanics of Probing, as seen by IP and brlow
452	   o  Search Strategy.
453	   o  How to implement PLPMTUD in specific Packetization Layers.
454	   o  How to improve Operational Integration and deployment.

456	5.1  Layering

458	5.1.1  Accounting for Header Sizes

460	   Packetization Layer Path MTU Discovery is most easily implemented by
461	   splitting its functions between layers.  The IP layer is in the best
462	   place to keep shared state, collect the ICMP messages, track IP
463	   headers sizes and manage MTU information from the link layer
464	   interfaces.  However the procedures that PLPMTUD uses for probing,
465	   verifications and scanning for the path MTU are very tightly coupled
466	   to the data recovery and congestion control state machines in the
467	   Packetization Layer.   The most difficult part of implementing
468	   PLPMTUD is properly splitting the implementation between the layers.

470	   Note that this layering is constant with the advice in the current
471	   PMTUD specifications [2][3]. Today, many implementations of classical
472	   PMTU Discovery are already split along these same layers.

474	   Early implementation of PLPMTUD revealed that it is critically
475	   important to have a good clean mechanism for accounting header sizes
476	   at all layers.  This is because each Packetization Layer does its
477	   calculations in its own natural data unit, which are almost always a
478	   reflection of the service that the Packetization Layer provides to
479	   the application or other upper layers.  For example, TCP naturally
480	   performs all of its calculations in terms of sequence numbers and
481	   segment sizes.   The size of the Probe gap is the size of the data
482	   segment that was that was carried by the probe packet. However, the
483	   MTU size being probed, ICMP MTU, etc are measures of full packets,
484	   which not only include the TCP data (measured in sequence space) but
485	   also include fixed TCP and IP headers, and may include IPv6 extension
486	   headers or IPv4 options, TCP options and even IPsec AH or ESP headers
487	   as well.

489	   PLPMTUD requires frequent translation between these two domains: the
490	   Packetization Layer's natural data unit and full IP packet sizes.
491	   While there are a number of possible ways to accurately implement
492	   dual size measures, our experience has been that it is best if the
493	   boundary between the IP layer and the Packetization layer communicate
494	   in terms of the IP Maximum Payload Size or MPS.  The MPS is the only
495	   size measure that is common to both the IP and Packetization Layers,
496	   because it exactly matches the boundary between the layers.  The IP
497	   Layer is responsible for adding or deducting it's own headers when
498	   translating between MTU and MPS.  Likewise the Packetization Layer is
499	   responsible for adding or deducting its own headers when calculations
500	   in it's natural data units.

502	   This document does not take a stance on the placement of IPsec, which
503	   logically sits between IP and the Packetization Layer. As far as
504	   PLPMTUD is concerned IPsec can be treated either as part of IP or as
505	   part of the Packetization Layer, as long as the accounting is
506	   consistent within any given implementation.  If IPsec is treated as
507	   part of the IP layer, then each security association to a remote node
508	   may need to be treated as a separate flow for PLPMTUD, if they have
509	   different length security headers. If IPsec is treated as part of the
510	   packetization layer, the IPsec header size has to be included in the
511	   Packetization Layer's header size calculations.

513	5.1.2  Storing PMTU information

515	   This memo uses the concept of a "flow" to define the scope in which
516	   path MTU information is used.  Each flow locally stores its maximum
517	   payload size (MPS), which is used for packetizing data.
518	   Packetization Layers may communicate with the IP layer to store or
519	   access cached MPS values, providing a means by which similar flows
520	   may share information. The IP layer also stores PMTU and derived MPS
521	   information when it receives Packet Too Big messages.

523	   Ideally, a PMTU value should be associated with a specific path
524	   traversed by packets exchanged between the source and destination
525	   nodes.  However, in most cases a node will not have enough
526	   information to completely and accurately identify such a path.
527	   Rather, a node must associate a PMTU value with some local
528	   representation of a path.  It is left to the implementation to select
529	   the local representation of a path.

531	   An implementation could use the destination address as the local
532	   representation of a path.  The PMTU value associated with a
533	   destination would be the minimum PMTU learned across the set of all
534	   paths in use to that destination.  The set of paths in use to a
535	   particular destination is expected to be small, in many cases
536	   consisting of a single path.  This approach will result in the use of
537	   optimally sized packets on a per-destination basis.  This approach
538	   integrates nicely with the conceptual model of a host as described in
539	   [ND@@@@]: a PMTU value could be stored with the corresponding entry
540	   in the destination cache.   However, NAT and other forms of middle
541	   boxes may exhibit differing MTUs at as single IP address.

543	   If IPv6 flows are in use, an implementation could use the IPv6 flow
544	   id [7][14] as the local representation of a path.  Packets sent to a
545	   particular destination but belonging to different flows may use
546	   different paths, with the choice of path depending on the flow id.
547	   This approach will result in the use of optimally sized packets on a
548	   per-flow basis, providing finer granularity than PMTU values
549	   maintained on a per-destination basis.

551	   For source routed packets (i.e. packets containing an IPv6 Routing
552	   header, or IPv4 LSRR or SSRR options), the source route may further
553	   qualify the local representation of a path.    An implementation
554	   could use source route information in the local representation of a
555	   path.

557	   If IPsec is in use, the security association can also be used to
558	   represent a path.

560	5.2  Lower Layers

562	5.2.1  Generating Probes

564	   A new candidate MTU is tested by sending one "probe packet", which is
565	   larger than the current MTU.  In this section we present a couple of
566	   possible ways to alter packetization layers to generate probe
567	   packets.   The different techniques incur different overheads in
568	   three areas: difficulty in generating the probe packet (in terms of
569	   packetization layer implementation complexity and computational
570	   overhead) possible additional network capacity consumed by the probes
571	   and the overhead of recovering from failed probes (both network and
572	   protocol overhead).

574	   For example some protocols might be extended to allow padding with
575	   dummy data within their packets.  This would greatly simplify the
576	   implementation because the probing can be performed without
577	   participation from the application and if the probe fails, the
578	   missing data (the "probe gap") is assured to fit within the current
579	   MTU when it is retransmitted. However, the padding does consume
580	   network capacity without carrying any useful payload.

582	   This technique does not work for TCP, because there is not a separate
583	   length field or other mechanism to differentiate between padding and
584	   real payload data. With TCP the natural approach is to send
585	   additional payload data in an over-sized segment.   There are several
586	   variants which have different tradeoffs.

588	   In one method, after a TCP probe segment has been sent the subsequent
589	   segment(s) may be sent as though the probe segment was not
590	   over-sized.  Thus if the probe segment is lost, it will leave a gap
591	   in the sequence space that is exactly the correct size to be filled
592	   by one segment at the current MTU.   Since this method generates
593	   overlapping data, it will cause duplicate acknowledgments if the
594	   probe is successfully delivered.  The sender must be capable of
595	   ignoring these expected duplicate acknowledgments in a manner which
596	   will not cause unnecessary retransmission or congestion window
597	   reduction.

599	   In the second method, after a TCP probe segment has been sent,
600	   subsequent TCP segments are sent in a non-overlapping manner.  If the
601	   probe segment is lost, it will leave a gap which will require
602	   retransmission of multiple segments to fill. This method has lower
603	   overhead for successful probes, but it requires more complexity in
604	   the retransmit logic to correctly retransmit the missing data (the
605	   "probe gap") with multiple segments that fit into the old MTU, while
606	   properly suppressing the congestion adjustments for this one
607	   situation and no others.

609	   Several Packetization protocols may be best served by using an
610	   adjunct protocol for MTU probing: a separate protocol (or protocol
611	   feature) that does not carry and real application data.  This greatly
612	   simplify s implementation because nothing needs to be retransmitted
613	   when the probe is lost, but it does consume network capacity without
614	   delivering any useful payload.

616	   Two important example of this come to mind:  SCTP [9] which might use
617	   its existing HEARTBEAT facility padded with dummy data to fill out
618	   the probe packet; and IP fragmentation which is sometimes used as a
619	   Packetization layer for carrying oversized datagrams as described in
620	   Section 5.2.6. In the case of IP fragmentation an entire separate
621	   protocol in need, that has to use the diagnostic interface described
622	   in Section 5.5.4

624	   It should be clear that nearly all packetization layers can be
625	   adapted to support PLPMTUD, possibly in more than one way.

627	5.2.2  Selecting the initial MTU

629	   When the PLPMTUD process is started the initial MTU should normally
630	   be set such that the Packetization Layer can carry 1 kByte data
631	   segments.    This initial MTU should be 1 kByte plus space for IP and
632	   Packetization layer headers. (see Section 5.1 on accounting for
633	   headers).   With the this MTU, RFC2414 [6] allows TCP and other
634	   transport protocols to start with an initial window of 4 packets.

636	   We suspect, but have not confirmed that TCP actually starts faster
637	   (and completes sooner for small packets) with 1kB packets rather than
638	   1500 byte packets because the 2nd data ACK occurs one round trip
639	   earlier

641	   This initial MTU should also be configurable.    One of the
642	   configuration options should be to set it to default to the
643	   interfaces MTU, to mimic classical PMTUD behavior. (See Section 5.5.1

645	5.2.3  Normal sequence of events to raise the MTU

647	   If the probe size is smaller than the actual path MTU and there are
648	   no other losses, the normal sequence of events to probe and raise the
649	   MTU will be:
650	   1.  The probe is sent, followed by more packets at the current MTU.
651	       By definition PLPMTUD enters the probe phase.   The probe
652	       propagates through the network and the far node acknowledges it
653	       (or possibly latter data, if acknowledgements are cumulative and
654	       delayed acknowledgement is in effect).

656	   2.  The acknowledgement for the probe reaches the data sender.   By
657	       definition, this ends the probe phase.

659	   3.  The packetization layer provisionally raises the MTU to the probe
660	       size. PLPMTUD enters the transitional phase when it starts
661	       sending data using the provisional MTU.

663	       Note that implementations that use packet counts for congestion
664	       accounting (e.g. keep cwnd in units of packets) must re-scale
665	       their congestion accounting such that raising the MTU does not
666	       raise the data rate (bytes/second) or the total congestion window
667	       in bytes.

669	       If the implementation packetizes the data at the application
670	       programming interface, it may transmit already queued data at the
671	       current MTU before raising the MTU. In this case this data is not
672	       part of either the probing or transition phases, because all of
673	       the packets in flight fit within the current MTU.

675	   4.  Once the first packet of the transitional phase is acknowledged,
676	       PLPMTUD enters the verification phase.   In principle the
677	       verification phase can be of arbitrary duration, however at this
678	       time we are recommending one full window of data (i.e one full
679	       round trip time) for most Packetization Layers.

681	   5.  Once there has been sufficient data delivered and acknowledged in
682	       the provisional MTU is considered verified and the path MTU is
683	       updated.   PLPMTUD can then probe for an even larger MTU, as
684	       described in the searching strategy in Section 5.3.

686	   Other events described in the next section are treated as exceptions
687	   and alter or cancel some of the steps above.

689	5.2.4  Processing MTU Indications

691	   The descriptions below assume that the Packetization Layer protocol
692	   that has a TCP fast retransmit style mechanism to synchronously
693	   detect the loss of a probe packet and trigger retransmission, without
694	   loss of the protocols self clock.  If this fails, then some sort of
695	   retransmission timeout will serve to catch the loss.    It also
696	   assumes that there is some mechanism to detect full-stop timeouts.

698	   If any of these events (or the receipt of an ICMP Packet Too Big
699	   message) occurs during the the above process to raise the MTU, then
700	   it is processed as indicated in the following sections.

702	5.2.4.1  Processing Packet Too Big Messages

704	   Classical PMTU discovery specifies the generation of Packet Too Big
705	   Messages if an over-sized packet (e.g. a probe) encounters a link
706	   that has a smaller MTU. Since these messages can not be authenticated
707	   they introduce a number of well documented attacks against classical
708	   PMTUD [5].

710	   With PLPMTUD these messages are not required for correct operation,
711	   and in principle can be summarily ignored at the expense of slower
712	   convergence to the proper MTU.   However we believe that a slightly
713	   better compromise is to process Packet too big messages in two
714	   specific contexts: in conjunction with a PLPMTUD probe or a full-stop
715	   timeout.

717	   Every Packet Too Big Message should be subjected to the following
718	   checks:
719	   o  If globally forbidden then discard the message.

721	   o  If forbidden by the application then discard the message.

723	   o  If this path has been tagged "bogus ICMP messages" then discard
724	      the message.

726	   o  If the reported MTU fails consistency checks then set "bogus ICMP
727	      messages" flag for this path and discards the message.  These
728	      consistency checks include:
729	      *  unrecognized or unparseable enclosed header,
730	      *  reported MTU is larger than the size indicated by the enclosed
731	         header or
732	      *  larger than the current MTU, provisional MTU or probe size as
733	         appropriate.
734	      *  or fails a ICMP consistency checks specific to the
735	         Packetization Layer. (E.g. The SCTP Verification-Tag mechanism
736	         [9][16])
737	      To ease migration, it is suggested that implementations may
738	      include global controls to suppress some or all of the consistency
739	      checks.

741	   If the Packet Too Big Message is acceptable under all of these checks
742	   do one of two things on depending on a global configuration switch:
743	   Emulate classical path MTU discovery by processing the message
744	   immediately (I.e. set the path MTU to the size indicated in the
745	   message) or save the "ICMP MTU", pending another PLPMTUD event.   In
746	   this case the saved ICMP MTU will only be acted upon under
747	   appropriate conditions if there are lost probes, verification packets
748	   or a full stop timeout.   This greatly reduces the impact of
749	   fraudulent ICMP Packet Too Big messages.

751	   In either case if the Packetization Layer calls for specific actions
752	   in response to a Packet Too Big message, that action should be
753	   invoked only at the point when the path MTU is updated from the ICMP
754	   MTU.

756	5.2.4.2  Packetization Layer Detects Lost Packets

758	   Each packetization protocol has it's own mechanism to detect lost
759	   packets and request the retransmission of missing data. The primary
760	   signals used by the packetization layer are these protocol specific
761	   loss indications. The packetization layer is responsible for
762	   retransmitting the lost data and notifying PLPMTUD that there was a
763	   loss.
764	   o  If the probe itself was lost, and there were no other losses
765	      during the probe phase (The RTT between when the probe was sent
766	      and the loss detected) than it is taken as an indication that the
767	      path MTU is smaller than the probe size. In this situation alone
768	      the Packetization Layer is permitted to retransmit the missing
769	      data (the "probe gap") without adjusting its congestion window or
770	      data transmission rate.

772	      If an accepted Packet Too Big Message was received after the probe
773	      was sent, and it passes the additional checks that the ICMP MTU is
774	      greater than the current MTU and less than the probe SIZE, then
775	      set the probe side to the ICMP MTU, and restart the probe process
776	      from step 1 in Section 5.2.3.

778	      If there was not a accepted Packet Too Big Message, then the
779	      indicated event is a "probe failure", which can be retried with a
780	      smaller probe size after a suitable delay for a probe_fail_event.
781	      See Section 5.2.4.2 for more complete descriptions of failure
782	      events.

784	   o  If there are losses during the probe phase and the probe was not
785	      lost, then the probe was successful.  However, since additional
786	      losses have the potential to spoil the verification phase, it is
787	      important that PLPMTUD not progress into the transition phase
788	      (step 3 above) until after the Packetization Layer has fully
789	      recovered from the losses and completed the congestion window (or
790	      rate) adjustment.

792	   o  If there are losses during the probe phase and the probe was also
793	      lost the outcome depends on the presence an ICMP MTU set by an
794	      acceptable Packet Too Big Message.

796	      If there was an accepted Packet Too Big Message received since the
797	      probe was sent, and it passes the additional checks that the ICMP
798	      MTU is greater than the current MTU and less than the probe size,
799	      then set the probe size to the ICMP MTU, and once the
800	      Packetization Layer completes the recovery from the losses then
801	      restart the probe process from step 1 in Section 5.2.3.

803	      If there was not an accepted Packet Too big Message, then the
804	      probe is inconclusive because the lost probe might have been
805	      caused by congestion.   The probe can be retried  after a suitable
806	      delay for a probe_inconclusive_event.

808	   o  It is unlikely that losses during the transition phase are caused
809	      by PLPMTUD, however they do potentially complicate the
810	      verification phase.  Note that we are referring to losses that are
811	      followed by acknowledgement of packets that were sent at the old
812	      MTU, while the transition to the provisional MTU is still
813	      propagating through the network.   The first acknowledgement of
814	      the provisional MTU (and the transition to the verification phase)
815	      is most likely going to occur during the recovery of the losses in
816	      transition phase.   It is important that the Packetization Layer
817	      retransmission machinery distinguish between loses at the old MTU
818	      (transition phase) and the provisional MTU (the verification
819	      phase, discussed next).

821	   o  Losses during the verification phase are taken as a indication
822	      that the path may have a non-uniform MTU or some other problems
823	      such that raising the MTU substantially raises the loss rate.  If
824	      so, this is potentially a very serious problem, so the provisional
825	      MTU is considered to have errored and the path MTU is set back to
826	      the previously verified MTU (the previously current MTU).

828	      Packet loss during the verification phase might also be due to
829	      coincidental congestion on the path, unrelated to the probe, so it
830	      would seem to be desirable to re-probe the path. The risk is that
831	      this effectively raises the tolerated loss threshold because even
832	      though raising the MTU seemed to cause additional loss, there is a
833	      statistical chance that repeated attempts to verify a new MTU may
834	      yield as false pass.    The compromise is to re-probe once with
835	      the same probe size (after delay probe_inconclusive_event), and if
836	      this also fails, then the probe may not be retried until after a
837	      suitable delay for a verification_error_event, which exponentially
838	      increases on each successive failure.

840	5.2.4.3  Packetization Layer Retransmission Timeout

842	   Note that the we do not make distinctions between the various methods
843	   that different Packetization Layers might use for detecting and
844	   retransmitting lost packets.   It is preferable that the
845	   Packetization Layer uses a recovery mechanism similar to TCP SACK or
846	   fast retransmit (or other "synchronous" loss recover mechanism) to
847	   detect losses and recover as quickly as possible.

849	   Under some conditions the Packetization Layer may have to rely on
850	   retransmission timeouts or other fairly disruptive techniques to
851	   recover from losses.   Since these greatly increase the cost of
852	   failed probes, it is recommended that PLPMTUD use even longer delays
853	   before re-probing. In these situations replace probe_fail_event with
854	   probe_timeout_event.

856	5.2.4.4  Packetization Layer Full Stop Timeout

858	   Under all conditions (not just during MTU probing) a full stop
859	   timeout should be taken as an indication of some significantly
860	   disruptive event in the network, such as a router failure or a
861	   routing change to a path with a smaller MTU.

863	   If the ICMP MTU is set, and it is less that the current MTU (or
864	   provisional MTU during the transitional phase), then the path MTU can
865	   be reduced to the ICMP MTU.   This is the only situation (a full stop
866	   timeout) outside of a probe that we recommended that the path MTU is
867	   set from the ICMP MTU. (In Section 5.5.1 we relax this recommendation
868	   to facilitate migration to PLPMTUD in exchange for slightly less
869	   protection from corrupt Packet Too Big messages)

871	   Note that whenever a problem with the path that causes a full-stop
872	   timeout (also known as a "persistent timeout" in other documents),
873	   several different path restart/recovery algorithms may be invoked at
874	   different layers in the stack.  Some device drivers may be restarted
875	   [@@], router discovery [@@], ES-IS [@@] and so forth.  We recommend
876	   that in most situation the first action should be to set the path MTU
877	   down.   Note that this recommendation is really beyond the scope of
878	   this document, and may require substantial additional research.

880	   Therefore, if there is a full stop timeout and there was not an ICMP
881	   message indicating a reason (Packet Too Big, Net unreachable, etc, or
882	   the ICMP messages was ignored for some reason), we suggest that the
883	   first recovery action should be to set the path MTU down to a safe
884	   minimum "restart MTU" value, and the PLPMTUD search state reset, so
885	   PLPMTUD will start over again searching for the proper MTU. The
886	   default restart_MTU should be the minimum MTU as specified by IPv4
887	   (576)[1] or IPv6 (1280) [7] as appropriate, unless overridden by some
888	   global control (See Section 5.5.5).

890	   If and only if the full stop timeout happens during the probe or
891	   transition phases (e.g. after the sending data using the provisional
892	   MTU but before any of it is acknowledged) is it considered likely
893	   that raising the MTU caused the full stop timeout.  If so this
894	   situation is is likely to be cyclic, because resetting the PLPMTUD
895	   search state is likely to eventually cause re-probing the same
896	   problematic MTU.

898	   It is tempting to define additional states to detect recurrent full
899	   stop timeouts. However in today's hostile network environment, there
900	   is little tolerance for nodes that are so fragile that they can be
901	   disrupted by something as simple as oversized packets.  Therefor we
902	   do not feel that it is worth the overhead of specifying a state
903	   machine that is capable of automaticly detecting these situations and
904	   disabling PLPMTUD.   However, it is important that there be a manual
905	   way to disable or limit probing on specific paths.  See Section
906	   5.5.5.

908	5.2.5  Probing Intervals

910	   Section 5.2.4.2 describes a number of probe failure events.   In all
911	   cases the basic response is the same: to wait some time interval
912	   (dependent on the specific event and possibly the history) and then
913	   to probe again.  For events that are "inconclusive", it is generally
914	   appropriate to re-probe with the same probe size.   For events that
915	   are identified as "failed probes" it is generally appropriate to
916	   re-probe with a smaller probe size.   The search strategy described
917	   in Section 5.3 is used to select probe sizes.

919	   Many of the intervals below are specified in terms of elapsed round
920	   trips relative to the current congestion window.   This is because
921	   TCP and other Packetization Layer protocols tend to exhibit periodic
922	   loses which cause periodic variations of the congestion window and
923	   possibly the data rate.  It is preferable that the PLPMTUD probes are
924	   scheduled near the low point of these cycles to minimize ambiguities
925	   caused by congestion losses.

927	   In order from least to most serious:
928	   probe_inconclusive_event Other lost packets near the lost probe made
929	      the probe result ambiguous.   Since the loss of non-probe packets
930	      requires a window (or data rate) reduction, it is desirable to
931	      schedule the re-probe (at the same probe size) at one round trip
932	      time after the end of the loss recovery.   This will be almost the
933	      minimum congestion window size, with a small cushion to minimize
934	      the chances that correlated losses caused by some other bursty
935	      connection spoil another probe.

937	   probe_fail_event A probe fail event is the one situation under which
938	      the Packetization layer is permitted not to treat loss as a
939	      congestion signal.  Because there is some small risk that
940	      suppressing congestion control might have unanticipated
941	      consequences (even for one isolated loss), we require that probe
942	      fail events be less frequent than the normal period for losses
943	      under standard congestion control.  Specifically after a probe
944	      fail event and suppressed congestion control, PLPMTUD may not
945	      probe again until an interval which is comparable to the expected
946	      interval between congestion control events. See Section 4.

948	      The simplest estimate of the interval to the next congestion event
949	      is the same number of round trips as the current window in
950	      packets.

952	   probe_timeout_event Since this event was detected by a timeout, it is
953	      relatively disruptive to protocol operation.   Furthermore, since
954	      the event indirectly includes a window adjustment that may have
955	      been caused by the MTU probe, it is important that the probe not
956	      be repeated until congestion has more than recovered from the
957	      loss.   Therefore we recommend five times the probe_fail_event
958	      interval.   I.e. five times as many round trips as the current
959	      congestion window in packets.

961	   verification_error_event A verification fail event indicates that a
962	      probe was deliver and the verification phase failed twice
963	      separated by a congestion adjustment (so the second verification
964	      phase was at a low point in the congestion control cycle). This is
965	      an indication that one of the following three things might have
966	      happened: repeated losses unrelated to PLPMTUD; the path is
967	      striped across links with dissimilar MTUs, or the link layer has
968	      some parametric limitation such that raising the MTU greatly
969	      increases the random error rate.

971	      The optimal method responding to this situation is an open
972	      research question. We believe that the correct response is some
973	      combination of exponentially lengthening backoffs (e.g. Starting
974	      at 1 minute and quadrupling on each repeat.) and implicitly
975	      treating the situation as a probe fail (and choosing a smaller
976	      probe size) after some threshold number of repeated
977	      verification_error_events.

979	5.2.6  Host fragmentation

981	   Packetization layers are encouraged to avoid sending messages that
982	   will require fragmentation (for the case against fragmentation, see
983	   [17][18]).  However this is not always possible. Some packetization
984	   layers, such as a UDP application outside the kernel, may be unable
985	   to change the size of messages it sends.  This may result in packet
986	   sizes that exceeds the Path MTU.

988	   IPv4 permitted such applications to send packets without DF set.
989	   Oversized packets without DF would be fragmented in the network or
990	   sending host when they encountered a link with a small MTU.   In some
991	   case, packets could be fragmented more than once if there were
992	   cascaded links with progressively smaller MTUs.

994	   This approach is no longer recommended.  We now recommend that IPv4
995	   implementation use a strategy that mimics IPv6 functionality.  When
996	   an application sends datagrams that are larger than the known path
997	   MTU they should be fragmented to the path MTU in the host IP layer
998	   even if they are smaller than the link MTU of the first hop networks
999	   directly attached to the host.  The DF bit should be set on the
1000	   fragments, so they will not be fragmented again in the network.

1002	   This technique will minimize future surprises as the Internet
1003	   migrated to IPv6. Otherwise there is the potential for widely
1004	   deployed applications or services relying on IPv4 fragmentation, in a
1005	   way that can not be implemented in IPv6. At least one major operating
1006	   system already uses this strategy.

1008	   Note that in principle the IP fragmentation layer is an example of a
1009	   Packetization Layers, it could implement full PLPMTUD in the
1010	   fragmentation process.

1012	5.2.7  Multicast

1014	   In the case of a multicast destination address, copies of a packet
1015	   may traverse many different paths to reach many different nodes.  The
1016	   local representation of the "path" to a multicast destination must in
1017	   fact represent a potentially large set of paths.

1019	   Minimally, an implementation could maintain a single MPS value to be
1020	   used for all packets originated from the node.  This MPS value would
1021	   be the minimum MPS learned across the set of all paths in use by the
1022	   node.  This approach is likely to result in the use of smaller
1023	   packets than is necessary for many paths.

1025	   Alternatively, if the application using multicast gets complete
1026	   delivery reports (unlikely because this requirement  has poor scaling
1027	   properties), PLPMTUD could be implemented in multicast protocols.

1029	5.3  Search Strategy

1031	   The search strategy described here is a only guide for implementors.
1032	   A standard algorithm is not specified because the strategy can
1033	   include many heuristics to optimize MPS selection for a given path.
1034	   Particularly, it may be appropriate for different protocols to follow
1035	   different strategies.  There is opportunity for future improvements
1036	   to this algorithm.

1038	   The search strategy uses three variables:
1039	      SEARCH_MAX is the largest MPS that a flow might be able to use.
1040	      It is determined by such considerations as interface MTU, widths
1041	      of protocol length fields, and possibly other protocol-dependent
1042	      values, such as the the TCP MSS option. In many cases it would be
1043	      the same as the classical MTU discovery initial MSS, minus the IP
1044	      layer headers.
1045	      SEARCH_LOW is the largest validated MPS, and should be used as the
1046	      effective MPS by the packetization layer.   It is the same as the
1047	      current validated MTU minus the IP layer headers.  The initial
1048	      value for SEARCH_LOW should be a parameter, but a value of 1024
1049	      may be a reasonable default.
1050	      SEARCH_HIGH is the least invalidated MPS.   In most cases is will
1051	      be the most recent failed probe size minus the IP layer headers.
1052	      When PLPMTUD is initialized SEARCH_HIGH should be set to
1053	      SEARCH_MAX.

1055	   There are three major states: Search, Monitor and Suspend. In the
1056	   Search state, it incrementally searches for the largest MPS that the
1057	   path can support, narrowing the difference between SEARCH_LOW and
1058	   SEARCH_HIGH. Once this gap is sufficiently narrow, the probing
1059	   algorithm enters the Monitor state where it probes infrequently to
1060	   detect if the path MPS has become larger.

1062	   If the MPS probing is determined harmful, perhaps by persistent probe
1063	   failures, the flow may enter the Suspend state, completely disabling
1064	   MPS probing.

1066	5.3.1  Search

1068	   In the Search state, the strategy follows a multi-phase scan.  If
1069	   SEARCH_HIGH >= SEARCH_MAX, a course scan is used.  In this mode, each
1070	   probe's payload size should be MIN(2 * SEARCH_LOW, SEARCH_MAX).  If
1071	   SEARCH_HIGH < SEARCH_MAX, the fine scan mode should be used.

1073	   The fine scan algorithm may pursue a number of different methods for
1074	   choosing probe sizes.  It may be useful to choose probe sizes so that
1075	   the final IP packet will fit common link MTUs, for example 1500,
1076	   4352, 9000, 17914.  Optionally, probes smaller than these values by
1077	   common tunnel header sizes may be used.

1079	   When using some protocols, the cost for a failed probe may be
1080	   significantly higher than the cost of a successful probe due to
1081	   retransmission and consequent delay jitter as seen by the
1082	   application.  For this reason, one possible approach to the fine scan
1083	   could be to use probes of size SEARCH_LOW + d, for some increment d.
1084	   It should enter the Monitor state when SEARCH_LOW + d >= SEARCH_HIGH.
1085	   This will result in at most one additional probe failure.

1087	   Another approach may be to use a simple binary search where each
1088	   probe size is (SEARCH_LOW + SEARCH_HIGH) / 2, entering the Monitor
1089	   state when SEARCH_LOW + s >= SEARCH_HIGH for some threshold s.  This
1090	   will converge quickly, but may have a higher number of probe
1091	   failures.  It is more appropriate for a protocol whose probes consist
1092	   entirely of padding.

1094	5.3.2  Monitor

1096	   In the Monitor state, a probe of size SEARCH_HIGH should be sent at
1097	   most once every MONITOR_INTERVAL seconds.  If the probe succeeds,
1098	   then SEARCH_HIGH should be set to SEARCH_MAX, and the state should be
1099	   set to Search.

1101	   If there is evidence that no flow traffic is receiving its
1102	   destination, such as repeated timeouts with no acknowledgements in
1103	   TCP, it may be that the connection was re-routed to a path with a
1104	   smaller MTU, and the Packet Too Big messages are ignored of filtered.
1105	   In this case, SEARCH_LOW and SEARCH_HIGH should be set to initial
1106	   values, and the Search state should be entered.

1108	5.3.3  Suspend

1110	   In the Suspend state, probing is entirely disabled, and the MPS
1111	   should be set to 512 bytes.  The Suspend state should only be used if
1112	   it is heuristically determined that probing is causing harmful
1113	   failures.

1115	5.4  Specific Packetization Layers

1117	   In this section we discuss specific implementation issues different
1118	   Packetization Layer protocols.

1120	5.4.1  Probing method using TCP

1122	   TCP has no mechanism that could be used to distinguish between real
1123	   application data and some other form of padding that might be used to
1124	   fill out probe packets.  Therefore, TCP must generate probes by
1125	   sending oversized segments that are carrying real data from upper
1126	   layers.  As previously mentioned there are two approaches that TCP
1127	   might use to minimize the overheads associated with the probing
1128	   process.

1130	   A TCP implementation of PLPMTUD can elect to send subsequent segments
1131	   overlapping the probe as though the probe segment was not oversized.
1132	   This has the advantage that TCP only need to retransmit one segment
1133	   at the current MTU to recover from failed probes. However the
1134	   duplicate data in the probe does consume network resources and will
1135	   cause duplicate acknowledgments.   It is important that these extra
1136	   duplicate acknowledgments not trigger Fast Retransmit.  This can be
1137	   guaranteed by limiting the largest probe segment size to twice the
1138	   current segment size (causing at most 1 duplicate acknowledgment) or
1139	   three times the current segment size (causing at most 2 duplicate
1140	   acknowledgments).

1142	   The other approach is to send non-overlapping segments following the
1143	   probe. Although this is cleaner from a protocol architecture
1144	   standpoint it clashes with many of the optimizations used improve the
1145	   efficiency of data motion withing many operating systems.  In
1146	   particular many implementations divide the data into segments and
1147	   pre-compute checksums as the data is copied out of user space.  In
1148	   these implementation it can be very expensive to adjust segment
1149	   boundaries after the data is already queued.

1151	   If TCP is using SACK or any other variable length headers, the
1152	   headers on the probe and verification packets should be padded to the
1153	   maximum possible length. Otherwise, future options may cause delivery
1154	   problems if they cause IP packets that are larger than the MTU.

1156	   Note that the header size and overhead calculations described in
1157	   Section 5.1 apply here.  TCP's natural data accounting units are
1158	   sequence space and Maximum Segment Size.  However the the PLPMTUD
1159	   process is described in terms of total packet size, which is larger
1160	   than the MSS by all fixed and optional headers.

1162	   At the point when TCP is ready to start the verification phase, it is
1163	   permitted transmit already queued data at the old MTU rather than
1164	   re-packetize it.  This postpones the verification process by the time
1165	   required to send the queued data.

1167	   If the verification phase experiences any segment losses, TCP is
1168	   required to pull back to the prior MSS.   Since failing the
1169	   verification phase should be an infrequent error condition it is less
1170	   important  that this be  as efficient as probing.

1172	5.4.1.1  Window management

1174	   Some TCP implementations keep the congestion window in units of
1175	   segments. When segment size is increased during a connection, a
1176	   conservative implementation should scale cwnd so that, in units of
1177	   bytes, it will remain unchanged.

1179	   It is recommended that TCP should not probe a new MPS if that MPS
1180	   will likely result in a cwnd of less than 5 segments.

1182	   If the network becomes too congested, it is recommended that the MPS
1183	   be reduced to a smaller size as determined by a heuristic.  The
1184	   recommended heuristic is to reduce the MPS by half if ssthresh is
1185	   reduced to 5 segments or smaller, with a minimum MPS of 512 bytes.

1187	5.4.2  Probing method using SCTP

1189	   In the SCTP protocol packetization is the responsibility of the
1190	   application or protocol above SCTP.  The application writes a set
1191	   message to SCTP and SCTP will "chunkify" it into appropriate sized
1192	   pieces. Some implementations MAY bundle multiple data chunks
1193	   together, but this is NOT required implementation behavior. By
1194	   implication not all SCTP implementations can easily generate probes
1195	   sending additional application data. In particular any implementation
1196	   that does not implement data chunk bundling would not be able to
1197	   implement a probe.

1199	   For SCTP the recommended method for generating probes is to pad SCTP
1200	   HeartBeat messages to the desired probed size. A successful probe
1201	   will be acknowledged without delay by the peer SCTP implementation
1202	   returning the same Heartbeat as a HEARTBEAT-ACK. This assures that
1203	   both directions will support the probed MTU size. [@@@@@ note that
1204	   both sides of the path are tested]

1206	   The verification phase is entered after a successful probe. For
1207	   implementations that can bundle multiple DATA chunks the verification
1208	   phase completes when a windows worth of bundled DATA chunks are
1209	   exchanged at the new MTU value. An SCTP implementation SHOULD arrange
1210	   its fragmentation point to be a suitable multiple of the new MTU size
1211	   (e.g. if the MTU size is 1500 bytes in IPv4 then a fragmentation
1212	   point of 718 bytes might be selected during the verification phase.
1213	   This would allow the two bundled DATA chunks to be put together to
1214	   exactly equal the proposed new PMTU. After verification is complete
1215	   the fragmentation point can then be set to the actual PMTU assuming
1216	   that this new value is the smallest MTU of all of the SCTP paths).
1217	   An SCTP implementation is allowed to transmit already fragmented DATA
1218	   chunks that cannot be bundled together at the new MTU value that were
1219	   previously queued. For implementation that do not allow DATA chunk
1220	   bundling three subsequent HEARTBEAT messages should be sent over the
1221	   next XX@@ RTT's padded to the new proposed MTU value. If all of HB's
1222	   are successful then the new PMTU should be adopted for the path.

1224	   [@@@@NOTE: it might be simpler to always use multiple HB's to prove
1225	   in a PMTU during verification, I leave this up to you. One thing to
1226	   keep in mind is that SCTP normally fragments its messages to the
1227	   SMALLEST PMTU of all paths... since SCTP is multi-homed this makes it
1228	   so any data chunk can fit on ANY path. Most implementations DO bundle
1229	   data chunks for this very reason... its easy to do and it allows
1230	   larger PMTU's on different paths to be utilized. So using the HB may
1231	   be more efficient... its definitely simpler... I leave it to you to
1232	   choose. We may also want to mention the ICMP issue with SCTP since a
1233	   validated ICMP message with SCTP can always be trusted].

1235	   The SCTP Verification-Tag is designed to increase SCTPs robustness in
1236	   the presence of a number of attacks, including forged ICMP messages.
1237	   It relies on a 32 bit Verification Tag which is initialized to a
1238	   random value during connection establishment and placed in the first
1239	   64 bits of all SCTP messages. All subsequent messages (including ICMP
1240	   messages, which copy at least the first 64 bits of the message) must
1241	   match the original Verification Tag, or they are rejected as being
1242	   likely attacks against the connection. [9][16].

1244	   It is believed that the Verification Tag mechanism is strong enough
1245	   where SCTP could unconditionally process Packet Too Large messages
1246	   that would reduce the path MTU at arbitrary times.   As written, this
1247	   document does not encourage this method.  The PLPMTUD ICMP validity
1248	   checks are cascaded with the SCTP checks, such that the messages are
1249	   processed only if they meet all consistency checks.  In particular,
1250	   PLPMTUD only uses the ICMP MTU value following a probe, during MTU
1251	   verification, or following a hard stop timeout.

1253	   To change this an implementation  would have to suppress some of the
1254	   checks in Section 5.2.4.1 for SCTP.

1256	5.4.3  Probing Method for IP Fragmentation

1258	   As mentioned in Section 5.2.6, datagram protocols (such as UDP) can
1259	   rely on IP fragmentation as a packetization layer.   Since the IP
1260	   layer does not have any way to determine if the fragments were
1261	   delivered, it can not do the probing directly.    The probing has to
1262	   be done with an adjunct protocol that uses the diagnostic API
1263	   (Section 5.5.4) to send oversized probes, and some other API to
1264	   update the MPS stored in the IP layer.

1266	5.4.4  Issues for other transport protocols

1268	   Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
1269	   repacketize when doing a retransmission.  That is, once an attempt is
1270	   made to transmit a segment of a certain size, the transport cannot
1271	   split the contents of the segment into smaller segments for
1272	   retransmission.  In such a case, the original segment can be
1273	   fragmented by the IP layer during retransmission.  Subsequent
1274	   segments, when transmitted for the first time, should be no larger
1275	   than allowed by the Path MTU.

1277	5.5  Operational Integration

1279	5.5.1  Interoperation with prior algorithms

1281	   Properly functioning Path MTU discovery is critical to the robust and
1282	   efficient operation of the Internet.   Any major change (as described
1283	   in this document) has the potential to be very disruptive if it
1284	   contains any errors or oversights.   Therefore, we offer a deployment
1285	   strategy in which classical PMTUD operation as described in RFC 1191
1286	   and RFC 1981 is unmodified and PLPMTUD is only invoked following a
1287	   full stop timeout, presumably due to an "ICMP black hole". To do
1288	   this:
1289	   o  Relax the ICMP checks in Section 5.2.4.1 specifically to allow an
1290	      ICMP Packet Too Large message to reduce the MTU at arbitrary
1291	      times.
1292	   o  When there is no cached MTU, use the Interface MTU as specified
1293	      classical PMTU discovery, rather the initial MTU as specified in
1294	      Section 5.2.2
1295	   o  MTU searching as described in Section 5.3 is disabled entirely or
1296	      starts in the monitor state.
1297	   o  A full stop timeout is processed as described in Section 5.2.4.4.
1298	      This becomes the only mechanism to invoke the rest of PLPMTUD.

1300	   When configured in this manner, PLPMTUD will increase the robustness
1301	   of classical PMTU discovery in the presence of ICMP black holes and
1302	   other ICMP problems, with minimal exposure to unanticipated problems
1303	   during deployment.  Since this configuration does not help robustness
1304	   in the presence of malicious or erroneous ICMP messages, it is not
1305	   recommended for the long term.

1307	5.5.2  Interoperation over subnets with dissimilar MTUs

1309	   With classical PMTUD, the ingress router to a subnet is responsible
1310	   for knowing what size packets can be delivered to every node attached
1311	   to that subnets.   For most subnet types, this requires that the
1312	   entire subnet has a single MTU which is common to every attached
1313	   node.   (For a few subnets types (e.g. ATM[12]) the nodes on a subnet
1314	   can be negotiate the MTU on a pairwise basis, and the ingress router
1315	   is responsible for knowing the MTU to each of it peers).

1317	   This requirement has proven to be a major impediment to deploying
1318	   larger MTUs in the operational Internet.  Often one single node which
1319	   does not support a larger MTU effectively vetoes raising the MTU on a
1320	   subnet, because the ingress router does not have a mechanism to
1321	   generate the proper Packet Too Big Message for the one attached node
1322	   with a smaller MTU

1324	   With  PLPMTUD, this requirement is completely relaxed.  As long as
1325	   oversized packets addressed the nodes with the smaller MTU are
1326	   reliably discarded, PLPMTUD will find the proper MTU for these nodes.

1328	5.5.3  Interoperation with tunnels

1330	   PLPMTUD is specifically designed to solve many of the problems that
1331	   people are experiencing today due to poor interactions between
1332	   classical MTU discovery, IPsec, and various sorts of tunnels [5].
1333	   As long as the tunnel reliably discards packets that are too large,
1334	   PLPMTUD will discover an appropriate MTU for the path.

1336	   Unfortunately due to the pervasive problems with classical PMTU
1337	   discovery, many manufacturers of various types of VPN/tunneling
1338	   equipment have resorted to ignoring the DF bit.  This not only
1339	   violates the IP standard and many recommendations to the contrary
1340	   [17][18], it also violates the only requirement that PLPMTUD places
1341	   on the link layer: that oversized packets are reliably discarded.
1342	   It is imperative that people understand the impact of ignoring the DF
1343	   bit both to applications and to PLPMTUD.

1345	   We do understand the reality of the situation.  It is important that
1346	   vendors who are building devices the violate the DF specification
1347	   understand that PLPMTUD requires that probe packets be discarded, and
1348	   that sending ICMP packet too big messages alone is insufficient to
1349	   prevent wholesale fragmentation if the probe packets are delivered.

1351	   Therefore, it is imperative that devices that do not honor DF include
1352	   packet size history caches and other heuristics to robustly detect
1353	   and discard probe packets, if delivering them would require
1354	   fragmentation.

1356	5.5.4  Diagnostic tools

1358	   All implementations MUST include facilities for MTU discovery
1359	   diagnostic tools that implement PLPMTUD or other MTU discovery
1360	   algorithms in user mode without help or interference by the PMTUD
1361	   algorithm present in the operating system.  This requires an
1362	   mechanism where a diagnostic application can send packets that are
1363	   larger than the operating system's notion of the current path MTU and
1364	   collect any resulting Packet Too Big Messages or other ICMP messages.
1365	   For IPv4 the diagnostic application must be able to set the DF bit.

1367	   At this time nearly all operating systems support two modes for
1368	   sending UDP datagrams: one which silently fragments packets that are
1369	   too large, and another that rejects packets that are too large.
1370	   Neither of these modes are suitable for efficiently diagnosing
1371	   problems with the MTU discovery, such as routers that return Packet
1372	   Too Big messages containing incorrect size information.

1374	5.5.5  Management interface

1376	   It is suggested that an implementation provide a way for a system
1377	   utility program to:
1378	   o  Globally disable all ICMP Packet Tool Large message processing
1379	   o  Globally suppress some or all ICMP consistency checks described in
1380	      Section 5.2.4.1.  Setting this option foregoes some possible
1381	      security improvements, in exchange for making PLPMTUD behave more
1382	      like classical PMTU discovery.  (See Section 5.5.1)
1383	   o  Globally permit ICMP Packet Tool Large messages to unconditionally
1384	      reduce the MTU, even if there were not lost lost packets.
1385	      Setting option foregoes some possible security improvements, in
1386	      exchange for making PLPMTUD behave more like classical PMTU
1387	      discovery.  (See Section 5.5.1)
1388	   o  Globally adjust timer intervals for specific classes of probe
1389	      failures

1391	   In addition, it is important that there be a mechanism to permit per
1392	   path controls to override specific parts of the PLPMTUD algorithm.
1393	   All of these per path controls can be preset from similar global
1394	   controls.
1395	   o  Disable MTU searching a given path, such that new MTU values are
1396	      never probed.
1397	   o  Set the initial MTU for a given path.   This could be used to
1398	      speed convergence in relatively static environments.   There
1399	      should be an option to cause PLPMTUD to choose the same initial
1400	      value as would be chosen by classical PMTU discovery.  I.e.
1401	      typically the Interface MTU.   This is used in the mode described
1402	      in Section 5.5.1 where PLPMTUD is used only for black hole
1403	      detection in classical PMTU discovery.
1404	   o  Limit the maximum probed MTU for a given path.   This permits a
1405	      manual configuration to work around a link that spuriously
1406	      delivers packets that are larger than the useful path MTU.
1407	   o  Per path and per application controls to disable ICMP processing,
1408	      to further limit possible damage from malicious Packet Too Big
1409	      messages (in addition to the global controls).

1411	6.  References

1413	6.1  Normative References

1415	   [1]  Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981.

1417	   [2]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
1418	        November 1990.

1420	   [3]  McCann, J., Deering, S. and J. Mogul, "Path MTU Discovery for IP
1421	        version 6", RFC 1981, August 1996.

1423	   [4]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
1424	        Levels", BCP 14, RFC 2119, March 1997.

1426	   [5]  Kent, S. and R. Atkinson, "Security Architecture for the
1427	        Internet Protocol", RFC 2401, November 1998.

1429	   [6]  Allman, M., Floyd, S. and C. Partridge, "Increasing TCP's
1430	        Initial Window", RFC 2414, September 1998.

1432	   [7]  Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
1433	        Specification", RFC 2460, December 1998.

1435	   [8]  Floyd, S., "Congestion Control Principles", BCP 41, RFC 2914,
1436	        September 2000.

1438	   [9]  Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer,
1439	        H., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and V. Paxson,
1440	        "Stream Control Transmission Protocol", RFC 2960, October 2000.

1442	6.2  Informative References

1444	   [10]  Mogul, J., Kent, C., Partridge, C. and K. McCloghrie, "IP MTU
1445	         discovery options", RFC 1063, July 1988.

1447	   [11]  Knowles, S., "IESG Advice from Experience with Path MTU
1448	         Discovery", RFC 1435, March 1993.

1450	   [12]  Atkinson, R., "Default IP MTU for use over ATM AAL5", RFC 1626,
1451	         May 1994.

1453	   [13]  Sung, T., "TCP And UDP Over IPX Networks With Fixed Path MTU",
1454	         RFC 1791, April 1995.

1456	   [14]  Partridge, C., "Using the Flow Label Field in IPv6", RFC 1809,
1457	         June 1995.

1459	   [15]  Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923,
1460	         September 2000.

1462	   [16]  Stewart, R., "Stream Control Transmission Protocol (SCTP)
1463	         Implementors Guide", draft-ietf-tsvwg-sctpimpguide-10 (work in
1464	         progress), December 2003.

1466	   [17]  Kent, C. and J. Mogul, "Fragmentation considered harmful",
1467	         Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

1469	   [18]  Mathis, M., Heffner, J. and B. Chandler, "Fragmentation
1470	         Considered Very Harmful", draft-mathis-frag-harmful-00 (work in
1471	         progress), July 2004.

1473	Authors' Addresses

1475	   Matt Mathis
1476	   Pittsburgh Supercomputing Center
1477	   4400 Fifth Avenue
1478	   Pittsburgh, PA  15213
1479	   US

1481	   Phone: 412-268-3319
1482	   EMail: mathis@psc.edu

1484	   John W. Heffner
1485	   Pittsburgh Supercomputing Center
1486	   4400 Fifth Avenue
1487	   Pittsburgh, PA  15213
1488	   US

1490	   Phone: 412-268-2329
1491	   EMail: jheffner@psc.edu

1493	   Kevin Lahey
1494	   Freelance

1496	   EMail: kml@patheticgeek.net

1498	Appendix A.  Security Considerations

1500	   Under all conditions the PLPMTUD procedure described in this document
1501	   is at least as secure as the current standard path MTU discovery
1502	   procedures described in RFC 1191 [2] and RFC 1981 [3].

1504	   It the recommended configuration, PLPMTUD is significantly harder to
1505	   attack than current procedures, because ICMP messages are cached and
1506	   only processed in connection with lost packets.   This effectively
1507	   prevents blind attacks on the path MTU discovery system.

1509	   Furthermore, since this algorithm is designed for robust operation
1510	   without any ICMP (or other messages from the network), it can be
1511	   configured to ignore all ICMP messages (globally or on a per
1512	   application basis).  In this configuration it can not be attacked,
1513	   unless the attacker can identify and selectively cause probe packets
1514	   to be lost.

1516	Appendix B.  IANA considerations

1518	   None.

1520	Appendix C.  Acknowledgements

1522	   Most of the SCTP text was contributed by Randall Stewart.

1524	   Matt Mathis and John Heffner are supported in this work by a grant
1525	   from Cisco Systems, Inc.

1527	Intellectual Property Statement

1529	   The IETF takes no position regarding the validity or scope of any
1530	   Intellectual Property Rights or other rights that might be claimed to
1531	   pertain to the implementation or use of the technology described in
1532	   this document or the extent to which any license under such rights
1533	   might or might not be available; nor does it represent that it has
1534	   made any independent effort to identify any such rights. Information
1535	   on the IETF's procedures with respect to rights in IETF Documents can
1536	   be found in BCP 78 and BCP 79.

1538	   Copies of IPR disclosures made to the IETF Secretariat and any
1539	   assurances of licenses to be made available, or the result of an
1540	   attempt made to obtain a general license or permission for the use of
1541	   such proprietary rights by implementers or users of this
1542	   specification can be obtained from the IETF on-line IPR repository at
1543	   http://www.ietf.org/ipr.

1545	   The IETF invites any interested party to bring to its attention any
1546	   copyrights, patents or patent applications, or other proprietary
1547	   rights that may cover technology that may be required to implement
1548	   this standard. Please address the information to the IETF at
1549	   ietf-ipr@ietf.org.

1551	Disclaimer of Validity

1553	   This document and the information contained herein are provided on an
1554	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1555	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1556	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1557	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1558	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1559	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1561	Copyright Statement

1563	   Copyright (C) The Internet Society (2004). This document is subject
1564	   to the rights, licenses and restrictions contained in BCP 78, and
1565	   except as set forth therein, the authors retain all their rights.

1567	Acknowledgment

1569	   Funding for the RFC Editor function is currently provided by the
1570	   Internet Society.