idnits 2.17.1 

draft-ietf-pmtud-method-00.txt:
-(146): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(252): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(350): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(714): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(869): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  == There are 9 instances of lines with non-ascii characters in the document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 19
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 20 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Authors' Addresses Section.

  ** There are 10 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 865 has weird spacing: '...imed to  per-...'

  == Line 896 has weird spacing: '...for the  purpo...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     Links MUST not deliver packets that are larger than their true MTU.
     Links that have parametric limitations (e.g. MTU bounds due to limited
     clock stability) MUST include explicit mechanisms to consistently reject
     packets that might otherwise be nondeterministically delivered.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (Oct 19, 2003) is 7495 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC 2119' is mentioned on line 84, but not defined

  == Missing Reference: 'IPv4-SPEC' is mentioned on line 243, but not defined

  == Missing Reference: 'IPv6-SPEC' is mentioned on line 676, but not defined

  == Missing Reference: 'FRAG' is mentioned on line 344, but not defined

  == Missing Reference: 'ND' is mentioned on line 396, but not defined

  == Missing Reference: 'CONG' is mentioned on line 707, but not defined

  == Missing Reference: 'ISOTP' is mentioned on line 730, but not defined

  == Missing Reference: 'RPC' is mentioned on line 740, but not defined

  == Unused Reference: 'RFC1191' is defined on line 800, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1981' is defined on line 804, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2119' is defined on line 807, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1063' is defined on line 812, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1435' is defined on line 815, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1626' is defined on line 819, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC1791' is defined on line 822, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC2923' is defined on line 825, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 1063 (ref. 'RFC1191') (Obsoleted by
     RFC 1191)

  ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201)

  -- Obsolete informational reference (is this intentional?): RFC 1626
     (Obsoleted by RFC 2225)


     Summary: 6 errors (**), 0 flaws (~~), 23 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft                                               Matt Mathis
3	                                                            John Heffner
4	                                                                     PSC
5	                                                             Kevin Lahey
6	                                                               Freelance
7	                                                            Oct 19, 2003

9	                           Path MTU Discovery
10	                     draft-ietf-pmtud-method-00.txt

12	Status of this Memo

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of RFC2026.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups. Note that other
19	   groups may also distribute working documents as Internet-Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time. It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	Abstract

34	   [@@ To be rewritten]

36	   This document describes Path MTU Discovery for the Internet.  It is
37	   largely derived from RFC 1191 and RFC 1981, which describe ICMP based
38	   Path MTU Discovery for IP versions 4 and 6, plus a robust new
39	   algorithm.

41	   The general strategy of the new algorithm is to start with a small
42	   MTU and probe upward, testing successively larger MTUs by probing
43	   with single packets.  If the probe is successfully delivered, then
44	   the MTU is raised.  If the probe is lost, it is treated as an MTU
45	   limitation and not as a congestion signal.

47	Table of Contents

49	   TBD

51	1. Introduction

53	   When one Internet node has a large amount of data to send to another
54	   node, the data is transmitted in a series of IP packets.  It is
55	   usually preferable that these packets be of the largest size that can
56	   successfully traverse the path from the source node to the
57	   destination node.  This packet size is referred to as the Path MTU
58	   (PMTU), and it is equal to the minimum link MTU of all the links in a
59	   path.

61	   This document describes a path MTU discovery (PMTUD) method based on
62	   the earlier methods described in the standards track documents,
63	   RFC1191 and RFC1981, with the addition of a new algorithm that
64	   searches for the proper MTU by probing with successively larger
65	   packets.  Large sections of this document are taken directly from
66	   RFC1191 and RFC1981.

68	   The methods described in this document apply to IPv4, IPv6, TCP, and
69	   other transport protocols.   This document does not define a
70	   protocol, but rather a method to use features of existing protocols
71	   to discover the path MTU.  It does not require cooperation from the
72	   lower layers (except that they are consistent about what packet sizes
73	   are acceptable) or the far node.  Variants in implementations will
74	   not cause problems with interoperability.

76	   For sake of clarity we uniformly prefer TCP and IPv6 terminology.  In
77	   the terminology section we also present the analogous IPv4 terms and
78	   concepts for the IPv6 terminology.  In a few situations we describe
79	   specific details that are different between IPv4 and IPv6.

81	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
82	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
83	   document are to be interpreted as described in [RFC 2119].

85	   [[This document still bears markup notes, indicated with square
86	   brackets [] or @@@@ signs.]]

88	2. Terminology

90	   IP          - Either IPv4 [IPv4-SPEC] or IPv6 [IPv6-SPEC].

92	   node        - A device that implements IP.

94	   router      - A node that forwards IP packets not explicitly
95	                 addressed to itself.

97	   host        - Any node that is not a router.

99	   upper layer - A protocol layer immediately above IP.  Examples are
100	                 transport protocols such as TCP and UDP, control
101	                 protocols such as ICMP, routing protocols such as OSPF,
102	                 and Internet or lower-layer protocols being "tunneled"
103	                 over (i.e., encapsulated in) IP such as IPX,
104	                 AppleTalk, IP itself.

106	   link        - A communication facility or medium over which nodes can
107	                 communicate at the link layer, i.e., the layer
108	                 immediately below IPv6.  Examples are Ethernets (simple
109	                 or bridged); PPP links; X.25, Frame Relay, or ATM
110	                 networks; and Internet (or higher) layer "tunnels",
111	                 such as tunnels over IPv4 or IPv6 itself.

113	   interface   - A node���s attachment to a link.

115	   address     - An IP-layer identifier for an interface or a set of
116	                 interfaces.

118	   packet      - An IP header plus payload.

120	   MTU         - Maximum Transmission Unit, the size in bytes of the
121	                 largest packet that can be transmitted on a link or
122	                 path.   Note that this could more properly be called
123	                 the IP MTU, to be consistent with how other standards
124	                 organizations use the term.  Beware that the definition
125	                 used in this and other IETF documents is not the same
126	                 as the definition used in other contexts.

128	   link MTU    - The Maximum Transmission Unit, i.e., maximum packet
129	                 size in octets, that can be conveyed in one piece over
130	                 a link.

132	   path        - The set of links traversed by a packet between a source
133	                 node and a destination node

135	   path MTU    - The minimum link MTU of all the links in a path between
136	                 a source node and a destination node.

138	   PMTU        - Path MTU

140	   Path MTU Discovery,
141	   PMTUD       - Process by which a node learns the PMTU of a path

143	   Packet Too Big message
144	               - An ICMP message reporting that an IP packet is too
145	                 large to forward.  This is the IPv6 term that
146	                 corresponds to the IPv4 "ICMP Can���t fragment" message.

148	   flow id     - A combination of a source address and a non-zero
149	                 IPv6 flow label.

151	   packetization protocol
152	               - The layer of the network stack which segments data into
153	                 packets.

155	   flow        - A context in which MTU discovery is applied.  This is
156	                 naturally an instance of the packetization protocol, e.g.
157	                 half of a TCP connection.

159	   MPS         - The maximum payload size available to a flow, usually
160	                 over a specific path.  As an example, this is the maximum
161	                 TCP segment size, including TCP headers but not including
162	                 IP headers.

164	   probe packet- A packet which is being used to test for a larger MTU.

166	   probe size  - The size of a packet being used to probe for a larger MTU.

168	   successful probe
169	               - The probe packet was delivered through the network.

171	   inconclusive probe
172	               - The probe packet was not delivered, but there were other lost
173	                 packets too close to the probe.   By implication the probe
174	                 might have been lost due to something other than MTU, so the
175	                 results are inconclusive.

177	   failed probe
178	               - The probe packet was not delivered and there were not other
179	                 lost packets close to the probe.

181	   probe gap   - The L3 payload data that will need to be retransmitted if the
182	                 probe is not delivered.

184	[[Deprecated terms - these terms should only appear in very specific parts of
185	the document.

187	ICMP

189	Can���t fragment messages

191	lower layers

193	@@@ remove as the document matures]]

195	3. Overview

197	   This document describes a technique to dynamically discover the MTU
198	   of a path.  These procedures are applicable to TCP and other
199	   transport- or application-level packetization protocols which
200	   implement similar features.

202	   The general strategy of the new procedure is to find the proper MTU
203	   by starting a connection using relatively small packets and then
204	   probing with progressively larger packets (containing application
205	   data).  If a probe packet is successfully delivered, then the path
206	   MTU is raised.  The isolated loss of a probe packet (with or without
207	   a Packet Too Big message) is treated as an indication of an MTU
208	   limit, and not as a congestion indicator.

210	   PMTUD can optionally process Packet Too Big messages for faster
211	   convergence in exchange for a slight decrease in robustness.
212	   Processing malicious or erroneous Packet Too Big messages can cause
213	   PMTU discovery to arrive at the incorrect MTU for a path, which is
214	   likely to reduce protocol performance.  The document describes three
215	   options for processing Packet Too Big messages: completely ignore
216	   them, only accept them in response to probes or accept all Packet Too
217	   Big messages (the previous approach).

219	   In addition, PMTUD can be extended with heuristics to use alternate
220	   criteria to select PMTU.  For example, on a path that is so congested
221	   that the fair share window is too small (smaller than 5 kB), TCP may
222	   be better behaved with 512-byte packets than with 1500-byte packets
223	   since with the larger packets the window would be too small to
224	   trigger Fast Retransmit.

226	   Relatively few details of this procedure affect interoperability with
227	   other standards or Internet protocols.  These details are specified
228	   in RFC2119 standards language in the requirements section.  The vast
229	   majority of the implementation details are recommendations based on
230	   experiences with earlier versions of path MTU discovery.  These are
231	   motivated by a desire to maximize robustness in the presence of less
232	   than ideal implementations as they exist in the field.

234	4. Requirements

236	   All Internet nodes SHOULD implement Path MTU Discovery in order to
237	   discover and take advantage of the largest MTU supported along the
238	   Internet path.

240	   Nodes not implementing Path MTU Discovery must use a default MTU as
241	   specified by the respective IP protocols.  For IPv6 the default MTU
242	   is 1280 bytes, the minimum link MTU as defined in [IPv6-SPEC].  For
243	   IPv4 it is 576 bytes, as specified in [IPv4-SPEC].

245	   Links MUST not deliver packets that are larger than their true MTU.
246	   Links that have parametric limitations (e.g. MTU bounds due to
247	   limited clock stability) MUST include explicit mechanisms to
248	   consistently reject packets that might otherwise be
249	   nondeterministically delivered.

251	   When a packet is too large to traverse a link, the attached router,
252	   if any, SHOULD send a Packet Too Big message (IPv6) or ICMP, can���t
253	   fragment message (IPv4 with DF set), as appropriate.

255	   The requirements below only apply to those implementations that
256	   include Path MTU Discovery.

258	   A flow MUST NOT send a probe packet until at least one packet of its
259	   full current MPS is acknowledged.  This implicitly limits successful
260	   probes to once per two round trips.  To make the algorithm more
261	   robust in the presence of multi-path routing, a flow SHOULD NOT send
262	   a probe packet until at least a full window or an appropriately large
263	   quantity of packets have been successfully acknowledged.

265	   Before a probe can be sent, the flow MUST be able to produce a packet
266	   containing a payload of at least the candidate MPS.  That is, it must
267	   have enough data or be able to pad the packet to the full desired
268	   size.  If the flow is able to send a probe with the exception of
269	   having enough data to

271	   Failed and inconclusive probes MUST NOT be sent more frequently than
272	   the normal congestion interval for the current average window size.

274	   A packetization protocol which does loss recovery MUST use a loss
275	   detection mechanism which does not result in spurious retransmission
276	   of any additional data when a probe packet is lost.

278	   During the probe, the normal congestion control machinery should
279	   remain in effect except when only the probe gap is detected as lost.
280	   In this case the normal multiplicative congestion window reduction is
281	   suppressed.  If any other data is detected as lost, all normal
282	   congestion control MUST take place.

284	   If the probe is successful, the current MPS is updated to the
285	   candidate MPS.  If window and other congestion state variables are
286	   kept in units of packets, they MUST be rescaled to preserve the
287	   current window size in bytes.

289	5. Implementation Issues

291	   This section discusses a number of issues related to the
292	   implementation of Path MTU Discovery.  This is not a specification,
293	   but rather a set of notes provided as an aid for implementers.

295	   The issues include:

297	   - What layer or layers implement Path MTU Discovery?

299	   - Accounting for headers

301	   - How is the PMTU information cached?

303	   - How are ICMP messages processed

305	   - How is stale PMTU information removed?

307	   - How to implement PMTUD with TCP?

309	   - What should other transport and higher layers do?

311	   - What should tunnels above IP do?

313	5.1. Layering

315	   In the IP architecture, the choice of what size packet to send is
316	   made by a protocol at a layer above IP.  This memo refers to such a
317	   protocol as a "packetization protocol".  Packetization protocols are
318	   usually transport protocols (for example, TCP) but can also be
319	   higher-layer protocols (for example, protocols built on top of UDP).

321	   This memo uses the concept of a "flow" to define the scope in which
322	   path MTU information is used.  Each flow locally stores its maximum
323	   payload size (MPS), which is used for packetizing data.  Flows may
324	   communicate with the IP layer to store or access cached PMTU values,
325	   providing a means by which similar flows may share information.  To
326	   do so, the flow must convert between these two values by adding or
327	   subtracting the size of the IP header plus any additional
328	   intermediate headers.  The IP layer also stores PMTU information from
329	   the ICMP layer when it receives Packet Too Big messages.

331	   It is possible that a packetization layer, perhaps a UDP application
332	   outside the kernel, is unable to change the size of messages it
333	   sends.  This may result in a packet size that exceeds the Path MTU.

335	   In such situations, the packets must be fragmented by the IP layer.
336	   To accommodate this, IPv6 defines a mechanism that allows large
337	   payloads to be divided into fragments, with each fragment sent in a
338	   separate packet (see [IPv6-SPEC] section "Fragment Header").  It is
339	   also recommended that IPv4 fragment the packets at the end system.
340	   @@@ Should it also set the DF flag to mimic IPv6? @@@

342	   However, packetization layers are encouraged to avoid sending
343	   messages that will require fragmentation (for the case against
344	   fragmentation, see [FRAG]).

346	5.2. Accounting for headers

348	   The packetization is done at or near the top of the protocol stack,
349	   while the final packet size, only determined at bottom of the stack,
350	   is what is determines the link���s ability to transmit the packet.  As
351	   such, it is necessary for the lower layers to deterministically
352	   accept all payloads of a uniform size, or for these layers to
353	   communicate their header sizes to the upper layer prior to
354	   packetization.

356	   This document does not take a position on the layering boundaries of
357	   IPsec, which logically sits between IP and TCP or another
358	   packetization layer.  IPsec can be treated either as part of IP or as
359	   part of the packetization layer, as long as the accounting is
360	   consistent within any given implementation.  If IPsec is treated as
361	   part of the IP layer, then each security association that contributes
362	   a different length security header, may need to be treated as a
363	   separate path.  If IPsec is treated as part of the packetization
364	   layer, then the MPS to PMTU calculation must include the IPsec header
365	   size for that flow.

367	5.3. Storing PMTU information

369	   Ideally, a PMTU value should be associated with a specific path
370	   traversed by packets exchanged between the source and destination
371	   nodes.  However, in most cases a node will not have enough
372	   information to completely and accurately identify such a path.
373	   Rather, a node must associate a PMTU value with some local
374	   representation of a path.  It is left to the implementation to select
375	   the local representation of a path.

377	   In the case of a multicast destination address, copies of a packet
378	   may traverse many different paths to reach many different nodes.  The
379	   local representation of the "path" to a multicast destination must in
380	   fact represent a potentially large set of paths.

382	   Minimally, an implementation could maintain a single PMTU value to be
383	   used for all packets originated from the node.  This PMTU value would
384	   be the minimum PMTU learned across the set of all paths in use by the
385	   node.  This approach is likely to result in the use of smaller
386	   packets than is necessary for many paths.

388	   An implementation could use the destination address as the local
389	   representation of a path.  The PMTU value associated with a
390	   destination would be the minimum PMTU learned across the set of all
391	   paths in use to that destination.  The set of paths in use to a
392	   particular destination is expected to be small, in many cases
393	   consisting of a single path.  This approach will result in the use of
394	   optimally sized packets on a per-destination basis.  This approach
395	   integrates nicely with the conceptual model of a host as described in
396	   [ND]: a PMTU value could be stored with the corresponding entry in
397	   the destination cache.

399	   If IPv6 flows [IPv6-SPEC] are in use, an implementation could use the
400	   flow id as the local representation of a path.  Packets sent to a
401	   particular destination but belonging to different flows may use
402	   different paths, with the choice of path depending on the flow id.
403	   This approach will result in the use of optimally sized packets on a
404	   per-flow basis, providing finer granularity than PMTU values
405	   maintained on a per-destination basis.

407	   For source routed packets (i.e. packets containing an IPv6 Routing
408	   header [IPv6-SPEC]), the source route may further qualify the local
409	   representation of a path.  In particular, a packet containing a type
410	   0 Routing header in which all bits in the Strict/Loose Bit Map are
411	   equal to 1 contains a complete path specification.  An implementation
412	   could use source route information in the local representation of a
413	   path.

415	   Note: Some paths may be further distinguished by different security
416	   classifications.  The details of such classifications are beyond the
417	   scope of this memo.    @@@ this should be in scope

419	5.4. Probing method using TCP

421	   A new candidate MPS is tested by sending one "probe segment", which
422	   is larger than the current MPS.  We present here two possible probing
423	   methods for TCP.

425	   In the first method, after a probe segment has been sent (of size
426	   candidate MPS), the subsequent segment(s) may be sent as though the
427	   probe segment was not over sized.  Thus if the probe segment is lost,
428	   it will leave a gap in the sequence space that is exactly one current
429	   MPS minus the TCP header size.  We refer to this potential hole as
430	   the probe gap.  Note that the length of the probe segment is
431	   determined by the candidate MPS under consideration, but the length
432	   of the probe gap by the current MPS.  If the probe segment is lost,
433	   this gap can be filled by a single retransmitted segment.

435	   This method will create duplicate acknowledgements if the probe is
436	   successful.  The sender must be capable of dealing with these
437	   expected duplicate acknowledgements in a manner which will not cause
438	   unnecessary retransmission or congestion window reduction.

440	   In the second method, after a probe segment has been sent, subsequent
441	   segments are sent in a non-overlapping manner.  If the probe segment
442	   is lost, it will leave a gap which will require retransmission of
443	   multiple segment to fill.

445	   The probe is completed when the acknowledgment sequence advances past
446	   the probe gap.  If, when the probe is complete, the probe gap was not
447	   retransmitted, the probe was successful.  If the probe gap was
448	   retransmitted and there were no other retransmissions, the candidate
449	   MPS failed.  If there were any other retransmissions the probe was
450	   inconclusive.

452	   If the probe was successful, the current MPS is updated to the
453	   candidate MPS.  @@@ add robustness language re: more losses

455	   If the probe failed or was inconclusive the probe countdown is set to
456	   COUNTDOWN_SCALE times the square of the current window size in
457	   packets.

459	   If a Packet Too Big message is received, it can be is used to compute
460	   an MPS limit by deducting the IP header size from the MTU reported in
461	   the ICMP message.  If the MPS limit is between the current MPS and
462	   candidate MPS, the current MPS is updated from the MPS limit,
463	   otherwise the message is ignored.  If the current MPS is updated,
464	   then the probe strategy is forced into the Monitor state described
465	   below.

467	5.5. Probing method using SCTP

469	   @@@@ to be written

471	5.6. General probing methods

473	   @@@@ to be written

475	5.7. Probe strategy

477	   The probe strategy described here is a recommended baseline
478	   algorithm.  It is not presented in formal standards language because
479	   the probe strategy can include heuristics to help select an optimal
480	   MSS for a given path.  As a consequence there is opportunity for
481	   future improvements to this algorithms.

483	   The probing strategy has three major states: Search, Monitor and
484	   Suspend.  In the Search state, it sequentially searches for the
485	   largest MSS that the path can support.  Once the appropriate MPS has
486	   been discovered, the probing algorithm enters the Monitor state where
487	   it probes infrequently to detect if the path MPS has become larger.

489	   If the MPS probing persistently fails it may be desirable to suspend
490	   MPS probing and heuristically select one of the common default MSSs:
491	   576, 1240, or 1460 Bytes.

493	   5.7.1. Search

495	   The recommended search strategy is a multi-phase scan: First, a
496	   coarse scan for the approximate MTU using factor of 2 steps starting
497	   at 1024 Bytes until a probe fails, followed by successively finer
498	   scans between the largest previously successful and unsuccessful
499	   probes.  The TCP should use its best knowledge of the lower layer
500	   header sizes to appropriately determine the MPS from the MTUs listed
501	   in the table below.

503	          Table 1: Recommended MTU scanning sequence
504	          (Coarse scan down column 1, fine scan across each row)
505	          512, [Use only after repeated timeouts]
506	          1024,  1492, 1500, 2002
507	          2048
508	          4096, 4352
509	          8192, 9000
510	          16384, 17914
511	          32768
512	          64512
513	          ((Additional values needed))

515	   During the scan it is recommended that the MPS not be raised if cwnd
516	   is too small as determined by a heuristic.  The recommended heuristic
517	   is that the MPS is only raised when the cwnd is larger than 20
518	   segments.  @@@ This may be too high.

520	   5.7.2. Monitor

522	   Once the scan has found an appropriate MPS, the probe strategy enters
523	   the Monitor state, where it re-probes the most recent failed MTU,
524	   once every MONITOR_INTERVAL seconds.  If the probe fails, it remains
525	   in the Monitor state.  If it succeeds, it enters the scanning state.

527	   If the network becomes too congested during either the Search or the
528	   Monitor states, it is recommended that the MPS be reduced to a
529	   smaller size as determined by a heuristic.  The recommended heuristic
530	   is to reduce the MSS if ssthresh is reduced to 5 segments or smaller.
531	   The recommended reduction is to the next smaller coarse step in Table
532	   1.

534	   When there are repeated timeouts (MAX_TIMO or more retransmissions,
535	   without any received ACKs), it is presumed that the connection was
536	   re-routed onto a link with a smaller MSS, and that ICMP messages are
537	   not being delivered.  The MSS probing algorithms is reset by pulling
538	   back the MSS to 1024 Bytes, rescaling the congestion control
539	   variables and reentering the Search state.

541	   5.7.3. Suspend

543	   If there is a timeout, and cwnd prior to the timeout was smaller than
544	   6 packets, then the probe strategy can enter the Suspend state and
545	   set the MSS to 512 or 1240 Bytes.  This has the effect of reducing
546	   the minimum data rate that TCP can stably manage.

548	5.8.  Processing Packet Too Big messages

550	   @@@ Add language re: optional processing
551	   When a Packet Too Big message is received, the node determines which
552	   path the message applies to based on the contents of the Packet Too
553	   Big message.  For example, if the destination address is used as the
554	   local representation of a path, the destination address from the
555	   original packet would be used to determine which path the message
556	   applies to.

558	      Note: if the original packet contained a IPv6 Routing header, the
559	      Routing header should be used to determine the location of the
560	      destination address within the original packet.  If Segments Left
561	      is equal to zero, the destination address is in the Destination
562	      Address field in the IPv6 header.  If Segments Left is greater
563	      than zero, the destination address is the last address
564	      (Address[n]) in the Routing header.

566	      If the original packet contained a IPv4 Source Route Option .....
567	      @@@@ write

569	   The node then uses the value in the MTU field in the Packet Too Big
570	   message as a tentative PMTU value, and compares the tentative PMTU to
571	   the existing PMTU.  If the tentative PMTU is less than the existing
572	   PMTU estimate, the tentative PMTU replaces the existing PMTU as the
573	   PMTU value for the path.

575	   The packetization layers must be notified about decreases in the
576	   PMTU.  Any packetization layer instance (for example, a TCP
577	   connection) that is actively using the path must be notified if the
578	   PMTU estimate is decreased.

580	      Note: even if the Packet Too Big message contains an Original
581	      Packet Header that refers to a UDP packet, the TCP layer must be
582	      notified if any of its connections use the given path.

584	   Also, the instance that sent the packet that elicited the Packet Too
585	   Big message should be notified that its packet has been dropped, even
586	   if the PMTU estimate has not changed, so that it may retransmit the
587	   dropped data.

589	      Note: An implementation can avoid the use of an asynchronous
590	      notification mechanism for PMTU decreases by postponing
591	      notification until the next attempt to send a packet larger than
592	      the PMTU estimate.  In this approach, when an attempt is made to
593	      SEND a packet that is larger than the PMTU estimate, the SEND
594	      function should fail and return a suitable error indication.  This
595	      approach may be more suitable to a connectionless packetization
596	      layer (such as one using UDP), which (in some implementations) may
597	      be hard to "notify" from the ICMP layer.  In this case, the normal
598	      timeout-based retransmission mechanisms would be used to recover
599	      from the dropped packets.    @@@@ why "SEND"?

601	   It is important to understand that the notification of the
602	   packetization layer instances using the path about the change in the
603	   PMTU is distinct from the notification of a specific instance that a
604	   packet has been dropped.  The latter should be done as soon as
605	   practical (i.e., asynchronously from the point of view of the
606	   packetization layer instance), while the former may be delayed until
607	   a packetization layer instance wants to create a packet.
608	   Retransmission should be done for only those packets that are known
609	   to be dropped, as indicated by a Packet Too Big message.

611	5.9. Purging stale PMTU information

613	   @@@ update

615	   Internetwork topology is dynamic; routes change over time.  While the
616	   local representation of a path may remain constant, the actual
617	   path(s) in use may change.  Thus, PMTU information cached by a node
618	   can become stale.

620	   If the stale PMTU value is too large, this will be discovered almost
621	   immediately once a large enough packet is sent on the path.  No such
622	   mechanism exists for realizing that a stale PMTU value is too small,
623	   so an implementation should "age" cached values.  When a PMTU value
624	   has not been decreased for a while (on the order of 10 minutes), the
625	   PMTU estimate should be set to the MTU of the first-hop link, and the
626	   packetization layers should be notified of the change.  This will
627	   cause the complete Path MTU Discovery process to take place again.

629	      Note: an implementation should provide a means for changing the
630	      timeout duration, including setting it to "infinity".  For
631	      example, nodes attached to an FDDI link which is then attached to
632	      the rest of the Internet via a small MTU serial line are never
633	      going to discover a new non-local PMTU, so they should not have to
634	      put up with dropped packets every 10 minutes.

636	   An upper layer must not retransmit data in response to an increase in
637	   the PMTU estimate, since this increase never comes in response to an
638	   indication of a dropped packet.

640	   One approach to implementing PMTU aging is to associate a timestamp
641	   field with a PMTU value.  This field is initialized to a "reserved"
642	   value, indicating that the PMTU is equal to the MTU of the first hop
643	   link.  Whenever the PMTU is decreased in response to a Packet Too Big
644	   message, the timestamp is set to the current time.

646	   Once a minute, a timer-driven procedure runs through all cached PMTU
647	   values, and for each PMTU whose timestamp is not "reserved" and is
648	   older than the timeout interval:

650	   - The PMTU estimate is set to the MTU of the first hop link.

652	   - The timestamp is set to the "reserved" value.

654	   - Packetization layers using this path are notified of the increase.

656	5.10. TCP layer actions

658	   The TCP layer must track the PMTU for the path(s) in use by a
659	   connection; it should not send segments that would result in packets
660	   larger than the PMTU except to probe the path MTU.  A simple
661	   implementation could ask the IP layer for this value each time it
662	   created a new segment, but this could be inefficient.  Moreover, TCP
663	   implementations that follow the "slow-start" congestion-avoidance
664	   algorithm [CONG] typically calculate and cache several other values
665	   derived from the PMTU.  It may be simpler to receive asynchronous
666	   notification when the PMTU changes, so that these variables may be
667	   updated.

669	   A TCP implementation must also store the MSS value received from its
670	   peer, and must not send any segment larger than this MSS, regardless
671	   of the PMTU.  In 4.xBSD-derived implementations, this may require
672	   adding an additional field to the TCP state record.

674	   The value sent in the TCP MSS option is independent of the PMTU.
675	   This MSS option value is used by the other end of the connection,
676	   which may be using an unrelated PMTU value.  See [IPv6-SPEC] sections
677	   "Packet Size Issues" and "Maximum Upper-Layer Payload Size" for
678	   information on selecting a value for the TCP MSS option.  When a
679	   Packet Too Big message is received, it implies that a packet was
680	   dropped by the node that sent the ICMP message.  It is sufficient to
681	   treat this as any other dropped segment, and wait until the
682	   retransmission timer expires to cause retransmission of the segment.
683	   If the Path MTU Discovery process requires several steps to find the
684	   PMTU of the full path, this could delay the connection by many round-
685	   trip times.

687	   @@@ Add IPv4 text

689	   [@@@deprecate?  Alternatively, the retransmission could be done in
690	   immediate response to a notification that the Path MTU has changed,
691	   but only for the specific connection specified by the Packet Too Big
692	   message.  The packet size used in the retransmission should be no
693	   larger than the new PMTU. ]
694	      Note: A packetization layer must not retransmit in response to
695	      every Packet Too Big message, since a burst of several oversized
696	      segments will give rise to several such messages and hence several
697	      retransmissions of the same data.  If the new estimated PMTU is
698	      still wrong, the process repeats, and there is an exponential
699	      growth in the number of superfluous segments sent.

701	      This means that the TCP layer must be able to recognize when a
702	      Packet Too Big notification actually decreases the PMTU that it
703	      has already used to send a packet on the given connection, and
704	      should ignore any other notifications.

706	   Many TCP implementations incorporate "congestion avoidance" and
707	   "slow-start" algorithms to improve performance [CONG].  Unlike a
708	   retransmission caused by a TCP retransmission timeout, a
709	   retransmission caused by a Packet Too Big message should not change
710	   the congestion window.  It should, however, trigger the slow-start
711	   mechanism (i.e., only one segment should be retransmitted until
712	   acknowledgments begin to arrive again).

714	   TCP performance can be reduced if the sender���s maximum window size is
715	   not an exact multiple of the segment size in use (this is not the
716	   congestion window size, which is always a multiple of the segment
717	   size).  In many systems (such as those derived from 4.2BSD), the
718	   segment size is often set to 1024 octets, and the maximum window size
719	   (the "send space") is usually a multiple of 1024 octets, so the
720	   proper relationship holds by default.  If Path MTU Discovery is used,
721	   however, the segment size may not be a sub-multiple of the send
722	   space, and it may change during a connection; this means that the TCP
723	   layer may need to change the transmission window size when Path MTU
724	   Discovery changes the PMTU value.  The maximum window size should be
725	   set to the greatest multiple of the segment size that is less than or
726	   equal to the sender���s buffer space size.

728	5.11.  Issues for other transport protocols

730	   Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
731	   repacketize when doing a retransmission.  That is, once an attempt is
732	   made to transmit a segment of a certain size, the transport cannot
733	   split the contents of the segment into smaller segments for
734	   retransmission.  In such a case, the original segment can be
735	   fragmented by the IP layer during retransmission.  Subsequent
736	   segments, when transmitted for the first time, should be no larger
737	   than allowed by the Path MTU.

739	   The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
740	   protocol [RPC] that, when used over UDP, in many cases will generate
741	   payloads that must be fragmented even for the first-hop link.  This
742	   might improve performance in certain cases, but it is known to cause
743	   reliability and performance problems, especially when the client and
744	   server are separated by routers.

746	   It is recommended that NFS implementations use Path MTU Discovery
747	   whenever routers are involved.  Most NFS implementations allow the
748	   RPC datagram size to be changed at mount-time (indirectly, by
749	   changing the effective file system block size), but might require
750	   some modification to support changes later on.

752	   Also, since a single NFS operation cannot be split across several UDP
753	   datagrams, certain operations (primarily, those operating on file
754	   names and directories) require a minimum payload size that if sent in
755	   a single packet would exceed the PMTU.  NFS implementations should
756	   not reduce the payload size below this threshold, even if Path MTU
757	   Discovery suggests a lower value.  In this case the payload will be
758	   fragmented by the IP layer.

760	5.12.  Issues for tunnels

762	   @@@ to be written

764	   5.13.  Diagnostic tools

766	   All implementations MUST include a mechanism to implement diagnostic
767	   tools that do not rely on the operating systems implementation of
768	   path MTU discovery.   This requires an mechanism where an application
769	   can send oversized packets that are not subjected to the operating
770	   systems notion of the current path MTU, up to the physical MTU limit
771	   as supported by the network interface, as well as a mechanism to
772	   collect any Packet Too Big Messages.

774	5.14.  Management interface

776	   It is suggested that an implementation provide a way for a system
777	   utility program to:

779	   - Specify that Path MTU Discovery not be done on a given path.

781	   - Change the PMTU value associated with a given path.

783	   - Global controls on ICMP processing

785	   - Per connection or per application controls on ICMP processing

787	   The former can be accomplished by associating a flag with the path;
788	   when a packet is sent on a path with this flag set, the IP layer does
789	   not send packets larger than the IPv6 minimum link MTU.

791	   These features might be used to work around an anomalous situation,
792	   or by a routing protocol implementation that is able to obtain Path
793	   MTU values.

795	   The implementation should also provide a way to change the timeout
796	   period for aging stale PMTU information.

798	6. Normative references

800	 [RFC1191]  Path MTU discovery. J.C. Mogul, S.E. Deering. Nov-01-1990.
801	            (Format: TXT=47936 bytes) (Obsoletes RFC1063) (Status: DRAFT
802	            STANDARD)

804	 [RFC1981]  Path MTU Discovery for IP version 6. J. McCann, S. Deering,
805	            J. Mogul. August 1996. (Status: PROPOSED STANDARD)

807	 [RFC2119]  Key words for use in RFCs to Indicate Requirement Levels. S.
808	            Bradner.  March 1997. (Status: BEST CURRENT PRACTICE)

810	7. Informative references

812	 [RFC1063]  IP MTU discovery options. J.C. Mogul, C.A. Kent, C. Par-
813	            tridge, K. McCloghrie. Jul-01-1988. (Obsoleted by RFC1191)

815	 [RFC1435]  IESG Advice from Experience with Path MTU Discovery. S.
816	            Knowles. March 1993. (Format: TXT=2708 bytes) (Status:
817	            INFORMATIONAL)

819	 [RFC1626]  Default IP MTU for use over ATM AAL5. R. Atkinson. May 1994.
820	            (Status: PROPOSED STANDARD)

822	 [RFC1791]  TCP And UDP Over IPX Networks With Fixed Path MTU. T. Sung.
823	            April 1995. (Status: EXPERIMENTAL)

825	 [RFC2923]  TCP Problems with Path MTU Discovery. K. Lahey. September
826	            2000. (Status: INFORMATIONAL)

828	8. Security considerations

830	   Since the MTU reported in the ICMP messages is constrained to be
831	   between the old MTU and the candidate MTU, this algorithm is more
832	   difficult to attack through fraudulent ICMP messages.

834	   Furthermore, since this algorithm can function properly without ICMP
835	   messages that part of the algorithm can be disabled for additional
836	   robustness in hostile environments.

838	9. IANA considerations

840	10. Contributors

842	11. Acknowledgements

844	   Matt Mathis and John Heffner are supported by a grant from Cisco Sys-
845	   tems, Inc.

847	12. Authors��� addresses

849	   Please send comments and suggestions to mtu@psc.edu.

851	   Matt Mathis and John Heffner
852	   Pittsburgh Supercomputing Center
853	   4400 Fifth Ave.
854	   Pittsburgh, PA 15213
855	   mathis@psc.edu
856	   jheffner@psc.edu

858	   Kevin Lahey
859	   Freelance
860	   kml@patheticgeek.net

862	13. Intellectual Property

864	   The IETF takes no position regarding the validity or scope of any
865	   intellectual property or other rights that might be claimed to  per-
866	   tain to the implementation or use of the technology described in this
867	   document or the extent to which any license under such rights might
868	   or might not be available; neither does it represent that it has made
869	   any effort to identify any such rights.  Information on the IETF���s
870	   procedures with respect to rights in standards-track and standards-
871	   related documentation can be found in BCP-11.  Copies of claims of
872	   rights made available for publication and any assurances of licenses
873	   to be made available, or the result of an attempt made to obtain a
874	   general license or permission for the use of such proprietary rights
875	   by implementers or users of this specification can be obtained from
876	   the IETF Secretariat.

878	   The IETF invites any interested party to bring to its attention any
879	   copyrights, patents or patent applications, or other proprietary
880	   rights which may cover technology that may be required to practice
881	   this standard.  Please address the information to the IETF Executive
882	   Director.

884	14. Full copyright statement

886	   Copyright (C) The Internet Society Oct 19, 2003. All Rights Reserved.

888	   This document and translations of it may be copied and furnished to
889	   others, and derivative works that comment on or otherwise explain it
890	   or assist in its implementation may be prepared, copied, published
891	   and distributed, in whole or in part, without restriction of any
892	   kind, provided that the above copyright notice and this paragraph are
893	   included on all such copies and derivative works.  However, this doc-
894	   ument itself may not be modified in any way, such as by removing the
895	   copyright notice or references to the Internet Society or other
896	   Internet organizations, except as needed for the  purpose of develop-
897	   ing Internet standards in which case the procedures for copyrights
898	   defined in the Internet Standards process must be followed, or as
899	   required to translate it into languages other than English.

901	   The limited permissions granted above are perpetual and will not be
902	   revoked by the Internet Society or its successors or assigns.