idnits 2.17.1 

draft-ietf-ipngwg-pmtuv6-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 197: '...et Too Big message, it MUST reduce its...'
     RFC 2119 keyword, line 204: '...oo Big message, a node MUST attempt to...'
     RFC 2119 keyword, line 205: '...ges in the near future.  The node MUST...'
     RFC 2119 keyword, line 210: '...   node MUST force the Path MTU Discov...'
     RFC 2119 keyword, line 212: '...th MTU Discovery MUST detect decreases...'
     (5 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'CONG'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'FRAG'

  ** Obsolete normative reference: RFC 1885 (ref. 'ICMPv6') (Obsoleted by RFC
     2463)

  ** Obsolete normative reference: RFC 1883 (ref. 'IPv6-SPEC') (Obsoleted by
     RFC 2460)

  ** Downref: Normative reference to an Unknown state RFC: RFC  905 (ref.
     'ISOTP')

  == Outdated reference: A later version (-06) exists of
     draft-ietf-ipngwg-discovery-04

  ** Downref: Normative reference to an Informational RFC: RFC 1057 (ref.
     'RPC')


     Summary: 13 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET-DRAFT                  J. McCann, Digital Equipment Corporation
2	February 21, 1996                                 S. Deering, Xerox PARC
3	                                 J. Mogul, Digital Equipment Corporation

5	                  Path MTU Discovery for IP version 6

7	                    draft-ietf-ipngwg-pmtuv6-01.txt

9	Abstract

11	   This document describes Path MTU Discovery for IP version 6.  It is
12	   largely derived from RFC-1191, which describes Path MTU Discovery for
13	   IP version 4.

15	Status of this Memo

17	   This document is an Internet-Draft.  Internet-Drafts are working
18	   documents of the Internet Engineering Task Force (IETF), its areas,
19	   and its working groups.  Note that other groups may also distribute
20	   working documents as Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as ``work in progress.''

27	   To learn the current status of any Internet-Draft, please check the
28	   ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
29	   Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
30	   munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
31	   ftp.isi.edu (US West Coast).

33	   Distribution of this document is unlimited.

35	Expiration

37	   August 21, 1996

39	Contents

41	   Abstract........................................................1

43	   Status of this Memo.............................................1

45	   Contents........................................................2

47	   1. Introduction.................................................3

49	   2. Terminology..................................................3

51	   3. Protocol overview............................................4

53	   4. Protocol Requirements........................................5

55	   5. Implementation suggestions...................................6

57	   5.1. Layering...................................................6

59	   5.2. Storing PMTU information...................................7

61	   5.3. Purging stale PMTU information.............................9

63	   5.4. TCP layer actions.........................................10

65	   5.5. Issues for other transport protocols......................11

67	   5.6. Management interface......................................12

69	   6. Security considerations.....................................12

71	   Acknowledgements...............................................13

73	   Appendix A - Comparison to RFC 1191............................14

75	   References.....................................................15

77	   Authors' Addresses.............................................16

79	1. Introduction

81	   When one IPv6 node has a large amount of data to send to another
82	   node, the data is transmitted in a series of IPv6 packets.  It is
83	   usually preferable that these packets be of the largest size that can
84	   successfully traverse the path from the source node to the
85	   destination node.  This packet size is referred to as the Path MTU
86	   (PMTU), and it is equal to the minimum link MTU of all the links in a
87	   path.  IPv6 defines a standard mechanism for a node to discover the
88	   PMTU of an arbitrary path.

90	   Nodes not implementing Path MTU Discovery use the IPv6 minimum link
91	   MTU defined in [IPv6-SPEC] as the maximum packet size.  In most
92	   cases, this will result in the use of smaller packets than necessary,
93	   because most paths have a PMTU greater than the IPv6 minimum link
94	   MTU.  A node sending packets much smaller than the Path MTU allows is
95	   wasting network resources and probably getting suboptimal throughput.

97	2. Terminology

99	   node        - a device that implements IPv6.

101	   router      - a node that forwards IPv6 packets not explicitly
102	                 addressed to itself.

104	   host        - any node that is not a router.

106	   upper layer - a protocol layer immediately above IPv6.  Examples are
107	                 transport protocols such as TCP and UDP, control
108	                 protocols such as ICMP, routing protocols such as OSPF,
109	                 and internet or lower-layer protocols being "tunneled"
110	                 over (i.e., encapsulated in) IPv6 such as IPX,
111	                 AppleTalk, or IPv6 itself.

113	   link        - a communication facility or medium over which nodes can
114	                 communicate at the link layer, i.e., the layer
115	                 immediately below IPv6.  Examples are Ethernets (simple
116	                 or bridged); PPP links; X.25, Frame Relay, or ATM
117	                 networks; and internet (or higher) layer "tunnels",
118	                 such as tunnels over IPv4 or IPv6 itself.

120	   interface   - a node's attachment to a link.

122	   address     - an IPv6-layer identifier for an interface or a set of
123	                 interfaces.

125	   packet      - an IPv6 header plus payload.

127	   link MTU    - the maximum transmission unit, i.e., maximum packet
128	                 size in octets, that can be conveyed in one piece over
129	                 a link.

131	   path        - the set of links traversed by a packet between a source
132	                 node and a destination node

134	   path MTU    - the minimum link MTU of all the links in a path between
135	                 a source node and a destination node.

137	   PMTU        - path MTU

139	   Path MTU
140	   Discovery   - process by which a node learns the PMTU of a path

142	   flow        - a sequence of packets sent from a particular source
143	                 to a particular (unicast or multicast) destination for
144	                 which the source desires special handling by the
145	                 intervening routers.

147	   flow id     - a combination of a source address and a non-zero
148	                 flow label.

150	3. Protocol overview

152	   This memo describes a technique to dynamically discover the PMTU of a
153	   path.  The basic idea is that a source node initially assumes that
154	   the PMTU of a path is the (known) MTU of the first hop in the path.
155	   If any of the packets sent on that path are too large to be forwarded
156	   by some node along the path, that node will discard them and return
157	   ICMPv6 Packet Too Big messages [ICMPv6].  Upon receipt of such a
158	   message, the source node reduces its assumed PMTU for the path based
159	   on the MTU of the constricting hop as reported in the Packet Too Big
160	   message.

162	   The Path MTU Discovery process ends when the node's estimate of the
163	   PMTU is less than or equal to the actual PMTU.  Note that several
164	   iterations of the packet-sent/Packet-Too-Big-message-received cycle
165	   may occur before the Path MTU Discovery process ends, as there may be
166	   links with smaller MTUs further along the path.

168	   Alternatively, the node may elect to end the discovery process by
169	   ceasing to send packets larger than the IPv6 minimum link MTU.

171	   The PMTU of a path may change over time, due to changes in the
172	   routing topology.  Reductions of the PMTU are detected by Packet Too
173	   Big messages.  To detect increases in a path's PMTU, a node
174	   periodically increases its assumed PMTU.  This will almost always
175	   result in packets being discarded and Packet Too Big messages being
176	   generated, because in most cases the PMTU of the path will not have
177	   changed.  Therefore, attempts to detect increases in a path's PMTU
178	   should be done infrequently.

180	   Path MTU Discovery supports multicast as well as unicast
181	   destinations.  In the case of a multicast destination, copies of a
182	   packet may traverse many different paths to many different nodes.
183	   Each path may have a different PMTU, and a single multicast packet
184	   may result in multiple Packet Too Big messages, each reporting a
185	   different next-hop MTU.  The minimum PMTU value across the set of
186	   paths in use determines the size of subsequent packets sent to the
187	   multicast destination.

189	   Note that Path MTU Discovery must be performed even in cases where a
190	   node "thinks" a destination is attached to the same link as itself.
191	   In a situation such as when a neighboring router acts as proxy [ND]
192	   for some destination, the destination can to appear to be directly
193	   connected but is in fact more than one hop away.

195	4. Protocol Requirements

197	   When a node receives a Packet Too Big message, it MUST reduce its
198	   estimate of the PMTU for the relevant path, based on the value of the
199	   MTU field in the message.  The precise behavior of a node in this
200	   circumstance is not specified, since different applications may have
201	   different requirements, and since different implementation
202	   architectures may favor different strategies.

204	   After receiving a Packet Too Big message, a node MUST attempt to
205	   avoid eliciting more such messages in the near future.  The node MUST
206	   reduce the size of the packets it is sending along the path.  Using a
207	   PMTU estimate larger than the IPv6 minimum link MTU may continue to
208	   elicit Packet Too Big messages.  Since each of these messages (and
209	   the dropped packets they respond to) consume network resources, the
210	   node MUST force the Path MTU Discovery process to end.

212	   Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast
213	   as possible.  Nodes MAY detect increases in PMTU, but because doing
214	   so requires sending packets larger than the current estimated PMTU,
215	   and because the likelihood is that the PMTU will not have increased,
216	   this MUST be done at infrequent intervals.  An attempt to detect an
217	   increase (by sending a packet larger than the current estimate) MUST
218	   NOT be done less than 5 minutes after a Packet Too Big message has
219	   been received for the given path.  The recommended setting for this
220	   timer is twice its minimum value (10 minutes).

222	   A node MUST NOT reduce its estimate of the Path MTU below the IPv6
223	   minimum link MTU.

225	      Note: A node may receive a Packet Too Big message reporting a
226	      next-hop MTU that is less than the IPv6 minimum link MTU.  In that
227	      case, the node is not required to reduce the size of subsequent
228	      packets sent on the path to less than the IPv6 minimun link MTU,
229	      but rather must include a Fragment header in those packets [IPv6-
230	      SPEC].

232	   A node MUST NOT increase its estimate of the Path MTU in response to
233	   the contents of a Packet Too Big message.  A message purporting to
234	   announce an increase in the Path MTU might be a stale packet that has
235	   been floating around in the network, a false packet injected as part
236	   of a denial-of-service attack, or the result of having multiple paths
237	   to the destination, each with a different PMTU.

239	5. Implementation suggestions

241	   This section discusses a number of issues related to the
242	   implementation of Path MTU Discovery.  This is not a specification,
243	   but rather a set of notes provided as an aid for implementors.

245	   The issues include:

247	   - What layer or layers implement Path MTU Discovery?

249	   - How is the PMTU information cached?

251	   - How is stale PMTU information removed?

253	   - What must transport and higher layers do?

255	5.1. Layering

257	   In the IP architecture, the choice of what size packet to send is
258	   made by a protocol at a layer above IP.  This memo refers to such a
259	   protocol as a "packetization protocol".  Packetization protocols are
260	   usually transport protocols (for example, TCP) but can also be
261	   higher-layer protocols (for example, protocols built on top of UDP).

263	   Implementing Path MTU Discovery in the packetization layers
264	   simplifies some of the inter-layer issues, but has several drawbacks:
265	   the implementation may have to be redone for each packetization
266	   protocol, it becomes hard to share PMTU information between different
267	   packetization layers, and the connection-oriented state maintained by
268	   some packetization layers may not easily extend to save PMTU
269	   information for long periods.

271	   It is therefore suggested that the IP layer store PMTU information
272	   and that the ICMP layer process received Packet Too Big messages.
273	   The packetization layers may respond to changes in the PMTU, by
274	   changing the size of the messages they send.  To support this
275	   layering, packetization layers require a way to learn of changes in
276	   the value of MMS_S, the "maximum send transport-message size".  The
277	   MMS_S is derived from the Path MTU by subtracting the size of the
278	   IPv6 header plus space reserved by the IP layer for additional
279	   headers (if any).

281	   It is possible that a packetization layer, perhaps a UDP application
282	   outside the kernel, is unable to change the size of messages it
283	   sends.  This may result in a packet size that exceeds the Path MTU.
284	   To accommodate such situations, IPv6 defines a mechanism that allows
285	   large payloads to be divided into fragments, with each fragment sent
286	   in a separate packet (see [IPv6-SPEC] section "Fragment Header").
287	   However, packetization layers are encouraged to avoid sending
288	   messages that will require fragmentation (for the case against
289	   fragmentation, see [FRAG]).

291	5.2. Storing PMTU information

293	   Ideally, a PMTU value should be associated with a specific path
294	   traversed by packets exchanged between the source and destination
295	   nodes.  However, in most cases a node will not have enough
296	   information to completely and accurately identify such a path.
297	   Rather, a node must associate a PMTU value with some local
298	   representation of a path.  It is left to the implementation to select
299	   the local representation of a path.

301	   In the case of a multicast destination address, copies of a packet
302	   may traverse many different paths to reach many different nodes.  The
303	   local representation of the "path" to a multicast destination must in
304	   fact represent a potentially large set of paths.

306	   Minimally, an implementation could maintain a single PMTU value to be
307	   used for all packets originated from the node.  This PMTU value would
308	   be the minimum PMTU learned across the set of all paths in use by the
309	   node.  This approach is likely to result in the use of smaller
310	   packets than is necessary for many paths.

312	   An implementation could use the destination address as the local
313	   representation of a path.  The PMTU value associated with a
314	   destination would be the minimum PMTU learned across the set of all
315	   paths in use to that destination.  The set of paths in use to a
316	   particular destination is expected to be small, in many cases
317	   consisting of a single path.  This approach will result in the use of
318	   optimally sized packets on a per-destination basis.  This approach
319	   integrates nicely with the conceptual model of a host as described in
320	   [ND]: a PMTU value could be stored with the corresponding entry in
321	   the destination cache.

323	   If flows [IPv6-SPEC] are in use, an implementation could use the flow
324	   id as the local representation of a path.  Packets sent to a
325	   particular destination but belonging to different flows may use
326	   different paths, with the choice of path depending on the flow id.
327	   This approach will result in the use of optimally sized packets on a
328	   per-flow basis, providing finer granularity than PMTU values
329	   maintained on a per-destination basis.

331	   For source routed packets (i.e. packets containing an IPv6 Routing
332	   header [IPv6-SPEC]), the source route may further qualify the local
333	   representation of a path.  In particular, a packet containing a type
334	   0 Routing header in which all bits in the Strict/Loose Bit Map are
335	   equal to 1 contains a complete path specification.  An implementation
336	   could use source route information in the local representation of a
337	   path.

339	      Note: Some paths may be further distinguished by different
340	      security classifications.  The details of such classifications are
341	      beyond the scope of this memo.

343	   Initially, the PMTU value for a path is assumed to be the (known) MTU
344	   of the first-hop link.

346	   When a Packet Too Big message is received, the node determines which
347	   path the message applies to based on the contents of the Packet Too
348	   Big message.  For example, if the destination address is used as the
349	   local representation of a path, the destination address from the
350	   original packet would be used to determine which path the message
351	   applies to.

353	      Note: if the original packet contained a Routing header, the
354	      Routing header should be used to determine the location of the
355	      destination address within the original packet.  If Segments Left
356	      is equal to zero, the destination address is in the Destination
357	      Address field in the IPv6 header.  If Segments Left is greater
358	      than zero, the destination address is the last address
359	      (Address[n]) in the Routing header.

361	   The node then uses the value in the MTU field in the Packet Too Big
362	   message as a tentative PMTU value, and compares the tentative PMTU to
363	   the existing PMTU.  If the tentative PMTU is less than the existing
364	   PMTU estimate, the tentative PMTU replaces the existing PMTU as the
365	   PMTU value for the path.

367	   The packetization layers must be notified about decreases in the
368	   PMTU.  Any packetization layer instance (for example, a TCP
369	   connection) that is actively using the path must be notified if the
370	   PMTU estimate is decreased.

372	      Note: even if the Packet Too Big message contains an Original
373	      Packet Header that refers to a UDP packet, the TCP layer must be
374	      notified if any of its connections use the given path.

376	   Also, the instance that sent the packet that elicited the Packet Too
377	   Big message should be notified that its packet has been dropped, even
378	   if the PMTU estimate has not changed, so that it may retransmit the
379	   dropped data.

381	      Note: An implementation can avoid the use of an asynchronous
382	      notification mechanism for PMTU decreases by postponing
383	      notification until the next attempt to send a packet larger than
384	      the PMTU estimate.  In this approach, when an attempt is made to
385	      SEND a packet that is larger than the PMTU estimate, the SEND
386	      function should fail and return a suitable error indication.  This
387	      approach may be more suitable to a connectionless packetization
388	      layer (such as one using UDP), which (in some implementations) may
389	      be hard to "notify" from the ICMP layer.  In this case, the normal
390	      timeout-based retransmission mechanisms would be used to recover
391	      from the dropped packets.

393	   It is important to understand that the notification of the
394	   packetization layer instances using the path about the change in the
395	   PMTU is distinct from the notification of a specific instance that a
396	   packet has been dropped.  The latter should be done as soon as
397	   practical (i.e., asynchronously from the point of view of the
398	   packetization layer instance), while the former may be delayed until
399	   a packetization layer instance wants to create a packet.
400	   Retransmission should be done for only for those packets that are
401	   known to be dropped, as indicated by a Packet Too Big message.

403	5.3. Purging stale PMTU information

405	   Internetwork topology is dynamic; routes change over time.  While the
406	   local representation of a path may remain constant, the actual
407	   path(s) in use may change.  Thus, PMTU information cached by a node
408	   can become stale.

410	   If the stale PMTU value is too large, this will be discovered almost
411	   immediately once a large enough packet is sent on the path.  No such
412	   mechanism exists for realizing that a stale PMTU value is too small,
413	   so an implementation should "age" cached values.  When a PMTU value
414	   has not been decreased for a while (on the order of 10 minutes), the
415	   PMTU estimate should be set to the MTU of the first-hop link, and the
416	   packetization layers should be notified of the change.  This will
417	   cause the complete Path MTU Discovery process to take place again.

419	      Note: an implementation should provide a means for changing the
420	      timeout duration, including setting it to "infinity".  For
421	      example, nodes attached to an FDDI link which is then attached to
422	      the rest of the Internet via a small MTU serial line are never
423	      going to discover a new non-local PMTU, so they should not have to
424	      put up with dropped packets every 10 minutes.

426	   An upper layer must not retransmit data in response to an increase in
427	   the PMTU estimate, since this increase never comes in response to an
428	   indication of a dropped packet.

430	   One approach to implementing PMTU aging is to associate a timestamp
431	   field with a PMTU value.  This field is initialized to a "reserved"
432	   value, indicating that the PMTU is equal to the MTU of the first hop
433	   link.  Whenever the PMTU is decreased in response to a Packet Too Big
434	   message, the timestamp is set to the current time.

436	   Once a minute, a timer-driven procedure runs through all cached PMTU
437	   values, and for each PMTU whose timestamp is not "reserved" and is
438	   older than the timeout interval:

440	   - The PMTU estimate is set to the MTU of the first hop link.

442	   - The timestamp is set to the "reserved" value.

444	   - Packetization layers using this path are notified of the increase.

446	5.4. TCP layer actions

448	   The TCP layer must track the PMTU for the path(s) in use by a
449	   connection; it should not send segments that would result in packets
450	   larger than the PMTU.  A simple implementation could ask the IP layer
451	   for this value each time it created a new segment, but this could be
452	   inefficient.  Moreover, TCP implementations that follow the "slow-
453	   start" congestion-avoidance algorithm [CONG] typically calculate and
454	   cache several other values derived from the PMTU.  It may be simpler
455	   to receive asynchronous notification when the PMTU changes, so that
456	   these variables may be updated.

458	   A TCP implementation must also store the MSS value received from its
459	   peer, and must not send any segment larger than this MSS, regardless
460	   of the PMTU.  In 4.xBSD-derived implementations, this may require
461	   adding an additional field to the TCP state record.

463	   The value sent in the TCP MSS option is independent of the PMTU.
464	   This MSS option value is used by the other end of the connection,
465	   which may be using an unrelated PMTU value.  See [IPv6-SPEC] sections
466	   "Packet Size Issues" and "Maximum Upper-Layer Payload Size" for
467	   information on selecting a value for the TCP MSS option.

469	   When a Packet Too Big message is received, it implies that a packet
470	   was dropped by the node that sent the ICMP message.  It is sufficient
471	   to treat this as any other dropped segment, and wait until the
472	   retransmission timer expires to cause retransmission of the segment.
473	   If the Path MTU Discovery process requires several steps to find the
474	   PMTU of the full path, this could delay the connection by many
475	   round-trip times.

477	   Alternatively, the retransmission could be done in immediate response
478	   to a notification that the Path MTU has changed, but only for the
479	   specific connection specified by the Packet Too Big message.  The
480	   packet size used in the retransmission should be no larger than the
481	   new PMTU.

483	      Note: A packetization layer must not retransmit in response to
484	      every Packet Too Big message, since a burst of several oversized
485	      segments will give rise to several such messages and hence several
486	      retransmissions of the same data.  If the new estimated PMTU is
487	      still wrong, the process repeats, and there is an exponential
488	      growth in the number of superfluous segments sent.

490	      This means that the TCP layer must be able to recognize when a
491	      Packet Too Big notification actually decreases the PMTU that it
492	      has already used to send a packet on the given connection, and
493	      should ignore any other notifications.

495	   Many TCP implementations incorporate "congestion avoidance" and
496	   "slow-start" algorithms to improve performance [CONG].  Unlike a
497	   retransmission caused by a TCP retransmission timeout, a
498	   retransmission caused by a Packet Too Big message should not change
499	   the congestion window.  It should, however, trigger the slow-start
500	   mechanism (i.e., only one segment should be retransmitted until
501	   acknowledgements begin to arrive again).

503	   TCP performance can be reduced if the sender's maximum window size is
504	   not an exact multiple of the segment size in use (this is not the
505	   congestion window size, which is always a multiple of the segment
506	   size).  In many systems (such as those derived from 4.2BSD), the
507	   segment size is often set to 1024 octets, and the maximum window size
508	   (the "send space") is usually a multiple of 1024 octets, so the
509	   proper relationship holds by default.  If Path MTU Discovery is used,
510	   however, the segment size may not be a submultiple of the send space,
511	   and it may change during a connection; this means that the TCP layer
512	   may need to change the transmission window size when Path MTU
513	   Discovery changes the PMTU value.  The maximum window size should be
514	   set to the greatest multiple of the segment size that is less than or
515	   equal to the sender's buffer space size.

517	5.5. Issues for other transport protocols

519	   Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to
520	   repacketize when doing a retransmission.  That is, once an attempt is
521	   made to transmit a segment of a certain size, the transport cannot
522	   split the contents of the segment into smaller segments for
523	   retransmission.  In such a case, the original segment can be
524	   fragmented by the IP layer during retransmission.  Subsequent
525	   segments, when transmitted for the first time, should be no larger
526	   than allowed by the Path MTU.

528	   The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
529	   protocol [RPC] that, when used over UDP, in many cases will generate
530	   payloads that must be fragmented even for the first-hop link.  This
531	   might improve performance in certain cases, but it is known to cause
532	   reliability and performance problems, especially when the client and
533	   server are separated by routers.

535	   It is recommended that NFS implementations use Path MTU Discovery
536	   whenever routers are involved.  Most NFS implementations allow the
537	   RPC datagram size to be changed at mount-time (indirectly, by
538	   changing the effective file system block size), but might require
539	   some modification to support changes later on.

541	   Also, since a single NFS operation cannot be split across several UDP
542	   datagrams, certain operations (primarily, those operating on file
543	   names and directories) require a minimum payload size that if sent in
544	   a single packet would exceed the PMTU.  NFS implementations should
545	   not reduce the payload size below this threshold, even if Path MTU
546	   Discovery suggests a lower value.  In this case the payload will be
547	   fragmented by the IP layer.

549	5.6. Management interface

551	   It is suggested that an implementation provide a way for a system
552	   utility program to:

554	   - Specify that Path MTU Discovery not be done on a given path.

556	   - Change the PMTU value associated with a given path.

558	   The former can be accomplished by associating a flag with the path;
559	   when a packet is sent on a path with this flag set, the IP layer does
560	   not send packets larger than the IPv6 minimum link MTU.

562	   These features might be used to work around an anomalous situation,
563	   or by a routing protocol implementation that is able to obtain Path
564	   MTU values.

566	   The implementation should also provide a way to change the timeout
567	   period for aging stale PMTU information.

569	6. Security considerations

571	   This Path MTU Discovery mechanism makes possible two denial-of-
572	   service attacks, both based on a malicious party sending false Packet
573	   Too Big messages to a node.

575	   In the first attack, the false message indicates a PMTU much smaller
576	   than reality.  This should not entirely stop data flow, since the
577	   victim node should never set its PMTU estimate below the IPv6 minimum
578	   link MTU.  It will, however, result in suboptimal performance.

580	   In the second attack, the false message indicates a PMTU larger than
581	   reality.  If believed, this could cause temporary blockage as the
582	   victim sends packets that will be dropped by some router.  Within one
583	   round-trip time, the node would discover its mistake (receiving
584	   Packet Too Big messages from that router), but frequent repetition of
585	   this attack could cause lots of packets to be dropped.  A node,
586	   however, should never raise its estimate of the PMTU based on a
587	   Packet Too Big message, so should not be vulnerable to this attack.

589	   A malicious party could also cause problems if it could stop a victim
590	   from receiving legitimate Packet Too Big messages, but in this case
591	   there are simpler denial-of-service attacks available.

593	Acknowledgements

595	   We would like to acknowledge the authors of and contributors to
596	   [RFC-1191], from which the majority of this document was derived.  We
597	   would also like to acknowledge the members of the IPng working group
598	   for their careful review and constructive criticisms.

600	Appendix A - Comparison to RFC 1191

602	   This document is based in large part on RFC 1191, which describes
603	   Path MTU Discovery for IPv4.  Certain portions of RFC 1191 were not
604	   needed in this document:

606	   router specification    - Packet Too Big messages and corresponding
607	                             router behavior are defined in [ICMPv6]

609	   Don't Fragment bit      - there is no DF bit in IPv6 packets

611	   TCP MSS discussion      - selecting a value to send in the TCP MSS
612	                             option is discussed in [IPv6-SPEC]

614	   old-style messages      - all Packet Too Big messages report the
615	                             MTU of the constricting link

617	   MTU plateau tables      - not needed because there are no old-style
618	                             messages

620	References

622	   [CONG]      Van Jacobson.  Congestion Avoidance and Control.  Proc.
623	               SIGCOMM '88 Symposium on Communications Architectures and
624	               Protocols, pages 314-329.  Stanford, CA, August, 1988.

626	   [FRAG]      C. Kent and J. Mogul.  Fragmentation Considered Harmful.
627	               In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
628	               Communications Technology.  August, 1987.

630	   [ICMPv6]    A. Conta and S. Deering, "Internet Control Message
631	               Protocol (ICMPv6) for the Internet Protocol Version 6
632	               (IPv6) Specification", RFC 1885, December 1995

634	   [IPv6-SPEC] S. Deering and R. Hinden, "Internet Protocol, Version 6
635	               (IPv6) Specification", RFC 1883, December 1995

637	   [ISOTP]     ISO.  ISO Transport Protocol Specification: ISO DP 8073.
638	               RFC 905, SRI Network Information Center, April, 1984.

640	   [ND]        T. Narten, E. Nordmark, and W. Simpson, "Neighbor
641	               Discovery for IP Version 6 (IPv6)", work in progress
642	               draft-ietf-ipngwg-discovery-04.txt, February 1996.

644	   [RFC-1191]  J. Mogul and S. Deering, "Path MTU Discovery",
645	               November 1990

647	   [RPC]       Sun Microsystems, Inc.  RPC: Remote Procedure Call
648	               Protocol.  RFC 1057, SRI Network Information Center,
649	               June, 1988.

651	Authors' Addresses

653	    Jack McCann
654	    Digital Equipment Corporation
655	    110 Spitbrook Road, ZKO3-3/U14
656	    Nashua, NH 03062
657	    Phone: +1 603 881 2608
658	    Fax:   +1 603 881 0120
659	    Email: mccann@zk3.dec.com

661	    Stephen E. Deering
662	    Xerox Palo Alto Research Center
663	    3333 Coyote Hill Road
664	    Palo Alto, CA 94304
665	    Phone: +1 415 812 4839
666	    Fax:   +1 415 812 4471
667	    Email: deering@parc.xerox.com

669	    Jeffrey Mogul
670	    Digital Equipment Corporation Western Research Laboratory
671	    250 University Avenue
672	    Palo Alto, CA 94301
673	    Phone: +1 415 617 3304
674	    Email: mogul@pa.dec.com

676	Expiration

678	    August 21, 1996