idnits 2.17.1 draft-ietf-ipngwg-pmtuv6-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-24) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 90: '... IPv6 nodes SHOULD implement Path MT...' RFC 2119 keyword, line 206: '...et Too Big message, it MUST reduce its...' RFC 2119 keyword, line 213: '...oo Big message, a node MUST attempt to...' RFC 2119 keyword, line 214: '...ges in the near future. The node MUST...' RFC 2119 keyword, line 219: '... node MUST force the Path MTU Discov...' (6 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'CONG' -- Possible downref: Non-RFC (?) normative reference: ref. 'FRAG' ** Obsolete normative reference: RFC 1885 (ref. 'ICMPv6') (Obsoleted by RFC 2463) ** Obsolete normative reference: RFC 1883 (ref. 'IPv6-SPEC') (Obsoleted by RFC 2460) ** Downref: Normative reference to an Unknown state RFC: RFC 905 (ref. 'ISOTP') == Outdated reference: A later version (-06) exists of draft-ietf-ipngwg-discovery-04 ** Downref: Normative reference to an Informational RFC: RFC 1057 (ref. 'RPC') Summary: 13 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT J. McCann, Digital Equipment Corporation 2 May 23, 1996 S. Deering, Xerox PARC 3 J. Mogul, Digital Equipment Corporation 5 Path MTU Discovery for IP version 6 7 draft-ietf-ipngwg-pmtuv6-03.txt 9 Abstract 11 This document describes Path MTU Discovery for IP version 6. It is 12 largely derived from RFC-1191, which describes Path MTU Discovery for 13 IP version 4. 15 Status of this Memo 17 This document is an Internet-Draft. Internet-Drafts are working 18 documents of the Internet Engineering Task Force (IETF), its areas, 19 and its working groups. Note that other groups may also distribute 20 working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 To learn the current status of any Internet-Draft, please check the 28 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow 29 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 30 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 31 ftp.isi.edu (US West Coast). 33 Distribution of this document is unlimited. 35 Expiration 37 November 23, 1996 39 Contents 41 Abstract........................................................1 43 Status of this Memo.............................................1 45 Contents........................................................2 47 1. Introduction.................................................3 49 2. Terminology..................................................3 51 3. Protocol overview............................................4 53 4. Protocol Requirements........................................5 55 5. Implementation Issues........................................6 57 5.1. Layering...................................................6 59 5.2. Storing PMTU information...................................7 61 5.3. Purging stale PMTU information.............................9 63 5.4. TCP layer actions.........................................10 65 5.5. Issues for other transport protocols......................11 67 5.6. Management interface......................................12 69 6. Security considerations.....................................12 71 Acknowledgements...............................................13 73 Appendix A - Comparison to RFC 1191............................14 75 References.....................................................15 77 Authors' Addresses.............................................16 79 1. Introduction 81 When one IPv6 node has a large amount of data to send to another 82 node, the data is transmitted in a series of IPv6 packets. It is 83 usually preferable that these packets be of the largest size that can 84 successfully traverse the path from the source node to the 85 destination node. This packet size is referred to as the Path MTU 86 (PMTU), and it is equal to the minimum link MTU of all the links in a 87 path. IPv6 defines a standard mechanism for a node to discover the 88 PMTU of an arbitrary path. 90 IPv6 nodes SHOULD implement Path MTU Discovery in order to discover 91 and take advantage of paths with PMTU greater than the IPv6 minimum 92 link MTU [IPv6-SPEC]. A minimal IPv6 implementation (e.g., in a boot 93 ROM) may choose to omit implementation of Path MTU Discovery. 95 Nodes not implementing Path MTU Discovery use the IPv6 minimum link 96 MTU defined in [IPv6-SPEC] as the maximum packet size. In most 97 cases, this will result in the use of smaller packets than necessary, 98 because most paths have a PMTU greater than the IPv6 minimum link 99 MTU. A node sending packets much smaller than the Path MTU allows is 100 wasting network resources and probably getting suboptimal throughput. 102 2. Terminology 104 node - a device that implements IPv6. 106 router - a node that forwards IPv6 packets not explicitly 107 addressed to itself. 109 host - any node that is not a router. 111 upper layer - a protocol layer immediately above IPv6. Examples are 112 transport protocols such as TCP and UDP, control 113 protocols such as ICMP, routing protocols such as OSPF, 114 and internet or lower-layer protocols being "tunneled" 115 over (i.e., encapsulated in) IPv6 such as IPX, 116 AppleTalk, or IPv6 itself. 118 link - a communication facility or medium over which nodes can 119 communicate at the link layer, i.e., the layer 120 immediately below IPv6. Examples are Ethernets (simple 121 or bridged); PPP links; X.25, Frame Relay, or ATM 122 networks; and internet (or higher) layer "tunnels", 123 such as tunnels over IPv4 or IPv6 itself. 125 interface - a node's attachment to a link. 127 address - an IPv6-layer identifier for an interface or a set of 128 interfaces. 130 packet - an IPv6 header plus payload. 132 link MTU - the maximum transmission unit, i.e., maximum packet 133 size in octets, that can be conveyed in one piece over 134 a link. 136 path - the set of links traversed by a packet between a source 137 node and a destination node 139 path MTU - the minimum link MTU of all the links in a path between 140 a source node and a destination node. 142 PMTU - path MTU 144 Path MTU 145 Discovery - process by which a node learns the PMTU of a path 147 flow - a sequence of packets sent from a particular source 148 to a particular (unicast or multicast) destination for 149 which the source desires special handling by the 150 intervening routers. 152 flow id - a combination of a source address and a non-zero 153 flow label. 155 3. Protocol overview 157 This memo describes a technique to dynamically discover the PMTU of a 158 path. The basic idea is that a source node initially assumes that 159 the PMTU of a path is the (known) MTU of the first hop in the path. 160 If any of the packets sent on that path are too large to be forwarded 161 by some node along the path, that node will discard them and return 162 ICMPv6 Packet Too Big messages [ICMPv6]. Upon receipt of such a 163 message, the source node reduces its assumed PMTU for the path based 164 on the MTU of the constricting hop as reported in the Packet Too Big 165 message. 167 The Path MTU Discovery process ends when the node's estimate of the 168 PMTU is less than or equal to the actual PMTU. Note that several 169 iterations of the packet-sent/Packet-Too-Big-message-received cycle 170 may occur before the Path MTU Discovery process ends, as there may be 171 links with smaller MTUs further along the path. 173 Alternatively, the node may elect to end the discovery process by 174 ceasing to send packets larger than the IPv6 minimum link MTU. 176 The PMTU of a path may change over time, due to changes in the 177 routing topology. Reductions of the PMTU are detected by Packet Too 178 Big messages. To detect increases in a path's PMTU, a node 179 periodically increases its assumed PMTU. This will almost always 180 result in packets being discarded and Packet Too Big messages being 181 generated, because in most cases the PMTU of the path will not have 182 changed. Therefore, attempts to detect increases in a path's PMTU 183 should be done infrequently. 185 Path MTU Discovery supports multicast as well as unicast 186 destinations. In the case of a multicast destination, copies of a 187 packet may traverse many different paths to many different nodes. 188 Each path may have a different PMTU, and a single multicast packet 189 may result in multiple Packet Too Big messages, each reporting a 190 different next-hop MTU. The minimum PMTU value across the set of 191 paths in use determines the size of subsequent packets sent to the 192 multicast destination. 194 Note that Path MTU Discovery must be performed even in cases where a 195 node "thinks" a destination is attached to the same link as itself. 196 In a situation such as when a neighboring router acts as proxy [ND] 197 for some destination, the destination can to appear to be directly 198 connected but is in fact more than one hop away. 200 4. Protocol Requirements 202 As discussed in section 1, IPv6 nodes are not required to implement 203 Path MTU Discovery. The requirements in this section apply only to 204 those implementations that include Path MTU Discovery. 206 When a node receives a Packet Too Big message, it MUST reduce its 207 estimate of the PMTU for the relevant path, based on the value of the 208 MTU field in the message. The precise behavior of a node in this 209 circumstance is not specified, since different applications may have 210 different requirements, and since different implementation 211 architectures may favor different strategies. 213 After receiving a Packet Too Big message, a node MUST attempt to 214 avoid eliciting more such messages in the near future. The node MUST 215 reduce the size of the packets it is sending along the path. Using a 216 PMTU estimate larger than the IPv6 minimum link MTU may continue to 217 elicit Packet Too Big messages. Since each of these messages (and 218 the dropped packets they respond to) consume network resources, the 219 node MUST force the Path MTU Discovery process to end. 221 Nodes using Path MTU Discovery MUST detect decreases in PMTU as fast 222 as possible. Nodes MAY detect increases in PMTU, but because doing 223 so requires sending packets larger than the current estimated PMTU, 224 and because the likelihood is that the PMTU will not have increased, 225 this MUST be done at infrequent intervals. An attempt to detect an 226 increase (by sending a packet larger than the current estimate) MUST 227 NOT be done less than 5 minutes after a Packet Too Big message has 228 been received for the given path. The recommended setting for this 229 timer is twice its minimum value (10 minutes). 231 A node MUST NOT reduce its estimate of the Path MTU below the IPv6 232 minimum link MTU. 234 Note: A node may receive a Packet Too Big message reporting a 235 next-hop MTU that is less than the IPv6 minimum link MTU. In that 236 case, the node is not required to reduce the size of subsequent 237 packets sent on the path to less than the IPv6 minimun link MTU, 238 but rather must include a Fragment header in those packets [IPv6- 239 SPEC]. 241 A node MUST NOT increase its estimate of the Path MTU in response to 242 the contents of a Packet Too Big message. A message purporting to 243 announce an increase in the Path MTU might be a stale packet that has 244 been floating around in the network, a false packet injected as part 245 of a denial-of-service attack, or the result of having multiple paths 246 to the destination, each with a different PMTU. 248 5. Implementation Issues 250 This section discusses a number of issues related to the 251 implementation of Path MTU Discovery. This is not a specification, 252 but rather a set of notes provided as an aid for implementors. 254 The issues include: 256 - What layer or layers implement Path MTU Discovery? 258 - How is the PMTU information cached? 260 - How is stale PMTU information removed? 262 - What must transport and higher layers do? 264 5.1. Layering 266 In the IP architecture, the choice of what size packet to send is 267 made by a protocol at a layer above IP. This memo refers to such a 268 protocol as a "packetization protocol". Packetization protocols are 269 usually transport protocols (for example, TCP) but can also be 270 higher-layer protocols (for example, protocols built on top of UDP). 272 Implementing Path MTU Discovery in the packetization layers 273 simplifies some of the inter-layer issues, but has several drawbacks: 274 the implementation may have to be redone for each packetization 275 protocol, it becomes hard to share PMTU information between different 276 packetization layers, and the connection-oriented state maintained by 277 some packetization layers may not easily extend to save PMTU 278 information for long periods. 280 It is therefore suggested that the IP layer store PMTU information 281 and that the ICMP layer process received Packet Too Big messages. 282 The packetization layers may respond to changes in the PMTU, by 283 changing the size of the messages they send. To support this 284 layering, packetization layers require a way to learn of changes in 285 the value of MMS_S, the "maximum send transport-message size". The 286 MMS_S is derived from the Path MTU by subtracting the size of the 287 IPv6 header plus space reserved by the IP layer for additional 288 headers (if any). 290 It is possible that a packetization layer, perhaps a UDP application 291 outside the kernel, is unable to change the size of messages it 292 sends. This may result in a packet size that exceeds the Path MTU. 293 To accommodate such situations, IPv6 defines a mechanism that allows 294 large payloads to be divided into fragments, with each fragment sent 295 in a separate packet (see [IPv6-SPEC] section "Fragment Header"). 296 However, packetization layers are encouraged to avoid sending 297 messages that will require fragmentation (for the case against 298 fragmentation, see [FRAG]). 300 5.2. Storing PMTU information 302 Ideally, a PMTU value should be associated with a specific path 303 traversed by packets exchanged between the source and destination 304 nodes. However, in most cases a node will not have enough 305 information to completely and accurately identify such a path. 306 Rather, a node must associate a PMTU value with some local 307 representation of a path. It is left to the implementation to select 308 the local representation of a path. 310 In the case of a multicast destination address, copies of a packet 311 may traverse many different paths to reach many different nodes. The 312 local representation of the "path" to a multicast destination must in 313 fact represent a potentially large set of paths. 315 Minimally, an implementation could maintain a single PMTU value to be 316 used for all packets originated from the node. This PMTU value would 317 be the minimum PMTU learned across the set of all paths in use by the 318 node. This approach is likely to result in the use of smaller 319 packets than is necessary for many paths. 321 An implementation could use the destination address as the local 322 representation of a path. The PMTU value associated with a 323 destination would be the minimum PMTU learned across the set of all 324 paths in use to that destination. The set of paths in use to a 325 particular destination is expected to be small, in many cases 326 consisting of a single path. This approach will result in the use of 327 optimally sized packets on a per-destination basis. This approach 328 integrates nicely with the conceptual model of a host as described in 329 [ND]: a PMTU value could be stored with the corresponding entry in 330 the destination cache. 332 If flows [IPv6-SPEC] are in use, an implementation could use the flow 333 id as the local representation of a path. Packets sent to a 334 particular destination but belonging to different flows may use 335 different paths, with the choice of path depending on the flow id. 336 This approach will result in the use of optimally sized packets on a 337 per-flow basis, providing finer granularity than PMTU values 338 maintained on a per-destination basis. 340 For source routed packets (i.e. packets containing an IPv6 Routing 341 header [IPv6-SPEC]), the source route may further qualify the local 342 representation of a path. In particular, a packet containing a type 343 0 Routing header in which all bits in the Strict/Loose Bit Map are 344 equal to 1 contains a complete path specification. An implementation 345 could use source route information in the local representation of a 346 path. 348 Note: Some paths may be further distinguished by different 349 security classifications. The details of such classifications are 350 beyond the scope of this memo. 352 Initially, the PMTU value for a path is assumed to be the (known) MTU 353 of the first-hop link. 355 When a Packet Too Big message is received, the node determines which 356 path the message applies to based on the contents of the Packet Too 357 Big message. For example, if the destination address is used as the 358 local representation of a path, the destination address from the 359 original packet would be used to determine which path the message 360 applies to. 362 Note: if the original packet contained a Routing header, the 363 Routing header should be used to determine the location of the 364 destination address within the original packet. If Segments Left 365 is equal to zero, the destination address is in the Destination 366 Address field in the IPv6 header. If Segments Left is greater 367 than zero, the destination address is the last address 368 (Address[n]) in the Routing header. 370 The node then uses the value in the MTU field in the Packet Too Big 371 message as a tentative PMTU value, and compares the tentative PMTU to 372 the existing PMTU. If the tentative PMTU is less than the existing 373 PMTU estimate, the tentative PMTU replaces the existing PMTU as the 374 PMTU value for the path. 376 The packetization layers must be notified about decreases in the 377 PMTU. Any packetization layer instance (for example, a TCP 378 connection) that is actively using the path must be notified if the 379 PMTU estimate is decreased. 381 Note: even if the Packet Too Big message contains an Original 382 Packet Header that refers to a UDP packet, the TCP layer must be 383 notified if any of its connections use the given path. 385 Also, the instance that sent the packet that elicited the Packet Too 386 Big message should be notified that its packet has been dropped, even 387 if the PMTU estimate has not changed, so that it may retransmit the 388 dropped data. 390 Note: An implementation can avoid the use of an asynchronous 391 notification mechanism for PMTU decreases by postponing 392 notification until the next attempt to send a packet larger than 393 the PMTU estimate. In this approach, when an attempt is made to 394 SEND a packet that is larger than the PMTU estimate, the SEND 395 function should fail and return a suitable error indication. This 396 approach may be more suitable to a connectionless packetization 397 layer (such as one using UDP), which (in some implementations) may 398 be hard to "notify" from the ICMP layer. In this case, the normal 399 timeout-based retransmission mechanisms would be used to recover 400 from the dropped packets. 402 It is important to understand that the notification of the 403 packetization layer instances using the path about the change in the 404 PMTU is distinct from the notification of a specific instance that a 405 packet has been dropped. The latter should be done as soon as 406 practical (i.e., asynchronously from the point of view of the 407 packetization layer instance), while the former may be delayed until 408 a packetization layer instance wants to create a packet. 409 Retransmission should be done for only for those packets that are 410 known to be dropped, as indicated by a Packet Too Big message. 412 5.3. Purging stale PMTU information 414 Internetwork topology is dynamic; routes change over time. While the 415 local representation of a path may remain constant, the actual 416 path(s) in use may change. Thus, PMTU information cached by a node 417 can become stale. 419 If the stale PMTU value is too large, this will be discovered almost 420 immediately once a large enough packet is sent on the path. No such 421 mechanism exists for realizing that a stale PMTU value is too small, 422 so an implementation should "age" cached values. When a PMTU value 423 has not been decreased for a while (on the order of 10 minutes), the 424 PMTU estimate should be set to the MTU of the first-hop link, and the 425 packetization layers should be notified of the change. This will 426 cause the complete Path MTU Discovery process to take place again. 428 Note: an implementation should provide a means for changing the 429 timeout duration, including setting it to "infinity". For 430 example, nodes attached to an FDDI link which is then attached to 431 the rest of the Internet via a small MTU serial line are never 432 going to discover a new non-local PMTU, so they should not have to 433 put up with dropped packets every 10 minutes. 435 An upper layer must not retransmit data in response to an increase in 436 the PMTU estimate, since this increase never comes in response to an 437 indication of a dropped packet. 439 One approach to implementing PMTU aging is to associate a timestamp 440 field with a PMTU value. This field is initialized to a "reserved" 441 value, indicating that the PMTU is equal to the MTU of the first hop 442 link. Whenever the PMTU is decreased in response to a Packet Too Big 443 message, the timestamp is set to the current time. 445 Once a minute, a timer-driven procedure runs through all cached PMTU 446 values, and for each PMTU whose timestamp is not "reserved" and is 447 older than the timeout interval: 449 - The PMTU estimate is set to the MTU of the first hop link. 451 - The timestamp is set to the "reserved" value. 453 - Packetization layers using this path are notified of the increase. 455 5.4. TCP layer actions 457 The TCP layer must track the PMTU for the path(s) in use by a 458 connection; it should not send segments that would result in packets 459 larger than the PMTU. A simple implementation could ask the IP layer 460 for this value each time it created a new segment, but this could be 461 inefficient. Moreover, TCP implementations that follow the "slow- 462 start" congestion-avoidance algorithm [CONG] typically calculate and 463 cache several other values derived from the PMTU. It may be simpler 464 to receive asynchronous notification when the PMTU changes, so that 465 these variables may be updated. 467 A TCP implementation must also store the MSS value received from its 468 peer, and must not send any segment larger than this MSS, regardless 469 of the PMTU. In 4.xBSD-derived implementations, this may require 470 adding an additional field to the TCP state record. 472 The value sent in the TCP MSS option is independent of the PMTU. 473 This MSS option value is used by the other end of the connection, 474 which may be using an unrelated PMTU value. See [IPv6-SPEC] sections 475 "Packet Size Issues" and "Maximum Upper-Layer Payload Size" for 476 information on selecting a value for the TCP MSS option. 478 When a Packet Too Big message is received, it implies that a packet 479 was dropped by the node that sent the ICMP message. It is sufficient 480 to treat this as any other dropped segment, and wait until the 481 retransmission timer expires to cause retransmission of the segment. 482 If the Path MTU Discovery process requires several steps to find the 483 PMTU of the full path, this could delay the connection by many 484 round-trip times. 486 Alternatively, the retransmission could be done in immediate response 487 to a notification that the Path MTU has changed, but only for the 488 specific connection specified by the Packet Too Big message. The 489 packet size used in the retransmission should be no larger than the 490 new PMTU. 492 Note: A packetization layer must not retransmit in response to 493 every Packet Too Big message, since a burst of several oversized 494 segments will give rise to several such messages and hence several 495 retransmissions of the same data. If the new estimated PMTU is 496 still wrong, the process repeats, and there is an exponential 497 growth in the number of superfluous segments sent. 499 This means that the TCP layer must be able to recognize when a 500 Packet Too Big notification actually decreases the PMTU that it 501 has already used to send a packet on the given connection, and 502 should ignore any other notifications. 504 Many TCP implementations incorporate "congestion avoidance" and 505 "slow-start" algorithms to improve performance [CONG]. Unlike a 506 retransmission caused by a TCP retransmission timeout, a 507 retransmission caused by a Packet Too Big message should not change 508 the congestion window. It should, however, trigger the slow-start 509 mechanism (i.e., only one segment should be retransmitted until 510 acknowledgements begin to arrive again). 512 TCP performance can be reduced if the sender's maximum window size is 513 not an exact multiple of the segment size in use (this is not the 514 congestion window size, which is always a multiple of the segment 515 size). In many systems (such as those derived from 4.2BSD), the 516 segment size is often set to 1024 octets, and the maximum window size 517 (the "send space") is usually a multiple of 1024 octets, so the 518 proper relationship holds by default. If Path MTU Discovery is used, 519 however, the segment size may not be a submultiple of the send space, 520 and it may change during a connection; this means that the TCP layer 521 may need to change the transmission window size when Path MTU 522 Discovery changes the PMTU value. The maximum window size should be 523 set to the greatest multiple of the segment size that is less than or 524 equal to the sender's buffer space size. 526 5.5. Issues for other transport protocols 528 Some transport protocols (such as ISO TP4 [ISOTP]) are not allowed to 529 repacketize when doing a retransmission. That is, once an attempt is 530 made to transmit a segment of a certain size, the transport cannot 531 split the contents of the segment into smaller segments for 532 retransmission. In such a case, the original segment can be 533 fragmented by the IP layer during retransmission. Subsequent 534 segments, when transmitted for the first time, should be no larger 535 than allowed by the Path MTU. 537 The Sun Network File System (NFS) uses a Remote Procedure Call (RPC) 538 protocol [RPC] that, when used over UDP, in many cases will generate 539 payloads that must be fragmented even for the first-hop link. This 540 might improve performance in certain cases, but it is known to cause 541 reliability and performance problems, especially when the client and 542 server are separated by routers. 544 It is recommended that NFS implementations use Path MTU Discovery 545 whenever routers are involved. Most NFS implementations allow the 546 RPC datagram size to be changed at mount-time (indirectly, by 547 changing the effective file system block size), but might require 548 some modification to support changes later on. 550 Also, since a single NFS operation cannot be split across several UDP 551 datagrams, certain operations (primarily, those operating on file 552 names and directories) require a minimum payload size that if sent in 553 a single packet would exceed the PMTU. NFS implementations should 554 not reduce the payload size below this threshold, even if Path MTU 555 Discovery suggests a lower value. In this case the payload will be 556 fragmented by the IP layer. 558 5.6. Management interface 560 It is suggested that an implementation provide a way for a system 561 utility program to: 563 - Specify that Path MTU Discovery not be done on a given path. 565 - Change the PMTU value associated with a given path. 567 The former can be accomplished by associating a flag with the path; 568 when a packet is sent on a path with this flag set, the IP layer does 569 not send packets larger than the IPv6 minimum link MTU. 571 These features might be used to work around an anomalous situation, 572 or by a routing protocol implementation that is able to obtain Path 573 MTU values. 575 The implementation should also provide a way to change the timeout 576 period for aging stale PMTU information. 578 6. Security considerations 580 This Path MTU Discovery mechanism makes possible two denial-of- 581 service attacks, both based on a malicious party sending false Packet 582 Too Big messages to a node. 584 In the first attack, the false message indicates a PMTU much smaller 585 than reality. This should not entirely stop data flow, since the 586 victim node should never set its PMTU estimate below the IPv6 minimum 587 link MTU. It will, however, result in suboptimal performance. 589 In the second attack, the false message indicates a PMTU larger than 590 reality. If believed, this could cause temporary blockage as the 591 victim sends packets that will be dropped by some router. Within one 592 round-trip time, the node would discover its mistake (receiving 593 Packet Too Big messages from that router), but frequent repetition of 594 this attack could cause lots of packets to be dropped. A node, 595 however, should never raise its estimate of the PMTU based on a 596 Packet Too Big message, so should not be vulnerable to this attack. 598 A malicious party could also cause problems if it could stop a victim 599 from receiving legitimate Packet Too Big messages, but in this case 600 there are simpler denial-of-service attacks available. 602 Acknowledgements 604 We would like to acknowledge the authors of and contributors to 605 [RFC-1191], from which the majority of this document was derived. We 606 would also like to acknowledge the members of the IPng working group 607 for their careful review and constructive criticisms. 609 Appendix A - Comparison to RFC 1191 611 This document is based in large part on RFC 1191, which describes 612 Path MTU Discovery for IPv4. Certain portions of RFC 1191 were not 613 needed in this document: 615 router specification - Packet Too Big messages and corresponding 616 router behavior are defined in [ICMPv6] 618 Don't Fragment bit - there is no DF bit in IPv6 packets 620 TCP MSS discussion - selecting a value to send in the TCP MSS 621 option is discussed in [IPv6-SPEC] 623 old-style messages - all Packet Too Big messages report the 624 MTU of the constricting link 626 MTU plateau tables - not needed because there are no old-style 627 messages 629 References 631 [CONG] Van Jacobson. Congestion Avoidance and Control. Proc. 632 SIGCOMM '88 Symposium on Communications Architectures and 633 Protocols, pages 314-329. Stanford, CA, August, 1988. 635 [FRAG] C. Kent and J. Mogul. Fragmentation Considered Harmful. 636 In Proc. SIGCOMM '87 Workshop on Frontiers in Computer 637 Communications Technology. August, 1987. 639 [ICMPv6] A. Conta and S. Deering, "Internet Control Message 640 Protocol (ICMPv6) for the Internet Protocol Version 6 641 (IPv6) Specification", RFC 1885, December 1995 643 [IPv6-SPEC] S. Deering and R. Hinden, "Internet Protocol, Version 6 644 (IPv6) Specification", RFC 1883, December 1995 646 [ISOTP] ISO. ISO Transport Protocol Specification: ISO DP 8073. 647 RFC 905, SRI Network Information Center, April, 1984. 649 [ND] T. Narten, E. Nordmark, and W. Simpson, "Neighbor 650 Discovery for IP Version 6 (IPv6)", work in progress 651 draft-ietf-ipngwg-discovery-04.txt, February 1996. 653 [RFC-1191] J. Mogul and S. Deering, "Path MTU Discovery", 654 November 1990 656 [RPC] Sun Microsystems, Inc. RPC: Remote Procedure Call 657 Protocol. RFC 1057, SRI Network Information Center, 658 June, 1988. 660 Authors' Addresses 662 Jack McCann 663 Digital Equipment Corporation 664 110 Spitbrook Road, ZKO3-3/U14 665 Nashua, NH 03062 666 Phone: +1 603 881 2608 667 Fax: +1 603 881 0120 668 Email: mccann@zk3.dec.com 670 Stephen E. Deering 671 Xerox Palo Alto Research Center 672 3333 Coyote Hill Road 673 Palo Alto, CA 94304 674 Phone: +1 415 812 4839 675 Fax: +1 415 812 4471 676 Email: deering@parc.xerox.com 678 Jeffrey Mogul 679 Digital Equipment Corporation Western Research Laboratory 680 250 University Avenue 681 Palo Alto, CA 94301 682 Phone: +1 415 617 3304 683 Email: mogul@pa.dec.com 685 Expiration 687 November 23, 1996