idnits 2.17.1 draft-ietf-tuba-mtu-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 17 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 81 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** There are 2 instances of lines with control characters in the document. ** The abstract seems to contain references ([6], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 123: '... the path, hosts MUST NOT suppress err...' RFC 2119 keyword, line 124: '...ing (i.e., hosts MUST set the Error Re...' RFC 2119 keyword, line 164: '...t implement PMTU MUST implement both t...' RFC 2119 keyword, line 179: '...STRONGLY RECOMMENDED that under the sa...' RFC 2119 keyword, line 199: '...es a Datagram Too Big message, it MUST...' (23 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: Hosts SHOULD not lower the value they send in the MSS option; doing so prevents the PMTU Discovery mechanism from discovering PMTUs larger than the default TCP MSS. For TUBA/CLNP hosts, the TCP MSS option should be 74 octets less than the size of the largest datagram the host is able to reassemble (MMS_R, as defined in RFC 1122 [8]). In many cases, this will be the architectural limit of 65461 (65535 -74) octets. A host MAY send an MSS value derived from the MTU of its connected network (the maximum MTU over its connected networks, for a multi-homed host); this should not cause problems for PMTU Discovery, and may dissuade a broken peer from sending enormous datagrams. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: An upper layer MUST not retransmit datagrams in response to an increase in the PMTU estimate, since this increase never comes in response to an indication of a dropped datagram. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Note: Retransmissions MUST not be sent in response to every Datagram Too Big message. A burst of oversized segments will give rise to several such messages and hence several retransmissions of the same data; if the new estimated PMTU is still wrong, the process repeats, and there is an exponential growth in the number of superfluous segments sent. This means that he TCP layer must be able to recognize when a Datagram Too Big notification actually decreases the PMTU that it has already used to send a datagram on the given connection, and should ignore any other notifications. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (26 May 1994) is 10928 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 879 (ref. '5') (Obsoleted by RFC 7805, RFC 9293) ** Downref: Normative reference to an Experimental RFC: RFC 1561 (ref. '6') -- Possible downref: Non-RFC (?) normative reference: ref. '9' -- Possible downref: Non-RFC (?) normative reference: ref. '10' -- Possible downref: Non-RFC (?) normative reference: ref. '11' ** Obsolete normative reference: RFC 1323 (ref. '12') (Obsoleted by RFC 7323) -- Possible downref: Non-RFC (?) normative reference: ref. '13' ** Downref: Normative reference to an Informational RFC: RFC 1057 (ref. '14') Summary: 17 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TUBA Working Group D. Piscitello 2 Internet Draft Core Competence, Inc. 3 Expires 26 November 1994 26 May 1994 5 File name draft-ietf-tuba-mtu-01.txt 7 CLNP Path MTU Discovery 9 Status of this Memo 11 This document is an Internet Draft. Internet Drafts are 12 working documents of the Internet Engineering Task Force 13 (IETF), its Areas, and its Working Groups. Note that other 14 groups may also distribute working documents as Internet 15 Drafts. 17 Internet Drafts are draft documents valid for a maximum of 18 six months. Internet Drafts may be updated, replaced, or 19 obsoleted by other documents at any time. It is inappropriate 20 to use Internet Drafts as reference material or to cite them 21 other than as a "working draft" or "work in progress." 23 Please check the 1id-abstracts.txt listing contained in the 24 internet-drafts Shadow Directories on nic.ddn.mil, 25 nnsc.nsf.net, nic.nordu.net, ftp.nisc.sri.com, or 26 munnari.oz.au to learn the current status of any Internet 27 Draft. 29 Distribution of this memo is unlimited. Comments should be 30 submitted to the tuba@lanl.gov mailing list. 32 Abstract 34 This memo describes a technique for dynamically discovering 35 the maximum transmission unit (MTU) of an arbitrary CLNP 36 path. The mechanism described here is applicable to both 37 "pure-stack" OSI as well as TUBA/CLNP [6] environments, i.e., 38 environments where Internet transport protocols (UDP and TCP) 39 are operated over CLNP. This technique might not in all cases 40 discover the optimum Path MTU, but it will always choose a 41 Path MTU as accurate as, and in many cases more accurate 42 than, the Path MTU that would be chosen by current practice. 44 Acknowledgements 46 The mechanism proposed here was first suggested by Geof 47 Cooper,and incorporated into RFC 1191 [1], Path MTU 48 Discovery, by Jeff Mogul and Steve Deering. The excellent 49 work of these folks readily extends to CLNP-based internets. 50 Thanks also to Steve Deering and Mike Shand, for their 51 comments on early drafts. 53 TUBA Working Group CLNP Path MTU Discovery 55 1. Introduction 57 ISO/IEC 8473, Protocol for Providing the Connectionless 58 Network Service, [2] is a network layer datagram protocol. As 59 is the case for hosts in IP-based internets, a CLNP-based 60 host that has a large amount of data to send to another CLNP- 61 based host transmits that data as a series of CLNP datagrams. 62 The desire to reduce or eliminate fragmentation is the same 63 in CLNP-based internetworking environments as for IP [3]. 64 (Refer to [4] for arguments against fragmentation.). It is 65 thus desirable to define a mechanism that determines the 66 largest size datagram that does not require fragmentation 67 anywhere along the path from the source to the destination; 68 this is referred to as the Path MTU (PMTU), and it is equal 69 to the minimum of the MTUs of each hop in the path. 71 A shortcoming of the OSI protocol suite is the lack of a 72 standard mechanism for a host to discover the PMTU of an 73 arbitrary path. This document addresses this shortcoming by 74 applying a mechanism demonstrated to be effective on IP-based 75 internets. 77 ISO/IEC 8473 indicates that minimum subnetwork service data 78 unit size an underlying service must offer to CLNP is 512 79 octets. This is as close as OSI comes to specifying a host 80 requirement on what is referred to in Internet literature as 81 a maximum segment size (MSS, [5]). The current practice in 82 CLNP-based internets is to use the smaller of 512 and the 83 first-hop MTU as the PMTU for any destination that is not 84 connected to the same subnetwork as the source. This often 85 results in the use of smaller CLNP datagrams than necessary, 86 because it is increasingly the case that paths supporting 87 CLNP offer a PMTU greater than 512. As is the case with IP, a 88 host that sends CLNP datagrams smaller than the Path MTU 89 allows wastes Internet resources and applications operating 90 on that host are provided suboptimal throughput. 92 Future routing protocols may be required to provide accurate 93 PMTU information within a routing domain, although perhaps 94 not across multi-level routing hierarchies. Like IP networks, 95 CLNP-based networks need a simple mechanism that discovers 96 PMTUs without wasting resources within a routing domain, and 97 in interdomain communications exchanges as well. The 98 mechanism described here should serve the community until 99 (and perhaps beyond) such time as routing protocol extensions 100 are developed and deployed. 102 The initial mechanism described does not rely on changes to 103 CLNP. Improvements in the mechanism can be achieved through 104 the addition of a new option to the CLNP Error Report. 106 TUBA Working Group CLNP Path MTU Discovery 108 2. Protocol overview 110 The RFC 1191 technique of using the Don't Fragment (DF) bit 111 in the IP header to dynamically discover the PMTU of an IP 112 path is easily extended to CLNP by using the Segmentation 113 Permitted (SP) flag in the CLNP header. A source CLNP host 114 initially assumes that the MTU of a path is the (known) MTU 115 of its first hop, and sends all datagrams on that path with 116 segmentation disabled (i.e., the SP = FALSE). If any of the datagrams are too 117 large to be forwarded without fragmentation by some router along the path, that 118 router will discard them and return a CLNP Error Report message with the Reason 119 for Discard parameter set to the value indicating "segmentation needed but not 120 permitted". Upon receipt of such a message (consistent with RFC 1191, this is 121 referred to as a "Datagram Too Big" message), the source host reduces its 122 assumed PMTU for the path. Since the mechanism relies on the generation of an 123 Error Report message by a router along the path, hosts MUST NOT suppress error 124 reporting (i.e., hosts MUST set the Error Report flag to TRUE in CLNP headers 125 when attempting Path MTU discovery. 127 The PMTU discovery process ends when a host's estimate of the 128 PMTU is low enough that its datagrams can be delivered 129 without fragmentation. Alternatively, the host could end the 130 discovery process by enabling segmentation (SP = TRUE) in the 131 datagram headers; it could do so, for example, because it is 132 willing to have datagrams fragmented in some circumstances. 133 Normally, the host continues to set SP = FALSE in all datagrams, so that if the 134 route changes and the new PMTU 135 is lower, the lower PMTU will be discovered. 137 2.1 Datagram Too Big message considerations 139 The Datagram Too Big message as originally specified in ICMP 140 [7] did not report the MTU of the hop for which the rejected 141 datagram was too big; the CLNP Error Report fails in this 142 regard as well, so again, the source host cannot tell exactly 143 how much to reduce its assumed PMTU given the information 144 returned in the Error Report. To remedy this, a new option 145 is defined for CLNP Error Reports in Appendix A. The Next-Hop-MTU option should 146 convey the same semantics as the corresponding parameter in the ICMP header as 147 specified in RFC 1191; i.e. This field is used to report the MTU of what RFC 148 1191 refers to as the "constricting (next) hop". 150 Although this is the only change needed for routers to fully 151 support CLNP PMTU Discovery, it will not be possible to take 152 advantage of this explicit feedback mechanism until all 153 routers are upgraded, because the processing of CLNP options 155 TUBA Working Group CLNP Path MTU Discovery 157 requires that Error Reports containing unrecognized options 158 be (silently) discarded. Until such time as routers are updated, hosts may 159 search for an accurate PMTU estimate by continuing to send datagrams with the SP 160 = FALSE while varying datagram sizes. By using the search strategy described in 161 Section 7, hosts can discover an optimum (or at least better) PMTU with good 162 performance. 164 This memo recommends that all hosts that implement PMTU MUST implement both the 165 search method described in section 7 and the option method described here and in 166 Appendix A, with preference given the option, if present in the Error Report. 168 2.2 Path MTU changes 170 The MTU of a path may change over time, due to changes in 171 the routing topology. Reductions of the PMTU are indicated by 172 Datagram Too Big messages. 174 Hosts that choose to implement MTU discovery and cease the 175 process by enabling segmentation (SP = TRUE) change the composition of the CLNP 176 header, by forcing the addition of a segmentation part. RFC 1191 suggests that 177 IP hosts that implement MTU discovery will normally continue to set the DF bit 178 in all datagrams to detect PMTU changes resulting from routing changes; it is 179 STRONGLY RECOMMENDED that under the same circumstances, CLNP hosts follow suit, 180 and to continue to transmit datagrams in the discovery mode. 182 A host may periodically increase its assumed PMTU to detect 183 increases in a PMTU. As is the case with IPv4, this will 184 almost always result in CLNP datagrams being discarded and 185 Datagram Too Big messages being generated, because in most 186 cases the PMTU of the path will not have changed, so the 187 increase "probe" should be done infrequently. 189 Note: this mechanism essentially guarantees that a CLNP host 190 will not receive fragments from a peer doing PMTU Discovery, 191 so a host that continues to operate in MTU discovery mode 192 will interoperate with "segmentation-challenged" hosts; i.e., 193 hosts that are unable to reassemble fragmented datagrams as a 194 result of having implemented the non-segmenting subset rather 195 than the full version of CLNP. 197 3. Host specification 199 When a host receives a Datagram Too Big message, it MUST 200 reduce its estimate of the PMTU for the relevant path. The 201 precise behavior of a host in this circumstance is not 202 specified here, since different applications may have 204 TUBA Working Group CLNP Path MTU Discovery 206 different requirements, and different implementation 207 architectures may favor different strategies. 209 After receiving a Datagram Too Big message, a host MUST avoid 210 eliciting more such messages in the near future. The host has 211 two choices; (1) reduce the size of the datagrams it sends 212 along the path, or (2) set the segmentation flag in the CLNP 213 header and use segmentation. A host MUST force the PMTU 214 Discovery process to converge. 216 Hosts performing PMTU Discovery MUST detect decreases in Path 217 MTU as fast as possible. Hosts MAY detect increases in PMTU, 218 but since doing so requires sending datagrams larger than the 219 current estimated PMTU, and since it is likely is that the 220 PMTU will not have increased, this MUST be done at infrequent 221 intervals. Consistent with RFC 1191 recommendations for IP, 222 an attempt to detect an increase by sending a CLNP datagram 223 larger than the current estimate MUST NOT be done less than 5 224 minutes after a Datagram Too Big message has been received 225 for the given destination, or less than one minute after a 226 previous, successful attempted increase. The recommended 227 setting of these timers is twice their minimum values (10 and 228 2 minutes, respectively). 230 RFC 1191 recommends that a host MUST never reduce its 231 estimate of the PMTU below 68 octets (the value of 68 octets 232 guarantees that 8 octets of data can be transmitted given an 233 IPv4 header of 60 octets, see RFC 791). CLNP implementations 234 SHOULD NOT allow the MTU size to be configured to be less 235 than 512 octets. A CLNP host SHOULD NEVER reduce its estimate 236 of the PMTU below 512 octets. 238 3.1. TCP MSS Option 240 A host performing CLNP PMTU Discovery must obey the rule that 241 it not send datagrams larger than 512 octets unless it has 242 permission from the receiver. For TCP connections, this means 243 that a TUBA/CLNP host must not send datagrams larger than 74 244 octets plus the Maximum Segment Size (MSS) sent by its peer. 246 Note: In RFC 879, the TCP MSS is defined to be the relevant 247 IP datagram size minus 40, where 40 represents what is 248 referred to as the "liberal or optimistic" assumption 249 regarding TCP and IP header size (20 octets each); the 250 default of 576 octets for the maximum IP datagram size in 251 this scenario yields a default of 536 octets for the TCP MSS. 252 Using CLNP, with a correspondingly liberal and optimistic 253 assumption about CLNP header size (54 octets), the default 254 CLNP MSS of 512 octets yields a default of 438 octets for the 255 TCP MSS. 257 TUBA Working Group CLNP Path MTU Discovery 259 Hosts SHOULD not lower the value they send in the MSS option; 260 doing so prevents the PMTU Discovery mechanism from 261 discovering PMTUs larger than the default TCP MSS. For 262 TUBA/CLNP hosts, the TCP MSS option should be 74 octets less 263 than the size of the largest datagram the host is able to 264 reassemble (MMS_R, as defined in RFC 1122 [8]). In many 265 cases, this will be the architectural limit of 65461 (65535 - 266 74) octets. A host MAY send an MSS value derived from the MTU 267 of its connected network (the maximum MTU over its connected 268 networks, for a multi-homed host); this should not cause 269 problems for PMTU Discovery, and may dissuade a broken peer 270 from sending enormous datagrams. 272 Note: RFC 1191 recommends that hosts refrain from sending an 273 MSS greater than the architectural limit of 65535 minus the 274 IP header size. This recomendation applies for TUBA/CLNP 275 hosts as well (i.e., do not use a value greater than 65461). 277 4. Router specification 279 When a router is unable to forward a datagram because (a) the 280 datagram length exceeds the MTU of the next-hop network, (b) 281 segmentation is disabled (SP = FALSE), and (c) the Suppress Error Reports flag 282 is reset, the router MUST attempt to return an Error Report message to the 283 source of the datagram, with the Reason for Discard parameter code set to 284 indicate "segmentation required but not permitted". 286 To support MTU discovery, all routers MUST recognize the option specified in 287 Appendix A and are STRONGLY ENCOURAGED to be capable of generating the option. 288 Having all the routers recognize the option will allow the option to be returned 289 to the host engaged in MTU discovery. (It is recommended that a router's ability 290 to generate the option be operator-configurable; generation of the option can 291 then be implemented in an incremental fashion.). 293 5. Host processing of Error Report messages 295 RFC 1191 outlines several possible strategies a host may 296 follow upon receiving a Datagram Too Big message from a 297 router that has not implemented the next-hop-MTU parameter. 298 This section describes the strategies as they apply to 299 TUBA/CLNP hosts; however, the discussion here is limited to 300 the strategies that RFC 1191 identifies as tractable. 302 The simplest thing for a CLNP host to do in response to a 303 Datagram Too Big message is to assume that the PMTU is the 304 minimum of its currently-assumed PMTU and 512, and to enable 305 segmentation (SP = TRUE) in datagrams sent on that path. 307 TUBA Working Group CLNP Path MTU Discovery 309 Thus, the host falls back to the same PMTU as it would choose under current 310 practice. This strategy terminates quickly and does no worse than existing 311 practice, but it fails to avoid 312 fragmentation in some cases, and fails to make the most 313 efficient utilization of the internetwork in other cases.More 314 sophisticated strategies involve "searching" for an accurate 315 PMTU estimate, by continuing to send datagrams with SP = FALSE while varying 316 datagram sizes. 318 A good search strategy is one that obtains an accurate 319 estimate of the PMTU without causing many packets to be lost 320 in the process. The "MTU Plateau" strategy recommended in RFC 321 1191 for IP applies to CLNP hosts. The strategy begins with 322 the assumption that there are relatively few MTU values in 323 use in the Internet, so the search can be constrained to 324 include only the MTU values that are likely to appear. Mogul 325 and Deering make the assumption that designers tend to choose 326 MTUs in similar ways, so they collect groups of similar MTU 327 values and use the lowest value in the group as a search 328 "plateau", suggesting that it is better to underestimate an 329 MTU by a few per cent than to overestimate it by one. 331 Section 7 provides a table of representative MTU plateaus for 332 use in PMTU estimation, derived from RFC 1191, but extended 333 to include technologies that have emerged since its 334 publication. With this table, convergence is as good as 335 binary search in the worst case, and is far better in common 336 cases. Since the plateaus lie near powers of two, if an MTU 337 is not represented in this table, the algorithm will not 338 underestimate it by more than a factor of 2. 340 In RFC 1191, Mogul and Deering note that any search strategy 341 must have some "memory" of previous estimates in order to 342 choose the next one, and suggest that the information 343 available in the Datagram Too Big message itself can be used 344 for this purpose. Like ICMP Destination Unreachable messages, 345 CLNP Error report messages contain the header of the original datagram, which 346 contains the Total Length of the datagram too big to be forwarded without 347 fragmentation (note: when SP = FALSE, the total length of the CLNP datagram is 348 recorded in the Segment Length field). Since this Total Length may be less than 349 the current PMTU estimate, but is nonetheless larger than the actual PMTU, it 350 may be a good input to the method for choosing the next PMTU estimate. 352 Consistent with the strategy recommended for IP in RFC 1191, CLNP hosts shall 353 use as the next PMTU estimate the greatest plateau value that is less than the 354 returned Total Length field. 356 TUBA Working Group CLNP Path MTU Discovery 358 6. Host implementation 360 The RFC 1191 discussion of how PMTU Discovery is implemented 361 in host software is relevant here. The issues that are applicable to CLNP MTU 362 Discovery include: 364 - What layer or layers implement PMTU Discovery? 365 - Where is the PMTU information cached? 366 - How is stale PMTU information removed? 367 - What must transport and higher layers do? 369 6.1. Layering 371 In the IP architecture, the choice of what size datagram to 372 send is made by a transport or higher layer protocol, i.e., a 373 layer above IP. Mogul and Deering call such protocols 374 "packetization protocols", and explain how implementing PMTU 375 Discovery in the packetization layers simplifies some of the 376 inter-layer issues, but has several drawbacks, and conclude 377 that the IP layer should store PMTU information and that the 378 ICMP layer should process received Datagram Too Big messages. 380 In OSI, the functions ascribed to ICMP and IP are both 381 provided in the network layer. The division of function between the 382 packetization and network layer changes slightly. The packetization layers must 383 still respond to changes in the Path MTU by changing the size of the datagrams 384 they send, and must also be able to specify when segmentation of datagrams is 385 not permitted (SP = FALSE). (As is the case with IP, the network (CLNP) layer 386 does not simply set SP = FALSE in every packet, since it is possible that a 387 packetization layer, i.e., UDP or an application outside the kernel, is unable 388 to change its datagram size.) 390 To support this layering in CLNP, packetization layers require an extension of 391 the network service interface defined in [8]. The extension provides a way to 392 learn of changes in the value of MMS_S, the "maximum send transport-message 393 size", which is derived from the Path MTU by subtracting the minimum CLNP header 394 size (52 octets). This interaction might take the form of an OSI network service 395 primitive; i.e., an N-MSS_S-CHANGE.indication. (For completeness, one may wish 396 to 397 extend the N-UNITDATA.request primitive in [9] to allow 398 transport-entities to control the setting of the SP flag.) 400 6.2. Storing PMTU information 402 The general guidelines for storing PMTU information are the 403 same for CLNP as IP. The network (CLNP) layer should 404 associate each PMTU value that it has learned with a specific 406 TUBA Working Group CLNP Path MTU Discovery 408 path, identified by a source address, a destination address, 409 a CLNP quality-of-service, and if implemented, a security 410 classification. This association can be stored as a field in 411 the routing table entries. A host will not have a route for 412 every possible destination, but it should be able to cache a 413 per-host route for every active destination (A requirement 414 already imposed by the need to process ES-IS Redirects [10].) 416 PMTU storage guidelines for IP also apply to CLNP. When the first packet is sent 417 to a host for which no per-host route exists, a route is chosen either from the 418 set of per-network routes, or from the set of default routes. The PMTU fields in 419 these route entries should be initialized to be the MTU of the associated 420 first-hop data link, and must never be changed by the PMTU Discovery process. 421 (PMTU Discovery only creates or changes entries for per-host routes). The PMTU 422 associated with the initially-chosen route is presumed to be accurate until a 423 Datagram Too Big message is received. 425 When a Datagram Too Big message is received, the network 426 layer determines a new estimate for the Path MTU. If a per- 427 host route for this path does not exist, then one is created 428 (as if a per-host ES-IS Redirect is being processed; the new 429 route uses the same first-hop router as the current route). 430 If the PMTU estimate associated with the per-host route is 431 higher than the new estimate, then the value in the routing 432 entry is changed. 434 The packetization layers must be notified about decreases in 435 the PMTU (for example, through an implementation equivalent 436 of the primitive earlier described). Any packetization layer 437 instance (for example, a TCP connection) that is actively 438 using the path must be notified if the PMTU estimate is 439 decreased. Even if the Datagram Too Big message contains an 440 original datagram header that refers to a UDP packet, the TCP 441 layer must be notified if any of its connections use the 442 given path. (The same would be true for CLTP and TP-4 443 connections in OSI internets.) 445 The packetization layer instance that sent the CLNP datagram 446 that elicited the Datagram Too Big message should be notified 447 that its datagram has been dropped, even if the PMTU estimate 448 has not changed, so that it may retransmit the dropped 449 datagram. This notification can be asynchronously generated 450 by the network (CLNP) layer, or the notification can be 451 postponed until the packetization instance next attempts to 452 send a CLNP datagram larger than the PMTU estimate. In the 453 latter approach, if one assumes that an N-UNITDATA.request is 454 used to model the request to send a datagram, and the 455 primitive is extended to include the ability to twiddle the 457 TUBA Working Group CLNP Path MTU Discovery 459 SP flag, and the datagram is larger than the PMTU estimate, 460 the send function should fail and return a suitable error 461 indication. In RFC 1191, Mogul and Deering suggest that this 462 approach may be more suitable to a connectionless 463 packetization layer (such as one using UDP), which may be 464 hard to "notify" from the ICMP (or network) layer; this 465 should not be the case for CLNP, however, if so, the normal 466 timeout-based retransmission mechanisms would be used to 467 recover from the dropped datagrams. 469 Mogul and Deering are careful to note that the notification 470 to the packetization layer instances using the path about the 471 change in the PMTU is distinct from the notification of a 472 specific instance that a packet has been dropped. The latter 473 should be done as soon as practical (i.e., asynchronously 474 from the point of view of the packetization layer instance), 475 while the former may be delayed until a packetization layer 476 instance wants to create a packet. Retransmission should be 477 done for only those packets that are known to be dropped, as 478 indicated by a Datagram Too Big message. This applies to CLNP 479 Path MTU discovery for TUBA/CLNP environments as well. 481 6.3. Purging stale PMTU information 483 RFC 1191 provides guidelines for aging PMTU information. 484 Similar guidelines apply for TUBA/CLNP MTU discovery. 486 Because (under normal circumstances) a host performing CLNP 487 PMTU Discovery always disables segmentation, a stale PMTU value (one that is too 488 large) will be discovered almost immediately once a datagram is sent to the 489 given destination. No such mechanism exists for determining that a stored PMTU 490 value is too small, so an implementation SHOULD "age" cached PMTU values. When a 491 PMTU value has not decreased for some time (on the order of 10 minutes), the 492 PMTU estimate SHOULD be set to the first-hop data-link MTU, and the 493 packetization layers should be notified of the change. This will cause the 494 complete PMTU Discovery process to take place again. 496 Note: an implementation should provide a means for changing 497 the timeout duration, including setting it to "infinity". In 498 RFC 1191, Mogul and Deering cite the example of hosts 499 attached to an FDDI network, which is then attached to the 500 rest of the Internet via a slow serial line; such hosts will 501 never discover a larger, non-local PMTU, so they should not 502 be subjected to dropped datagrams every 10 minutes. 504 An upper layer MUST not retransmit datagrams in response to 505 an increase in the PMTU estimate, since this increase never 506 comes in response to an indication of a dropped datagram. 508 TUBA Working Group CLNP Path MTU Discovery 510 RFC 1191 and this memo recommend that PMTU aging be 511 implemented by adding a timestamp field to the routing table 512 entry. This field SHOULD be initialized to a "reserved" value 513 that indicates that the PMTU has never been changed. Whenever 514 the PMTU is decreased in response to a Datagram Too Big 515 message, the timestamp is set to the current time. Once a 516 minute thereafter, a timer-driven procedure should run 517 through the routing table, and for each entry whose timestamp 518 is not "reserved" and is older than the timeout interval, 520 - set the PMTU estimate to the MTU of the associated first 521 hop 523 - notify the packetization layers using this route of the 524 increase. 526 PMTU estimates may disappear from the routing table if the 527 per-host routes are removed; this can happen in response to 528 an ES-IS Redirect message, or because certain routing-table 529 daemons delete old routes after several minutes. Also, on a 530 multi-homed host a topology change may result in the use of a 531 different source interface. When this happens, if the 532 packetization layer is not notified then it may continue to 533 use a cached PMTU value that is now too small. RFC 1191 and 534 this memo suggest that the packetization layer be notified of 535 a possible PMTU change whenever a Redirect message causes a 536 route change, and whenever a route is deleted from the 537 routing table. 539 6.4. TCP layer actions 541 RFC 1191 provides guidelines for TCP layers when Path MTU 542 discovery is being performed. Similar guidelines apply for 543 TUBA/CLNP MTU discovery. 545 The TCP layer must track the PMTU for the destination of a 546 connection; it should not send datagrams that would be larger 547 than this. A simple implementation could ask the network 548 (CLNP) layer for this value (using a TUBA/CLNP equivalent of 549 the GET_MAXSIZES interface described in [8]) each time it 550 created a new segment, but this could be inefficient. 551 Moreover, TCP implementations that follow the "slow-start" 552 congestion-avoidance algorithm [11] typically calculate and 553 cache several other values derived from the PMTU. It may be 554 simpler to receive asynchronous notification when the PMTU 555 changes, so that these variables may be updated. 557 A TCP implementation must also store the MSS value received 558 from its peer (which defaults to 440), and not send any 559 segment larger than this MSS, regardless of the PMTU. 561 TUBA Working Group CLNP Path MTU Discovery 563 When a Datagram Too Big message is received, it implies that 564 a datagram was dropped by the router that sent the Error 565 Report message. It is sufficient to treat this as any other 566 dropped segment, and wait until the retransmission timer 567 expires to cause retransmission of the segment. If the PMTU 568 discovery process requires several steps to estimate the 569 right PMTU, this could delay the connection by many round- 570 trip times. Alternatively, the retransmission could be done 571 in immediate response to a notification that the Path MTU has 572 changed, but only for the specific connection specified by 573 the Datagram Too Big message. The datagram size used in the 574 retransmission should be no larger than the new PMTU. 576 Note: Retransmissions MUST not be sent in response to every 577 Datagram Too Big message. A burst of oversized segments will give rise to 578 several such messages and hence several retransmissions of the same data; if the 579 new estimated PMTU is still wrong, the process repeats, and there is an 580 exponential growth in the number of superfluous segments sent. This means that 581 he TCP layer must be able to recognize when a Datagram Too Big notification 582 actually decreases the PMTU that it has already used to send a datagram on the 583 given connection, and should ignore any other notifications. 585 Many TCP implementations now incorporate "congestion 586 advoidance" and "slow-start" algorithms to improve 587 performance [11, 12]. Unlike a retransmission caused by a TCP 588 retransmission timeout, a retransmission caused by a Datagram 589 Too Big message should not change the congestion window. It 590 should, however, trigger the slow-start mechanism (i.e., only 591 one segment should be retransmitted until acknowledgements 592 begin to arrive again). 594 TCP performance can be reduced if the sender's maximum window 595 size is not an exact multiple of the segment size in use 596 (this is not the congestion window size, which is always a 597 multiple of the segment size). In many systems (such as those 598 derived from 4.2BSD), the segment size is often set to 1024 599 octets, and the maximum window size (the "send space") is 600 usually a multiple of 1024 octets, so the proper relationship 601 holds by default. If PMTU Discovery is used, however, the 602 segment size may not be a submultiple of the send space, and 603 it may change during a connection; this means that the TCP 604 layer may need to change the transmission window size when 605 PMTU Discovery changes the PMTU value. The maximum window 606 size should be set to the greatest multiple of the segment 607 size (PMTU - 74) that is less than or equal to the sender's 608 buffer space size. 610 PMTU Discovery does not affect the value sent in the TCP MSS 612 TUBA Working Group CLNP Path MTU Discovery 614 option, because that value is used by the other end of the 615 connection, which may be using an unrelated PMTU value. 617 6.5. Issues for other transport protocols 619 Some transport protocols (such as OSI TP4 [13]) are not 620 allowed to repacketize when doing a retransmission; once an 621 attempt is made to transmit a datagram of a certain size, its 622 contents cannot be split into smaller datagrams for 623 retransmission. In such a case, the original CLNP datagram 624 should be retransmitted with segmentation enabled, allowing 625 it to be fragmented as necessary to reach its destination. 626 Subsequent datagrams, when transmitted for the first time, 627 should be no larger than allowed by the Path MTU, and should 628 have the SP = FALSE. 630 The Sun Network File System (NFS) uses a Remote Procedure 631 Call (RPC) protocol [14] that, in many cases, sends datagrams 632 that must be fragmented even for the first-hop link. This 633 might improve performance in certain cases, but it is known 634 to cause reliability and performance problems, especially 635 when the client and server are separated by routers. NFS 636 implementations SHOULD use PMTU Discovery whenever routers 637 are involved. Most NFS implementations allow the RPC datagram 638 size to be changed at mount-time (indirectly, by changing the 639 effective file system block size), but might require some 640 modification to support changes later on. 642 Also, since a single NFS operation cannot be split across 643 several UDP datagrams, certain operations (primarily, those 644 operating on file names and directories) require a minimum 645 datagram size that may be larger than the PMTU. NFS 646 implementations SHOULD NOT reduce the datagram size below 647 this threshold, even if PMTU Discovery suggests a lower 648 value. (In this case datagrams should not be sent with segmentation disabled.) 650 6.6. Management interface 652 In RFC 1191, Mogul and Deering suggest that an implementation 653 provide a way for a system utility program to: 655 - Specify that PMTU Discovery not be done on a given route 657 - Change the PMTU value associated with a given route 659 The former can be accomplished by associating a flag with the 660 routing entry; when a packet is sent via a route with this 661 flag set, the IP layer leaves the DF bit clear no matter what 662 the upper layer requests. The same can be provided for CLNP 664 TUBA Working Group CLNP Path MTU Discovery 666 PMTU discovery; when a packet is sent via a route with a 667 "suppress PMTU discovery" flag set, the CLNP layer leaves the 668 SP flag reset irrespective of upper layer requests. (The implementation should 669 also provide a way to change the 670 timeout period for aging stale PMTU information.) 672 7. Likely values for Path MTUs 674 The algorithm recommended in section 5 for "searching" the 675 space of Path MTUs is based on a table of values that 676 severely restricts the search space. In RFC 1191, Mogul and 677 Deering describe a table of MTU values that represented all 678 major data-link technologies in use in the Internet. 680 In this memo, Table 7-1 has been revised to consider 681 technologies that have been introduced to the Internet since 682 the publication of RFC 1191. The author has also removed 683 technologies that seem unlikely transmission media for CLNP; 684 notably, 1822/ARPANET, ARCNET, SLIP, Experimental Ethernets, and WIDEBAND. 685 Implementors should also make it convenient for customers without source code to 686 update the table values in their systems. 688 Plateau MTU Comments Reference 689 ------ --- -------- --------- 690 65535 Official maximum MTU RFC 791 691 65535 Official maximum NSDU ISO 8348 692 65535 Hyperchannel RFC 1044 693 65535 694 32000 Just in case 695 17914 16Mb IBM Token Ring (RFC 1191) 696 17914 697 9180 SMDS RFC 1209 698 9180 ATM over AAL5 RFC iiii 699 9180 700 8166 IEEE 802.4 RFC 1042 701 8166 702 4464 IEEE 802.5 (4Mb max) RFC 1042 703 4352 FDDI (Revised) RFC 1188 704 4352 705 1600 Frame Relay (recommended) RFC 1490 706 1600 X.25 Networks RFC 1356 707 1500 Ethernet Networks RFC 894 708 1500 Point-to-Point (default) RFC 1548 709 1492 IEEE 802.3 RFC 1042 710 1492 711 512 NETBIOS RFC 1088 712 512 Minimum SNSDU size ISO 8473 713 512 714 Table 7-1: CLNP MTUs in the Internet 716 TUBA Working Group CLNP Path MTU Discovery 718 Table 7-1 lists data links in order of decreasing MTU, and 719 groups them so that each set of similar MTUs is associated 720 with a "plateau" equal to the lowest MTU in the group. As 721 indicated in RFC 1191, the values in the table, especially 722 for higher MTU levels, will not remain valid forever; they 723 are presented here as an implementation suggestion, NOT as a 724 specification or requirement. Implementors should use up-to- 725 date references to pick a set of plateaus. It is important 726 that the table not contain too many entries or the process of 727 searching for a PMTU might waste Internet resources. 729 7.1. A better way to detect PMTU increases 731 Rather than detecting increases in the PMTU value by 732 periodically increasing the PMTU estimate to the first-hop 733 MTU, it is possible to periodically increase a PMTU estimate 734 to the lesser of the next-highest value in the plateau table 735 or the first-hop MTU. If the increased estimate is wrong, at 736 most one round-trip time is wasted before the correct value 737 is rediscovered. If the increased estimate is still too low, 738 a higher estimate will be attempted somewhat later. 740 Because it may take several such periods to discover a 741 significant increase in the PMTU, a short timeout period 742 should be used after the estimate is increased, and a longer 743 timeout be used after the PTMU estimate is decreased because 744 of a Datagram Too Big message. For example, after the PTMU 745 estimate is decreased, the timeout should be set to 10 746 minutes; once this timer expires and a larger MTU is 747 attempted, the timeout can be set to a much smaller value 748 (say, 2 minutes). In no case should the timeout be shorter 749 than the estimated round-trip time, if this is known. 751 8. Security considerations 753 A malicious party could cause problems if it could stop a 754 victim from receiving legitimate Datagram Too Big messages, 755 but in this case there are simpler denial-of-service attacks. 756 Other, more likely forms of denial-of-service attacks against 757 an IP host attempting MTU discovery are based on tampering 758 with the value announced in the ICMP NEXT-HOP-MTU parameter 759 (see also Appendix A). 761 9. References 763 [1] Mogul, J., and S. Deering. Path MTU Discovery, RFC 1191, 764 Internet Network Information Center, November 1990. 766 [2] ISO/IEC 8473-1992. ISO - Data Communications - Protocol 767 Providing the Connectionless Network Service, Edition 2. 769 TUBA Working Group CLNP Path MTU Discovery 771 [3] Postel, J. Internet Protocol. RFC 791, Internet Network 772 Information Center, September 1981. 774 [4] Kent, C., and J. Mogul. Fragmentation Considered 775 Harmful. Proc. SIGCOMM '87 Workshop on Frontiers in 776 Computer Communications Technology. August, 1987. 778 [5] Postel, J. The TCP Maximum Segment Size and Related 779 Topics. RFC 879, Internet Network Information Center, 780 Nov. 1983. 782 [6] Piscitello, D. Use of ISO CLNP in TUBA Environments, RFC 783 1561, Internet Network Information Center, Dec. 1993. 785 [7] Postel, J. Internet Control Message Protocol. RFC 792, 786 Internet Network Information Center, September, 1981. 788 [8] R. Braden, ed. Requirements for Internet Hosts -- 789 Communication Layers. RFC 1122, Internet Network 790 Information Center, October, 1989. 792 [9] ISO/IEC 8348-1992. International Standards Organization. 793 OSI Network Service Definition. 795 [10] ISO/IEC 9542-1992. International Standards Organization. 796 End-system to Intermediate-system exchange protocol 797 for use in conjunction with ISO/IEC 8473.. 799 [11] Jacobson, V. Congestion Avoidance and Control. In Proc. 800 SIGCOMM '88 Symposium on Communications Architectures 801 and Protocols, pages 314-329. Stanford, CA, Aug. 1988. 803 [12] Van Jacobson, R. Braden, D, Borman. RFC 1323, TCP 804 Extensions for High Performance, Internet Network 805 Information Center, May 1992. 807 [13] ISO/IEC 8072-1986. International Standards Organization. 808 ISO Transport Protocol Specification. 810 [14] Sun Microsystems, Inc. Remote Procedure Call Protocol. 811 RFC 1057, SRI Network Information Center, June, 1988. 813 Author's Address 815 David M. Piscitello 816 Core Competence, Inc. 817 1620 Tuckerstown Road 818 Dresher, PA 19025 USA 819 dave@corecom.com 821 TUBA Working Group CLNP Path MTU Discovery 823 Appendix A. NEXT-HOP-MTU parameter for CLNP Error Reports 825 To support Path MTU Discovery more efficiently, a new 826 parameter is defined for CLNP Error Reports. The "Next-Hop-MTU" parameter has 827 the same semantics as the corresponding parameter in the ICMP header as 828 specified in RFC 1191; i.e., this field shall be used to report the 829 "constricting (next) hop" MTU. As part of its specification, ISO/IEC 8473 MUST 830 indicate that a router MUST include the MTU of the constricting next-hop network 831 in the new parameter in the 832 Error Report header. The format of the parameter is: 834 0 1 2 3 835 01234567 89012345 67890123 45678901 836 +--------+--------+--------+--------+ 837 | Code | Length | (value of) | 838 |11000010| (4) | Next-Hop-MTU | 839 +--------+--------+--------+--------+ 841 The value of the Next-Hop MTU field is the size in octets of 842 the largest CLNP datagram that could be forwarded, along the 843 path of the original datagram, without being fragmented at 844 this router. The size includes the CLNP header and data, and 845 does not include any lower level headers. This field MUST 846 never contain a value less than 512. When a host receives a 847 Datagram Too Big message, it MUST reduce its estimate of the 848 PMTU for the relevant path, based on the value of the Next- 849 Hop-MTU field in the Error Report 851 The specification of this parameter introduces additional 852 security considerations for PMTU Discovery. CLNP Path MTU 853 Discovery mechanism will be vulnerable to the same denial-of- 854 service attacks as IP. Both attacks are based on a malicious 855 party sending false Datagram Too Big messages to a host. The 856 RFC 1191 description of these attacks is repeated here. 858 In the first attack, the false message indicates a PMTU much 859 smaller than reality. This should not entirely stop data 860 flow, since the victim host should never set its PMTU 861 estimate below the absolute minimum. Since the minimum MTU is 862 512, this has less impact than with IP but is nonetheless 863 intrusive. In the other attack, the false message indicates a 864 larger PMTU than reality. If believed, this could cause 865 temporary blockage as the victim sends datagrams that will be 866 dropped by some router. The host would discover its mistake 867 within one RTT, by receiving Datagram Too Big messages, but 868 frequent repetition of this attack could cause many discards. 869 A hostshould never raise its estimate of the PMTU based on a 870 Datagram Too Big message, so should not be vulnerable to this 871 attack.