idnits 2.17.1 draft-van-beijnum-multi-mtu-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 21, 2016) is 2957 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group I. van Beijnum 3 Internet-Draft Institute IMDEA Networks 4 Intended status: Experimental March 21, 2016 5 Expires: September 22, 2016 7 Extensions for Multi-MTU Subnets 8 draft-van-beijnum-multi-mtu-05 10 Abstract 12 In the early days of the internet, many different link types with 13 many different maximum packet sizes were in use. For point-to-point 14 or point-to-multipoint links, there are still some other link types 15 (PPP, ATM, Packet over SONET), but multipoint subnets are now almost 16 exclusively implemented as Ethernets. Even though the relevant 17 standards mandate a 1500 byte maximum packet size for Ethernet, more 18 and more Ethernet equipment is capable of handling packets bigger 19 than 1500 bytes. However, since this capability isn't standardized, 20 it is seldom used today, despite the potential performance benefits 21 of using larger packets. This document specifies mechanisms to 22 negotiate per-neighbor maximum packet sizes so that nodes on a 23 multipoint subnet may use the maximum mutually supported packet size 24 between them without being limited by nodes with smaller maximum 25 sizes on the same subnet. 27 Status of This Memo 29 This Internet-Draft is submitted in full conformance with the 30 provisions of BCP 78 and BCP 79. 32 Internet-Drafts are working documents of the Internet Engineering 33 Task Force (IETF). Note that other groups may also distribute 34 working documents as Internet-Drafts. The list of current Internet- 35 Drafts is at http://datatracker.ietf.org/drafts/current/. 37 Internet-Drafts are draft documents valid for a maximum of six months 38 and may be updated, replaced, or obsoleted by other documents at any 39 time. It is inappropriate to use Internet-Drafts as reference 40 material or to cite them other than as "work in progress." 42 This Internet-Draft will expire on September 22, 2016. 44 Copyright Notice 46 Copyright (c) 2016 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with respect 54 to this document. Code Components extracted from this document must 55 include Simplified BSD License text as described in Section 4.e of 56 the Trust Legal Provisions and are provided without warranty as 57 described in the Simplified BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Notational Conventions . . . . . . . . . . . . . . . . . . . 4 63 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 4. Overview of operation . . . . . . . . . . . . . . . . . . . . 5 65 5. The ND NODEMTU option . . . . . . . . . . . . . . . . . . . . 6 66 6. The MTUTEST packet format . . . . . . . . . . . . . . . . . . 7 67 7. Changes to the RA MTU option semantics . . . . . . . . . . . 8 68 8. The TCP MSS option . . . . . . . . . . . . . . . . . . . . . 9 69 9. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 9.1. Initialization . . . . . . . . . . . . . . . . . . . . . 9 71 9.2. Probing . . . . . . . . . . . . . . . . . . . . . . . . . 10 72 9.3. Monitoring . . . . . . . . . . . . . . . . . . . . . . . 14 73 9.4. Neighbor MTU garbage collection . . . . . . . . . . . . . 16 74 10. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 16 75 11. IANA considerations . . . . . . . . . . . . . . . . . . . . . 16 76 12. Security considerations . . . . . . . . . . . . . . . . . . . 16 77 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 17 78 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 79 14.1. Normative References . . . . . . . . . . . . . . . . . . 17 80 14.2. Informative References . . . . . . . . . . . . . . . . . 18 81 Appendix A. Document and discussion information . . . . . . . . 19 82 Appendix B. Advantages and disadvantages of larger packets . . . 19 83 B.1. Clock skew . . . . . . . . . . . . . . . . . . . . . . . 19 84 B.2. ECMP over paths with different MTUs . . . . . . . . . . . 20 85 B.3. Delay and jitter . . . . . . . . . . . . . . . . . . . . 20 86 B.4. Path MTU Discovery problems . . . . . . . . . . . . . . . 21 87 B.5. Packet loss through bit errors . . . . . . . . . . . . . 21 88 B.6. Undetected bit errors . . . . . . . . . . . . . . . . . . 22 89 B.7. Interaction TCP congestion control . . . . . . . . . . . 23 90 B.8. IEEE 802.3 compatibility . . . . . . . . . . . . . . . . 23 91 B.9. Conclusion . . . . . . . . . . . . . . . . . . . . . . . 24 92 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 24 94 1. Introduction 96 Some protocols inherently generate small packets. Examples are VoIP, 97 where it is necessary to send packets frequently before much data can 98 be gathered to fill up the packet, and the DNS, where the queries are 99 inherently small and the returned results also often do not fill up a 100 full 1500-byte packet. However, most data that is transferred across 101 the internet and private networks is part of long-lived sessons and 102 requires segmentation by a transport protocol, which is almost always 103 TCP. These types of data transfers can benefit from larger packets 104 in several ways: 106 1. A higher data-to-header ratio makes for fewer overhead bytes 108 2. Fewer packets means fewer per-packet operations for the source 109 and destination hosts 111 3. Fewer packets also means fewer per-packet operations in routers 112 and middleboxes 114 4. TCP performance increases with larger packet sizes 116 Even though today, the capability to use larger packets (often called 117 jumboframes) is present in a lot of Ethernet hardware, this 118 capability typically isn't used because IP assumes a common MTU size 119 for all nodes connected to a link or subnet. In practice, this means 120 that using a larger MTU requires manual configuration of the non- 121 standard MTU size on all hosts and routers and possibly on layer 2 122 switches connected to a subnet. Also, the MTU size for a subnet is 123 limited to that of the least capable router, host or switch. 125 Perhaps in the future, when hosts support packetization layer path 126 MTU discovery ([RFC4821], "Packetization Layer Path MTU Discovery") 127 in all relevant transport protocols, it will be possible to simply 128 ignore MTU limitations by sending at the maximum locally supported 129 size and determining the maximum packet size towards a correspondent 130 from acknowledgements that come back for packets of different sizes. 131 However, [RFC4821] must be implemented in every transport protocol, 132 and problems arise in the case where hosts implementing [RFC4821] 133 interact with hosts that don't implement this mechanism, but do use a 134 larger than standard MTU. 136 This document provides for a set of mechanisms that allow the use of 137 larger packets between nodes that support them which interacts well 138 with both manually configured non-standard MTUs and expected future 139 [RFC4821] operation with larger MTUs. This is done using a new IPv6 140 Neighbor Discovery option and a new UDP-based protocol for exchanging 141 MTU information and testing whether jumboframes can be transmitted 142 successfully. 144 Appendix B discusses several potential issues with larger packets, 145 such as head-of-line blocking delays, path MTU discovery black holes 146 and the strength of the CRC32 with increasing packet sizes. 148 2. Notational Conventions 150 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 152 document are to be interpreted as described in [RFC2119]. 154 Note that this specification is not standards track, and as such, 155 can't overrule existing specifications. Whenever [RFC2119] language 156 is used, this must be interpreted within the context of this 157 specification: while the specification as a whole is optional and 158 non-standard, whenever it is implemented, such an implementation can 159 only function properly when all MUSTs are observed. 161 3. Terminology 163 Advertised MTU: The MTU size announced by a node to other nodes on 164 the local subnet. 166 Confirmed MTU: The largest packet size successfully received from 167 the neighbor or the largest packet size sent to the neighbor for 168 which an acknowledgment was received; whichever size is greater. 170 Confirmed Time: When a packet the size of the confirmed MTU was last 171 received or acknowledged. 173 Local MTU: The MTU configured on an interface. By default, this is 174 the largest MTU size supported by the hardware, but the Local MTU 175 may be lowered administratively or automatically based on policy. 176 (For instance, the MTU may be set to the Standard MTU if the link 177 speed is below 1000 Mbps.) 179 MRU: Maximum Receive Unit. The size of the largest IP packet that 180 can be received on an interface. This document doesn't use the 181 term MRU, and assumes that the MRU is equal to the MTU. 183 MTU: Maximum Transfer Unit. The size of the largest IP packet that 184 can be transmitted on an interface, considering hardware (and 185 administrative) limitations. 187 Neighbor: Another node on a connected subnet. Neighbors are 188 identified by the combination of a link address and an IP version. 190 The MTU may be set to different values for IPv4 and IPv6 191 administratively, but it is assumed that if a node has multiple 192 IPv4 or IPv6 addresses, the MTU for each set of addresses is the 193 same. 195 Neighbor MTU: The currently used MTU towards a neighboring node on a 196 subnet. The Neighbor MTU reflects the current best understanding 197 of the maximum packet size that can successfully be transmitted 198 towards that neighbor. 200 Safe MTU: The maximum packet size that is assumed to work without 201 testing. Defaults to the Standard MTU, but may be set to a 202 subnet-wide higher or lower value administratively, or to a lower 203 value using the MTU option in IPv6 Router Advertisements. 205 Standard MTU: The MTU specified in the relevant IPv4-over-... or 206 IPv6-over-... document, which is 1500 for Ethernet ([RFC0894] and 207 [RFC2464]). 209 4. Overview of operation 211 The mechanisms described in this document come into play when a node 212 is connected to a subnet using an interface that supports an MTU size 213 larger than the standard MTU size for that link type. 215 For each remote node connected to such a subnet, the local node 216 maintains a neighbor MTU setting. The length of packets transmitted 217 to a neighbor is always limited to the neighbor MTU size. 219 When a node starts communicating with another node on the same 220 subnet, it follows the following procedure: 222 1. Initialization: the neighbor MTU is set to local maximum MTU for 223 the interface used to reach the neighbor. 225 2. Discovery: learning the other node's MTU. 227 3. Probing: determining the maximum packet size that can 228 successfully be transmitted to and and received from the other 229 node, considering the (unknown) maximum packet size supported by 230 the layer 2 infrastructure. 232 4. Monitoring: making sure that when large packets are transmitted, 233 they are not silently discarded, for instance as the result of a 234 layer 2 reconfiguration. 236 During the discovery and probing stages, the neighbor MTU is adjusted 237 as new information becomes available. The monitoring stage is 238 ongoing. If during the monitoring stage it is determined that large 239 packets aren't successfully exchanged with the neighboring node, the 240 neighbor MTU is set to the safe MTU and the node returns to the 241 testing stage. 243 Unless administrative configuration or policy specifies otherwise, 244 the link, IPv4 and IPv6 MTU sizes are set to the maximum supported by 245 the hardware. This means that when TCP sessions are created, they 246 carry a maximum segment size (MSS) option that reflects the larger- 247 than-standard MTU. 249 5. The ND NODEMTU option 251 All MTU values are 32-bit unsigned integers in network byte order. 252 All other values are also unsigned and in network byte order . 254 The MTU size and two flags are exchanged as an IPv6 Neighbor 255 Discovery option. The new option, as well as the MTU value it 256 avertises, are named "NODEMTU". 258 1 2 3 259 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 261 | Type | Length | Reserved | 262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 263 | NodeMTU | 264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 265 / HintMTU (optional) / 266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 268 Type: TBD 270 Length: 1 or 2 272 Reserved: Set to 0 on transmission, ignored when received. 274 NodeMTU The maximum packet size the node wishes to receive on this 275 interface. 277 HintMTU The maximum packet size the node believes it can 278 successfully receive on this interface at this time. If the 279 HintMTU is equal to the NodeMTU or no value for HintMTU is known, 280 this field may be omitted and the Length field is set to 1. If 281 the HintMTU field is present, the Length field is set to 2. 283 When a node's interface speed changes, it MAY advertise a new MTU, 284 but it SHOULD remain prepared to receive packets of the maximum size 285 advertised to neighbors previously (if the old maximum size is larger 286 than the newly advertised one). 288 6. The MTUTEST packet format 290 The packets used to test whether large packets can be transmitted 291 successfully and communicate status are sent using UDP ([RFC0768]). 292 Their format is as follows: 294 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 296 | Source Port | Destination Port | 297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 298 | Length | Checksum | 299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 300 | 'M' | 'T' | 'U' | 'T' | 301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 302 |R|B| Reserved | Nonce | 303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 304 | NodeMTU | 305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 306 | HintMTU | 307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 308 | Padding | 309 ~ ~ 310 | | 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 Source port (UDP): For outgoing requests: an ephemeral port number. 314 For replies: 1022. (16 bits.) 316 Destination port (UDP): For outgoing requests: 1022. For replies: 317 the source port used in the request being replied to. (16 bits.) 319 Length (UDP): for IPv4 and IPv6 packets smaller than or equal to 320 65575 bytes, the length of the UDP segment. For IPv6 packets 321 larger than 65575 bytes, 0 (as per [RFC2675]). (16 bits.) 323 Checksum (UDP): the UDP checksum. (16 bits.) 325 R: reply request flag. If set to 0, no reply is sent. If set to 1, 326 the receiver is asked to send a reply. (1 bit.) 328 MTUT: The value corresponding to the ASCII string "MTUT", used to 329 differentiate MTUTEST packets from other UDP packets that use port 330 1022. Packets with a value other than "MTUT" at the beginning of 331 the UDP payload MUST be ignored. (32 bits.) 333 B: big reply request flag. If set to 0, replies are not padded. If 334 set to 1, replies are padded to be the same size as the request. 335 (1 bit.) 337 Reserved: set to 0 on transmission, ignored on reception. (6 bits.) 339 Nonce: a hard-to-guess value. (24 bits.) 341 NodeMTU: The maximum packet size that the sender is prepared to 342 receive at this time. (32 bits.) 344 HintMTU: The maximum packet size that the sender believes it can 345 successfully receive at this time. (32 bits.) 347 Padding: Filled with 0 or more all-zero bytes on transmission, 348 ignored on reception. 350 In addition to the fields listed above, the following IP and link 351 layer fields are taken into consideration: 353 Source link-layer address: On transmission: set automatically by the 354 networking stack. On reception: used to identify a neighbor. 356 IP version: On transmission: set automatically by the networking 357 stack. On reception: used to identify a neighbor. (The IP 358 version may also be identified implicitly through the API without 359 directly observing the version field.) 361 Time To Live / Hop Limit: On transmission: set to 255. On 362 reception: if 255, the packet is processed. If other than 255, 363 the packet is silently discarded. (To enforce that the protocol 364 is only used within a local subnet.) 366 Source IP address: On transmission, for requests: set to the address 367 the node intends to use to communicate with the neighbor. For 368 replies: set to the destination IP address in the request being 369 replied to. On reception: used to identify a neighbor. 371 Destination IP address: On transmission, for requests: set to the 372 address the node intends to use to communicate with the neighbor. 373 For replies: set to the source IP address in the request being 374 replied to. 376 7. Changes to the RA MTU option semantics 378 Section 6.3.4 of [RFC4861] specifies: 380 "If the MTU option is present, hosts SHOULD copy the option's value 381 into LinkMTU so long as the value is greater than or equal to the 382 minimum link MTU and does not exceed the maximum LinkMTU value 383 specified in the link-type-specific document" 385 This document changes the handling of the Router Advertisement MTU 386 option such that it may also be used by routers to tell hosts that 387 they SHOULD use an MTU larger than the LinkMTU and update their 388 SafeMTU value. If multiple routers advertise different MTUs that are 389 higher or lower than the standard MTU, behavior is undefined. MTU 390 options containing the standard MTU SHOULD be ignored. 392 The ability to advertise a larger-than-standard MTU must be used with 393 extreme care by nework administrators, as advertising an MTU size 394 that exceeds the capabilities of routers or the layer 2 395 infrastructure will lead to reachability problems. 397 If the advertised larger-than-standard MTU is ignored or not 398 supported by some hosts connected to the subnet, TCP will presumably 399 still work because the MSS option ([RFC0793]) limits the size of 400 transmitted TCP segments to what the receiver suports. However, non- 401 TCP protocols that use large packets will likely fail. The most 402 prominent example of this is DNS over UDP with EDNS0 when requesting 403 large records, such as those used for DNSSEC ([RFC6891]). 405 8. The TCP MSS option 407 Hosts SHOULD advertise the maximum MTU size they are prepared to use 408 on a link in the TCP MSS value, even during times when probing has 409 failed: should larger neighbor MTUs be established later, it will not 410 be possible to adjust the MSS for ongoing sessions. 412 9. Operation 414 9.1. Initialization 416 When an interface is activated, an appropriate local MTU is 417 determined, based on hardware limitations and admnistrative settings. 418 Additionally, a policy may be in place to constrain packet sizes when 419 operating at lower bandwidths, to avoid excessive delays as queues of 420 large packets build up and cause significant head-of-line blocking 421 for subsequent time-sensitive packts. Also, layer 2 devices 422 operating at lower interface speeds are less likely to support non- 423 standard MTUs. 425 In the absense of operational experience, this document RECOMMENDS 426 limiting the use of larger than standard MTUs to interfaces operating 427 at 400 Mbps or faster; and if a larger MTU is used for interfaces 428 operating at lower speeds, a "mini jumbo" size of 1982 bytes or less 429 is used for Ethernets. 431 For IPv4, the local MTU is limited to 65535 bytes. For IPv6, if 432 [RFC2675] jumbograms are not supported, the local MTU is limited to 433 65575 bytes. These limits apply even if the interface hardware 434 supports a larger MTU. IPv6 nodes that implement [RFC2675] 435 jumbograms MAY use MTU sizes larger than 65575 bytes. 437 When the interface speed changes, the local MTU MAY be changed to 438 reflect the new speed. However, the node SHOULD remain prepared to 439 receive packets of the size of a previously advertised MTU. 441 The local MTU MAY be different for IPv4 and IPv6. The local MTU is 442 the size used to calculate the value of the TCP MSS option. The 443 HintMTU is set to undefined. 445 When sending Neighbor Solicitations and Neighbor Advertisements, a 446 node includes its local MTU in the NodeMTU field of the NODEMTU 447 option. If the size of the HintMTU is known, it is also included. 449 9.2. Probing 451 When a node starts communicating with a new IPv4 or IPv6 neighbor, 452 the probing procedure is started. This can happen when ARP [RFC0826] 453 or Neighbor Discovery messages are exchanged, or when an incoming TCP 454 SYN is received. 456 The node sends a MTUTEST packet to the new neighbor and sets the 457 neighbor MTU to the safe MTU. The MTUTEST packet has the local MTU 458 in the NodeMTU field. If a hint MTU is known, it is included in the 459 HintMTU field. The R and B flags are set to 0. No padding is 460 included. 462 Upon reception of a Neighor Solicitation or a Neighbor Advertisement 463 with the NODEMTU option or an MTUTEST packet, the node determines if 464 the packet is received from a known neighbor IP address and a known 465 neighor link layer address. If the values match the values stored 466 for a known neighbor, no action occurs. 468 If the values match the values for a known link layer address and IP 469 version, but an unknown IP address, the IP address is added to the 470 list of IP addresses for the neighbor in question and the known 471 neighbor MTU for the neighbor is applied to the new address. 473 If the NodeMTU matches the NodeMTU previously sent by a known 474 neighbor but the HintMTU as a different non-zero value, the HintMTU 475 is updated. 477 If the HintMTU sent by a known neighbor is 0, the neighbor MTU is set 478 to the safe MTU, the HintMTU for the neighbor is set to unknown and 479 the probing procedure is started. 481 If the combination of link layer address and IP version is unknown, 482 the neighbor MTU is set to the safe MTU, the HintMTU is set to the 483 HintMTU value in the packet and the probing procedure is started. 485 Before starting the probing procedure, a node compares its link layer 486 address to the neighbor's link layer address. If the node's link 487 layer address is numerically larger than the neighbor's link layer 488 address, the node applies a waiting period before starting the 489 probing procedure. The waiting period SHOULD be at least 250 490 milliseconds and at most 1 second. 492 The following is pseudo-code for a probing procedure. Note that it 493 differens from the one outlined in [RFC4821]. The latter favors 494 conservative probing because lost probes can't easily be 495 differentiated from congestion losses, so lost probes are expensive. 496 For this specification, successful probes waste bandwidth and losses 497 are less problematic, so more aggressive probing and failing quickly 498 is more appropriate. 500 Neighbor.ConfirmedTime = UNDEFINED 502 if LocalMTU > Neighbor.AdvertisedMTU 503 let Max = Neighbor.AdvertisedMTU 504 else 505 let Max = LocalMTU 507 # test with maximum supported packet size first 508 # and finish probing upon success 509 test (Max) 510 if Success: 511 Neighbor.MTU = Max 512 return 514 # maximum size doesn't work, now find 515 # what does work 516 # assumption: 256 works for IPv4, 1280 for IPv6 517 let WorksNo = Max 518 if IPv6: 519 let Neighbor.ConfirmedMTU = 1280 520 if IPv4: 521 let Neighbor.ConfirmedMTU = 256 523 # test with the hinted size 524 # if successful, this becomes the minimum for further tests 525 # if unsuccessful, this becomes the maximum 526 test (HintMTU) 527 if Success: 528 let Neighbor.ConfirmedMTU = HintMTU 529 else 530 let WorksNo = HintMTU 532 # test the smallest usable size larger than 533 # the standard MTU (if that size is still 534 # in the range to be tested) so we avoid wasting 535 # time probing non-jumbo-capable nodes 536 if (StandardMTU + 8 > Neighbor.ConfirmedMTU and \ 537 StandardMTU + 8 < WorksNo) 538 test (StandardMTU + 8) 539 if Success: 540 let Neighbor.ConfirmedMTU = StandardMTU + 8 541 else 542 let WorksNo = StandardMTU + 8 544 # to establish an upper bound quickly, 545 # test (320, 640, 1280, ) 2560, 5120, 10240, 20480, 40960, ... 546 let Current = 320 547 while (Current < WorksNo) 548 if (Current > Neighbor.ConfirmedMTU) 549 test (Current) 550 if Success: 551 let Neighbor.ConfirmedMTU = Current 552 else 553 let WorksNo = Current 554 let Current = Current * 2 556 # we have now established that 557 # WorksNo =< Neighbor.ConfirmedMTU * 2 559 # further testing is based on a list of hints. 560 # there SHOULD be a mechanism for administrators 561 # to add hints. 562 # 563 # hint sources: 564 # 576: common PPP low delay 565 # 1492: PPP over Ethernet [RFC2516] 566 # 1500: Ethernet II 567 # 1982: IEEE Std 802.3as-2006 568 # 2304: IEEE 802.11 569 # 2482: Fibre Channel over Ethernet (FCoE) 570 # [CATALYST]: 571 # 9216, 8092, 1600, 1998, 2000, 1546, 1530, 17976, 2018 572 # sizes observed by the author: 574 # 576, 1982, 4070, 9000, 16384, 64000 575 let Hints = 576, 1492, 1530, 1982, 2304, 4070, 8092, 9000, \ 576 16384, 32000, 64000 578 foreach Size in Hints 579 if Size > Neighbor.ConfirmedMTU and Size < WorksNo 580 test (Size) 581 if Success: 582 let Neighbor.ConfirmedMTU = Size 583 else 584 let WorksNo = Size 586 # finished testing, maximum working packet size 587 # is now known to within about a factor 1.5, 588 # depending on the number of hints 590 if Neighbor.ConfirmedTime <> UNDEFINED 591 # we got at least one probe back, use discovered MTU 592 Neighbor.MTU = Neighbor.ConfirmedMTU 593 else 594 # we never got any probes back, neighbor probably does 595 # not implement MTUTEST protocol, so we use the safe MTU 596 Neighbor.MTU = SafeMTU 598 # done! 599 return 601 # sending probes 602 function test (Size) 604 # wait 20 milliseconds between sending probes 605 let MsecSinceProbe = now () - ProbeTime 607 if (MsecSinceProbe < 20) 608 sleep (20 - MsecSinceProbe) 610 # create probe, request reply (but not a big one) 611 let Probe.TTL = 255 612 let Probe.ReplyFlag = 1 613 Let Probe.BigFlag = 0 614 Let Nonce = rand () 615 Let Probe.Nonce = Nonce 616 let Probe.NodeMTU = LocalMTU 617 let Probe.HintMTU = HintMTU 618 let Probe.Padding = pad (Size - sizeof (Probe)) 619 send (Probe) 621 let ProbeTime = now () 622 # wait 2000 milliseconds for reply 623 # (this also avoids sending packets that are too large more 624 # than once every two seconds) 625 let Success = receive (Reply, 2000) 627 if not Success 628 return false 630 if not (Reply.TTL = 255 and Reply.Nonce = Nonce 631 and Reply.LinkAddress = Neighbor.LinkAddress) 632 return false 634 # valid reply received 635 # note that Neighbor.MTU is not updated yet, 636 # this happens after probing has finished 637 Neighbor.ConfirmedMTU = Reply.NodeMTU 638 Neighbor.ConfirmedTime = now () 639 Neighbor.HintMTU = Reply.HintMTU; 640 if HintMTU < Size 641 HintMTU = Size 642 return true 644 If at any time an unsolicited packet arrives from the neighbor and 645 the ConfirmedMTU of that neighbor is smaller than the size of the 646 packet received, the HintMTU for the neighbor is set to the size of 647 the received packet and a probe of that size may be sent. However, 648 as the maximum size of incoming packets may be different than the 649 maximum supported size of outgoing packets, reception of a large 650 packet is not sufficient to update the ConfirmedMTU. The packets 651 that update the HintMTU do not have to be MTUTEST protocol packets. 653 There are no retransmissions. Both nodes run the probing procedure, 654 so there are two opportunities to succeed. However, if both fail to 655 determine the maximum packet size that can be used because of lost 656 packets, the hosts will have to use a smaller packet size. 658 It is assumed that the maximum packet size that A can send to B is 659 the same as the maximum packet size that B can send to A. As such, 660 the reception of a large packet is treated the same as receiving an 661 acknowledgment for a sent large packet. 663 9.3. Monitoring 665 Once a working neighbor MTU is found, large packets can be exchanged. 666 Presumably, this situation will persist indefinitely. However, it is 667 possible that the network is reconfigured and then no longer supports 668 the MTU used between two nodes. The aim of the monitoring phase is 669 to detect this when it happens and establish a working MTU value 670 before sessions time out. 672 For each neighbor (as defined by a unique combination of link layer 673 address and IP version) with a neighbor MTU larger than the safe MTU, 674 the ability to successfully send or receive large packets is 675 monitored. In the monitoring phase, a node tracks whether it sends 676 any packets larger than the safe MTU to a neighbor and whether it 677 receives either acknowledgments for those packets, or it receives 678 packets of length neighbor MTU from that neighbor. (So acknowledged 679 outgoing packets don't have to be the maximum size supported to/from 680 the neighbor, but incoming packets do.) 682 The ability to track acknowledgment of non-MTUTEST packets is not 683 required. However, it is expected that hosts will be able to do this 684 for TCP packets because the TCP state is readily available. 686 Monitoring is happens in intervals. This document RECOMMENDS that 687 this interval is between 25 and 35 seconds for hosts and between 35 688 and 45 seconds for routers. At the end of each monitoring interval, 689 if acknowledgments or large packets were received, everything is fine 690 and the neighbor confirmed time is updated. 692 At the end of a monitoring interval, if no large packets were sent, 693 everything is fine and nothing happens. 695 At the end of a monitoring interval, if large packets were sent, but 696 no acknowledgments or incoming maximum size packets were seen, there 697 may have been a network reconfiguration that has made it impossible 698 for large packets to be transmitted successfully between the two 699 nodes. To determine whether this is the case, the node sends an 700 MTUTEST packet with lenght neighbor MTU. The R flag is set to 1 and 701 the B flag SHOULD be set to 0. A random nonce and the local MTU and 702 the hint MTU are included. 704 The node waits 2 seconds for a reply. If there is no reply, the 705 probe is retransmitted and the node waits 4 seconds for a reply. If 706 after 4 seconds there is still no reply, the node sets the hint MTU 707 to 0 and reinitializes all of the neighbor's MTU-related information 708 to initial values. Most notably, this means that the neighbor MTU is 709 set to the safe MTU. 711 If the node sets is own hint MTU to 0 or receives a hint MTU of 0 712 from a neighbor using an ND or MTUTEST packet, the node MAY start 713 sending probes to other neighbors before the monitoring interval 714 expires. However, nodes SHOULD limit the number of probes for all 715 neighbors combined to no more than one every two seconds. If a node 716 has many neighbors and sending probes at one every two seconds would 717 take too long, it MAY reset the neighbor MTUs of all of its neighbors 718 to the safe MTU without sending probes if at least two neighbors 719 appear to be affected by a reduction of the maximum working packet 720 size. 722 9.4. Neighbor MTU garbage collection 724 The MTU size for a neighbor is garbage collected along with a 725 neighbor's link address in accordance with regular ARP and neighbor 726 discovery timeouts. Additionally, a neighbor's MTU size is reset to 727 unknown after dead neighbor detection declares a neighbor "dead". 729 10. Applicability 731 As discussed in annex B, all larger packets, but especially very 732 large packet sizes have the potential to be problematic in various 733 ways. However, jumboframes of 9000 or 9216 bytes have been supported 734 by various vendors for a long time. As such, larger MTUs of 9 735 kilobytes seem safe enough for larger scale experimentation at this 736 time, but experiments with packet sizes larger than 11 kilobytes are 737 best done in confined and closely monitored situations. 739 11. IANA considerations 741 IANA is requested to assign a neighbor discovery option type value. 743 [TO BE REMOVED: This registration should take place at the following 744 location: http://www.iana.org/assignments/icmpv6-parameters 746 UDP port 1022 is used in accordance with [RFC4727]. Presumably, 747 unlike an ND option type value, a UDP port would be relatively easy 748 to change when experimentation makes way for production deployment. 750 12. Security considerations 752 Generating false neighbor discovery and MTUTEST packets with large 753 MTUs may lead to a denial-of-serve condition, just like the 754 advertisement of other false link parameters. Requests are large and 755 replies typically short to avoid the MTUTEST protocol being used as 756 an amplification vector. The nonce is used together with the 757 ephemeral UDP port number to make sure that malicious nodes cannot 758 generate a reply to a request in the blind. Enforcement of the value 759 255 for Hop Limit makes sure that off-link attackers can't use the 760 protocol to influence packet sizes remotely. 762 A malicious node may negotiate the use of large packets and cause 763 head-of-line blocking, especially on slower links. However, this can 764 only happen if the neighbor is prepared to use large packets in the 765 first place. 767 13. Acknowledgements 769 This document benefited from feedback by Dave Thaler, Jari Arkko, Joe 770 Touch, Pat Thaler, David Black, Brian Carpeter, Fred Templin, Jeffrey 771 Hammond, Mikael Abrahamsson and others. 773 14. References 775 14.1. Normative References 777 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 778 DOI 10.17487/RFC0768, August 1980, 779 . 781 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 782 RFC 793, DOI 10.17487/RFC0793, September 1981, 783 . 785 [RFC0826] Plummer, D., "Ethernet Address Resolution Protocol: Or 786 Converting Network Protocol Addresses to 48.bit Ethernet 787 Address for Transmission on Ethernet Hardware", STD 37, 788 RFC 826, DOI 10.17487/RFC0826, November 1982, 789 . 791 [RFC0894] Hornig, C., "A Standard for the Transmission of IP 792 Datagrams over Ethernet Networks", STD 41, RFC 894, 793 DOI 10.17487/RFC0894, April 1984, 794 . 796 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 797 Requirement Levels", BCP 14, RFC 2119, 798 DOI 10.17487/RFC2119, March 1997, 799 . 801 [RFC2464] Crawford, M., "Transmission of IPv6 Packets over Ethernet 802 Networks", RFC 2464, DOI 10.17487/RFC2464, December 1998, 803 . 805 [RFC2675] Borman, D., Deering, S., and R. Hinden, "IPv6 Jumbograms", 806 RFC 2675, DOI 10.17487/RFC2675, August 1999, 807 . 809 [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path 810 Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, 811 . 813 [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, 814 ICMPv6, UDP, and TCP Headers", RFC 4727, 815 DOI 10.17487/RFC4727, November 2006, 816 . 818 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 819 Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, 820 . 822 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 823 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 824 DOI 10.17487/RFC4861, September 2007, 825 . 827 [RFC6891] Damas, J., Graff, M., and P. Vixie, "Extension Mechanisms 828 for DNS (EDNS(0))", STD 75, RFC 6891, 829 DOI 10.17487/RFC6891, April 2013, 830 . 832 [ETHERNETII] 833 Digital Equipment Corporation, Intel Corporation, Xerox 834 Corporation, ""The Ethernet - A Local Area Network", 835 September 1980, . 838 14.2. Informative References 840 [RFC2516] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., 841 and R. Wheeler, "A Method for Transmitting PPP Over 842 Ethernet (PPPoE)", RFC 2516, DOI 10.17487/RFC2516, 843 February 1999, . 845 [IEEE.802.3AS_2006] 846 IEEE, "IEEE Standard for Information Technology 847 Telecommunications and Information Exchange Between 848 Systems Local and Metropolitan Area Networks Specific 849 Requirements Part 3: Carrier Sense Multiple Access With 850 Collision Detection (CSMA/CD) Access Method and Physical 851 Layer Specifications Amendment 3: Frame Format 852 Extensions", IEEE 802.3as-2006, 853 DOI 10.1109/ieeestd.2006.248146, November 2006, 854 . 857 [IEEE.802.3_2012] 858 IEEE, "802.3-2012", IEEE 802.3-2012, 859 DOI 10.1109/ieeestd.2012.6419735, January 2013, 860 . 863 [CRC] Jain, R., "Error Characteristics of Fiber Distributed Data 864 Interface (FDDI), IEEE Transactions on Communications", 865 August 1990. 867 [CATALYST] 868 Cisco, "Jumbo/Giant Frame Support on Catalyst Switches 869 Configuration Example", 870 . 873 Appendix A. Document and discussion information 875 The latest version of this document will always be available at 876 http://www.muada.com/drafts/. Please direct questions and comments to 877 the int-area mailinglist or directly to the author. 879 Appendix B. Advantages and disadvantages of larger packets 881 Although often desirable, the use of larger packets isn't universally 882 advantageous for the following reasons: 884 1. Clock skew 886 2. ECMP over paths with different MTUs 888 3. Increased delay and jitter 890 4. Increased reliance on path MTU discovery 892 5. Increased packet loss through bit errors 894 6. Increased risk of undetected bit errors 896 B.1. Clock skew 898 Ethernet hardware has to compensate between clocking differences 899 between the sender and receiver though a FIFO buffer. As packets get 900 larger, more buffer capacity is required. This places a limit on 901 packet sizes. 903 As jumboframes have been widely supported sinze the introduction of 904 Gigabit Ethernet, an in the absense of information to the contrary, 905 it seems safe to assume that the packet sizes that may be set 906 administratively fall within the capabilities of the hardware. 907 Administrators are encouraged to monitor the fraction of packets lost 908 from different types of corruption and adjust MTU sizes accordingly. 910 B.2. ECMP over paths with different MTUs 912 Should Equal Cost Multipath [RFC2992] be in effect between two nodes 913 implementing this specification, with the different paths having 914 different MTUs, then there is a high risk that probing will detect 915 the larger of the supported MTU sizes but some data packets will flow 916 over the path with the smaller MTU size. In this situation, packets 917 will be lost consistently and the protocol will not be able to 918 recover. 920 As such, configuring paths used for ECMP with different MTU sizes 921 MUST be avoided. 923 B.3. Delay and jitter 925 An low-bandwidth links, the additional time it takes to transmit 926 larger packets may lead to unacceptable delays. For instance, 927 transmitting a 9000-byte packet takes 7.23 milliseconds at 10 Mbps, 928 while transmitting a 1500-byte packet takes only 1.23 ms. Once 929 transmission of a packet has started, additional traffic must wait 930 for the transmission to finish, so a larger maximum packet size 931 immediately leads to a higher worst-case head-of-line blocking delay, 932 and thus, to a bigger difference between the best and worst cases 933 (jitter). The increase in average delay depends on the number of 934 packets that are buffered, the average packet size and the queuing 935 strategy in use. Buffer sizes vary greatly between implementations, 936 from only a few buffers in some switches and on low-speed interfaces 937 in routers, to hundreds of megabytes of buffer space on 10 Gbps 938 interfaces in some routers. 940 If we assume that the delays involved with 1500-byte packets on 100 941 Mbps Ethernet are acceptable for most, if not all, applications, then 942 the conclusion must be that 15000-byte packets on 1 Gbps Ethernet 943 should also be acceptable, as the delay is the same. At 10 Gbps 944 Ethernet, much larger packet sizes could be accommodated without 945 adverse impact on delay-sensitive applications. At below 100 Mbps, 946 larger packet sizes are probably not advisable. 948 When very tight QoS bounds are required, it may be appropriate to 949 limit MTU sizes and forego larger MTUs. With IPv6 this can be 950 accomplised by advertising a limited MTU size in Router 951 Advertisements. With IPv4, it is necessary to configure each node to 952 limit its MTU size. 954 B.4. Path MTU Discovery problems 956 PMTUD issues arise when routers can't fragment packets in transit 957 because the DF bit is set or because the packet is IPv6, but the 958 packet is too large to be forwarded over the next link, and the 959 resulting "packet too big" ICMP messages from the router don't make 960 it back to the sending host. If there is a PMTUD black hole, this 961 will typically happen when there is an MTU bottleneck somewhere in 962 the middle of the path. If the MTU bottleneck is located at either 963 end, the TCP MSS (maximum segment size) option makes sure that TCP 964 packets conform to the smallest MTU in the path. PMTUD problems are 965 of course possible with non-TCP protocols, but this is rare in 966 practice because non-TCP protocols are generally not capable of 967 adjusting their packet size on the fly and therefore use more 968 conservative packet sizes which won't trigger PMTUD issues. 970 Taking the delay and jitter issues to heart, maximum packet sizes 971 should be larger for faster links and smaller for slower links. This 972 means that in the majority of cases, the MTU bottleneck will tend to 973 be at, or close to, one of the ends of a path, rather than somewhere 974 in the middle, as in today's internet, the core of the network is 975 quite fast, while users usually connect to the core at lower speeds. 977 A crucial difference between PMTUD problems that result from MTUs 978 smaller than the de facto standard 1500 bytes and PMTUD problems that 979 result from MTUs larger than 1500 bytes is that in the latter case, 980 only the party that's actually using the non-standard MTU is 981 affected. This puts potential problems, the potential benefits and 982 the ability to solve any resulting problems in the same place: it's 983 always possible to revert to a 1500-byte MTU if PMTUD problems can't 984 be resolved otherwise. 986 Considering the above and the work that's going on in the IETF to 987 resolve PMTUD issues as they exist today, increasing MTUs where 988 desired doesn't seem to involve undue risks. 990 B.5. Packet loss through bit errors 992 All transmission media are subject to bit errors. In many cases, a 993 bit error leads to a CRC failure, after which the packet is lost. In 994 other cases, packets are retransmitted a number of times, but if 995 error conditions are severe, packets may still be lost because an 996 error occurred at every try. Using larger packets means that the 997 chance of a packet being lost due to errors increases. And when a 998 packet is lost, more data has to be retransmitted. 1000 Both per-packet overhead and loss through errors reduce the amount of 1001 usable data transferred. The optimum tradeoff is reached when both 1002 types of loss are equal. If we make the simplifying assumption that 1003 the relationship between the bit error rate of a medium and the 1004 resulting number of lost packets is linear with packet size for 1005 reasonable bit error rates, the optimum packet size is computed as 1006 follows: 1008 packet size = sqrt( overhead bytes / bit error rate ) 1010 According to this, the optimum packet size is one or more orders of 1011 magnitude larger than what's commonly used today. For instance, the 1012 maximum BER for 1000BASE-T is 10^-10, which implies an optimum packet 1013 size of 312250 bytes with Ethernet framing and IP overhead. 1015 B.6. Undetected bit errors 1017 Nearly all link layers employ some kind of checksum to detect bit 1018 errors so that packets with errors can be discarded. In the case of 1019 Ethernet, this is a frame check sequence in the form of a 32-bit CRC. 1020 Assuming a strong frame check sequence algorithm, a 32-bit checksum 1021 suggests that there is a 1 in 2^32 chance that a packet with one or 1022 more bit errors in it has the same checksum as the original packet, 1023 so the bit errors go undetected and data is corrupted. However, 1024 according to [CRC] the CRC-32 that's used for FDDI and Ethernet has 1025 the property that packets between 375 and 11453 bytes long 1026 (including) have a Hamming distance of 4. (Smaller packets have a 1027 larger Hamming distance, larger packets a smaller Hamming distance.) 1028 As a result, all errors where only a single bit is flipped, two bits 1029 are flipped or three bits are flipped, will be detected, because they 1030 can't result in the same CRC as the original packet. The probability 1031 of a packet having undetected bit errors can be approximated as 1032 follows for a 32-bit CRC: 1034 PER = (PL * BER) ^ H / 2^32 1036 Where PER is the packet error rate, BER is the bit error rate, PL is 1037 the packet length in bits and H is the Hamming distance. Another 1038 consideration is the impact of packet length on a multi-packet 1039 transmission of a given size. This would be: 1041 TER = transmission length / PL * PER 1043 So 1045 TER = transmission length / (PL ^ (H - 1) * BER ^ H) / 2^32 1047 Where TER is the transmission error rate. 1049 In the case of the Ethernet FCS and a Hamming distance of 4 for a 1050 large range of packet sizes, this means that the risk of undetected 1051 errors goes up with the cube of the packet length, but goes down with 1052 the fourth power of the bit error rate. This suggest that for a 1053 given acceptable risk of undetected errors, a maximum packet size can 1054 be calculated from the expected bit error rate. It also suggests 1055 that given the low BER rates mandated for Gigabit Ethernet, packet 1056 sizes of up to 11453 bytes should be acceptable. 1058 Additionally, unlike properties such as the packet length, the frame 1059 check sequence can be made dependent on the physical media, so in the 1060 future it should be possible to define a stronger FCS in future 1061 Ethernet standards, or to negotiate a stronger FCS between two 1062 stations on a point-to-point Ethernet link (i.e., a host and a switch 1063 or a router and a switch). 1065 B.7. Interaction TCP congestion control 1067 TCP performance is based on the inverse of the square of the packet 1068 loss probability. Using larger and thus fewer packets is therefore a 1069 competitative advantage. Larger packets increase burstiness, which 1070 can be problematic in some circumstances. Larger packets also allow 1071 TCP to ramp up its transmission speed faster, which is helpful on 1072 fast links, where large packets will be more common. In general, it 1073 would seem advantageous for an individual user to use larger packets, 1074 but under some circumstances, users using smaller packets may be put 1075 at a slight disadvantage. 1077 B.8. IEEE 802.3 compatibility 1079 According to the IEEE 802.3 standard ([IEEE.802.3_2012]), the field 1080 following the Ethernet addresses is a length field. However, 1081 [RFC0894] uses this field as a type field. Ambiguity is largely 1082 avoided by numbering type codes above 2048. The mechanisms described 1083 in this memo only apply to the standard [RFC0894] and [RFC2464] 1084 encapsulation of IPv4 and IPv6 in Ethernet, not to possible 1085 encapsulations of IPv4 or IPv6 in IEEE 802.3/IEEE 802.2 frames, so 1086 there is no change to the current use of the Ethernet length/type 1087 field. 1089 The 2006 revision of IEEE 802.3 ([IEEE.802.3AS_2006]) adds "frame 1090 expansion" to 2000 bytes (allowing for 1982-byte IP packets). As a 1091 result, layer 2 networks supporting MTUs of 1982 bytes are becoming 1092 more common. However, as [RFC0894] and [RFC2464] (encapsulation of 1093 IPv4 and IPv6 in Ethernet) are based on [ETHERNETII]), the IEEE 802.3 1094 standard has little bearing on the problem at hand. 1096 B.9. Conclusion 1098 Larger packets aren't universally desirable. The factors that factor 1099 into the decision to use larger packets include: 1101 o A link's bit error rate 1103 o The number of bits per symbol on a link and hence the likelihood 1104 of multiple bit errors in a single packet 1106 o The strength of the frame check sequence 1108 o The link speed 1110 o The number of buffers 1112 o Queuing strategy 1114 o Number of sessions on shared links and paths 1116 This means that choosing a good maximum packet size is, initially at 1117 least, the responsibility of hardware builders. A conservative 1118 approach may be called for, but even under conservative assumptions, 1119 9000-byte jumboframes on Gigabit Ethernet links seem reasonable. 1121 Author's Address 1123 Iljitsch van Beijnum 1124 Institute IMDEA Networks 1125 Avda. del Mar Mediterraneo, 22 1126 Leganes, Madrid 28918 1127 Spain 1129 Email: iljitsch@muada.com