< draft-van-beijnum-multi-mtu-00.txt   draft-van-beijnum-multi-mtu-01.txt >
Network Working Group I. van Beijnum Network Working Group I. van Beijnum
Internet-Draft Consultant Internet-Draft Consultant
Expires: December 29, 2007 June 29, 2007 Expires: Febrary 29, 2008 August 29, 2007
IPv6 Extensions for Multi-MTU Subnets IPv6 Extensions for Multi-MTU Subnets
draft-van-beijnum-multi-mtu-00 draft-van-beijnum-multi-mtu-01
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 33 skipping to change at page 1, line 33
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 29, 2007. This Internet-Draft will expire on Febrary 28, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2007).
Abstract Abstract
In the early days of the internet, many different link types with many In the early days of the internet, many different link types with many
different maximum packet sizes were in use. For point-to-point or different maximum packet sizes were in use. For point-to-point or
point-to-multipoint links, there are still some other link types (PPP, point-to-multipoint links, there are still some other link types (PPP,
ATM, Packet over SONET), but shared subnets are almost exclusively ATM, Packet over SONET), but shared subnets are almost exclusively
implemented as ethernets. Even though the relevant standards madate a implemented as ethernets. Even though the relevant standards madate a
1500 byte maximum packet size for ethernet, more and more ethernet 1500 octet maximum packet size for ethernet, more and more ethernet
equipment is capable of handling packets bigger than 1500 bytes. equipment is capable of handling packets bigger than 1500 octets.
However, since this capability isn't standardized, it's seldom used However, since this capability isn't standardized, it's seldom used
today, despite the potential performance benefits of using larger today, despite the potential performance benefits of using larger
packets. This document specifies a mechanism to negotiate per-neighbor packets. This document specifies a mechanism to negotiate per-neighbor
maximum packet sizes so that nodes on a shared subnet may use the maximum packet sizes so that nodes on a shared subnet may use the
maximum mutually supported packet size between them without being maximum mutually supported packet size between them without being
limited by nodes with smaller maximum sizes on the same subnet. limited by nodes with smaller maximum sizes on the same subnet.
1 Introduction 1 Introduction
Some protocols inherently generate small packets. Examples are VoIP, Some protocols inherently generate small packets. Examples are VoIP,
where it's necessary to send packets frequently before much data can where it's necessary to send packets frequently before much data can
be gathered to fill up the packet, and the DNS, where the queries are be gathered to fill up the packet, and the DNS, where the queries are
inherently small and the returned results also rarely fill up a full inherently small and the returned results also rarely fill up a full
1500-byte packet. However, most data that is transferred across the 1500-octet packet. However, most data that is transferred across the
internet and private networks is at least several kilobytes in size internet and private networks is at least several kilobytes in size
(often much larger) and requires segmentation by TCP or another (often much larger) and requires segmentation by TCP or another
transport protocol. These types of data transfer can benefit from transport protocol. These types of data transfer can benefit from
larger packets in several ways: larger packets in several ways:
1. A higher data-to-header ratio makes for fewer overhead bytes 1. A higher data-to-header ratio makes for fewer overhead bytes
2. Fewer packets means fewer per-packet operations on the source and 2. Fewer packets means fewer per-packet operations on the source and
destination hosts destination hosts
3. Fewer packets also means fewer per-packet operations in routers and 3. Fewer packets also means fewer per-packet operations in routers and
middleboxes middleboxes
4. TCP performance tends to increase with larger packet sizes 4. TCP performance tends to increase with larger packet sizes
Even though today, the capability to use larger packets (often called Even though today, the capability to use larger packets (often called
jumboframes) is present in a lot of ethernet hardware, this capability jumbo frames) is present in a lot of ethernet hardware, this
isn't used because IP assumes a common MTU size for all nodes capability isn't used because IP assumes a common MTU size for all
connected to a link or subnet. In practice, this means that using a nodes connected to a link or subnet. In practice, this means that
larger MTU requires manual configuration of the the non-standard MTU using a larger MTU requires manual configuration of the the
size on all hosts and routers and possibly on switches. Also, the MTU non-standard MTU size on all hosts and routers and possibly on
size for a subnet is limited to that of the least capable router, host switches. Also, the MTU size for a subnet is limited to that of
or switch. the least capable router, host or switch.
This document proposes to end this situation using several new IPv6 This document proposes to end this situation using several new
options and messages: options and messages:
1. An additional router advertisement MTU option to limit higher 1. An additional router advertisement MTU option to limit higher
maximum packet sizes maximum packet sizes
2. A new switch advertisement message, similar to a router 2. A neighbor discovery option that allows nodes to inform their
advertisement message, so that switches can announce the maximum
packet size they support
3. A neighbor discovery option that allows nodes to inform their
neighbors of the maximum packet size they support neighbors of the maximum packet size they support
4. A new ICMPv6 message for confirming that packets with an increased
maximum size can be transmitted and received successfully
Nodes running IPv6 may take advantage of these mechanisms to send 3. A neighbor discovery option for padding messages to make them
packets larger than the standard maximum size. Since IPv4 doesn't suitable for probing a neighbor's MTU and link-layer MTU
support equivalent mechanisms, support for IPv4 requires additional limitations
work that is best carried out after deployment experience with IPv6. 4. Padding for ARP messages to make them suitable for probing a
neighbor's MTU and link-layer MTU limitations
2 Terminology 2 Terminology
Local MTU:
The maximum packet size considered usable on an interface,
based on the physical MTU, the MTU advertised by routers and
administrative settings.
MTU: MTU:
Maximum Transmission Unit. This is the maximum IP packet size in Maximum Transmission Unit. This is the maximum IP packet size in
bytes supported on a link, towards a neighbor or towards a remote octets supported on a link, towards a neighbor or towards a remote
correspondent. In some cases, the term MRU (maximum receive unit) correspondent. In some cases, the term MRU (maximum receive unit)
would be more appropriate, but for consistency, the term MTU is would be more appropriate, but for consistency, the term MTU is
used throughout this document. used throughout this document.
Advised MTU:
The MTU that is considered the best or safe choice at a given time
on a given link.
Allowed MTU:
The maximum MTU allowed administratively.
Local MTU:
The maximum packet size considered usable on a node, based on the
physical MTU, the allowed MTU and advised MTUs.
Neighbor MTU: Neighbor MTU:
The maximum packet size that may be used towards a given on-link The maximum packet size that may be used towards a given
neighbor. on-link neighbor.
Off-link MTU: Node:
The maximum packet size that is appropriate for communicating with A host or router running IPv4 or IPv6.
off-link correspondents.
Oversized packet:
A packet exceeding the size defined in the relevant
IPv6-over-... or IP-over-... RFC.
Physical MTU: Physical MTU:
The MTU reported by the driver for an interface when operating at The MTU reported by the driver for an interface when operating at
a given link speed. a given link speed.
Tentative neighbor MTU: Probe:
The maximum packet size advertised by a neighbor. An ARP or neighbor solicitation packet of a specific (oversized)
size sent for the purpose of determining whether a neighbor can
successfully receive packets of this size sent by the local node.
3 Disadvantages of larger packets 3 Disadvantages of larger packets
Although often desirable, the use of larger packets isn't universally Although often desirable, the use of larger packets isn't universally
advantageous for the following reasons: advantageous for the following reasons:
1. Increased delay and jitter 1. Increased delay and jitter
2. Increased reliance on path MTU discovery 2. Increased reliance on path MTU discovery
3. Increased packet loss through bit errors 3. Increased packet loss through bit errors
4. Increased risk of undetected bit errors 4. Increased risk of undetected bit errors
skipping to change at page 4, line 9 skipping to change at page 4, line 4
3 Disadvantages of larger packets 3 Disadvantages of larger packets
Although often desirable, the use of larger packets isn't universally Although often desirable, the use of larger packets isn't universally
advantageous for the following reasons: advantageous for the following reasons:
1. Increased delay and jitter 1. Increased delay and jitter
2. Increased reliance on path MTU discovery 2. Increased reliance on path MTU discovery
3. Increased packet loss through bit errors 3. Increased packet loss through bit errors
4. Increased risk of undetected bit errors 4. Increased risk of undetected bit errors
3.1 Delay and jitter 3.1 Delay and jitter
An low-bandwidth links, the additional time it takes to transmit An low-bandwidth links, the additional time it takes to transmit
larger packets may lead to unacceptable delays. For instance, larger packets may lead to unacceptable delays. For instance,
transmitting a 9000-byte packet takes 7.23 milliseconds at 10 Mbps, transmitting a 9000-octet packet takes 7.23 milliseconds at 10 Mbps,
while transmitting a 1500-byte packet takes only 1.23 ms. Once while transmitting a 1500-octet packet takes only 1.23 ms. Once
transmission of a packet has started, additional traffic must wait for transmission of a packet has started, additional traffic must wait for
the transmission to finish, so a larger maximum packet size the transmission to finish, so a larger maximum packet size
immediately leads to a higher worst-case head-of-line blocking delay, immediately leads to a higher worst-case head-of-line blocking delay,
and as such, to a bigger difference between the best and worst cases and as such, to a bigger difference between the best and worst cases
(jitter). The increase in average delay depends on the number of (jitter). The increase in average delay depends on the number of
packets that are buffered, the average packet size and the queuing packets that are buffered, the average packet size and the queuing
strategy in use. Buffer sizes vary greatly, but assuming 40 buffers strategy in use. Buffer sizes vary greatly, but assuming 40 buffers
(not uncommon) leads to the following results: (not uncommon) leads to the following results:
Speed 500 1500 4500 9000 16384 65535 Speed 500 1500 4500 9000 16384 65535
10 Mbps 17.22 49.21 145.22 289.22 525.50 2098.34 10 Mbps 17.22 49.21 145.22 289.22 525.50 2098.34
100 Mbps 1.72 4.92 14.52 28.92 52.55 209.83 100 Mbps 1.72 4.92 14.52 28.92 52.55 209.83
1 Gbps 0.17 0.49 1.45 2.89 5.26 20.98 1 Gbps 0.17 0.49 1.45 2.89 5.26 20.98
10 Gbps 0.02 0.05 0.15 0.29 0.52 2.01 10 Gbps 0.02 0.05 0.15 0.29 0.52 2.01
In milliseconds and counting 38 additional bytes of ethernet overhead. In milliseconds and counting 38 additional octets of ethernet
overhead.
If we assume that the delays involved with 1500-byte packets on 100 If we assume that the delays involved with 1500-octet packets on 100
Mbps ethernet are acceptable for most, if not all, applications, then Mbps ethernet are acceptable for most, if not all, applications, then
the conclusion must be that 9000-byte packets on 1 Gbps ethernet the conclusion must be that 9000-octet packets on 1 Gbps ethernet
should also be acceptable. At 10 Gbps ethernet, much larger packet should also be acceptable. At 10 Gbps ethernet, much larger packet
sizes could be accommodated without adverse impact on delay-sensitive sizes could be accommodated without adverse impact on delay-sensitive
applications. Below 100 Mbps, larger packet sizes are probably not applications. Below 100 Mbps, larger packet sizes are probably not
advisable. advisable.
3.2 Path MTU Discovery problems 3.2 Path MTU Discovery problems
PMTUD issues arise when routers can't fragment packets in transit PMTUD issues arise when routers can't fragment packets in transit
because the DF bit is set or because the packet is IPv6, but the because the DF bit is set or because the packet is IPv6, but the
packet is too large to be forwarded over the next link, and the packet is too large to be forwarded over the next link, and the
skipping to change at page 5, line 11 skipping to change at page 5, line 8
size) option makes sure that TCP packets conform to the limited MTU. size) option makes sure that TCP packets conform to the limited MTU.
PMTUD problems are of course possible with non-TCP protocols, but this PMTUD problems are of course possible with non-TCP protocols, but this
is rare in practice. is rare in practice.
Taking the delay and jitter issues to heart, maximum packet sizes Taking the delay and jitter issues to heart, maximum packet sizes
should be larger for faster links. This means that in the majority of should be larger for faster links. This means that in the majority of
cases, the MTU bottleneck will tend to be at one of the ends of a cases, the MTU bottleneck will tend to be at one of the ends of a
path, rather than somewhere in the middle. path, rather than somewhere in the middle.
A crucial difference between PMTUD problems that result from MTUs A crucial difference between PMTUD problems that result from MTUs
smaller than the standard 1500 bytes and PMTUD problems that result smaller than the standard 1500 octets and PMTUD problems that result
from MTUs larger than the standard 1500 bytes is that in the latter from MTUs larger than the standard 1500 octets is that in the latter
case, only a party that's actually using the non-standard MTU is case, only a party that's actually using the non-standard MTU is
affected. This puts potential problems and potential benefits in the affected. This puts potential problems and potential benefits in the
same place so it's always possible to revert to a 1500-byte MTU if same place so it's always possible to revert to a 1500-octet MTU if
PMTUD problems can't be resolved otherwise. PMTUD problems can't be resolved otherwise.
Considering the above and the work that's going on in the IETF to Considering the above and the work that's going on in the IETF to
resolve PMTUD issues as they exist today, means that increasing MTUs resolve PMTUD issues as they exist today, means that increasing MTUs
where desired doesn't involve undue risks. where desired doesn't involve undue risks.
3.3 Packet loss through bit errors 3.3 Packet loss through bit errors
All transmission media are subject to bit errors. In many cases, a bit All transmission media are subject to bit errors. In many cases, a bit
error leads to a CRC failure, after which the packet is lost. In other error leads to a CRC failure, after which the packet is lost. In other
skipping to change at page 5, line 39 skipping to change at page 5, line 36
packet being lost due to errors increases. And when a packet is lost, packet being lost due to errors increases. And when a packet is lost,
more data has to be retransmitted. more data has to be retransmitted.
Both per-packet overhead and loss through errors reduce the amount of Both per-packet overhead and loss through errors reduce the amount of
usable data transferred. The optimum tradeoff is reached when both usable data transferred. The optimum tradeoff is reached when both
types of loss are equal. If we make the simplifying assumption that types of loss are equal. If we make the simplifying assumption that
the relationship between the bit error rate of a medium and the the relationship between the bit error rate of a medium and the
resulting number of lost packets is linear with packet size, the resulting number of lost packets is linear with packet size, the
optimum packet size is computed as follows: optimum packet size is computed as follows:
packet size = sqrt(overhead bytes / bit error rate) packet size = sqrt(overhead octets / bit error rate)
For IPv6 in ethernet framing, with 14 bytes of ethernet header, 40 For IPv6 in ethernet framing, with 14 octets of ethernet header, 40
bytes of IPv6 header, 20 bytes of TCP header and 32 bits of ethernet octets of IPv6 header, 20 octets of TCP header and 32 bits of ethernet
CRC the total number of bytes transmitted is 1538 while the useful CRC the total number of octets transmitted is 1538 while the useful
data is 1440. (The preamble and inter frame gap are not relevant for data is 1440. (The preamble and inter frame gap are not relevant for
error rate purposes.) 78 bytes of overhead would result in a 1518-byte error rate purposes.) 78 octets of overhead would result in a
frame length for a bit error rate of 10^-5.3. 1518-octet frame length for a bit error rate of 10^-5.3.
Note that the minimum BER for 1000BASE-T is 10^-10, which implies an Note that the minimum BER for 1000BASE-T is 10^-10, which implies an
optimum packet size of 312250 bytes. optimum packet size of 312250 octets.
In practice, it's better to err on the side of smaller packets and In practice, it's better to err on the side of smaller packets and
lower packet loss to avoid triggering TCP congestion mechanisms. lower packet loss to avoid triggering TCP congestion mechanisms.
However, it's obvious that current maximum packet sizes are far below However, it's obvious that current maximum packet sizes are far below
the optimum size with respect to optimum throughput. the optimum size with respect to optimum throughput.
3.4 Undetected bit errors 3.4 Undetected bit errors
Nearly all link layers employ some kind of checksum to detect bit Nearly all link layers employ some kind of checksum to detect bit
errors so that packets with errors can be discarded. In the case of errors so that packets with errors can be discarded. In the case of
ethernet, this is a frame check sequence in the form of a 32-bit CRC. ethernet, this is a frame check sequence in the form of a 32-bit CRC.
The error detecting properties of the CRC are twofold: the minimum The error detecting properties of the CRC are twofold: the minimum
Hamming distance and the statistical unlikeliness of two packets Hamming distance and the statistical unlikeliness of two packets
resulting in the same CRC. Depending on the size of the packet, there resulting in the same CRC. Depending on the size of the packet, there
is a minimum Hamming distance between two possible packets that result is a minimum Hamming distance between two possible packets that result
in the same CRC. For ethernet packets between 376 and 11454 bytes long in the same CRC. For ethernet packets between 376 and 11454 octets
(including), the Hamming distance is 3 [CRC]. So all packets where long (including), the Hamming distance is 3 [CRC]. So all packets
transmission errors resulted in one or two flipped bits are detected. where transmission errors resulted in one or two flipped bits are
If 3 or more bits are flipped, most errors are caught because only in detected. If 3 or more bits are flipped, most errors are caught
very few cases, the new bit pattern results in the same CRC as the old because only in very few cases, the new bit pattern results in the
bit pattern. In theory, the chance of two packets having the same same CRC as the old bit pattern. In theory, the chance of two
CRC-32 is 1 in 2^32, but this assumes the CRC is as strong as it packets having the same CRC-32 is 1 in 2^32, but this assumes the
possibly could be. CRC is as strong as it possibly could be.
It has been suggested that increasing packet lengths reduce the It has been suggested that increasing packet lengths reduce the
effectiveness of the CRC-32. For the statistical aspect of the CRC, effectiveness of the CRC-32. For the statistical aspect of the CRC,
this isn't true. Again, assuming a linear relationship between the this isn't true. Again, assuming a linear relationship between the
likelihood of bit errors in a packet and the bit error rate, doubling likelihood of bit errors in a packet and the bit error rate, doubling
the packet size means doubling the chance of a given number of bit the packet size means doubling the chance of a given number of bit
errors in the packet. In turn, this doubles the chance of a packet errors in the packet. In turn, this doubles the chance of a packet
with bit errors going undetected by the CRC. However, because the with bit errors going undetected by the CRC. However, because the
packet is twice as long, only half the number of packets is required packet is twice as long, only half the number of packets is required
to transmit any given amount of data. These aspects cancel each other to transmit any given amount of data. These aspects cancel each other
skipping to change at page 7, line 4 skipping to change at page 6, line 50
having enough bit errors to satisfy a given Hamming distance (packet having enough bit errors to satisfy a given Hamming distance (packet
error rate) and then generate the same CRC is: error rate) and then generate the same CRC is:
PER = (packet length in bits * BER) ^ H / 2^32 PER = (packet length in bits * BER) ^ H / 2^32
The likelihood of a packet with enough bit errors to meet the Hamming The likelihood of a packet with enough bit errors to meet the Hamming
distance and then generate an identical CRC in a transmission of a distance and then generate an identical CRC in a transmission of a
certain number of bits is: certain number of bits is:
TER = transmission length / packet length * PER TER = transmission length / packet length * PER
In other words: In other words:
TER = transmission length / (packet length ^ (H - 1) * BER ^ H) / 2^32 TER = transmission length / (packet length ^ (H - 1) * BER ^ H) / 2^32
(Hence the irrelevance of the packet length for a Hamming distance of (Hence the irrelevance of the packet length for a Hamming distance of
1.) 1.)
For a 400 GB (approximately one hour) transmission over 1000BASE-T For a 400 GB (approximately one hour) transmission over 1000BASE-T
with a BER of 10^-10 and a 1518-byte ethernet frame length this means: with a BER of 10^-10 and a 1518-octet ethernet frame length this
means:
TER = 3.44*10^12 * 12144 ^ 2 * 10^-10 ^ 3 / 2^32 = 1.18*10^-19 TER = 3.44*10^12 * 12144 ^ 2 * 10^-10 ^ 3 / 2^32 = 1.18*10^-19
For 11454-byte packets this becomes: For 11454-octet packets this becomes:
TER = 3.44*10^12 * 91632 ^ 2 * 10^-10 ^ 3 / 2^32 = 6.73*10^-18 TER = 3.44*10^12 * 91632 ^ 2 * 10^-10 ^ 3 / 2^32 = 6.73*10^-18
Please note that this is 14 orders of magnitude better than the naive Please note that this is 14 orders of magnitude better than the naive
assumption of a Hamming distance of 1 suggests for standard 1518-byte assumption of a Hamming distance of 1 suggests for standard 1518-octet
ethernet frames: ethernet frames:
TER = 3.44*10^12 * 12144 ^ 0 * 10^-10 ^ 1 / 2^32 = 9.73*10^-4 TER = 3.44*10^12 * 12144 ^ 0 * 10^-10 ^ 1 / 2^32 = 9.73*10^-4
So the strength of the CRC, assuming a Hamming distance of 3, goes So the strength of the CRC, assuming a Hamming distance of 3, goes
down with the square of the factor by which the packet length is down with the square of the factor by which the packet length is
increased. And it goes down with the third power of any increase of increased. And it goes down with the third power of any increase of
the bit error rate. However, this discussion is largely academic the bit error rate. However, this discussion is largely academic
because of the assumption that bit errors happen in isolation. For because of the assumption that bit errors happen in isolation. For
instance, 1000BASE-T transmits two bits per symbol over four wire instance, 1000BASE-T transmits two bits per symbol over four wire
skipping to change at page 8, line 4 skipping to change at page 7, line 49
Larger packets aren't universally desireable. The factors that factor Larger packets aren't universally desireable. The factors that factor
into the decision to use larger packets include: into the decision to use larger packets include:
- A link's bit error rate - A link's bit error rate
- The number of bits per symbol on a link and hence the likelihood of - The number of bits per symbol on a link and hence the likelihood of
multiple bit errors in a single packet multiple bit errors in a single packet
- The strength of the Frame Check Sequence - The strength of the Frame Check Sequence
- The link speed - The link speed
- The number of buffers - The number of buffers
- Queuing strategy - Queuing strategy
This means that choosing a good maximum packet size is, initially at This means that choosing a good maximum packet size is, initially at
least, the responsibility of hardware vendors. On top of that, robust least, the responsibility of hardware vendors. On top of that, robust
mechanisms must be available to operators to further limit maximum mechanisms must be available to operators to further limit maximum
packet sizes where appropriate. packet sizes where appropriate.
4 The protocol mechanisms 4 The protocol mechanisms
The basic idea is that nodes are free to negotiate larger MTUs with The basic idea is that nodes are free to negotiate larger MTUs with
neighbors. However, to avoid problems, test packets are sent first neighbors on a subnet. However, to avoid problems, probe packets
before larger packets are used for actual traffic, and routers and are sent first before larger packets are used for actual traffic,
switches may inform nodes of MTU limitations that are best observed and routers may inform hosts of MTU limitations that should be
or are mandatory to observe. observed for three common ranges of link speeds. The rationale for
having different MTU limitations for different link speeds is that
it's common for devices operating at the link layer to support
larger MTUs if they support and/or operate at higher link speeds.
E.g., a LAN could consist of a gigabit ethernet switch with jumbo
frame capabilities connected to a 10/100 Mbps ethernet switch which
doesn't support jumbo frames. By limiting the use of oversized
packets to nodes operating at 1000 Mbps, the 10/100 Mbps switch
isn't exposed to oversized packets which would result in error
conditions and use up unnecessary bandwidth. Additionally, it may
be desireable to limit packet sizes at lower speeds even if a large
MTU is supported for QoS purposes.
4.1 The variable MTU router advertisement option Additionally, routers send out two flags. One is intended to signal
hosts to be conservative in the number of probes they transmit to
avoid triggering undesired behavior by link-layer devices seeing a
large number of out-of-spec packets. The other flag suppresses
probing for compatibility with the existing practice where all
nodes on a subnet are administratively configured with a
non-standard MTU.
Probing consists of sending a large neighbor discovery or ARP
packet to a neighbor. If the neighbor sends a reply, it managed to
successfully receive the probe so the per-neighbor MTU for this
neighbor can be set to the size of the probe packet and data
packets of that size can now be sent.
4.1 The multi-MTU router advertisement option
Routers use this option to inform hosts on connected subnets about the Routers use this option to inform hosts on connected subnets about the
maximum allowed MTU for a given link speed and the off-link MTU that maximum allowed MTU for three ranges of link speeds.
should be used towards off-link destinations.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Reserved | | Type | Length |C|N| Reserved | Pri |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Off-link MTU | | MAXMTU1000 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Pri | Link speed | | MAXMTU100 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Allowed MTU | | MAXMTU10 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type: TBD Type: TBD
Length: 2 Length:
1 or 2. A length of more than 2 indicates a future extension with
additional fields and MUST NOT be treated as an error, the
additional fields MUST be ignored.
Reserved: 0 on transmission, ignored on reception. C:
"Conservative" flag: when set, nodes should reduce the number of
large packets sent by using a conservative timings and probing
algorithms, if possible avoiding sending more than one
unsuccessful probe per 60 seconds. When the flag is cleared,
nodes may send send several oversized packets per second when
probing.
Off-link MTU: N:
This is the maximum packet size that a router can forward to other "No probe" flag: when set to 0, hosts MUST probe before using
links it connects to. Hosts SHOULD use a TCP MSS option based on oversized packets towards a neighbor. When set to 1, hosts MUST
this value in all TCP sessions and limit packets sent to off-link NOT send probes and use the relevant MAXMTU field as their MTU.
destinations to this maximum. The off-link MTU must be at least If MAXMTU is larger than the physical MTU, an error is logged.
1280. A value of 0 means the off-link MTU is undefined and hosts
should use their physical MTU in TCP MSS options and limit packets Reserved: 0 on transmission, ignored on reception.
sent to routers to the maximum MTU the router supports as
discovered through the neighbor discovery option.
Pri: Pri:
Priority. Values have the following meaning: Priority. Values have the following meaning:
000: Vendor default 000: Vendor default
001: Local override of 000 001: Local override of 000
010: Site default 010: Site default
011: Local override of 010 011: Local override of 010
100: Subnet default 100: Subnet default
101: Local override of 100 101: Local override of 100
skipping to change at page 9, line 19 skipping to change at page 10, line 4
001: Local override of 000 001: Local override of 000
010: Site default 010: Site default
011: Local override of 010 011: Local override of 010
100: Subnet default 100: Subnet default
101: Local override of 100 101: Local override of 100
110: Per-node setting 110: Per-node setting
111: Local override of 110 111: Local override of 110
Vendors may only use priority 000 in default configurations. Vendors may only use priority 000 in default configurations.
Site-wide administrative settings may only use 000 and 010. Site-wide administrative settings may only use 000 and 010.
Subnet-specific administrative settings may use 000, 010 or 110, Subnet-specific administrative settings may use 000, 010 or 110,
but not 001, 011, 101 or 111. but not 001, 011, 101 or 111.
Link speed: MAXMTU1000:
Minimum link speed the option may apply to. Values from 0 to 49151 The maximum packets size allowed on a link operating at a speed
indicate a link speed in megabits per second. Values from 49152 to of 300 Mbps or more. Packets larger than this value SHOULD NOT
65535 are reserved for future use, but imply a link speed of more be sent over the link in question. The MAXMTU1000 MUST be at
than 49151 Mbps. Hosts MUST ignore all options with a link speed least the MTU size specified in the relevant IPv6-over-... RFC.
value that's higher than the current link speed of the interface A value of 0 means that the MTU size is undefined and no
the option is received over. For instance, if a host has an maximum size is enforced for this link speed.
interface that supports 10, 100 and 1000 Mbps ethernet which
currently operates at 100 Mbps, and the host receives options
with link speed values of 100 and 1000 over that interface, the
option with the link speed of 100 is processed and the option
with the link speed of 1000 is ignored.
Allowed MTU: MAXMTU100:
The maximum packets size allowed on a link. Packets larger than The maximum packets size allowed on a link operating at a speed
this value MUST NOT be sent over the link in question. The allowed of 30 to 299 Mbps and links operating at an unknown speed if
MTU MUST be at least 1500. A value of 0 means that the allowed that speed can be 30 Mbps or higher. Packets larger than
MTU is undefined and no maximum MTU is enforced. this value SHOULD NOT be sent over the link in question. The
MAXMTU100 MUST be at least the MTU size specified in the
relevant IPv6-over-... RFC. A value of 0 means that the MTU
size is undefined and no maximum size is enforced for this link
speed.
The number of variable MTU options in router advertisements is limited MAXMTU10:
to a maximum of 4. The maximum packets size allowed on a link operating at a speed
of less than 30 Mbps. Packets larger than this value SHOULD NOT
be sent over the link in question. The MAXMTU10 MUST be at
least the MTU size specified in the relevant IPv6-over-... RFC.
A value of 0 means that the MTU size is undefined and no
maximum size is enforced for this link speed.
Hosts are expected to recover the variable MTU options from the router When MAXMTU1000, MAXMTU100 and MAXMTU10 all contain the same value,
it is allowed to omit MAXMTU100 and MAXMTU10 so the option has a
length of 1 (8 octets) rather than 2 (16 octets). The receiver of
the option should treat the shorter option the same as a full lenth
option where the three MAXMTU fields all contain the value from
MAXMTU1000.
Hosts are expected to recover the multi-MTU options from the router
advertisements of at least the router they select as a default router, advertisements of at least the router they select as a default router,
but it's allowed (not required) to recover options from multiple but it's encouraged (not required) to recover options from multiple
routers. The same option, or data constituting the same information, routers. The same option, or data constituting the same information,
may be learned from other sources, such as local configuration and/or may be learned from other sources, such as local configuration and/or
DHCPv6. Host MUST only consider variable MTU options where the value DHCPv6. Hosts SHOULD use the MAXMTU value relevant for the link
of the link speed field doesn't exceed that of the current link speed speed the interface is currently operating at from the option or
of the associated interface. Any options (or equivalents) that satisfy equivalent information with the largest priority value. If the
this condition are ordered by the priority, link speed and allowed MTU relevant MAXMTU field is unspecified (zero) in the option or
fields, in that order. Hosts SHOULD copy the allowed MTU and off-link information with the highest priority, the field from the option
MTU information, if specified, from the option (or equivalent) with or information with the next highest priority is considered, and
the largest value for the concatenation of these three fields. so on. If no information is available because no option or
equivalent is available, or the relevant MAXMTU field never has a
4.2 Changes to the RA MTU option semantics non-zero value, the host SHOULD use its physical MTU as the
MAXMTU.
Hosts are currently supposed to ignore an MTU of more than 1500 in the
MTU option in router advertisements on ethernet links [RFC2464]. This
makes it impossible to use an MTU larger than 1500 bytes for multicast
packets. In order to lift this limitation, routers and hosts that
implement variable MTU subnets may advertise and accept, respectively,
an MTU option with an MTU larger than 1500. Hosts should use the
minimum of the maximum feasible MTU and the MTU in the RA MTU option
for the transmission of multicast packets.
Note that advertising an MTU option larger than 1500 can only work on
subnets where all the hosts implement variable MTU subnets.
4.3 The switch MTU advertisement message
Switches and other layer 2 devices MAY advertise the maximum MTU they
support in an ICMPv6 [RFC2463] message sent to multicast address TBD.
The format of this ICMPv6 message is as follows:
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of MTUs |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Switch identifier +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Link speed 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Advised MTU 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved | Link speed 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Advised MTU 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
...
| Reserved | Link speed N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Advised MTU N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type: TBD (informational)
Code: TBD
Checksum: see [RFC2463]
Number of MTUs:
Number of times the reserved/link speed/advised MTU fields are
repeated for different link speed values. The minimum is 1, the
maximum 4.
Switch identifier: a 64-bit value that is unique to the switch.
Reserved: 0 on transmission, ignored on reception.
Link speed: When a node's interface speed changes, it MAY reinitiate
Minimum link speed the option may apply to. Values from 0 to 49151 negotiation of per-neighbor MTUs, but it SHOULD remain prepared to
indicate a link speed in megabits per second. Values from 49152 to receive packets of the maximum size indicated to neighbors
65535 are reserved for future use, but imply a link speed of more previously.
than 49151 Mbps. Hosts MUST ignore all options with a link speed
value that's lower than the current link speed of the interface
the option is received over. Note that this is the opposite
behavior of that specified for the link speed in the RA variable
MTU option.
Advised MTU: Devices not acting as IPv6 routers that need to inform hosts on the
The IPv6 MTU the switch supports on ports operating at the local subnet of MTU limitations MAY send out a router advertisement
indicated link speed. In the case of ethernet, the IPv6 MTU is the with a Router Lifetime of 0 [RFC2461] and the pertinent information
maximum frame size after subtracting the size of the VLAN tag, the in a multi-MTU option.
14-byte Ethernet II header and the frame check sequence.
Switch MTU advertisements should be sent out at 5-minute intervals. 4.2 Changes to the RA MTU option semantics
When a port transitions from an inactive or disconnected to an active
state, the interval MAY be reduced to 60 seconds, such that if it has
been 60 seconds or longer ago that the last switch MTU advertisement
was sent out, a switch MTU advertisement is sent out immediately.
If the switch doesn't otherwise implement IPv6, or the IPv6 protocol Hosts are currently supposed to ignore an MTU of more than 1500 in
is inactive, the IPv6 source address should be the unspecified the MTU option in router advertisements on ethernet links
address. Since all the information in the message is thus known in [RFC2464]. This makes it impossible to use an MTU larger than 1500
advance, the entire message, including the checksum, may be octets for multicast packets. In order to lift this limitation,
pre-calculated without the need to implement IPv6 in the switch. routers and hosts that implement multi-MTU subnets may advertise
and accept, respectively, an MTU option with an MTU larger than
1500. Hosts should use the minimum of the MAXMTU for their link
speed and the MTU in the RA MTU option for the transmission of
multicast packets.
Host SHOULD monitor switch MTU advertisement messages, using the Note that advertising an MTU option larger than 1500 can only work on
switch identifier field to detect refreshes/duplicates, and retain all subnets where all the hosts implement multi-MTU subnets.
switch MTU advertisements for 10 minutes. When the switch MTU
advertisement information changes (new advertisements, new information
in previously known advertisements, advertisements expire), hosts
SHOULD select the minimum advised MTU value where the associated link
speed is equal to or higher than the current link speed on the
associated interface. The thusly recovered advised MTU for the link is
the minimum of the MTUs supported by all the switches for this
particular link speed if all switches implement the switch MTU
advertisement mechanism.
4.4 The neighbor discovery MTU option 4.3 The IPv6 neighbor discovery MTU and padding options
A node that implements the variable MTU subnet capability SHOULD A node that implements the multi-MTU subnet capability SHOULD
include an MTU option in both neighbor solicitation and neighbor include an MTU option in both neighbor solicitation and neighbor
advertisement messages [RFC2461]. A node MAY omit the option if the advertisement messages [RFC2461]. A node MAY omit the option if the
use of a larger MTU isn't desired at that time or if the MTU it would use of a larger MTU isn't desired at that time or if the MTU it would
advertise is equal to or lower than the MTU that would otherwise be advertise is equal to or lower than the MTU that would otherwise be
used. However, there is no requirement to omit the option depending on used. However, there is no requirement to omit the option depending on
the value of the different MTU variables as the receiver must the value of the different MTU variables as the receiver must
implement the logic required to determine which MTU to use anyway. implement the logic required to determine which MTU to use anyway.
The format of the neighbor discovery MTU option is as follows: The format of the neighbor discovery MTU option is as follows:
skipping to change at page 12, line 29 skipping to change at page 12, line 4
The format of the neighbor discovery MTU option is as follows: The format of the neighbor discovery MTU option is as follows:
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length | Reserved | | Type | Length | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MTU | | MTU |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type: TBD Type: TBD
Length: 1 Length: 1
Reserved: set to 0 on transmission, ignored on reception. Reserved: set to 0 on transmission, ignored on reception.
MTU: MTU:
The maximum packet size the node is prepared to send and receive, The maximum packet size in octets that the node is prepared to
which is copied from the local MTU. The minimum valid value is receive. The minimum valid value is 1280.
1280.
Reception of a neighbor solicitation or a neighbor advertisement
triggers the sending of an ICMPv6 MTU detection message.
The MTU detection message
Since it's possible that there are layer 2 devices that don't The format of the neighbor discovery MTU option is as follows:
implement the switch MTU advertisement message in the path between two
nodes, it's necessary to make that it is indeed possible to send and
receive packets larger than the standard MTU. This is what the ICMPv6
MTU detection message is for. It has the following format:
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Code | Checksum | | Type | Length |R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Packet size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Padding | | Padding |
... ~ ~
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type: TBD (informational) Type: TBD
Code: TBD Length: see below.
Checksum: see [RFC2463] R: reply flag.
R (reply requested): 0: no reply requested, 1: reply requested Reserved: set to 0 on transmission, ignored on reception.
Reserved: 0 on transmission, ignored on reception Padding: 0 or more all-zero octects.
Packet size: The MTU option is included in all neighbor advertisement and
Size of this packet, including IPv6 and other headers. A value of neighbor solicitation messages.
0 indicates no padding is present and the size of the packet
shouldn't be considered.
Padding: Reception of a neighbor solicitation or a neighbor advertisement
0 or more 0 bytes to bring the packet to the specified packet triggers for a neighbor for which no per-neighbor MTU is known
size. triggers, in addition to the normal response if it's a neighbor
solicitation, the sending of an neighbor solicitation message wih
the MTU and padding options in it. The size of this message is may
vary between the IPv6-over-... size + 1 for the link and the
minimum of the relevant MAXMTU, the physical MTU and the neighbor's
MTU as advertised in the MTU option of the packet received. See
below for considerations about the packet sizes to choose. The
padding option is used to bring the neighbor solicitation message
to this size. The padding option MUST be the last option in the
packet.
In order to avoid sending large numbers of packets that can't be There are two possible ways to determine the value of the length
handled properly by switches or other layer 2 devices, after sending a field:
large MTU detection packet, no other maximum size MTU detection
packets may be transmitted on the same interface for 60 seconds or
until a large MTU detection packet has been received, whichever
happens first. In this context, "large" means larger than the standard
MTU size for the link type, i.e., 1500 bytes for ethernet.
When variable MTU subnet capability is detected for a neighbor by the 1. Set it to 0. As the "length" field in options has a granularity
presence of an MTU option in a neighbor solicitation or neighbor of 8 octets and the behavior of nodes when they receive a
discovery message, an MTU detection message is constructed as follows: neighbor solicitation packet which has a total length that
doesn't match the length of the packet contents, an option
length of 0 is used to make sure that hosts that don't
understand the padding option will silently discard the packet.
R: 2. If the intended packet length allows a valid value for the
Set to 0 if the neighbor MTU is known and confirmed, set to 1 length field, the length field MAY be set to that value. The
otherwise. node MAY reduce the size of the intended packet to accommodate
the requirement that the size field is a multiple of 8 octets.
I.e., if the intended packet size is 4470 octets with 40 and 24
octets for the IPv4 and neighbor solicitation headers,
respectively, the padding option would have to be 4406 octets
long, which can't be expressed in the length field. The node may
choose to use a packet size of 4464 instead, which results in a
length field value of 550.
Packet size: A neighbor solicitation message with the padding option is always
Equal to the minimum of the local MTU and the (tentative) neighbor sent in addition to a regular neighbor solicitation message, rather
MTU. than in place of one.
When an MTU detection packet is received, the size of the packet is When a node receives a neighbor solicitation message with the
checked against the value in the packet size field to detect padding option, it stops evaluating options when it reaches the
truncation in transit. If the packet size and the packet size field padding option and returns a regular neighbor advertisement
don't match, or if the packet size is smaller than 1280 bytes, the message, which includes the MTU option with the R flag set to 1.
message is silently discarded. Whenever the neighbor advertisement is not the result of receiving
a neighbor solicitation with a padding option, the R flag is set to
0.
If the received message has the R flag set to 1, a reply is When a node receives a neighbor advertisement message, it must
constructed as follows: determine whether the message is in reaction to a locally sent
neighbor solicitation with the padding option or not. If the MTU
option is included in the message received, an R flag of 1
indicates that it is indeed a reply. In the absense of the MTU
option the node must use heuristics relating to the timing of the
messages it sent with and without the option, and the reception of
the current message. If the message was a reply, the node sets the
neighbor MTU to the size of the neighbor solicitation message that
was replied to.
R: 0 If no reply is received after some time, either the neighbor is
incapable of receiving packets of the size that was used, or a
device operating at the link layer was incapable for forwarding the
frame. (Incidental packet loss is also a possibility.) In order to
determine a workable MTU even in the presence of unknown
limitations, a node may repeat sending a solicitation with the
padding option. However, since presumably, some equipment may react
badly to a large number of out-of-spec packets, it's important that
nodes adjust their behavior in the presence of the C (conservative)
flag in router advertisements.
Packet size: The above allows for two strategies in determining a neighbor's
Equal to the minimum of the local MTU and the neighbor MTU. MTU: the node can depend on the presence of these mechanisms
described in this document, including setting the padding option
length field to 0, or it can try to interoperate with nodes that do
have the capability of using larger packet sizes, but don't
implement any of the mechanisms described. In that case, the
padding option must conform to [RFC2461] and care must be taken to
avoid overly aggressive probing of nodes that do not support larger
packets.
The neighbor MTU overrules information in the TCP MSS option in TCP Nodes MUST support reception of both types of probes, but MAY be
sessions towards that neighbor. Neighbor MTU information expires along limited to generating only one type.
with link addresses learned through neighbor discovery and upon dead
neighbor detection.
4.5 Determining the local MTU 4.4 IPv4 ethernet jumbo ARP message
The local MTU is the value communicated to neighbors. It is the Due to lack of neighbor discovery, with IPv4, it's necessary to use
minimum of the physical MTU for an interface and the allowed MTU as ARP to probe for non-standard MTU capabilities. This is done by
advertised by a router or learned through other means. The local MTU simply probing with an ARP packet padded to the desired size. If a
may be further reduced by the reception of switch MTU advertisements. reply comes back, the neighbor supports the probed MTU size.
4.5 Probe considerations
In cases where the neigbor's MTU was advertised in an MTU option,
it makes sense to try with this size. If that probe fails or the
neighbor's MTU is unknown, the best choice for a probe size would
be the smallest possible non-standard MTU. This could be the
IPv6-over-... RFC's MTU size + 1, or a slightly larger value that
represents the first larger size that is actually useful, such as
1508 or 1520 for ethernet. Failure at this size wastes relatively
little bandwidth and indicates that further probes are unnecessary.
If this probe is successful, further choices for the probe size may
be common MTU sizes such as 1508, 1530, 1536, 1546, 1998, 2000,
2018, 4464, 4470, 8092, 8192, 9000, 9176, 9180, 9216, 17976, 64000
and 65280 octets.
There is no requirement that a node tries a number of probes of
different sizes; only that before oversized packets are sent, a
reply for a probe of that size or larger MUST have been received
from the neighbor in question, unless the N flag is set to 1. A
simple strategy that would be appropriate when the C flag is set to
1, but may also be used otherwise, would be to initially send just
one probe sized at the local MTU value, and if unsuccessful, only
send a second probe when a probe from the neighbor is received. The
second probe is made the same size as the neighbor's probe.
Probes MUST be sent as unicast.
4.6 Neighbor MTU garbage collection
The MTU size for a neighbor is garbage collected along with a
neighbor's link address in accordance with regular ARP and neighbor
discovery timeouts. Additionally, a neighbor's MTU size is reset to
unknown after dead neighbor detection declares a neighbor "dead".
5 References 5 References
5.1 Normative References 5.1 Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2461] Narten, T., Nordmark, E., and W. Simpson, "Neighbor [RFC2461] Narten, T., Nordmark, E., and W. Simpson, "Neighbor
Discovery for IP Version 6 (IPv6)", RFC 2461, Discovery for IP Version 6 (IPv6)", RFC 2461,
skipping to change at page 15, line 7 skipping to change at page 15, line 34
Autoconfiguration", RFC 2462, December 1998. Autoconfiguration", RFC 2462, December 1998.
5.2 Informative References 5.2 Informative References
[CRC] Jain, R., ""Error Characteristics of Fiber Distributed [CRC] Jain, R., ""Error Characteristics of Fiber Distributed
Data Interface (FDDI)", IEEE Transactions on Data Interface (FDDI)", IEEE Transactions on
Communications, August 1990. Communications, August 1990.
6 Document and Author Information 6 Document and Author Information
This document expires December, 2007. The latest version will always This document expires February, 2008. The latest version will always
be available at http://www.muada.com/drafts/. Please direct questions be available at http://www.muada.com/drafts/. Please direct questions
and comments to the ipv6 or int area mailinglists or directly to the and comments to the ipv6 or int area mailinglists or directly to the
author: author:
Iljitsch van Beijnum Iljitsch van Beijnum
Email: iljitsch@muada.com Email: iljitsch@muada.com
Full Copyright Statement Full Copyright Statement
 End of changes. 84 change blocks. 
288 lines changed or deleted 309 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/