< draft-savola-mtufrag-network-tunneling-04.txt   draft-savola-mtufrag-network-tunneling-05.txt >
Internet Engineering Task Force P. Savola Internet Engineering Task Force P. Savola
Internet-Draft CSC/FUNET Internet-Draft CSC/FUNET
Expires: November 24, 2005 May 23, 2005 Expires: April 8, 2006 October 5, 2005
MTU and Fragmentation Issues with In-the-Network Tunneling MTU and Fragmentation Issues with In-the-Network Tunneling
draft-savola-mtufrag-network-tunneling-04.txt draft-savola-mtufrag-network-tunneling-05.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 33 skipping to change at page 1, line 33
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 24, 2005. This Internet-Draft will expire on April 8, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). Copyright (C) The Internet Society (2005).
Abstract Abstract
Tunneling techniques such as IP-in-IP when deployed in the middle of Tunneling techniques such as IP-in-IP when deployed in the middle of
the network, typically between routers, have certain issues regarding the network, typically between routers, have certain issues regarding
how large packets can be handled: whether such packets would be how large packets can be handled: whether such packets would be
skipping to change at page 2, line 10 skipping to change at page 2, line 10
would be used, or how this scenario could be operationally avoided. would be used, or how this scenario could be operationally avoided.
This memo justifies why this is a common, non-trivial problem, and This memo justifies why this is a common, non-trivial problem, and
goes on to describe the different solutions and their characteristics goes on to describe the different solutions and their characteristics
at some length. at some length.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4
3. Description of Solutions . . . . . . . . . . . . . . . . . . . 5 3. Description of Solutions . . . . . . . . . . . . . . . . . . . 5
3.1 Fragmentation and Reassembly by the Tunnel Endpoints . . . 5 3.1. Fragmentation and Reassembly by the Tunnel Endpoints . . . 5
3.2 Signalling the Lower MTU to the Sources . . . . . . . . . 6 3.2. Signalling the Lower MTU to the Sources . . . . . . . . . 6
3.3 Encapsulate Only When There is Free MTU . . . . . . . . . 7 3.3. Encapsulate Only When There is Free MTU . . . . . . . . . 7
3.4 Fragmentation of the Inner Packet . . . . . . . . . . . . 8 3.4. Fragmentation of the Inner Packet . . . . . . . . . . . . 8
4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.1 Normative References . . . . . . . . . . . . . . . . . . . 12 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12
8.2 Informative References . . . . . . . . . . . . . . . . . . 12 8.2. Informative References . . . . . . . . . . . . . . . . . . 13
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 Appendix A. MTU of the Tunnel . . . . . . . . . . . . . . . . . . 13
A. MTU of the Tunnel . . . . . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13
Intellectual Property and Copyright Statements . . . . . . . . 14 Intellectual Property and Copyright Statements . . . . . . . . . . 14
1. Introduction 1. Introduction
A large number of ways to encapsulate datagrams in other packets, A large number of ways to encapsulate datagrams in other packets,
i.e., tunneling mechanisms, have been specified over the years: for i.e., tunneling mechanisms, have been specified over the years: for
example, IP-in-IP (e.g., [1], [2]), GRE [3], L2TP [4], or IPsec [5] example, IP-in-IP (e.g., [1] [2], [3]), GRE [4], L2TP [5], or IPsec
in tunnel mode -- any of which might run on top of IPv4, IPv6, or [6] in tunnel mode -- any of which might run on top of IPv4, IPv6, or
some other protocol and carrying the same or a different protocol. some other protocol and carrying the same or a different protocol.
All of these can be run so that the endpoints of the inner protocol All of these can be run so that the endpoints of the inner protocol
are co-located with the endpoints of the outer protocol; in a typical are co-located with the endpoints of the outer protocol; in a typical
scenario, this would correspond to "host-to-host" tunneling. It is scenario, this would correspond to "host-to-host" tunneling. It is
also possible to have one set of endpoints co-located, i.e., host-to- also possible to have one set of endpoints co-located, i.e., host-to-
router or router-to-host tunneling. Finally, many of these router or router-to-host tunneling. Finally, many of these
mechanisms are also employed between the routers for all or a part of mechanisms are also employed between the routers for all or a part of
the traffic that passes between them, resulting in router-to-router the traffic that passes between them, resulting in router-to-router
tunneling. tunneling.
All these protocols and scenarios have one issue in common: how does All these protocols and scenarios have one issue in common: how does
the source select the maximum packet size so that the packets will the source select the maximum packet size so that the packets will
fit, even encapsulated, in the largest Maximum Transfer Unit (MTU) of fit, even encapsulated, in the smallest Maximum Transfer Unit (MTU)
the traversed path in the network; and if you cannot affect the of the traversed path in the network; and if you cannot affect the
packet sizes, what do you do to be able to encapsulate them in any packet sizes, what do you do to be able to encapsulate them in any
case? The four main solutions are (these will be elaborated in case? The four main solutions are (these will be elaborated in
Section 3): Section 3):
1. Fragmenting all too big encapsulated packets to fit in the paths, 1. Fragmenting all too big encapsulated packets to fit in the paths,
and reassembling them at the tunnel end-points. and reassembling them at the tunnel end-points.
2. Signal to all the sources whose traffic must be encapsulated, and 2. Signal to all the sources whose traffic must be encapsulated, and
is larger than that fits, to send smaller packets, e.g., using is larger than that fits, to send smaller packets, e.g., using
Path MTU Discovery [6] [7]. Path MTU Discovery [7][8].
3. Ensure that in the specific environment, the encapsulated packets 3. Ensure that in the specific environment, the encapsulated packets
will fit in all the paths in the network, e.g., by using MTU will fit in all the paths in the network, e.g., by using MTU
bigger than 1500 in the backbone used for encapsulation. bigger than 1500 in the backbone used for encapsulation.
4. Fragmenting the original too big packets so that their fragments 4. Fragmenting the original too big packets so that their fragments
will fit, even encapsulated, in the paths, and reassembling them will fit, even encapsulated, in the paths, and reassembling them
at the destination nodes. Note that this approach is only at the destination nodes. Note that this approach is only
available for IPv4 under certain assumptions (see Section 3.4). available for IPv4 under certain assumptions (see Section 3.4).
skipping to change at page 4, line 14 skipping to change at page 4, line 14
The tunneling packet size issues are relatively straightforward in The tunneling packet size issues are relatively straightforward in
host-to-host tunneling or host-to-router tunneling where Path MTU host-to-host tunneling or host-to-router tunneling where Path MTU
Discovery only needs to signal to one source node. The issues are Discovery only needs to signal to one source node. The issues are
significantly more difficult in router-to-router and certain router- significantly more difficult in router-to-router and certain router-
to-host scenarios, which are the focus of this memo. to-host scenarios, which are the focus of this memo.
It is worth noting that most of this discussion applies to a more It is worth noting that most of this discussion applies to a more
generic case, where there exists a link with lower MTU in the path. generic case, where there exists a link with lower MTU in the path.
A concrete and widely deployed example of this is the usage of PPP A concrete and widely deployed example of this is the usage of PPP
over Ethernet (PPPoE) [10] at the customers' access link. These over Ethernet (PPPoE) [11] at the customers' access link. These
lower-MTU links, and particularly PPPoE links, are typically not lower-MTU links, and particularly PPPoE links, are typically not
deployed in topologies where fragmentation and reassembly might be deployed in topologies where fragmentation and reassembly might be
unfeasible (e.g., a backbone), so this may be a slightly easier unfeasible (e.g., a backbone), so this may be a slightly easier
problem. However, this more generic case is considered out of scope problem. However, this more generic case is considered out of scope
of this memo. of this memo.
There are also known challenges in specifying and implementing a There are also known challenges in specifying and implementing a
mechanism which would be used at the tunnel end-point to obtain the mechanism which would be used at the tunnel end-point to obtain the
best suitable packet size to use for encapsulation: if a static value best suitable packet size to use for encapsulation: if a static value
is chosen, a lot of fragmentation might end up being performed. On is chosen, a lot of fragmentation might end up being performed. On
the other hand, if PMTUD is used, the implementation would need to the other hand, if PMTUD is used, the implementation would need to
update the discovered interface MTU based on the ICMP Packet Too Big update the discovered interface MTU based on the ICMP Packet Too Big
messages and originate ICMP Packet Too Big message(s) back to the messages and originate ICMP Packet Too Big message(s) back to the
source(s) of the encapsulated packets; this also assumes that source(s) of the encapsulated packets; this also assumes that
sufficient data has been piggybacked on the ICMP messages (beyond the sufficient data has been piggybacked on the ICMP messages (beyond the
required 64 bits beyond the ICMPv4 header). We'll discuss using required 64 bits after the IPv4 header). We'll discuss using PMTUD
PMTUD to signal the sources briefly in Section 3.2, but in-depth to signal the sources briefly in Section 3.2, but in-depth
specification and analysis is described elsewhere (e.g., in [3] and specification and analysis is described elsewhere (e.g., in [4] and
[1]) and is out of scope of this memo. [2]) and is out of scope of this memo.
Section 2 includes a problem statement, section 3 describes the Section 2 includes a problem statement, section 3 describes the
different solutions with their drawbacks and advantages, and section different solutions with their drawbacks and advantages, and section
4 presents conclusions. 4 presents conclusions.
2. Problem Statement 2. Problem Statement
It is worth considering why exactly this is considered a problem. It is worth considering why exactly this is considered a problem.
It is possible to fix all the packet size issues using the solution It is possible to fix all the packet size issues using the solution
1, fragmenting the resulting encapsulated packet, and reassembling it 1, fragmenting the resulting encapsulated packet, and reassembling it
by the tunnel endpoint. However, this is considered problematic for by the tunnel endpoint. However, this is considered problematic for
at least three reasons, as described in Section 3.1. at least three reasons, as described in Section 3.1.
Therefore it is desirable to avoid fragmentation and reassembly if Therefore it is desirable to avoid fragmentation and reassembly if
possible. On the other hand, the other solutions may not be possible. On the other hand, the other solutions may not be
practical either: especially in router-to-router or router-to-host practical either: especially in router-to-router or router-to-host
tunneling, Path MTU Discovery might be very disadvantageous -- tunneling, Path MTU Discovery might be very disadvantageous --
consider the case where a backbone router would send an ICMP Packet consider the case where a backbone router would send ICMP Packet Too
Too Big messages to every source who would try to send packets Big messages to every source who would try to send packets through
through it. Fragmenting before encapsulation is also not available it. Fragmenting before encapsulation is also not available in IPv6,
in IPv6, and not available when the Don't Fragment (DF) bit has been and not available when the Don't Fragment (DF) bit has been set (see
set (unless the implementation ignores the DF bit). Ensuring high Section 3.4 for more). Ensuring high enough MTU so encapsulation is
enough MTU so encapsulation is always possible is of course a valid always possible is of course a valid approach, but requires careful
approach, but requires careful operational planning, and may not be a operational planning, and may not be a feasible assumption for
feasible assumption for implementors. implementors.
This yields that there is no trivial solution to this problem, and it This yields that there is no trivial solution to this problem, and it
needs to be further explored to consider the tradeoffs, as is done in needs to be further explored to consider the tradeoffs, as is done in
this memo. this memo.
3. Description of Solutions 3. Description of Solutions
This section describes the potential solutions in a bit more detail. This section describes the potential solutions in a bit more detail.
3.1 Fragmentation and Reassembly by the Tunnel Endpoints 3.1. Fragmentation and Reassembly by the Tunnel Endpoints
The seemingly simplest solution to tunneling packet size issues is The seemingly simplest solution to tunneling packet size issues is
fragmentation of the outer packet by the encapsulator, and reassembly fragmentation of the outer packet by the encapsulator, and reassembly
by the decapsulator. However, this is highly problematic for at by the decapsulator. However, this is highly problematic for at
least three reasons: least three reasons:
o Fragmentation causes overhead: every fragment requires the IP o Fragmentation causes overhead: every fragment requires the IP
header (20 or 40 bytes), and with IPv6, additional 8 bytes for the header (20 or 40 bytes), and with IPv6, additional 8 bytes for the
Fragment Header. Fragment Header.
skipping to change at page 5, line 44 skipping to change at page 5, line 45
implementations may not be able to be perform these operations at implementations may not be able to be perform these operations at
line rate. line rate.
o At the time of reassembly, all the information (i.e., all the o At the time of reassembly, all the information (i.e., all the
fragments) is normally not available; when the first fragment fragments) is normally not available; when the first fragment
arrives to be reassembled, a buffer of the maximum possible size arrives to be reassembled, a buffer of the maximum possible size
may have to be allocated because the total length of the may have to be allocated because the total length of the
reassembled datagram is not known at that time. Further, as reassembled datagram is not known at that time. Further, as
fragments might get lost, be reordered or delayed, the reassembly fragments might get lost, be reordered or delayed, the reassembly
engine has to wait with the partial packet for some time (for engine has to wait with the partial packet for some time (for
example, 60 seconds [8]). When this would have to be done at the example, 60 seconds [9]). When this would have to be done at the
line rate, with e.g., 10 Gbit/s speed, the length of the buffers line rate, with e.g., 10 Gbit/s speed, the length of the buffers
that reassembly might require would be prohibitive. that reassembly might require would be prohibitive.
When examining router-to-router tunneling, the third problem is When examining router-to-router tunneling, the third problem is
likely the worst; certainly, a hardware computation and likely the worst; certainly, a hardware computation and
implementation requirement would also be significant, but not all implementation requirement would also be significant, but not all
that difficult in the end -- and the link capacity wasted in the that difficult in the end -- and the link capacity wasted in the
backbones by additional overhead might not be a huge problem either. backbones by additional overhead might not be a huge problem either.
However, IPv4 identification header length is only 16 bits (compared However, IPv4 identification header length is only 16 bits (compared
to 32 bits in IPv6), and if a larger number of packets are being to 32 bits in IPv6), and if a larger number of packets are being
tunneled between two IP addresses, the ID is very likely to wrap and tunneled between two IP addresses, the ID is very likely to wrap and
cause data misassociation. This reassembly wrongly combining data cause data misassociation. This reassembly wrongly combining data
from two unrelated packets causes data integrity and potentially a from two unrelated packets causes data integrity and potentially a
confidentiality violation. This problem is further described in confidentiality violation. This problem is further described in
[11]. [12].
IPv6, and IPv4 with the DF bit set in the encapsulating header, IPv6, and IPv4 with the DF bit set in the encapsulating header,
allows the tunnel endpoints to optimize the tunnel MTU and minimize allows the tunnel endpoints to optimize the tunnel MTU and minimize
network-based reassembly. This also prevents fragmentation of the network-based reassembly. This also prevents fragmentation of the
encapsulated packets on the tunnel path. If the IPv4 encapsulating encapsulated packets on the tunnel path. If the IPv4 encapsulating
header does not have DF bit set, the tunnel endpoints will have to header does not have DF bit set, the tunnel endpoints will have to
perform significant amount of fragmentation and reassembly, while the perform significant amount of fragmentation and reassembly, while the
use of PMTUD is minimized. use of PMTUD is minimized.
As Appendix A describes, the MTU of the tunnel is also a factor on As Appendix A describes, the MTU of the tunnel is also a factor on
which packets require fragmentation and reassembly; the worst case which packets require fragmentation and reassembly; the worst case
occurs if the tunnel MTU is "infinite" or equal to the physical occurs if the tunnel MTU is "infinite" or equal to the physical
interface MTUs. interface MTUs.
So, if reassembly could be made to work sufficiently reliably, this So, if reassembly could be made to work sufficiently reliably, this
would be one acceptable fallback solution but only for IPv6. would be one acceptable fallback solution but only for IPv6.
3.2 Signalling the Lower MTU to the Sources 3.2. Signalling the Lower MTU to the Sources
Another approach is to use techniques like Path MTU Discovery (or Another approach is to use techniques like Path MTU Discovery (or
potentially a future derivative [12]) to signal to the sources whose potentially a future derivative [13]) to signal to the sources whose
packets will be encapsulated in the network to send smaller packets packets will be encapsulated in the network to send smaller packets
so that they can be encapsulated; in particular, when done on so that they can be encapsulated; in particular, when done on
routers, this includes two separable functions: routers, this includes two separable functions:
a. Forwarding behaviour: when forwarding packets, if the IPv4-only a. Forwarding behaviour: when forwarding packets, if the IPv4-only
DF bit is set, the router sends an ICMP Packet Too Big message to DF bit is set, the router sends an ICMP Packet Too Big message to
the source if the MTU of the egress link is too small. the source if the MTU of the egress link is too small.
b. Router's "host" behaviour: when the router receives an ICMP b. Router's "host" behaviour: when the router receives an ICMP
Packet too Big message related to a tunnel, it (1) adjusts the Packet too Big message related to a tunnel, it (1) adjusts the
skipping to change at page 7, line 9 skipping to change at page 7, line 10
either immediately or by waiting for the next packet to trigger either immediately or by waiting for the next packet to trigger
an ICMP; the former minimizes the packet loss due to MTU changes. an ICMP; the former minimizes the packet loss due to MTU changes.
Note that this only works if the MTU of the tunnel is of reasonable Note that this only works if the MTU of the tunnel is of reasonable
size, and not e.g., 64 kilobytes: see Appendix A for more. size, and not e.g., 64 kilobytes: see Appendix A for more.
This approach would presuppose that PMTUD works. While it is This approach would presuppose that PMTUD works. While it is
currently working for IPv6, and critical for its operation, there is currently working for IPv6, and critical for its operation, there is
ample evidence that in IPv4, PMTUD is far from reliable due to e.g., ample evidence that in IPv4, PMTUD is far from reliable due to e.g.,
firewalls and other boxes being configured to inappropriately drop firewalls and other boxes being configured to inappropriately drop
all the ICMP packets [13], or software bugs rendering PMTUD all the ICMP packets [14], or software bugs rendering PMTUD
inoperational. inoperational.
Further, there are two scenarios where signalling from the network Further, there are two scenarios where signalling from the network
would be highly undesirable: when the encapsulation would be done in would be highly undesirable: when the encapsulation would be done in
such a prominent place in the network that a very large number of such a prominent place in the network that a very large number of
sources would need to be signalled with this information (possibly sources would need to be signalled with this information (possibly
even multiple times, depending on how long they keep their PMTUD even multiple times, depending on how long they keep their PMTUD
state), or when the encapsulation is done for passive monitoring state), or when the encapsulation is done for passive monitoring
purposes (network management, lawful interception, etc.) -- when it's purposes (network management, lawful interception, etc.) -- when it's
critical that the sources whose traffic is being encapsulated are not critical that the sources whose traffic is being encapsulated are not
aware of this happening. aware of this happening.
When desiring to avoid fragmentation, IPv4 allows two options: copy When desiring to avoid fragmentation, IPv4 requires one of two
the DF bit from the inner packets to the encapsulating header, or alternatives [1]: copy the DF bit from the inner packets to the
always set the DF bit. The latter is better especially in controlled encapsulating header, or always set the DF bit of the outer header.
environments, because it forces PMTUD to converge immediately. The latter is better especially in controlled environments, because
it forces PMTUD to converge immediately.
A related technique, which works with TCP under specific scenarios A related technique, which works with TCP under specific scenarios
only is so-called "MSS clamping". With that technique or rather a only is so-called "MSS clamping". With that technique or rather a
"hack", the TCP packets' Maximum Segment Size (MSS) is reduced by "hack", the TCP packets' Maximum Segment Size (MSS) is reduced by
tunnel endpoints so that the TCP connection automatically restricts tunnel endpoints so that the TCP connection automatically restricts
itself to the maximum available packet size. Obviously this does not itself to the maximum available packet size. Obviously this does not
work for UDP or other protocols which have no MSS. This approach is work for UDP or other protocols which have no MSS. This approach is
most applicable and used with PPPoE, but could be applied otherwise most applicable and used with PPPoE, but could be applied otherwise
as well; the approach also assumes that all the traffic goes through as well; the approach also assumes that all the traffic goes through
tunnel endpoints which do MSS clamping -- this is trivial for the tunnel endpoints which do MSS clamping -- this is trivial for the
single-homed access links, but could be a challenge otherwise. single-homed access links, but could be a challenge otherwise.
A new approach to PMTUD is in the works [12], but it is uncertain A new approach to PMTUD is in the works [13], but it is uncertain
whether that would fix the problems -- at least not the passive whether that would fix the problems -- at least not the passive
monitoring requirements. monitoring requirements.
3.3 Encapsulate Only When There is Free MTU 3.3. Encapsulate Only When There is Free MTU
The third approach is an operational one, depending on the The third approach is an operational one, depending on the
environment where encapsulation and decapsulation is being performed. environment where encapsulation and decapsulation is being performed.
That is, if an ISP would deploy tunneling in its backbone, which That is, if an ISP would deploy tunneling in its backbone, which
would consist only of links supporting high MTUs (e.g., Gigabit would consist only of links supporting high MTUs (e.g., Gigabit
Ethernet or SDH/SONET), but all its customers and peers would have a Ethernet or SDH/SONET), but all its customers and peers would have a
lower MTU (e.g., 1500, or the backbone MTU minus the encapsulation lower MTU (e.g., 1500, or the backbone MTU minus the encapsulation
overhead), this would imply that no packets (with the encapsulation overhead), this would imply that no packets (with the encapsulation
overhead added) would have larger MTU than the "backbone MTU", and overhead added) would have larger MTU than the "backbone MTU", and
all the encapsulated packets would always fit MTU-wise in the all the encapsulated packets would always fit MTU-wise in the
skipping to change at page 8, line 29 skipping to change at page 8, line 32
Another, related approach might be having the sources use only a low Another, related approach might be having the sources use only a low
enough MTU which would fit in all the physical MTUs; for example, enough MTU which would fit in all the physical MTUs; for example,
IPv6 specifies the minimum MTU of 1280 bytes. For example, if all IPv6 specifies the minimum MTU of 1280 bytes. For example, if all
the sources whose traffic would be encapsulated would use this as the the sources whose traffic would be encapsulated would use this as the
maximum packet size, there would probably always be enough free MTU maximum packet size, there would probably always be enough free MTU
for encapsulation in the network. However, this is not the case for encapsulation in the network. However, this is not the case
today, and it would be completely unrealistic to assume that this today, and it would be completely unrealistic to assume that this
kind of approach could be made to work in general. kind of approach could be made to work in general.
It is worth remembering that while the IPv6 minimum MTU is 1280 bytes It is worth remembering that while the IPv6 minimum MTU is 1280 bytes
[9], there are scenarios where the tunnel implementation must [10], there are scenarios where the tunnel implementation must
implement fragmentation and reassembly [2]: for example, when having implement fragmentation and reassembly [3]: for example, when having
an IPv6-in-IPv6 tunnel on top of a physical interface with MTU of an IPv6-in-IPv6 tunnel on top of a physical interface with MTU of
1280 bytes, or when having two layers of IPv6 tunneling. This can 1280 bytes, or when having two layers of IPv6 tunneling. This can
only be avoided by ensuring that links on top of which IPv6 is being only be avoided by ensuring that links on top of which IPv6 is being
tunneled have a somewhat larger MTU (e.g., 40 bytes) than 1280 bytes. tunneled have a somewhat larger MTU (e.g., 40 bytes) than 1280 bytes.
This conclusion can be generalized: because IP can be tunneled on top This conclusion can be generalized: because IP can be tunneled on top
of IP, no single minimum or maximum MTU can be found such that of IP, no single minimum or maximum MTU can be found such that
fragmentation or signalling to the sources would never be needed. fragmentation or signalling to the sources would never be needed.
All in all, while in certain operational environments it might be All in all, while in certain operational environments it might be
possible to avoid any problems by deployment choices, or limiting the possible to avoid any problems by deployment choices, or limiting the
MTU that the sources use, this is probably not a sufficiently good MTU that the sources use, this is probably not a sufficiently good
general solution for the equipment vendors, and other solutions must general solution for the equipment vendors, and other solutions must
also be provided. also be provided.
3.4 Fragmentation of the Inner Packet 3.4. Fragmentation of the Inner Packet
A final possibility is fragmenting the inner packet, before A final possibility is fragmenting the inner packet, before
encapsulation, in such a manner that the encapsulated packet fits in encapsulation, in such a manner that the encapsulated packet fits in
the the tunnel's path MTU (discovered using PMTUD). However, one the the tunnel's path MTU (discovered using PMTUD). However, one
should note that only IPv4 supports this "in-flight" fragmentation; should note that only IPv4 supports this "in-flight" fragmentation;
further, it isn't allowed for packets where Don't Fragment -bit has further, it isn't allowed for packets where Don't Fragment -bit has
been set. Even if one could ignore IPv6 completely, so many IPv4 been set. Even if one could ignore IPv6 completely, so many IPv4
host stacks send packets with DF bit set that this would seem host stacks send packets with DF bit set that this would seem
unfeasible. unfeasible.
Regardless of what the specifications say, there are implementations However, there are existing implementations that violate the standard
that: that:
o Discard too big packets with DF bit not set instead of fragmenting o Discard too big packets with DF bit not set instead of fragmenting
them (this is rare), them (this is rare),
o Ignore the DF bit completely, for all or specified interfaces, or o Ignore the DF bit completely, for all or specified interfaces, or
o Clear the DF bit before encapsulation, in the egress of configured o Clear the DF bit before encapsulation, in the egress of configured
interfaces. This is typically done for all the traffic, not just interfaces. This is typically done for all the traffic, not just
too big packets (allowing configuring this is common). too big packets (allowing configuring this is common).
skipping to change at page 9, line 48 skipping to change at page 10, line 4
would not work with IPv6, it could not be considered a generic would not work with IPv6, it could not be considered a generic
solution. solution.
4. Conclusions 4. Conclusions
Fragmentation and reassembly by the tunnel endpoints is a clear and Fragmentation and reassembly by the tunnel endpoints is a clear and
simple solution to the problem, but the hardware reassembly when the simple solution to the problem, but the hardware reassembly when the
packets get lost may face significant implementation challenges which packets get lost may face significant implementation challenges which
may be insurmountable. This approach does not seem feasible may be insurmountable. This approach does not seem feasible
especially for IPv4 with high data rates due to problems with especially for IPv4 with high data rates due to problems with
wrapping fragment identification field [11]. Constant wrapping may wrapping fragment identification field [12]. Constant wrapping may
occur when the data rate is in the order of MB/s for IPv4 and in the occur when the data rate is in the order of MB/s for IPv4 and in the
order of dozens of GB/s for IPv6. However, this reassembly approach order of dozens of GB/s for IPv6. However, this reassembly approach
is probably not a problem for passive monitoring applications. is probably not a problem for passive monitoring applications.
PMTUD techniques, at least at the moment and especially for IPv4, PMTUD techniques, at least at the moment and especially for IPv4,
appear to be too unreliable or unscalable to be used in the appear to be too unreliable or unscalable to be used in the
backbones. It is an open question whether a future solution might backbones. It is an open question whether a future solution might
work better in this aspect. work better in this aspect.
It is clear that in some environments the operational approach to the It is clear that in some environments the operational approach to the
skipping to change at page 11, line 20 skipping to change at page 11, line 25
This document describes different issues with packet sizes and in- This document describes different issues with packet sizes and in-
the-network tunneling; this does not have security considerations on the-network tunneling; this does not have security considerations on
its own. its own.
However, different solutions might have characteristics which may However, different solutions might have characteristics which may
make them more susceptible to attacks -- for example, a router-based make them more susceptible to attacks -- for example, a router-based
fragment reassembly could easily lead to (reassembly) buffer memory fragment reassembly could easily lead to (reassembly) buffer memory
exhaustion if the attacker would send a sufficient number of exhaustion if the attacker would send a sufficient number of
fragments without sending all of them, so that the reassembly would fragments without sending all of them, so that the reassembly would
be stalled until a timeout; these and other fragment attacks (e.g., be stalled until a timeout; these and other fragment attacks (e.g.,
[14]) have already been used against e.g., firewalls and host stacks, [15]) have already been used against e.g., firewalls and host stacks,
and need to be taken into consideration in the implementations. and need to be taken into consideration in the implementations.
It is worth considering the cryptographic expense (which is typically It is worth considering the cryptographic expense (which is typically
more significant than the reassembly, if done in software) with more significant than the reassembly, if done in software) with
fragmentation of the inner or outer packet. If an outer fragment fragmentation of the inner or outer packet. If an outer fragment
goes missing, no cryptographic operations have been yet performed; if goes missing, no cryptographic operations have been yet performed; if
an inner fragment goes missing, cryptographic operations have already an inner fragment goes missing, cryptographic operations have already
been performed. Therefore, which of these approaches is preferable been performed. Therefore, which of these approaches is preferable
also depends on whether cryptography or reassembly are already also depends on whether cryptography or reassembly are already
provided in hardware; for high-speed routers, at least, one should be provided in hardware; for high-speed routers, at least, one should be
able to assume that if it is performing relatively heavy able to assume that if it is performing relatively heavy
cryptography, hardware support is already required. cryptography, hardware support is already required.
The solutions using PMTUD (and consequently ICMP) will also need to The solutions using PMTUD (and consequently ICMP) will also need to
take into account the attacks using ICMP. In particular, an attacker take into account the attacks using ICMP. In particular, an attacker
could send ICMP Packet Too Big messages indicating a very low MTU to could send ICMP Packet Too Big messages indicating a very low MTU to
reduce the throughput and/or as a fragmentation/reassembly denial-of- reduce the throughput and/or as a fragmentation/reassembly denial-of-
service attack. This attack has been described in the context of TCP service attack. This attack has been described in the context of TCP
in [15]. in [16].
7. Acknowledgements 7. Acknowledgements
While the topic is far from new, recent discussions with W. Mark While the topic is far from new, recent discussions with W. Mark
Townsley on L2TP fragmentation issues caused the author to sit down Townsley on L2TP fragmentation issues caused the author to sit down
and write up the issues in more general. Michael Richardson and Mika and write up the issues in more general. Michael Richardson and Mika
Joutsenvirta provided useful feedback on the first draft. When Joutsenvirta provided useful feedback on the first draft. When
soliciting comments from NANOG list, Carsten Bormann, Kevin Miller, soliciting comments from NANOG list, Carsten Bormann, Kevin Miller,
Warren Kumari, Iljitsch van Beijnum, Alok Dube, and Stephen J. Wilcox Warren Kumari, Iljitsch van Beijnum, Alok Dube, and Stephen J. Wilcox
provided useful feedback. Later, Carlos Pignataro provided excellent provided useful feedback. Later, Carlos Pignataro provided excellent
input, helping in improving the document. input, helping in improving the document. Joe Touch also provided
input on the memo.
8. References 8. References
8.1 Normative References
[1] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for 8.1. Normative References
IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-07 (work in
progress), March 2005.
[2] Conta, A. and S. Deering, "Generic Packet Tunneling in IPv6 [1] Perkins, C., "IP Encapsulation within IP", RFC 2003,
Specification", RFC 2473, December 1998. October 1996.
[3] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, [2] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for
"Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-07 (work in
progress), March 2005.
[4] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling [3] Conta, A. and S. Deering, "Generic Packet Tunneling in IPv6
Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. Specification", RFC 2473, December 1998.
[5] Kent, S. and R. Atkinson, "Security Architecture for the [4] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina,
Internet Protocol", RFC 2401, November 1998. "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000.
[6] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, [5] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling
November 1990. Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005.
[7] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for [6] Kent, S. and K. Seo, "Security Architecture for the Internet
IP version 6", RFC 1981, August 1996. Protocol", draft-ietf-ipsec-rfc2401bis-06 (work in progress),
April 2005.
[8] Braden, R., "Requirements for Internet Hosts - Communication [7] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
Layers", STD 3, RFC 1122, October 1989. November 1990.
[9] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) [8] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for
Specification", RFC 2460, December 1998. IP version 6", RFC 1981, August 1996.
8.2 Informative References [9] Braden, R., "Requirements for Internet Hosts - Communication
Layers", STD 3, RFC 1122, October 1989.
[10] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., and [10] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
Specification", RFC 2460, December 1998.
8.2. Informative References
[11] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., and
R. Wheeler, "A Method for Transmitting PPP Over Ethernet R. Wheeler, "A Method for Transmitting PPP Over Ethernet
(PPPoE)", RFC 2516, February 1999. (PPPoE)", RFC 2516, February 1999.
[11] Mathis, M., "Fragmentation Considered Very Harmful", [12] Mathis, M., "Fragmentation Considered Very Harmful",
draft-mathis-frag-harmful-00 (work in progress), July 2004. draft-mathis-frag-harmful-00 (work in progress), July 2004.
[12] Mathis, M., "Path MTU Discovery", draft-ietf-pmtud-method-04 [13] Mathis, M., "Path MTU Discovery", draft-ietf-pmtud-method-04
(work in progress), February 2005. (work in progress), February 2005.
[13] Medina, A., Allman, M., and S. Floyd, "Measuring the Evolution [14] Medina, A., Allman, M., and S. Floyd, "Measuring the Evolution
of Transport Protocols in the Internet", Computer of Transport Protocols in the Internet", Computer
Communications Review, Apr 2005, <http://www.icir.org/tbit/>. Communications Review, Apr 2005, <http://www.icir.org/tbit/>.
[14] Miller, I., "Protection Against a Variant of the Tiny Fragment [15] Miller, I., "Protection Against a Variant of the Tiny Fragment
Attack (RFC 1858)", RFC 3128, June 2001. Attack (RFC 1858)", RFC 3128, June 2001.
[15] Gont, F., "ICMP attacks against TCP", [16] Gont, F., "ICMP attacks against TCP",
draft-gont-tcpm-icmp-attacks-03 (work in progress), draft-gont-tcpm-icmp-attacks-04 (work in progress),
December 2004. September 2005.
Author's Address
Pekka Savola
CSC/FUNET
Espoo
Finland
Email: psavola@funet.fi
Appendix A. MTU of the Tunnel Appendix A. MTU of the Tunnel
Different tunneling mechanisms may treat the tunnel links as having Different tunneling mechanisms may treat the tunnel links as having
different kind of MTU values. Some might use the same default MTU as different kind of MTU values. Some might use the same default MTU as
for other interfaces; some others might use the default MTU minus the for other interfaces; some others might use the default MTU minus the
expected IP overhead (e.g., 20, 28, or 40 bytes); some others might expected IP overhead (e.g., 20, 28, or 40 bytes); some others might
even treat the tunnel as having "infinite MTU", e.g., 64 kilobytes. even treat the tunnel as having "infinite MTU", e.g., 64 kilobytes.
As [1] describes, having an infinite MTU, i.e., always fragmenting As [2] describes, having an infinite MTU, i.e., always fragmenting
the outer packet (and never the inner packet) and never performing the outer packet (and never the inner packet) and never performing
PMTUD for the tunnel path is a very bad idea, especially in host-to- PMTUD for the tunnel path is a very bad idea, especially in host-to-
router scenarios. (It could be argued that if the nodes are sure router scenarios. (It could be argued that if the nodes are sure
that this is a host-to-host tunnel, a larger MTU might make sense if that this is a host-to-host tunnel, a larger MTU might make sense if
fragmentation and reassembly is more efficient than just sending fragmentation and reassembly is more efficient than just sending
properly sized packets -- but this seems like a stretch.) properly sized packets -- but this seems like a stretch.)
Intellectual Property Statement Author's Address
Pekka Savola
CSC/FUNET
Espoo
Finland
Email: psavola@funet.fi
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79. found in BCP 78 and BCP 79.
skipping to change at page 14, line 29 skipping to change at page 15, line 7
such proprietary rights by implementers or users of this such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at this standard. Please address the information to the IETF at
ietf-ipr@ietf.org. ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is currently provided by the
Internet Society. Internet Society.
 End of changes. 48 change blocks. 
107 lines changed or deleted 114 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/