| < draft-savola-mtufrag-network-tunneling-04.txt | draft-savola-mtufrag-network-tunneling-05.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force P. Savola | Internet Engineering Task Force P. Savola | |||
| Internet-Draft CSC/FUNET | Internet-Draft CSC/FUNET | |||
| Expires: November 24, 2005 May 23, 2005 | Expires: April 8, 2006 October 5, 2005 | |||
| MTU and Fragmentation Issues with In-the-Network Tunneling | MTU and Fragmentation Issues with In-the-Network Tunneling | |||
| draft-savola-mtufrag-network-tunneling-04.txt | draft-savola-mtufrag-network-tunneling-05.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 33 ¶ | skipping to change at page 1, line 33 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on November 24, 2005. | This Internet-Draft will expire on April 8, 2006. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2005). | Copyright (C) The Internet Society (2005). | |||
| Abstract | Abstract | |||
| Tunneling techniques such as IP-in-IP when deployed in the middle of | Tunneling techniques such as IP-in-IP when deployed in the middle of | |||
| the network, typically between routers, have certain issues regarding | the network, typically between routers, have certain issues regarding | |||
| how large packets can be handled: whether such packets would be | how large packets can be handled: whether such packets would be | |||
| skipping to change at page 2, line 10 ¶ | skipping to change at page 2, line 10 ¶ | |||
| would be used, or how this scenario could be operationally avoided. | would be used, or how this scenario could be operationally avoided. | |||
| This memo justifies why this is a common, non-trivial problem, and | This memo justifies why this is a common, non-trivial problem, and | |||
| goes on to describe the different solutions and their characteristics | goes on to describe the different solutions and their characteristics | |||
| at some length. | at some length. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 3. Description of Solutions . . . . . . . . . . . . . . . . . . . 5 | 3. Description of Solutions . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.1 Fragmentation and Reassembly by the Tunnel Endpoints . . . 5 | 3.1. Fragmentation and Reassembly by the Tunnel Endpoints . . . 5 | |||
| 3.2 Signalling the Lower MTU to the Sources . . . . . . . . . 6 | 3.2. Signalling the Lower MTU to the Sources . . . . . . . . . 6 | |||
| 3.3 Encapsulate Only When There is Free MTU . . . . . . . . . 7 | 3.3. Encapsulate Only When There is Free MTU . . . . . . . . . 7 | |||
| 3.4 Fragmentation of the Inner Packet . . . . . . . . . . . . 8 | 3.4. Fragmentation of the Inner Packet . . . . . . . . . . . . 8 | |||
| 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | |||
| 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 8.1 Normative References . . . . . . . . . . . . . . . . . . . 12 | 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12 | |||
| 8.2 Informative References . . . . . . . . . . . . . . . . . . 12 | 8.2. Informative References . . . . . . . . . . . . . . . . . . 13 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . 13 | Appendix A. MTU of the Tunnel . . . . . . . . . . . . . . . . . . 13 | |||
| A. MTU of the Tunnel . . . . . . . . . . . . . . . . . . . . . . 13 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| Intellectual Property and Copyright Statements . . . . . . . . 14 | Intellectual Property and Copyright Statements . . . . . . . . . . 14 | |||
| 1. Introduction | 1. Introduction | |||
| A large number of ways to encapsulate datagrams in other packets, | A large number of ways to encapsulate datagrams in other packets, | |||
| i.e., tunneling mechanisms, have been specified over the years: for | i.e., tunneling mechanisms, have been specified over the years: for | |||
| example, IP-in-IP (e.g., [1], [2]), GRE [3], L2TP [4], or IPsec [5] | example, IP-in-IP (e.g., [1] [2], [3]), GRE [4], L2TP [5], or IPsec | |||
| in tunnel mode -- any of which might run on top of IPv4, IPv6, or | [6] in tunnel mode -- any of which might run on top of IPv4, IPv6, or | |||
| some other protocol and carrying the same or a different protocol. | some other protocol and carrying the same or a different protocol. | |||
| All of these can be run so that the endpoints of the inner protocol | All of these can be run so that the endpoints of the inner protocol | |||
| are co-located with the endpoints of the outer protocol; in a typical | are co-located with the endpoints of the outer protocol; in a typical | |||
| scenario, this would correspond to "host-to-host" tunneling. It is | scenario, this would correspond to "host-to-host" tunneling. It is | |||
| also possible to have one set of endpoints co-located, i.e., host-to- | also possible to have one set of endpoints co-located, i.e., host-to- | |||
| router or router-to-host tunneling. Finally, many of these | router or router-to-host tunneling. Finally, many of these | |||
| mechanisms are also employed between the routers for all or a part of | mechanisms are also employed between the routers for all or a part of | |||
| the traffic that passes between them, resulting in router-to-router | the traffic that passes between them, resulting in router-to-router | |||
| tunneling. | tunneling. | |||
| All these protocols and scenarios have one issue in common: how does | All these protocols and scenarios have one issue in common: how does | |||
| the source select the maximum packet size so that the packets will | the source select the maximum packet size so that the packets will | |||
| fit, even encapsulated, in the largest Maximum Transfer Unit (MTU) of | fit, even encapsulated, in the smallest Maximum Transfer Unit (MTU) | |||
| the traversed path in the network; and if you cannot affect the | of the traversed path in the network; and if you cannot affect the | |||
| packet sizes, what do you do to be able to encapsulate them in any | packet sizes, what do you do to be able to encapsulate them in any | |||
| case? The four main solutions are (these will be elaborated in | case? The four main solutions are (these will be elaborated in | |||
| Section 3): | Section 3): | |||
| 1. Fragmenting all too big encapsulated packets to fit in the paths, | 1. Fragmenting all too big encapsulated packets to fit in the paths, | |||
| and reassembling them at the tunnel end-points. | and reassembling them at the tunnel end-points. | |||
| 2. Signal to all the sources whose traffic must be encapsulated, and | 2. Signal to all the sources whose traffic must be encapsulated, and | |||
| is larger than that fits, to send smaller packets, e.g., using | is larger than that fits, to send smaller packets, e.g., using | |||
| Path MTU Discovery [6] [7]. | Path MTU Discovery [7][8]. | |||
| 3. Ensure that in the specific environment, the encapsulated packets | 3. Ensure that in the specific environment, the encapsulated packets | |||
| will fit in all the paths in the network, e.g., by using MTU | will fit in all the paths in the network, e.g., by using MTU | |||
| bigger than 1500 in the backbone used for encapsulation. | bigger than 1500 in the backbone used for encapsulation. | |||
| 4. Fragmenting the original too big packets so that their fragments | 4. Fragmenting the original too big packets so that their fragments | |||
| will fit, even encapsulated, in the paths, and reassembling them | will fit, even encapsulated, in the paths, and reassembling them | |||
| at the destination nodes. Note that this approach is only | at the destination nodes. Note that this approach is only | |||
| available for IPv4 under certain assumptions (see Section 3.4). | available for IPv4 under certain assumptions (see Section 3.4). | |||
| skipping to change at page 4, line 14 ¶ | skipping to change at page 4, line 14 ¶ | |||
| The tunneling packet size issues are relatively straightforward in | The tunneling packet size issues are relatively straightforward in | |||
| host-to-host tunneling or host-to-router tunneling where Path MTU | host-to-host tunneling or host-to-router tunneling where Path MTU | |||
| Discovery only needs to signal to one source node. The issues are | Discovery only needs to signal to one source node. The issues are | |||
| significantly more difficult in router-to-router and certain router- | significantly more difficult in router-to-router and certain router- | |||
| to-host scenarios, which are the focus of this memo. | to-host scenarios, which are the focus of this memo. | |||
| It is worth noting that most of this discussion applies to a more | It is worth noting that most of this discussion applies to a more | |||
| generic case, where there exists a link with lower MTU in the path. | generic case, where there exists a link with lower MTU in the path. | |||
| A concrete and widely deployed example of this is the usage of PPP | A concrete and widely deployed example of this is the usage of PPP | |||
| over Ethernet (PPPoE) [10] at the customers' access link. These | over Ethernet (PPPoE) [11] at the customers' access link. These | |||
| lower-MTU links, and particularly PPPoE links, are typically not | lower-MTU links, and particularly PPPoE links, are typically not | |||
| deployed in topologies where fragmentation and reassembly might be | deployed in topologies where fragmentation and reassembly might be | |||
| unfeasible (e.g., a backbone), so this may be a slightly easier | unfeasible (e.g., a backbone), so this may be a slightly easier | |||
| problem. However, this more generic case is considered out of scope | problem. However, this more generic case is considered out of scope | |||
| of this memo. | of this memo. | |||
| There are also known challenges in specifying and implementing a | There are also known challenges in specifying and implementing a | |||
| mechanism which would be used at the tunnel end-point to obtain the | mechanism which would be used at the tunnel end-point to obtain the | |||
| best suitable packet size to use for encapsulation: if a static value | best suitable packet size to use for encapsulation: if a static value | |||
| is chosen, a lot of fragmentation might end up being performed. On | is chosen, a lot of fragmentation might end up being performed. On | |||
| the other hand, if PMTUD is used, the implementation would need to | the other hand, if PMTUD is used, the implementation would need to | |||
| update the discovered interface MTU based on the ICMP Packet Too Big | update the discovered interface MTU based on the ICMP Packet Too Big | |||
| messages and originate ICMP Packet Too Big message(s) back to the | messages and originate ICMP Packet Too Big message(s) back to the | |||
| source(s) of the encapsulated packets; this also assumes that | source(s) of the encapsulated packets; this also assumes that | |||
| sufficient data has been piggybacked on the ICMP messages (beyond the | sufficient data has been piggybacked on the ICMP messages (beyond the | |||
| required 64 bits beyond the ICMPv4 header). We'll discuss using | required 64 bits after the IPv4 header). We'll discuss using PMTUD | |||
| PMTUD to signal the sources briefly in Section 3.2, but in-depth | to signal the sources briefly in Section 3.2, but in-depth | |||
| specification and analysis is described elsewhere (e.g., in [3] and | specification and analysis is described elsewhere (e.g., in [4] and | |||
| [1]) and is out of scope of this memo. | [2]) and is out of scope of this memo. | |||
| Section 2 includes a problem statement, section 3 describes the | Section 2 includes a problem statement, section 3 describes the | |||
| different solutions with their drawbacks and advantages, and section | different solutions with their drawbacks and advantages, and section | |||
| 4 presents conclusions. | 4 presents conclusions. | |||
| 2. Problem Statement | 2. Problem Statement | |||
| It is worth considering why exactly this is considered a problem. | It is worth considering why exactly this is considered a problem. | |||
| It is possible to fix all the packet size issues using the solution | It is possible to fix all the packet size issues using the solution | |||
| 1, fragmenting the resulting encapsulated packet, and reassembling it | 1, fragmenting the resulting encapsulated packet, and reassembling it | |||
| by the tunnel endpoint. However, this is considered problematic for | by the tunnel endpoint. However, this is considered problematic for | |||
| at least three reasons, as described in Section 3.1. | at least three reasons, as described in Section 3.1. | |||
| Therefore it is desirable to avoid fragmentation and reassembly if | Therefore it is desirable to avoid fragmentation and reassembly if | |||
| possible. On the other hand, the other solutions may not be | possible. On the other hand, the other solutions may not be | |||
| practical either: especially in router-to-router or router-to-host | practical either: especially in router-to-router or router-to-host | |||
| tunneling, Path MTU Discovery might be very disadvantageous -- | tunneling, Path MTU Discovery might be very disadvantageous -- | |||
| consider the case where a backbone router would send an ICMP Packet | consider the case where a backbone router would send ICMP Packet Too | |||
| Too Big messages to every source who would try to send packets | Big messages to every source who would try to send packets through | |||
| through it. Fragmenting before encapsulation is also not available | it. Fragmenting before encapsulation is also not available in IPv6, | |||
| in IPv6, and not available when the Don't Fragment (DF) bit has been | and not available when the Don't Fragment (DF) bit has been set (see | |||
| set (unless the implementation ignores the DF bit). Ensuring high | Section 3.4 for more). Ensuring high enough MTU so encapsulation is | |||
| enough MTU so encapsulation is always possible is of course a valid | always possible is of course a valid approach, but requires careful | |||
| approach, but requires careful operational planning, and may not be a | operational planning, and may not be a feasible assumption for | |||
| feasible assumption for implementors. | implementors. | |||
| This yields that there is no trivial solution to this problem, and it | This yields that there is no trivial solution to this problem, and it | |||
| needs to be further explored to consider the tradeoffs, as is done in | needs to be further explored to consider the tradeoffs, as is done in | |||
| this memo. | this memo. | |||
| 3. Description of Solutions | 3. Description of Solutions | |||
| This section describes the potential solutions in a bit more detail. | This section describes the potential solutions in a bit more detail. | |||
| 3.1 Fragmentation and Reassembly by the Tunnel Endpoints | 3.1. Fragmentation and Reassembly by the Tunnel Endpoints | |||
| The seemingly simplest solution to tunneling packet size issues is | The seemingly simplest solution to tunneling packet size issues is | |||
| fragmentation of the outer packet by the encapsulator, and reassembly | fragmentation of the outer packet by the encapsulator, and reassembly | |||
| by the decapsulator. However, this is highly problematic for at | by the decapsulator. However, this is highly problematic for at | |||
| least three reasons: | least three reasons: | |||
| o Fragmentation causes overhead: every fragment requires the IP | o Fragmentation causes overhead: every fragment requires the IP | |||
| header (20 or 40 bytes), and with IPv6, additional 8 bytes for the | header (20 or 40 bytes), and with IPv6, additional 8 bytes for the | |||
| Fragment Header. | Fragment Header. | |||
| skipping to change at page 5, line 44 ¶ | skipping to change at page 5, line 45 ¶ | |||
| implementations may not be able to be perform these operations at | implementations may not be able to be perform these operations at | |||
| line rate. | line rate. | |||
| o At the time of reassembly, all the information (i.e., all the | o At the time of reassembly, all the information (i.e., all the | |||
| fragments) is normally not available; when the first fragment | fragments) is normally not available; when the first fragment | |||
| arrives to be reassembled, a buffer of the maximum possible size | arrives to be reassembled, a buffer of the maximum possible size | |||
| may have to be allocated because the total length of the | may have to be allocated because the total length of the | |||
| reassembled datagram is not known at that time. Further, as | reassembled datagram is not known at that time. Further, as | |||
| fragments might get lost, be reordered or delayed, the reassembly | fragments might get lost, be reordered or delayed, the reassembly | |||
| engine has to wait with the partial packet for some time (for | engine has to wait with the partial packet for some time (for | |||
| example, 60 seconds [8]). When this would have to be done at the | example, 60 seconds [9]). When this would have to be done at the | |||
| line rate, with e.g., 10 Gbit/s speed, the length of the buffers | line rate, with e.g., 10 Gbit/s speed, the length of the buffers | |||
| that reassembly might require would be prohibitive. | that reassembly might require would be prohibitive. | |||
| When examining router-to-router tunneling, the third problem is | When examining router-to-router tunneling, the third problem is | |||
| likely the worst; certainly, a hardware computation and | likely the worst; certainly, a hardware computation and | |||
| implementation requirement would also be significant, but not all | implementation requirement would also be significant, but not all | |||
| that difficult in the end -- and the link capacity wasted in the | that difficult in the end -- and the link capacity wasted in the | |||
| backbones by additional overhead might not be a huge problem either. | backbones by additional overhead might not be a huge problem either. | |||
| However, IPv4 identification header length is only 16 bits (compared | However, IPv4 identification header length is only 16 bits (compared | |||
| to 32 bits in IPv6), and if a larger number of packets are being | to 32 bits in IPv6), and if a larger number of packets are being | |||
| tunneled between two IP addresses, the ID is very likely to wrap and | tunneled between two IP addresses, the ID is very likely to wrap and | |||
| cause data misassociation. This reassembly wrongly combining data | cause data misassociation. This reassembly wrongly combining data | |||
| from two unrelated packets causes data integrity and potentially a | from two unrelated packets causes data integrity and potentially a | |||
| confidentiality violation. This problem is further described in | confidentiality violation. This problem is further described in | |||
| [11]. | [12]. | |||
| IPv6, and IPv4 with the DF bit set in the encapsulating header, | IPv6, and IPv4 with the DF bit set in the encapsulating header, | |||
| allows the tunnel endpoints to optimize the tunnel MTU and minimize | allows the tunnel endpoints to optimize the tunnel MTU and minimize | |||
| network-based reassembly. This also prevents fragmentation of the | network-based reassembly. This also prevents fragmentation of the | |||
| encapsulated packets on the tunnel path. If the IPv4 encapsulating | encapsulated packets on the tunnel path. If the IPv4 encapsulating | |||
| header does not have DF bit set, the tunnel endpoints will have to | header does not have DF bit set, the tunnel endpoints will have to | |||
| perform significant amount of fragmentation and reassembly, while the | perform significant amount of fragmentation and reassembly, while the | |||
| use of PMTUD is minimized. | use of PMTUD is minimized. | |||
| As Appendix A describes, the MTU of the tunnel is also a factor on | As Appendix A describes, the MTU of the tunnel is also a factor on | |||
| which packets require fragmentation and reassembly; the worst case | which packets require fragmentation and reassembly; the worst case | |||
| occurs if the tunnel MTU is "infinite" or equal to the physical | occurs if the tunnel MTU is "infinite" or equal to the physical | |||
| interface MTUs. | interface MTUs. | |||
| So, if reassembly could be made to work sufficiently reliably, this | So, if reassembly could be made to work sufficiently reliably, this | |||
| would be one acceptable fallback solution but only for IPv6. | would be one acceptable fallback solution but only for IPv6. | |||
| 3.2 Signalling the Lower MTU to the Sources | 3.2. Signalling the Lower MTU to the Sources | |||
| Another approach is to use techniques like Path MTU Discovery (or | Another approach is to use techniques like Path MTU Discovery (or | |||
| potentially a future derivative [12]) to signal to the sources whose | potentially a future derivative [13]) to signal to the sources whose | |||
| packets will be encapsulated in the network to send smaller packets | packets will be encapsulated in the network to send smaller packets | |||
| so that they can be encapsulated; in particular, when done on | so that they can be encapsulated; in particular, when done on | |||
| routers, this includes two separable functions: | routers, this includes two separable functions: | |||
| a. Forwarding behaviour: when forwarding packets, if the IPv4-only | a. Forwarding behaviour: when forwarding packets, if the IPv4-only | |||
| DF bit is set, the router sends an ICMP Packet Too Big message to | DF bit is set, the router sends an ICMP Packet Too Big message to | |||
| the source if the MTU of the egress link is too small. | the source if the MTU of the egress link is too small. | |||
| b. Router's "host" behaviour: when the router receives an ICMP | b. Router's "host" behaviour: when the router receives an ICMP | |||
| Packet too Big message related to a tunnel, it (1) adjusts the | Packet too Big message related to a tunnel, it (1) adjusts the | |||
| skipping to change at page 7, line 9 ¶ | skipping to change at page 7, line 10 ¶ | |||
| either immediately or by waiting for the next packet to trigger | either immediately or by waiting for the next packet to trigger | |||
| an ICMP; the former minimizes the packet loss due to MTU changes. | an ICMP; the former minimizes the packet loss due to MTU changes. | |||
| Note that this only works if the MTU of the tunnel is of reasonable | Note that this only works if the MTU of the tunnel is of reasonable | |||
| size, and not e.g., 64 kilobytes: see Appendix A for more. | size, and not e.g., 64 kilobytes: see Appendix A for more. | |||
| This approach would presuppose that PMTUD works. While it is | This approach would presuppose that PMTUD works. While it is | |||
| currently working for IPv6, and critical for its operation, there is | currently working for IPv6, and critical for its operation, there is | |||
| ample evidence that in IPv4, PMTUD is far from reliable due to e.g., | ample evidence that in IPv4, PMTUD is far from reliable due to e.g., | |||
| firewalls and other boxes being configured to inappropriately drop | firewalls and other boxes being configured to inappropriately drop | |||
| all the ICMP packets [13], or software bugs rendering PMTUD | all the ICMP packets [14], or software bugs rendering PMTUD | |||
| inoperational. | inoperational. | |||
| Further, there are two scenarios where signalling from the network | Further, there are two scenarios where signalling from the network | |||
| would be highly undesirable: when the encapsulation would be done in | would be highly undesirable: when the encapsulation would be done in | |||
| such a prominent place in the network that a very large number of | such a prominent place in the network that a very large number of | |||
| sources would need to be signalled with this information (possibly | sources would need to be signalled with this information (possibly | |||
| even multiple times, depending on how long they keep their PMTUD | even multiple times, depending on how long they keep their PMTUD | |||
| state), or when the encapsulation is done for passive monitoring | state), or when the encapsulation is done for passive monitoring | |||
| purposes (network management, lawful interception, etc.) -- when it's | purposes (network management, lawful interception, etc.) -- when it's | |||
| critical that the sources whose traffic is being encapsulated are not | critical that the sources whose traffic is being encapsulated are not | |||
| aware of this happening. | aware of this happening. | |||
| When desiring to avoid fragmentation, IPv4 allows two options: copy | When desiring to avoid fragmentation, IPv4 requires one of two | |||
| the DF bit from the inner packets to the encapsulating header, or | alternatives [1]: copy the DF bit from the inner packets to the | |||
| always set the DF bit. The latter is better especially in controlled | encapsulating header, or always set the DF bit of the outer header. | |||
| environments, because it forces PMTUD to converge immediately. | The latter is better especially in controlled environments, because | |||
| it forces PMTUD to converge immediately. | ||||
| A related technique, which works with TCP under specific scenarios | A related technique, which works with TCP under specific scenarios | |||
| only is so-called "MSS clamping". With that technique or rather a | only is so-called "MSS clamping". With that technique or rather a | |||
| "hack", the TCP packets' Maximum Segment Size (MSS) is reduced by | "hack", the TCP packets' Maximum Segment Size (MSS) is reduced by | |||
| tunnel endpoints so that the TCP connection automatically restricts | tunnel endpoints so that the TCP connection automatically restricts | |||
| itself to the maximum available packet size. Obviously this does not | itself to the maximum available packet size. Obviously this does not | |||
| work for UDP or other protocols which have no MSS. This approach is | work for UDP or other protocols which have no MSS. This approach is | |||
| most applicable and used with PPPoE, but could be applied otherwise | most applicable and used with PPPoE, but could be applied otherwise | |||
| as well; the approach also assumes that all the traffic goes through | as well; the approach also assumes that all the traffic goes through | |||
| tunnel endpoints which do MSS clamping -- this is trivial for the | tunnel endpoints which do MSS clamping -- this is trivial for the | |||
| single-homed access links, but could be a challenge otherwise. | single-homed access links, but could be a challenge otherwise. | |||
| A new approach to PMTUD is in the works [12], but it is uncertain | A new approach to PMTUD is in the works [13], but it is uncertain | |||
| whether that would fix the problems -- at least not the passive | whether that would fix the problems -- at least not the passive | |||
| monitoring requirements. | monitoring requirements. | |||
| 3.3 Encapsulate Only When There is Free MTU | 3.3. Encapsulate Only When There is Free MTU | |||
| The third approach is an operational one, depending on the | The third approach is an operational one, depending on the | |||
| environment where encapsulation and decapsulation is being performed. | environment where encapsulation and decapsulation is being performed. | |||
| That is, if an ISP would deploy tunneling in its backbone, which | That is, if an ISP would deploy tunneling in its backbone, which | |||
| would consist only of links supporting high MTUs (e.g., Gigabit | would consist only of links supporting high MTUs (e.g., Gigabit | |||
| Ethernet or SDH/SONET), but all its customers and peers would have a | Ethernet or SDH/SONET), but all its customers and peers would have a | |||
| lower MTU (e.g., 1500, or the backbone MTU minus the encapsulation | lower MTU (e.g., 1500, or the backbone MTU minus the encapsulation | |||
| overhead), this would imply that no packets (with the encapsulation | overhead), this would imply that no packets (with the encapsulation | |||
| overhead added) would have larger MTU than the "backbone MTU", and | overhead added) would have larger MTU than the "backbone MTU", and | |||
| all the encapsulated packets would always fit MTU-wise in the | all the encapsulated packets would always fit MTU-wise in the | |||
| skipping to change at page 8, line 29 ¶ | skipping to change at page 8, line 32 ¶ | |||
| Another, related approach might be having the sources use only a low | Another, related approach might be having the sources use only a low | |||
| enough MTU which would fit in all the physical MTUs; for example, | enough MTU which would fit in all the physical MTUs; for example, | |||
| IPv6 specifies the minimum MTU of 1280 bytes. For example, if all | IPv6 specifies the minimum MTU of 1280 bytes. For example, if all | |||
| the sources whose traffic would be encapsulated would use this as the | the sources whose traffic would be encapsulated would use this as the | |||
| maximum packet size, there would probably always be enough free MTU | maximum packet size, there would probably always be enough free MTU | |||
| for encapsulation in the network. However, this is not the case | for encapsulation in the network. However, this is not the case | |||
| today, and it would be completely unrealistic to assume that this | today, and it would be completely unrealistic to assume that this | |||
| kind of approach could be made to work in general. | kind of approach could be made to work in general. | |||
| It is worth remembering that while the IPv6 minimum MTU is 1280 bytes | It is worth remembering that while the IPv6 minimum MTU is 1280 bytes | |||
| [9], there are scenarios where the tunnel implementation must | [10], there are scenarios where the tunnel implementation must | |||
| implement fragmentation and reassembly [2]: for example, when having | implement fragmentation and reassembly [3]: for example, when having | |||
| an IPv6-in-IPv6 tunnel on top of a physical interface with MTU of | an IPv6-in-IPv6 tunnel on top of a physical interface with MTU of | |||
| 1280 bytes, or when having two layers of IPv6 tunneling. This can | 1280 bytes, or when having two layers of IPv6 tunneling. This can | |||
| only be avoided by ensuring that links on top of which IPv6 is being | only be avoided by ensuring that links on top of which IPv6 is being | |||
| tunneled have a somewhat larger MTU (e.g., 40 bytes) than 1280 bytes. | tunneled have a somewhat larger MTU (e.g., 40 bytes) than 1280 bytes. | |||
| This conclusion can be generalized: because IP can be tunneled on top | This conclusion can be generalized: because IP can be tunneled on top | |||
| of IP, no single minimum or maximum MTU can be found such that | of IP, no single minimum or maximum MTU can be found such that | |||
| fragmentation or signalling to the sources would never be needed. | fragmentation or signalling to the sources would never be needed. | |||
| All in all, while in certain operational environments it might be | All in all, while in certain operational environments it might be | |||
| possible to avoid any problems by deployment choices, or limiting the | possible to avoid any problems by deployment choices, or limiting the | |||
| MTU that the sources use, this is probably not a sufficiently good | MTU that the sources use, this is probably not a sufficiently good | |||
| general solution for the equipment vendors, and other solutions must | general solution for the equipment vendors, and other solutions must | |||
| also be provided. | also be provided. | |||
| 3.4 Fragmentation of the Inner Packet | 3.4. Fragmentation of the Inner Packet | |||
| A final possibility is fragmenting the inner packet, before | A final possibility is fragmenting the inner packet, before | |||
| encapsulation, in such a manner that the encapsulated packet fits in | encapsulation, in such a manner that the encapsulated packet fits in | |||
| the the tunnel's path MTU (discovered using PMTUD). However, one | the the tunnel's path MTU (discovered using PMTUD). However, one | |||
| should note that only IPv4 supports this "in-flight" fragmentation; | should note that only IPv4 supports this "in-flight" fragmentation; | |||
| further, it isn't allowed for packets where Don't Fragment -bit has | further, it isn't allowed for packets where Don't Fragment -bit has | |||
| been set. Even if one could ignore IPv6 completely, so many IPv4 | been set. Even if one could ignore IPv6 completely, so many IPv4 | |||
| host stacks send packets with DF bit set that this would seem | host stacks send packets with DF bit set that this would seem | |||
| unfeasible. | unfeasible. | |||
| Regardless of what the specifications say, there are implementations | However, there are existing implementations that violate the standard | |||
| that: | that: | |||
| o Discard too big packets with DF bit not set instead of fragmenting | o Discard too big packets with DF bit not set instead of fragmenting | |||
| them (this is rare), | them (this is rare), | |||
| o Ignore the DF bit completely, for all or specified interfaces, or | o Ignore the DF bit completely, for all or specified interfaces, or | |||
| o Clear the DF bit before encapsulation, in the egress of configured | o Clear the DF bit before encapsulation, in the egress of configured | |||
| interfaces. This is typically done for all the traffic, not just | interfaces. This is typically done for all the traffic, not just | |||
| too big packets (allowing configuring this is common). | too big packets (allowing configuring this is common). | |||
| skipping to change at page 9, line 48 ¶ | skipping to change at page 10, line 4 ¶ | |||
| would not work with IPv6, it could not be considered a generic | would not work with IPv6, it could not be considered a generic | |||
| solution. | solution. | |||
| 4. Conclusions | 4. Conclusions | |||
| Fragmentation and reassembly by the tunnel endpoints is a clear and | Fragmentation and reassembly by the tunnel endpoints is a clear and | |||
| simple solution to the problem, but the hardware reassembly when the | simple solution to the problem, but the hardware reassembly when the | |||
| packets get lost may face significant implementation challenges which | packets get lost may face significant implementation challenges which | |||
| may be insurmountable. This approach does not seem feasible | may be insurmountable. This approach does not seem feasible | |||
| especially for IPv4 with high data rates due to problems with | especially for IPv4 with high data rates due to problems with | |||
| wrapping fragment identification field [11]. Constant wrapping may | wrapping fragment identification field [12]. Constant wrapping may | |||
| occur when the data rate is in the order of MB/s for IPv4 and in the | occur when the data rate is in the order of MB/s for IPv4 and in the | |||
| order of dozens of GB/s for IPv6. However, this reassembly approach | order of dozens of GB/s for IPv6. However, this reassembly approach | |||
| is probably not a problem for passive monitoring applications. | is probably not a problem for passive monitoring applications. | |||
| PMTUD techniques, at least at the moment and especially for IPv4, | PMTUD techniques, at least at the moment and especially for IPv4, | |||
| appear to be too unreliable or unscalable to be used in the | appear to be too unreliable or unscalable to be used in the | |||
| backbones. It is an open question whether a future solution might | backbones. It is an open question whether a future solution might | |||
| work better in this aspect. | work better in this aspect. | |||
| It is clear that in some environments the operational approach to the | It is clear that in some environments the operational approach to the | |||
| skipping to change at page 11, line 20 ¶ | skipping to change at page 11, line 25 ¶ | |||
| This document describes different issues with packet sizes and in- | This document describes different issues with packet sizes and in- | |||
| the-network tunneling; this does not have security considerations on | the-network tunneling; this does not have security considerations on | |||
| its own. | its own. | |||
| However, different solutions might have characteristics which may | However, different solutions might have characteristics which may | |||
| make them more susceptible to attacks -- for example, a router-based | make them more susceptible to attacks -- for example, a router-based | |||
| fragment reassembly could easily lead to (reassembly) buffer memory | fragment reassembly could easily lead to (reassembly) buffer memory | |||
| exhaustion if the attacker would send a sufficient number of | exhaustion if the attacker would send a sufficient number of | |||
| fragments without sending all of them, so that the reassembly would | fragments without sending all of them, so that the reassembly would | |||
| be stalled until a timeout; these and other fragment attacks (e.g., | be stalled until a timeout; these and other fragment attacks (e.g., | |||
| [14]) have already been used against e.g., firewalls and host stacks, | [15]) have already been used against e.g., firewalls and host stacks, | |||
| and need to be taken into consideration in the implementations. | and need to be taken into consideration in the implementations. | |||
| It is worth considering the cryptographic expense (which is typically | It is worth considering the cryptographic expense (which is typically | |||
| more significant than the reassembly, if done in software) with | more significant than the reassembly, if done in software) with | |||
| fragmentation of the inner or outer packet. If an outer fragment | fragmentation of the inner or outer packet. If an outer fragment | |||
| goes missing, no cryptographic operations have been yet performed; if | goes missing, no cryptographic operations have been yet performed; if | |||
| an inner fragment goes missing, cryptographic operations have already | an inner fragment goes missing, cryptographic operations have already | |||
| been performed. Therefore, which of these approaches is preferable | been performed. Therefore, which of these approaches is preferable | |||
| also depends on whether cryptography or reassembly are already | also depends on whether cryptography or reassembly are already | |||
| provided in hardware; for high-speed routers, at least, one should be | provided in hardware; for high-speed routers, at least, one should be | |||
| able to assume that if it is performing relatively heavy | able to assume that if it is performing relatively heavy | |||
| cryptography, hardware support is already required. | cryptography, hardware support is already required. | |||
| The solutions using PMTUD (and consequently ICMP) will also need to | The solutions using PMTUD (and consequently ICMP) will also need to | |||
| take into account the attacks using ICMP. In particular, an attacker | take into account the attacks using ICMP. In particular, an attacker | |||
| could send ICMP Packet Too Big messages indicating a very low MTU to | could send ICMP Packet Too Big messages indicating a very low MTU to | |||
| reduce the throughput and/or as a fragmentation/reassembly denial-of- | reduce the throughput and/or as a fragmentation/reassembly denial-of- | |||
| service attack. This attack has been described in the context of TCP | service attack. This attack has been described in the context of TCP | |||
| in [15]. | in [16]. | |||
| 7. Acknowledgements | 7. Acknowledgements | |||
| While the topic is far from new, recent discussions with W. Mark | While the topic is far from new, recent discussions with W. Mark | |||
| Townsley on L2TP fragmentation issues caused the author to sit down | Townsley on L2TP fragmentation issues caused the author to sit down | |||
| and write up the issues in more general. Michael Richardson and Mika | and write up the issues in more general. Michael Richardson and Mika | |||
| Joutsenvirta provided useful feedback on the first draft. When | Joutsenvirta provided useful feedback on the first draft. When | |||
| soliciting comments from NANOG list, Carsten Bormann, Kevin Miller, | soliciting comments from NANOG list, Carsten Bormann, Kevin Miller, | |||
| Warren Kumari, Iljitsch van Beijnum, Alok Dube, and Stephen J. Wilcox | Warren Kumari, Iljitsch van Beijnum, Alok Dube, and Stephen J. Wilcox | |||
| provided useful feedback. Later, Carlos Pignataro provided excellent | provided useful feedback. Later, Carlos Pignataro provided excellent | |||
| input, helping in improving the document. | input, helping in improving the document. Joe Touch also provided | |||
| input on the memo. | ||||
| 8. References | 8. References | |||
| 8.1 Normative References | ||||
| [1] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for | 8.1. Normative References | |||
| IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-07 (work in | ||||
| progress), March 2005. | ||||
| [2] Conta, A. and S. Deering, "Generic Packet Tunneling in IPv6 | [1] Perkins, C., "IP Encapsulation within IP", RFC 2003, | |||
| Specification", RFC 2473, December 1998. | October 1996. | |||
| [3] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, | [2] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for | |||
| "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. | IPv6 Hosts and Routers", draft-ietf-v6ops-mech-v2-07 (work in | |||
| progress), March 2005. | ||||
| [4] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling | [3] Conta, A. and S. Deering, "Generic Packet Tunneling in IPv6 | |||
| Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. | Specification", RFC 2473, December 1998. | |||
| [5] Kent, S. and R. Atkinson, "Security Architecture for the | [4] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, | |||
| Internet Protocol", RFC 2401, November 1998. | "Generic Routing Encapsulation (GRE)", RFC 2784, March 2000. | |||
| [6] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | [5] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling | |||
| November 1990. | Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. | |||
| [7] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for | [6] Kent, S. and K. Seo, "Security Architecture for the Internet | |||
| IP version 6", RFC 1981, August 1996. | Protocol", draft-ietf-ipsec-rfc2401bis-06 (work in progress), | |||
| April 2005. | ||||
| [8] Braden, R., "Requirements for Internet Hosts - Communication | [7] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, | |||
| Layers", STD 3, RFC 1122, October 1989. | November 1990. | |||
| [9] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) | [8] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for | |||
| Specification", RFC 2460, December 1998. | IP version 6", RFC 1981, August 1996. | |||
| 8.2 Informative References | [9] Braden, R., "Requirements for Internet Hosts - Communication | |||
| Layers", STD 3, RFC 1122, October 1989. | ||||
| [10] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., and | [10] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) | |||
| Specification", RFC 2460, December 1998. | ||||
| 8.2. Informative References | ||||
| [11] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., and | ||||
| R. Wheeler, "A Method for Transmitting PPP Over Ethernet | R. Wheeler, "A Method for Transmitting PPP Over Ethernet | |||
| (PPPoE)", RFC 2516, February 1999. | (PPPoE)", RFC 2516, February 1999. | |||
| [11] Mathis, M., "Fragmentation Considered Very Harmful", | [12] Mathis, M., "Fragmentation Considered Very Harmful", | |||
| draft-mathis-frag-harmful-00 (work in progress), July 2004. | draft-mathis-frag-harmful-00 (work in progress), July 2004. | |||
| [12] Mathis, M., "Path MTU Discovery", draft-ietf-pmtud-method-04 | [13] Mathis, M., "Path MTU Discovery", draft-ietf-pmtud-method-04 | |||
| (work in progress), February 2005. | (work in progress), February 2005. | |||
| [13] Medina, A., Allman, M., and S. Floyd, "Measuring the Evolution | [14] Medina, A., Allman, M., and S. Floyd, "Measuring the Evolution | |||
| of Transport Protocols in the Internet", Computer | of Transport Protocols in the Internet", Computer | |||
| Communications Review, Apr 2005, <http://www.icir.org/tbit/>. | Communications Review, Apr 2005, <http://www.icir.org/tbit/>. | |||
| [14] Miller, I., "Protection Against a Variant of the Tiny Fragment | [15] Miller, I., "Protection Against a Variant of the Tiny Fragment | |||
| Attack (RFC 1858)", RFC 3128, June 2001. | Attack (RFC 1858)", RFC 3128, June 2001. | |||
| [15] Gont, F., "ICMP attacks against TCP", | [16] Gont, F., "ICMP attacks against TCP", | |||
| draft-gont-tcpm-icmp-attacks-03 (work in progress), | draft-gont-tcpm-icmp-attacks-04 (work in progress), | |||
| December 2004. | September 2005. | |||
| Author's Address | ||||
| Pekka Savola | ||||
| CSC/FUNET | ||||
| Espoo | ||||
| Finland | ||||
| Email: psavola@funet.fi | ||||
| Appendix A. MTU of the Tunnel | Appendix A. MTU of the Tunnel | |||
| Different tunneling mechanisms may treat the tunnel links as having | Different tunneling mechanisms may treat the tunnel links as having | |||
| different kind of MTU values. Some might use the same default MTU as | different kind of MTU values. Some might use the same default MTU as | |||
| for other interfaces; some others might use the default MTU minus the | for other interfaces; some others might use the default MTU minus the | |||
| expected IP overhead (e.g., 20, 28, or 40 bytes); some others might | expected IP overhead (e.g., 20, 28, or 40 bytes); some others might | |||
| even treat the tunnel as having "infinite MTU", e.g., 64 kilobytes. | even treat the tunnel as having "infinite MTU", e.g., 64 kilobytes. | |||
| As [1] describes, having an infinite MTU, i.e., always fragmenting | As [2] describes, having an infinite MTU, i.e., always fragmenting | |||
| the outer packet (and never the inner packet) and never performing | the outer packet (and never the inner packet) and never performing | |||
| PMTUD for the tunnel path is a very bad idea, especially in host-to- | PMTUD for the tunnel path is a very bad idea, especially in host-to- | |||
| router scenarios. (It could be argued that if the nodes are sure | router scenarios. (It could be argued that if the nodes are sure | |||
| that this is a host-to-host tunnel, a larger MTU might make sense if | that this is a host-to-host tunnel, a larger MTU might make sense if | |||
| fragmentation and reassembly is more efficient than just sending | fragmentation and reassembly is more efficient than just sending | |||
| properly sized packets -- but this seems like a stretch.) | properly sized packets -- but this seems like a stretch.) | |||
| Intellectual Property Statement | Author's Address | |||
| Pekka Savola | ||||
| CSC/FUNET | ||||
| Espoo | ||||
| Finland | ||||
| Email: psavola@funet.fi | ||||
| Full Copyright Statement | ||||
| Copyright (C) The Internet Society (2005). | ||||
| This document is subject to the rights, licenses and restrictions | ||||
| contained in BCP 78, and except as set forth therein, the authors | ||||
| retain all their rights. | ||||
| This document and the information contained herein are provided on an | ||||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
| Intellectual Property | ||||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
| found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
| skipping to change at page 14, line 29 ¶ | skipping to change at page 15, line 7 ¶ | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Disclaimer of Validity | ||||
| This document and the information contained herein are provided on an | ||||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
| Copyright Statement | ||||
| Copyright (C) The Internet Society (2005). This document is subject | ||||
| to the rights, licenses and restrictions contained in BCP 78, and | ||||
| except as set forth therein, the authors retain all their rights. | ||||
| Acknowledgment | Acknowledgment | |||
| Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is currently provided by the | |||
| Internet Society. | Internet Society. | |||
| End of changes. 48 change blocks. | ||||
| 107 lines changed or deleted | 114 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||