idnits 2.17.1 draft-generic-6man-tunfrag-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3931 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC4443' is defined on line 278, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-68) exists of draft-templin-intarea-seal-60 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group F. Templin, Ed. 3 Internet-Draft Boeing Research & Technology 4 Intended status: Informational July 15, 2013 5 Expires: January 16, 2014 7 Fragmentation Revisited 8 draft-generic-6man-tunfrag-09.txt 10 Abstract 12 IP fragmentation has long been subject for scrutiny since the 13 publication of "Fragmentation Considered Harmful" in 1987. This work 14 cast fragmentation in a negative light that has persisted to the 15 present day. However, the tone of the work failed to honor two 16 principles of creative thinking: never say "always" and never say 17 "never". This document discusses uses for fragmentation that apply 18 both to the present day and moving forward into the future. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on January 16, 2014. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. IPv6 Hosts Sending Large Isolated Packets . . . . . . . . . . . 4 57 4. IPv6 Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 59 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 60 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . 7 61 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 8.1. Normative References . . . . . . . . . . . . . . . . . . . 7 63 8.2. Informative References . . . . . . . . . . . . . . . . . . 7 64 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 66 1. Introduction 68 IP fragmentation has long been subject for scrutiny since the 69 publication of "Fragmentation Considered Harmful" in 1987 [FRAG]. 70 This work cast fragmentation in a negative light that has persisted 71 to the present day. However, the tone of the work failed to honor 72 two principles of creative thinking: never say "always" and never say 73 "never". This document discusses uses for fragmentation that apply 74 both to the present day and moving forward into the future. 76 2. Problem Statement 78 The de facto "Internet cell size" is effectively 1500 bytes, i.e., 79 the minimum maximum Transmission Unit (minMTU) configured by the vast 80 majority of links in the Internet. IPv6 constrains this even further 81 by specifying a minMTU of 1280 bytes and a minimum Maximum Reassembly 82 Unit (minMRU) of 1500 bytes [RFC2460]. IPv4 specifies both minMTU/ 83 minMRU as only 576 bytes [RFC0791][RFC1122], although it is widely 84 assumed that the vast majority of nodes will configure an IPv4 minMRU 85 of at least 1500 bytes. 87 The 1280 IPv6 minMTU originated from a November 14, 1997 mailing from 88 Steve Deering to the IPng mailing list, which stated: 90 "In the ipngwg meeting in Munich, I proposed increasing the IPv6 91 minimum MTU from 576 bytes to something closer to the Ethernet MTU 92 of 1500 bytes, (i.e., 1500 minus room for a couple layers of 93 encapsulating headers, so that min- MTU-size packets that are 94 tunneled across 1500-byte-MTU paths won't be subject to 95 fragmentation/reassembly on ingress/egress from the tunnels, in 96 most cases). 98 ... 100 The number I propose for the new minimum MTU is 1280 bytes (1024 + 101 256, as compared to the classic 576 value which is 512 + 64). 102 That would leave generous room for encapsulating/tunnel headers 103 within the Ethernet MTU of 1500, e.g., enough for two layers of 104 secure tunneling including both ESP and AUTH headers." 106 However, there was a fundamental flaw in this reasoning . In 107 particular to avoid fragmentation for several nested layers of 108 encapsulation, the first tunnel (T1) would have to set a 1280 MTU so 109 that its tunneled packets would emerge as 1320 bytes (1280 bytes plus 110 40 bytes for the encapsulating IPv6 header). Then, the next tunnel 111 (T2) would have to set a 1320 MTU so its tunneled packets would 112 emerge as 1360. Then the next tunnel (T3) would have to set a 1360 113 MTU so that its tunneled packets would emerge as 1400, etc. until the 114 available path MTU is exhausted. The question is, how can those 115 nested tunnels be so carefully coordinated so that there would never 116 be an MTU infraction? In a single administrative domain where an 117 operator can lay hands on every tunnel ingress this may be possible, 118 but in the general case it cannot be expected that the nested tunnel 119 MTUs would be so well orchestrated. It is therefore necessary to 120 consider as a limiting condition a tunnel that configures a 1280 MTU 121 in which the tunnel crosses a link (perhaps another tunnel) that also 122 configures a 1280 MTU. In that case, the tunnel ingress has two 123 choices: 1) perform fragmentation that the tunnel egress needs to 124 reassemble, or 2) shut down the tunnel due to failure to meet the 125 IPv6 minMTU requirement. 127 In addition, it is becoming increasingly evident that Path MTU 128 Discovery (PMTUD) [RFC1981] does not work properly in all cases. 129 This is due to the fact that the Packet Too Big (PTB) messages 130 required for PMTUD can be lost due to network filters that block 131 ICMPv6 messages [RFC2923][WAND][SIGCOMM][RIPE]. It is therefore 132 necessary to consider the case where IPv6 packets are dropped 133 silently in the network due to a size restriction, but the IPv6 134 source host never receives the necesary indication from the network 135 that the packet was lost. The source host must therefore support 136 some form of IP fragmentation in order to ensure that isolated large 137 packets are delivered, as well as a packet size probing capabilitiy 138 (see: [RFC4821]) to ensure that large packets that are part of a 139 coordinated stream are making it through to the destination. 141 Due to these considerations, there are at least two use cases for 142 network layer fragmentation that must be satisfied now and for the 143 long term. In the following sections, we discuss these 144 considerations in more detail. 146 3. IPv6 Hosts Sending Large Isolated Packets 148 IPv6 hosts that send large isolated packets have no way of ensuring 149 that the packets are delivered to the final destination if their size 150 exceeds the path MTU. The host must therefore perform network layer 151 fragmentation to a fragment size of no larger than 1280 bytes to 152 ensure that the fragmented packets are delivered to the destination 153 without loss due to a size restriction. However, the destination 154 node need only configure a minMRU size of 1500 bytes per the IPv6 155 specs. Therefore, the source must either limit its packet sizes to 156 1500 bytes (i.e., before fragmentation) or somehow have a way of 157 determining that the destination configures a larger minMRU. Two 158 uses for this host-based fragmentation to support large isolated 159 packets are OSPVFv3 and DNS. 161 4. IPv6 Tunnels 163 IPv6 tunnels are used for many purposes, including transition, 164 security, mobility, routing control, etc. While it is assumed that 165 transition mechanisms will eventually give way to native IPv6, it is 166 clear that the use of tunnels for other purposes will continue and 167 even expand. A long term strategy for dealing with tunnel MTUs is 168 therefore required. 170 Tunnels may cross links (perhaps even other tunnels) that configue 171 only the IPv6 minMTU of 1280 bytes while the tunnel ingress must be 172 able to send packets that are at least 1280 bytes in length so that 173 the IPv6 minMTU is extended to the source. However, these tunneled 174 packets become (1280 + HLEN) bytes on the wire (where HLEN is the 175 length of the encapsulating headers), meaning that they would be 176 vulerable to loss at a link within the tunnel that configures a 177 smaller MTU. Therefore, the only way to satisfy the IPv6 minMTU is 178 through network layer fragmentation and reassembly between the tunnel 179 ingress and egress, where the ingress fragments its tunneled packets 180 that are larger than (1280 - HLEN) bytes. 182 Unfortunately, fragmentation and reassembly are a pain point for in- 183 the-network routers - especially for those that are nearer the core 184 of the network. It is therefore highly desirable for the tunnel 185 ingress to discover whether this fragmentation and reassembly can be 186 avoided. This can only be done by allowing the ingress to probe the 187 path to the egress by sending whole 1500 byte probe packets to 188 discover whether the probes can be delivered to the egress without 189 fragmentation. These 1500 byte probes appear as (1500 + HLEN) bytes 190 on the wire, therefore the path must support an MTU of at least this 191 size in order for the probe to succeed. 193 The tunnel fragmentation and reassembly strategy is therefore as 194 follows: 196 1. When the tunnel ingress receives a packet that is no larger than 197 (1280-HLEN) bytes, it encapsulates the packet and sends it to the 198 egress without fragmentation. The egress will receive the packet 199 since it is small enough to fit within the IPv6 minMTU of 1280 200 bytes. 202 2. When the tunnel egress receives a packet that is larger than 1500 203 bytes, it encapsulates the packet and sends it to the egress 204 without fragmentation. If the packet is lost in the network due 205 to a size restriction, the ingress may or may not reeceive a PTB 206 message which it can then forward to the original soruce. 207 Whether or not a PTB message is received, however, it is the 208 responsibility of the original source to ensure that its packets 209 larger than 1500 bytes are making it to the final destination by 210 using a path probing technique such as specified by [RFC4821]. 212 3. When the tunnel ingress receives a packet larger than (1280 - 213 HLEN) but no larger than 1500 bytes, and it is not yet known 214 whether packets of this size can reach the egress without 215 fragmentation, the ingress encapsulates the packet and uses 216 network layer fragmentation to fragment it into two pieces that 217 are each signifiicantly smaller than (1280 - HLEN) bytes. At the 218 same time, the tunnel ingress sends an unfragmented 1500 byte 219 probe packet toward the egress (subject to rate limiting) which 220 will appear as (1500 + HLEN) bytes on the wire. If the egress 221 receives the probe, it informs the ingress that the probe 222 succeeded. If the probe succeeds, the ingress can suspend the 223 fragmentation process and send packets between (1280-HLEN) and 224 1500 bytes without using fragmentation. This probing process 225 exactly parallels [RFC4821]. 227 In this method, the tunnel egress must configure a slightly larger 228 MRU than the minMRU specified for IPv6 in order to accommodate the 229 HLEN bytes of tunnel encapsulation during reassembly. 2KB is 230 recommended as the minMRU for this reason. 232 These procedures give way to the ability for the tunnel ingress to 233 configure an unlimited MTU (theoretical limit is 64KB for IPv4 and 234 4GB for IPv6). They will therefore naturally lead to the Internet 235 migrating to larger packet sizes with no dependence on traditional 236 path MTU discovery. Operators will also soon discover that 237 configuring larger MTUs on links between routers (e.g., 2KB or 238 larger) will dampen the fragmentation and reassembly requirements 239 until fragmentation and reassembly usage is gradually tuned out of 240 the network. 242 These procedures are not supported by the existing IPv6 fragmentation 243 method, however they are exactly those specified in the Subnetwork 244 Encapsulation and Adaptation Layer (SEAL) [I-D.templin-intarea-seal]. 245 Widespread adoption of SEAL will therefore naturally lead to an 246 Internet which no longer places MTU restrictions on tunnels and 247 therefore supports natural migration to unbounded packet sizes. The 248 approach can best be summarized as: "take care of the smalls, and let 249 the bigs take care of themselves". 251 5. IANA Considerations 253 There are no IANA considerations for this document. 255 6. Security Considerations 257 The security considerations for [RFC2460] apply also to this 258 document. 260 7. Acknowledgments 262 This method was inspired through discussion on various IETF mailing 263 lists in the 2012-2013 timeframe. 265 8. References 267 8.1. Normative References 269 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 270 September 1981. 272 [RFC1122] Braden, R., "Requirements for Internet Hosts - 273 Communication Layers", STD 3, RFC 1122, October 1989. 275 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 276 (IPv6) Specification", RFC 2460, December 1998. 278 [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control 279 Message Protocol (ICMPv6) for the Internet Protocol 280 Version 6 (IPv6) Specification", RFC 4443, March 2006. 282 8.2. Informative References 284 [FRAG] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", 285 October 1987. 287 [I-D.templin-intarea-seal] 288 Templin, F., "The Subnetwork Encapsulation and Adaptation 289 Layer (SEAL)", draft-templin-intarea-seal-60 (work in 290 progress), July 2013. 292 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 293 for IP version 6", RFC 1981, August 1996. 295 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 296 RFC 2923, September 2000. 298 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 299 Discovery", RFC 4821, March 2007. 301 [RIPE] De Boer, M. and J. Bosma, "Discovering Path MTU Black 302 Holes on the Internet using RIPE Atlas", July 2012. 304 [SIGCOMM] Luckie, M. and B. Stasiewicz, "Measuring Path MTU 305 Discovery Behavior", November 2010. 307 [WAND] Luckie, M., Cho, K., and B. Owens, "Inferring and 308 Debugging Path MTU Discovery Failures", October 2005. 310 Author's Address 312 Fred L. Templin (editor) 313 Boeing Research & Technology 314 P.O. Box 3707 315 Seattle, WA 98124 316 USA 318 Email: fltemplin@acm.org