idnits 2.17.1 draft-heffner-frag-harmful-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 379. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 356. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 363. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 369. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 1, 2006) is 6356 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2402 (Obsoleted by RFC 4302, RFC 4305) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Heffner 3 Internet-Draft M. Mathis 4 Expires: June 4, 2007 B. Chandler 5 PSC 6 December 1, 2006 8 IPv4 Fragmentation Considered Very Harmful 9 draft-heffner-frag-harmful-03 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on June 4, 2007. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 IPv4 fragmentation is not sufficiently robust for general use in 43 today's Internet. The 16-bit IP identification field is not large 44 enough to prevent frequent incorrectly assembled IP fragments, and 45 the TCP and UDP checksums are insufficient to prevent the resulting 46 corrupted datagrams from being delivered to higher protocol layers. 47 This note describes some easily reproduced experiments demonstrating 48 the problem, and discusses some of the operational implications of 49 these observations. 51 1. Introduction 53 The IPv4 header was designed at a time when data rates were several 54 orders of magnitude lower than those achievable today. This document 55 describes a consequent scale-related failure in the IP identification 56 (ID) field, where fragments may be incorrectly assembled at a rate 57 high enough likely to invalidate assumptions about data integrity 58 failure rates. 60 That IP fragmentation results in inefficient use of the network has 61 been well documented [Kent87]. This note presents a different kind 62 of problem, which can result not only in significant performance 63 degradation, but also frequent data corruption. This is especially 64 pertinent due to the recent proliferation of UDP bulk transport tools 65 that sometimes fragment every datagram. 67 Additionally, there is some network equipment that ignores the Don't 68 Fragment (DF) bit in the IP header to work around MTU discovery 69 problems [RFC2923]. This equipment indirectly exposes properly 70 implemented protocols and applications to corrupt data. 72 2. Wrapping the IP ID Field 74 The Internet Protocol standard specifies: 76 "The choice of the Identifier for a datagram is based on the need 77 to provide a way to uniquely identify the fragments of a 78 particular datagram. The protocol module assembling fragments 79 judges fragments to belong to the same datagram if they have the 80 same source, destination, protocol, and Identifier. Thus, the 81 sender must choose the Identifier to be unique for this source, 82 destination pair and protocol for the time the datagram (or any 83 fragment of it) could be alive in the Internet." [RFC0791] 85 Strict conformance to this standard limits transmissions in one 86 direction between any address pair to no more than 65536 packets per 87 protocol (e.g. TCP, UDP or ICMP) per maximum packet lifetime. 89 Clearly not all hosts follow this standard, because it implies an 90 unreasonably low maximum data rate. For example, a host sending 1500 91 byte packets with a 30 second maximum packet lifetime could send at 92 only about 26 Mbits/s before exceeding 65535 packets per packet 93 lifetime. Or, filling a 1 Gbit/s interface with 1500 byte packets 94 requires sending 65536 packets in less than 1 second, an unreasonably 95 short maximum packet lifetime, being less than the round-trip time on 96 some paths. This requirement is widely ignored. 98 IP receivers store fragments in a reassembly buffer until all 99 fragments in a datagram arrive, or until the reassembly timeout 100 expires (15 seconds is suggested in [RFC0791]). Fragments in a 101 datagram are associated with each other by their protocol number, the 102 value in their ID field, and by the source, destination address pair. 103 If a sender wraps the ID field in less than the reassembly timeout, 104 it becomes possible for fragments from different datagrams to be 105 incorrectly spliced together ("mis-associated"), and delivered to the 106 upper layer protocol. 108 A case of particular concern is when mis-association is self- 109 propagating. This occurs, for example, when there is reliable 110 ordering of packets and the first fragment of a datagram is lost in 111 the network. The rest of the fragments are stored in the fragment 112 reassembly buffer, and when the sender wraps the ID field, the first 113 fragment of the new datagram will be mis-associated with the rest of 114 the old datagram. The new datagram will be now be incomplete (since 115 it is missing its first fragment), so the rest of it will be saved in 116 the fragment reassembly buffer, forming a cycle that repeats every 117 65536 datagrams. It is possible to have a number of simultaneous 118 cycles, bounded by the size of the fragment reassembly buffer. 120 3. Harmful Effects of Mis-Associated Fragments 122 When the mis-associated fragments are delivered, transport-layer 123 checksumming should detect these datagrams as incorrect and discard 124 them. When the datagrams are discarded, it could pose a problem for 125 loss-feedback congestion control algorithms since there will be a 126 high number of non-congestion-related losses. 128 However, transport checksums may not be designed to handle such high 129 error rates, either. The TCP/UDP checksum is only 16 bits in length. 130 If these checksums follow a uniform random distribution, we expect 131 mis-associated datagrams to be accepted by the checksum at a rate of 132 one per 65536. With only one mis-association cycle, we expect 133 corrupt data delivered to the application layer once per 2^32 134 datagrams. This number can be significantly higher with multiple 135 cycles. 137 With non-random data, the TCP/UDP checksum may be even weaker still. 138 It is possible to construct datasets where mis-associated fragments 139 will always have the same checksum. Such a case may be considered 140 unlikely, but is worth considering. "Real" data may be more likely 141 than random data to cause checksum hot spots and increase the 142 probability of false checksum match [Stone98]. Also, some 143 applications or higher-level protocols may turn off checksumming to 144 increase speed, though this practice has been found to be dangerous 145 for other reasons when data reliability is important [Stone00]. 147 4. Experimental Observations 149 To test the practical impact of fragmentation on UDP, we ran a series 150 of experiments using a UDP bulk data transport protocol that was 151 designed to be used as an alternative to TCP for transporting large 152 data sets over specialized networks. The tool, Reliable Blast UDP 153 (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected 154 because it has a clean interface which facilitated automated 155 experiments. The decision to use RBUDP had little to do with the 156 details of the transport protocol itself. Any UDP transport protocol 157 that does not have additional means to detect corruption, and that 158 could be configured to use IP fragmentation, would have the same 159 results. 161 In order to diagnose corruption on files transferred with the UDP 162 bulk transfer tool, we used a file format that included embedded 163 sequence numbers and MD5 checksums in each fragment of each datagram. 164 Thus it was possible to distinguish random corruption from that 165 caused by mis-associated fragments. We used two different types of 166 files. One was constructed so that all the UDP checksums were 167 constant -- we will call this the "constant" dataset. The other was 168 constructed so that UDP checksums were uniformly random -- the 169 "random" dataset. All tests were done using 400 MB files. 171 The UDP bulk file transport tool was used to send the datasets 172 between a pair of hosts at slightly less than the available data rate 173 (100 Mbps). Near the beginning of each flow, a brief secondary flow 174 was started to induce packet loss in the primary flow. Throughout 175 the life of the primary flow, we typically observed mis-association 176 rates on the order of a few hundredths of a percent. 178 Tests run with the "constant" dataset resulted in corruption on all 179 mis-associated fragments, that is, corruption on the order of a few 180 hundredths of a percent. In sending approximately 10 TB of "random" 181 datasets, we observed 8847668 UDP checksum errors and 121 corruptions 182 of the data due to mis-associated fragments. 184 5. Implications 186 Most TCP implementations today participate in MTU discovery 187 [RFC1191], which will avoid the problems described in this note by 188 avoiding IP fragmentation altogether. However, as a work-around for 189 MTU discovery problems [RFC2923], some TCP implementations and 190 communications gear provide mechanisms to disable path MTU discovery 191 by clearing or ignoring the DF bit. Doing so will expose all 192 protocols using IPv4, even those that participate in MTU discovery, 193 to mis-association errors. 195 A case particularly worth noting is that of tunnels encapsulating 196 payload in IPv4. To deal with difficulties in MTU Discovery 197 [RFC4459], tunnels may rely on fragmentation between the two 198 endpoints, even if the payload is marked with a DF bit [RFC4301]. In 199 such a mode, the two tunnel endpoints behave as IP end hosts, with 200 all tunneled traffic having the same protocol type. Thus, the 201 aggregate rate of tunneled packets may not exceed 65536 per maximum 202 packet lifetime, or tunneled data becomes exposed to possible mis- 203 association. Even protocols doing MTU discovery such as TCP will be 204 affected. 206 IPv6 is less vulnerable to this type of problem, since its fragment 207 header contains a 32-bit identification field [RFC2460]. Mis- 208 association will only be a problem at packet rates 65536 times higher 209 than for IPv4. 211 Since mis-association of fragments will only occur when the IP ID 212 field is wrapped within the fragment reassembly timeout, it may be 213 possible to reduce the timeout sufficiently so that mis-association 214 will not occur. However, there are a number of difficulties with 215 such an approach. Since the sender controls the rate of packets sent 216 and selection of IP ID, while the receiver controls the reassembly 217 timeout, there would need to be some mutual assurance between each 218 party as to participation in the scheme. Further, it is not 219 generally possible to set the timeout low enough so that a fast 220 sender's fragments will not be mis-associated, yet high enough so 221 that a slow sender's fragments will not be unconditionally discarded 222 before it is possible to reassemble them. So the timeout and IP ID 223 selection would need to be done on a per peer basis. Also, it is 224 likely NAT will break any per peer tables keyed by IP address. It is 225 not within the scope of this document to recommend solutions to these 226 problems. 228 Another means of solving the corruption issue is to add stronger 229 integrity checking, which can be done at any layer above IP. This is 230 a natural side effect of using cryptographic authentication. If 231 IPsec AH [RFC2402] is in use, the mis-associated fragments will be 232 discarded at the network layer with extremely high probability. Some 233 higher layers may use longer checksums (for example, SCTP's is 32 234 bits in length [RFC2960]) or cryptographic authentication (SSH 235 message authentication codes [RFC4251]). While stronger integrity 236 checking may prevent data corruption, it will not solve the problem 237 of a high effective loss rate. In the case of SSH, any stream 238 corruption results in immediate termination of the connection. 240 6. Security Considerations 242 If a malicious entity knows that a pair of hosts are communicating 243 using a fragmented stream, it may present an opportunity for this 244 entity to corrupt the flow. By sending "high" fragments (those with 245 offset greater than zero) with a forged source address, the attacker 246 can deliberately cause corruption as described above. Exploiting 247 this vulnerability requires only knowledge of the source and 248 destination addresses of the flow, its protocol number, and fragment 249 boundaries. It does not require knowledge of port or sequence 250 numbers. 252 If the attacker has visibility of packets on the path, the attack 253 profile is similar to injecting full segments. Using this attack 254 makes blind disruptions easier, and could likely be used to cause 255 denial of service. However, only streams using IPv4 fragmentation 256 are vulnerable. Because of the nature of the problems outlined in 257 this draft, the use of IPv4 fragmentation for critical applications 258 may not be advisable regardless of security concerns. 260 7. IANA Considerations 262 None. 264 8. Informative References 266 [Kent87] Kent, C. and J. Mogul, "Fragmentation considered harmful", 267 Proc. SIGCOMM '87 vol. 17, No. 5, October 1987. 269 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 270 RFC 2923, September 2000. 272 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 273 September 1981. 275 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 276 November 1990. 278 [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 279 "Performance of Checksums and CRC's over Real Data", IEEE/ 280 ACM Transactions on Networking vol. 6, No. 5, 281 October 1998. 283 [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP Checksum 284 Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, 285 October 2000. 287 [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., 288 Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for 289 high performance data delivery over photonic networks", 290 Future Generation Computer Systems Vol. 19, No. 6, 291 August 2003. 293 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 294 (IPv6) Specification", RFC 2460, December 1998. 296 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 297 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 298 Zhang, L., and V. Paxson, "Stream Control Transmission 299 Protocol", RFC 2960, October 2000. 301 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", 302 RFC 2402, November 1998. 304 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 305 Protocol Architecture", RFC 4251, January 2006. 307 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 308 Internet Protocol", RFC 4301, December 2005. 310 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 311 Network Tunneling", RFC 4459, April 2006. 313 Appendix A. Acknowledgements 315 This work was supported by the National Science Foundation under 316 Grant No. 0083285. 318 Authors' Addresses 320 John W. Heffner 321 Pittsburgh Supercomputing Center 322 4400 Fifth Avenue 323 Pittsburgh, PA 15213 324 US 326 Phone: 412-268-2329 327 Email: jheffner@psc.edu 329 Matt Mathis 330 Pittsburgh Supercomputing Center 331 4400 Fifth Avenue 332 Pittsburgh, PA 15213 333 US 335 Phone: 412-268-3319 336 Email: mathis@psc.edu 338 Ben Chandler 339 Pittsburgh Supercomputing Center 340 4400 Fifth Avenue 341 Pittsburgh, PA 15213 342 US 344 Phone: 412-268-9783 345 Email: bchandle@psc.edu 347 Intellectual Property Statement 349 The IETF takes no position regarding the validity or scope of any 350 Intellectual Property Rights or other rights that might be claimed to 351 pertain to the implementation or use of the technology described in 352 this document or the extent to which any license under such rights 353 might or might not be available; nor does it represent that it has 354 made any independent effort to identify any such rights. Information 355 on the procedures with respect to rights in RFC documents can be 356 found in BCP 78 and BCP 79. 358 Copies of IPR disclosures made to the IETF Secretariat and any 359 assurances of licenses to be made available, or the result of an 360 attempt made to obtain a general license or permission for the use of 361 such proprietary rights by implementers or users of this 362 specification can be obtained from the IETF on-line IPR repository at 363 http://www.ietf.org/ipr. 365 The IETF invites any interested party to bring to its attention any 366 copyrights, patents or patent applications, or other proprietary 367 rights that may cover technology that may be required to implement 368 this standard. Please address the information to the IETF at 369 ietf-ipr@ietf.org. 371 Disclaimer of Validity 373 This document and the information contained herein are provided on an 374 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 375 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 376 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 377 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 378 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 379 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 381 Copyright Statement 383 Copyright (C) The Internet Society (2006). This document is subject 384 to the rights, licenses and restrictions contained in BCP 78, and 385 except as set forth therein, the authors retain all their rights. 387 Acknowledgment 389 Funding for the RFC Editor function is currently provided by the 390 Internet Society.