idnits 2.17.1 draft-heffner-frag-harmful-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 395. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 406. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 413. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 419. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 25, 2007) is 6301 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2402 (Obsoleted by RFC 4302, RFC 4305) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Heffner 3 Internet-Draft M. Mathis 4 Expires: July 29, 2007 B. Chandler 5 PSC 6 January 25, 2007 8 IPv4 Reassembly Errors at High Data Rates 9 draft-heffner-frag-harmful-04 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on July 29, 2007. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 IPv4 fragmentation is not sufficiently robust for use under some 43 conditions in today's Internet. At high data rates, the 16-bit IP 44 identification field is not large enough to prevent frequent 45 incorrectly assembled IP fragments, and the TCP and UDP checksums are 46 insufficient to prevent the resulting corrupted datagrams from being 47 delivered to higher protocol layers. This note describes some easily 48 reproduced experiments demonstrating the problem, and discusses some 49 of the operational implications of these observations. 51 1. Introduction 53 The IPv4 header was designed at a time when data rates were several 54 orders of magnitude lower than those achievable today. This document 55 describes a consequent scale-related failure in the IP identification 56 (ID) field, where fragments may be incorrectly assembled at a rate 57 high enough likely to invalidate assumptions about data integrity 58 failure rates. 60 That IP fragmentation results in inefficient use of the network has 61 been well documented [Kent87]. This note presents a different kind 62 of problem, which can result not only in significant performance 63 degradation, but also frequent data corruption. This is especially 64 pertinent due to the recent proliferation of UDP bulk transport tools 65 that sometimes fragment every datagram. 67 Additionally, there is some network equipment that ignores the Don't 68 Fragment (DF) bit in the IP header to work around MTU discovery 69 problems [RFC2923]. This equipment indirectly exposes properly 70 implemented protocols and applications to corrupt data. 72 2. Wrapping the IP ID Field 74 The Internet Protocol standard specifies: 76 "The choice of the Identifier for a datagram is based on the need 77 to provide a way to uniquely identify the fragments of a 78 particular datagram. The protocol module assembling fragments 79 judges fragments to belong to the same datagram if they have the 80 same source, destination, protocol, and Identifier. Thus, the 81 sender must choose the Identifier to be unique for this source, 82 destination pair and protocol for the time the datagram (or any 83 fragment of it) could be alive in the Internet." [RFC0791] 85 Strict conformance to this standard limits transmissions in one 86 direction between any address pair to no more than 65536 packets per 87 protocol (e.g. TCP, UDP or ICMP) per maximum packet lifetime. 89 Clearly not all hosts follow this standard, because it implies an 90 unreasonably low maximum data rate. For example, a host sending 1500 91 byte packets with a 30 second maximum packet lifetime could send at 92 only about 26 Mbits/s before exceeding 65535 packets per packet 93 lifetime. Or, filling a 1 Gbit/s interface with 1500 byte packets 94 requires sending 65536 packets in less than 1 second, an unreasonably 95 short maximum packet lifetime, being less than the round-trip time on 96 some paths. This requirement is widely ignored. 98 Additionally, it is worth noting that re-using values in the IP ID 99 field once per 65536 datagrams is the best case. Some 100 implementations randomize the IP ID to prevent leaking information 101 out of the kernel [Bellovin02], which causes re-use of the IP ID 102 field to occur probabilistically at all sending rates. 104 IP receivers store fragments in a reassembly buffer until all 105 fragments in a datagram arrive, or until the reassembly timeout 106 expires (15 seconds is suggested in [RFC0791]). Fragments in a 107 datagram are associated with each other by their protocol number, the 108 value in their ID field, and by the source, destination address pair. 109 If a sender wraps the ID field in less than the reassembly timeout, 110 it becomes possible for fragments from different datagrams to be 111 incorrectly spliced together ("mis-associated"), and delivered to the 112 upper layer protocol. 114 A case of particular concern is when mis-association is self- 115 propagating. This occurs, for example, when there is reliable 116 ordering of packets and the first fragment of a datagram is lost in 117 the network. The rest of the fragments are stored in the fragment 118 reassembly buffer, and when the sender wraps the ID field, the first 119 fragment of the new datagram will be mis-associated with the rest of 120 the old datagram. The new datagram will be now be incomplete (since 121 it is missing its first fragment), so the rest of it will be saved in 122 the fragment reassembly buffer, forming a cycle that repeats every 123 65536 datagrams. It is possible to have a number of simultaneous 124 cycles, bounded by the size of the fragment reassembly buffer. 126 3. Harmful Effects of Mis-Associated Fragments 128 When the mis-associated fragments are delivered, transport-layer 129 checksumming should detect these datagrams as incorrect and discard 130 them. When the datagrams are discarded, it could pose a problem for 131 loss-feedback congestion control algorithms since there will be a 132 high number of non-congestion-related losses. 134 However, transport checksums may not be designed to handle such high 135 error rates, either. The TCP/UDP checksum is only 16 bits in length. 136 If these checksums follow a uniform random distribution, we expect 137 mis-associated datagrams to be accepted by the checksum at a rate of 138 one per 65536. With only one mis-association cycle, we expect 139 corrupt data delivered to the application layer once per 2^32 140 datagrams. This number can be significantly higher with multiple 141 cycles. 143 With non-random data, the TCP/UDP checksum may be even weaker still. 144 It is possible to construct datasets where mis-associated fragments 145 will always have the same checksum. Such a case may be considered 146 unlikely, but is worth considering. "Real" data may be more likely 147 than random data to cause checksum hot spots and increase the 148 probability of false checksum match [Stone98]. Also, some 149 applications or higher-level protocols may turn off checksumming to 150 increase speed, though this practice has been found to be dangerous 151 for other reasons when data reliability is important [Stone00]. 153 4. Experimental Observations 155 To test the practical impact of fragmentation on UDP, we ran a series 156 of experiments using a UDP bulk data transport protocol that was 157 designed to be used as an alternative to TCP for transporting large 158 data sets over specialized networks. The tool, Reliable Blast UDP 159 (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected 160 because it has a clean interface which facilitated automated 161 experiments. The decision to use RBUDP had little to do with the 162 details of the transport protocol itself. Any UDP transport protocol 163 that does not have additional means to detect corruption, and that 164 could be configured to use IP fragmentation, would have the same 165 results. 167 In order to diagnose corruption on files transferred with the UDP 168 bulk transfer tool, we used a file format that included embedded 169 sequence numbers and MD5 checksums in each fragment of each datagram. 170 Thus it was possible to distinguish random corruption from that 171 caused by mis-associated fragments. We used two different types of 172 files. One was constructed so that all the UDP checksums were 173 constant -- we will call this the "constant" dataset. The other was 174 constructed so that UDP checksums were uniformly random -- the 175 "random" dataset. All tests were done using 400 MB files, sent in 176 1524-byte datagrams so that they were fragmented on standard Fast 177 Ethernet with a 1500-byte MTU. 179 The UDP bulk file transport tool was used to send the datasets 180 between a pair of hosts at slightly less than the available data rate 181 (100 Mbps). Near the beginning of each flow, a brief secondary flow 182 was started to induce packet loss in the primary flow. Throughout 183 the life of the primary flow, we typically observed mis-association 184 rates on the order of a few hundredths of a percent. 186 Tests run with the "constant" dataset resulted in corruption on all 187 mis-associated fragments, that is, corruption on the order of a few 188 hundredths of a percent. In sending approximately 10 TB of "random" 189 datasets, we observed 8847668 UDP checksum errors and 121 corruptions 190 of the data due to mis-associated fragments. 192 5. Implications 194 Most TCP implementations today participate in MTU discovery 195 [RFC1191], which will avoid the problems described in this note by 196 avoiding IP fragmentation altogether. However, as a work-around for 197 MTU discovery problems [RFC2923], some TCP implementations and 198 communications gear provide mechanisms to disable path MTU discovery 199 by clearing or ignoring the DF bit. Doing so will expose all 200 protocols using IPv4, even those that participate in MTU discovery, 201 to mis-association errors. 203 A case particularly worth noting is that of tunnels encapsulating 204 payload in IPv4. To deal with difficulties in MTU Discovery 205 [RFC4459], tunnels may rely on fragmentation between the two 206 endpoints, even if the payload is marked with a DF bit [RFC4301]. In 207 such a mode, the two tunnel endpoints behave as IP end hosts, with 208 all tunneled traffic having the same protocol type. Thus, the 209 aggregate rate of tunneled packets may not exceed 65536 per maximum 210 packet lifetime, or tunneled data becomes exposed to possible mis- 211 association. Even protocols doing MTU discovery such as TCP will be 212 affected. 214 IPv6 is less vulnerable to this type of problem, since its fragment 215 header contains a 32-bit identification field [RFC2460]. Mis- 216 association will only be a problem at packet rates 65536 times higher 217 than for IPv4. 219 Since mis-association of fragments will only occur when the IP ID 220 field is wrapped within the fragment reassembly timeout, it may be 221 possible to reduce the timeout sufficiently so that mis-association 222 will not occur. However, there are a number of difficulties with 223 such an approach. Since the sender controls the rate of packets sent 224 and selection of IP ID, while the receiver controls the reassembly 225 timeout, there would need to be some mutual assurance between each 226 party as to participation in the scheme. Further, it is not 227 generally possible to set the timeout low enough so that a fast 228 sender's fragments will not be mis-associated, yet high enough so 229 that a slow sender's fragments will not be unconditionally discarded 230 before it is possible to reassemble them. So the timeout and IP ID 231 selection would need to be done on a per peer basis. Also, it is 232 likely NAT will break any per peer tables keyed by IP address. It is 233 not within the scope of this document to recommend solutions to these 234 problems. 236 Another means of solving the corruption issue is to add stronger 237 integrity checking, which can be done at any layer above IP. This is 238 a natural side effect of using cryptographic authentication. If 239 IPsec AH [RFC2402] is in use, the mis-associated fragments will be 240 discarded at the network layer with extremely high probability. Some 241 higher layers may use longer checksums (for example, SCTP's is 32 242 bits in length [RFC2960]) or cryptographic authentication (SSH 243 message authentication codes [RFC4251]). While stronger integrity 244 checking may prevent data corruption, it will not solve the problem 245 of a high effective loss rate. In the case of SSH, any stream 246 corruption results in immediate termination of the connection. 248 It is difficult to concisely describe all possible situations under 249 which fragments might be mis-associated. Even if an end host 250 carefully follows the specification, ensuring unique IP IDs, the 251 presence of NATs or tunnels may expose applications to IP ID space 252 conflicts. A fragmenting application that sends at a low rate might 253 possibly be exposed when running simultaneously with a non- 254 fragmenting application that sends at a high rate. There are some 255 possible work-arounds that receivers might implement to reduce the 256 possibility of conflict, but there is no mechanism in place for a 257 sender to know what the receiver is doing in this respect. As a 258 consequence, there is no general mechanism for an application that is 259 using IPv4 fragmentation to know if it is deterministically or 260 statistically protected from mis-associated fragments. 262 In general, applications that rely on IPv4 fragmentation should be 263 written with these issues in mind, as well as those issues documented 264 in [Kent87]. Applications that rely on IPv4 fragmentation while 265 sending at high speeds, and devices that deliberately introduce 266 fragmentation to otherwise unfragmented traffic (e.g., tunnels) 267 should be particularly cautious, and introduce strong mechanisms to 268 ensure data integrity. 270 6. Security Considerations 272 If a malicious entity knows that a pair of hosts are communicating 273 using a fragmented stream, it may present an opportunity for this 274 entity to corrupt the flow. By sending "high" fragments (those with 275 offset greater than zero) with a forged source address, the attacker 276 can deliberately cause corruption as described above. Exploiting 277 this vulnerability requires only knowledge of the source and 278 destination addresses of the flow, its protocol number, and fragment 279 boundaries. It does not require knowledge of port or sequence 280 numbers. 282 If the attacker has visibility of packets on the path, the attack 283 profile is similar to injecting full segments. Using this attack 284 makes blind disruptions easier, and might possibly be used to cause 285 degradation of service. We believe only streams using IPv4 286 fragmentation are likely vulnerable. Because of the nature of the 287 problems outlined in this draft, the use of IPv4 fragmentation for 288 critical applications may not be advisable regardless of security 289 concerns. 291 7. IANA Considerations 293 None. 295 8. Informative References 297 [Kent87] Kent, C. and J. Mogul, "Fragmentation considered harmful", 298 Proc. SIGCOMM '87 vol. 17, No. 5, October 1987. 300 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 301 RFC 2923, September 2000. 303 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 304 September 1981. 306 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 307 November 1990. 309 [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 310 "Performance of Checksums and CRC's over Real Data", IEEE/ 311 ACM Transactions on Networking vol. 6, No. 5, 312 October 1998. 314 [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP Checksum 315 Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, 316 October 2000. 318 [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., 319 Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for 320 high performance data delivery over photonic networks", 321 Future Generation Computer Systems Vol. 19, No. 6, 322 August 2003. 324 [Bellovin02] 325 Bellovin, S., "A Technique for Counting NATted Hosts", 326 November 2002. 328 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 329 (IPv6) Specification", RFC 2460, December 1998. 331 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 332 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 333 Zhang, L., and V. Paxson, "Stream Control Transmission 334 Protocol", RFC 2960, October 2000. 336 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", 337 RFC 2402, November 1998. 339 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 340 Protocol Architecture", RFC 4251, January 2006. 342 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 343 Internet Protocol", RFC 4301, December 2005. 345 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 346 Network Tunneling", RFC 4459, April 2006. 348 Appendix A. Acknowledgements 350 This work was supported by the National Science Foundation under 351 Grant No. 0083285. 353 Authors' Addresses 355 John W. Heffner 356 Pittsburgh Supercomputing Center 357 4400 Fifth Avenue 358 Pittsburgh, PA 15213 359 US 361 Phone: 412-268-2329 362 Email: jheffner@psc.edu 364 Matt Mathis 365 Pittsburgh Supercomputing Center 366 4400 Fifth Avenue 367 Pittsburgh, PA 15213 368 US 370 Phone: 412-268-3319 371 Email: mathis@psc.edu 372 Ben Chandler 373 Pittsburgh Supercomputing Center 374 4400 Fifth Avenue 375 Pittsburgh, PA 15213 376 US 378 Phone: 412-268-9783 379 Email: bchandle@psc.edu 381 Full Copyright Statement 383 Copyright (C) The IETF Trust (2007). 385 This document is subject to the rights, licenses and restrictions 386 contained in BCP 78, and except as set forth therein, the authors 387 retain all their rights. 389 This document and the information contained herein are provided on an 390 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 391 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 392 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 393 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 394 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 395 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 397 Intellectual Property 399 The IETF takes no position regarding the validity or scope of any 400 Intellectual Property Rights or other rights that might be claimed to 401 pertain to the implementation or use of the technology described in 402 this document or the extent to which any license under such rights 403 might or might not be available; nor does it represent that it has 404 made any independent effort to identify any such rights. Information 405 on the procedures with respect to rights in RFC documents can be 406 found in BCP 78 and BCP 79. 408 Copies of IPR disclosures made to the IETF Secretariat and any 409 assurances of licenses to be made available, or the result of an 410 attempt made to obtain a general license or permission for the use of 411 such proprietary rights by implementers or users of this 412 specification can be obtained from the IETF on-line IPR repository at 413 http://www.ietf.org/ipr. 415 The IETF invites any interested party to bring to its attention any 416 copyrights, patents or patent applications, or other proprietary 417 rights that may cover technology that may be required to implement 418 this standard. Please address the information to the IETF at 419 ietf-ipr@ietf.org. 421 Acknowledgment 423 Funding for the RFC Editor function is provided by the IETF 424 Administrative Support Activity (IASA).