idnits 2.17.1 draft-heffner-frag-harmful-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 416. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 427. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 434. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 440. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 4, 2007) is 6231 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2402 (Obsoleted by RFC 4302, RFC 4305) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Heffner 3 Internet-Draft M. Mathis 4 Expires: October 6, 2007 B. Chandler 5 PSC 6 April 4, 2007 8 IPv4 Reassembly Errors at High Data Rates 9 draft-heffner-frag-harmful-05 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on October 6, 2007. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 IPv4 fragmentation is not sufficiently robust for use under some 43 conditions in today's Internet. At high data rates, the 16-bit IP 44 identification field is not large enough to prevent frequent 45 incorrectly assembled IP fragments, and the TCP and UDP checksums are 46 insufficient to prevent the resulting corrupted datagrams from being 47 delivered to higher protocol layers. This note describes some easily 48 reproduced experiments demonstrating the problem, and discusses some 49 of the operational implications of these observations. 51 1. Introduction 53 The IPv4 header was designed at a time when data rates were several 54 orders of magnitude lower than those achievable today. This document 55 describes a consequent scale-related failure in the IP identification 56 (ID) field, where fragments may be incorrectly assembled at a rate 57 high enough likely to invalidate assumptions about data integrity 58 failure rates. 60 That IP fragmentation results in inefficient use of the network has 61 been well documented [Kent87]. This note presents a different kind 62 of problem, which can result not only in significant performance 63 degradation, but also frequent data corruption. This is especially 64 pertinent due to the recent proliferation of UDP bulk transport tools 65 that sometimes fragment every datagram. 67 Additionally, there is some network equipment that ignores the Don't 68 Fragment (DF) bit in the IP header to work around MTU discovery 69 problems [RFC2923]. This equipment indirectly exposes properly 70 implemented protocols and applications to corrupt data. 72 2. Wrapping the IP ID Field 74 The Internet Protocol standard specifies: 76 "The choice of the Identifier for a datagram is based on the need 77 to provide a way to uniquely identify the fragments of a 78 particular datagram. The protocol module assembling fragments 79 judges fragments to belong to the same datagram if they have the 80 same source, destination, protocol, and Identifier. Thus, the 81 sender must choose the Identifier to be unique for this source, 82 destination pair and protocol for the time the datagram (or any 83 fragment of it) could be alive in the Internet." [RFC0791] 85 Strict conformance to this standard limits transmissions in one 86 direction between any address pair to no more than 65536 packets per 87 protocol (e.g. TCP, UDP or ICMP) per maximum packet lifetime. 89 Clearly not all hosts follow this standard, because it implies an 90 unreasonably low maximum data rate. For example, a host sending 1500 91 byte packets with a 30 second maximum packet lifetime could send at 92 only about 26 Mbits/s before exceeding 65535 packets per packet 93 lifetime. Or, filling a 1 Gbit/s interface with 1500 byte packets 94 requires sending 65536 packets in less than 1 second, an unreasonably 95 short maximum packet lifetime, being less than the round-trip time on 96 some paths. This requirement is widely ignored. 98 Additionally, it is worth noting that re-using values in the IP ID 99 field once per 65536 datagrams is the best case. Some 100 implementations randomize the IP ID to prevent leaking information 101 out of the kernel [Bellovin02], which causes re-use of the IP ID 102 field to occur probabilistically at all sending rates. 104 IP receivers store fragments in a reassembly buffer until all 105 fragments in a datagram arrive, or until the reassembly timeout 106 expires (15 seconds is suggested in [RFC0791]). Fragments in a 107 datagram are associated with each other by their protocol number, the 108 value in their ID field, and by the source, destination address pair. 109 If a sender wraps the ID field in less than the reassembly timeout, 110 it becomes possible for fragments from different datagrams to be 111 incorrectly spliced together ("mis-associated"), and delivered to the 112 upper layer protocol. 114 A case of particular concern is when mis-association is self- 115 propagating. This occurs, for example, when there is reliable 116 ordering of packets and the first fragment of a datagram is lost in 117 the network. The rest of the fragments are stored in the fragment 118 reassembly buffer, and when the sender wraps the ID field, the first 119 fragment of the new datagram will be mis-associated with the rest of 120 the old datagram. The new datagram will be now be incomplete (since 121 it is missing its first fragment), so the rest of it will be saved in 122 the fragment reassembly buffer, forming a cycle that repeats every 123 65536 datagrams. It is possible to have a number of simultaneous 124 cycles, bounded by the size of the fragment reassembly buffer. 126 IPv6 is considerably less vulnerable to this type of problem, since 127 its fragment header contains a 32-bit identification field [RFC2460]. 128 Mis-association will only be a problem at packet rates 65536 times 129 higher than for IPv4. 131 3. Effects of Mis-Associated Fragments 133 When the mis-associated fragments are delivered, transport-layer 134 checksumming should detect these datagrams as incorrect and discard 135 them. When the datagrams are discarded, it could create a 136 performance problem for loss-feedback congestion control algorithms, 137 particularly when a large congestion window is required, since it 138 will introduce a certain amount of non-congestive loss. 140 Transport checksums, however, may not be designed to handle such high 141 error rates. The TCP/UDP checksum is only 16 bits in length. If 142 these checksums follow a uniform random distribution, we expect mis- 143 associated datagrams to be accepted by the checksum at a rate of one 144 per 65536. With only one mis-association cycle, we expect corrupt 145 data delivered to the application layer once per 2^32 datagrams. 146 This number can be significantly higher with multiple concurrent 147 cycles. 149 With non-random data, the TCP/UDP checksum may be even weaker still. 150 It is possible to construct datasets where mis-associated fragments 151 will always have the same checksum. Such a case may be considered 152 unlikely, but is worth considering. "Real" data may be more likely 153 than random data to cause checksum hot spots and increase the 154 probability of false checksum match [Stone98]. Also, some 155 applications or higher-level protocols may turn off checksumming to 156 increase speed, though this practice has been found to be dangerous 157 for other reasons when data reliability is important [Stone00]. 159 4. Experimental Observations 161 To test the practical impact of fragmentation on UDP, we ran a series 162 of experiments using a UDP bulk data transport protocol that was 163 designed to be used as an alternative to TCP for transporting large 164 data sets over specialized networks. The tool, Reliable Blast UDP 165 (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected 166 because it has a clean interface which facilitated automated 167 experiments. The decision to use RBUDP had little to do with the 168 details of the transport protocol itself. Any UDP transport protocol 169 that does not have additional means to detect corruption, and that 170 could be configured to use IP fragmentation, would have the same 171 results. 173 In order to diagnose corruption on files transferred with the UDP 174 bulk transfer tool, we used a file format that included embedded 175 sequence numbers and MD5 checksums in each fragment of each datagram. 176 Thus it was possible to distinguish random corruption from that 177 caused by mis-associated fragments. We used two different types of 178 files. One was constructed so that all the UDP checksums were 179 constant -- we will call this the "constant" dataset. The other was 180 constructed so that UDP checksums were uniformly random -- the 181 "random" dataset. All tests were done using 400 MB files, sent in 182 1524-byte datagrams so that they were fragmented on standard Fast 183 Ethernet with a 1500-byte MTU. 185 The UDP bulk file transport tool was used to send the datasets 186 between a pair of hosts at slightly less than the available data rate 187 (100 Mbps). Near the beginning of each flow, a brief secondary flow 188 was started to induce packet loss in the primary flow. Throughout 189 the life of the primary flow, we typically observed mis-association 190 rates on the order of a few hundredths of a percent. 192 Tests run with the "constant" dataset resulted in corruption on all 193 mis-associated fragments, that is, corruption on the order of a few 194 hundredths of a percent. In sending approximately 10 TB of "random" 195 datasets, we observed 8847668 UDP checksum errors and 121 corruptions 196 of the data due to mis-associated fragments. 198 5. Preventing Mis-Association 200 The most straightforward way to avoid mis-association is to avoid 201 fragmentation altogether by implementing Path MTU Discovery [RFC1191] 202 [RFC4821]. However, this is not always feasible for all 203 applications. Further, as a work-around for MTU discovery problems 204 [RFC2923], some TCP implementations and communications gear provide 205 mechanisms to disable path MTU discovery by clearing or ignoring the 206 DF bit. Doing so will expose all protocols using IPv4, even those 207 that participate in MTU discovery, to mis-association errors. 209 If IP fragmentation is in use, it may be possible to reduce the 210 timeout sufficiently so that mis-association will not occur. 211 However, there are a number of difficulties with such an approach. 212 Since the sender controls the rate of packets sent and selection of 213 IP ID, while the receiver controls the reassembly timeout, there 214 would need to be some mutual assurance between each party as to 215 participation in the scheme. Further, it is not generally possible 216 to set the timeout low enough so that a fast sender's fragments will 217 not be mis-associated, yet high enough so that a slow sender's 218 fragments will not be unconditionally discarded before it is possible 219 to reassemble them. Therefore, the timeout and IP ID selection would 220 need to be done on a per peer basis. Also, it is likely NAT will 221 break any per peer tables keyed by IP address. It is not within the 222 scope of this document to recommend solutions to these problems, 223 though we believe a per-peer adaptive timeout is likely to prevent 224 mis-association under circumstances where it would most commonly 225 occur. 227 A case particularly worth noting is that of tunnels encapsulating 228 payload in IPv4. To deal with difficulties in MTU Discovery 229 [RFC4459], tunnels may rely on fragmentation between the two 230 endpoints, even if the payload is marked with a DF bit [RFC4301]. In 231 such a mode, the two tunnel endpoints behave as IP end hosts, with 232 all tunneled traffic having the same protocol type. Thus, the 233 aggregate rate of tunneled packets may not exceed 65536 per maximum 234 packet lifetime, or tunneled data becomes exposed to possible mis- 235 association. Even protocols doing MTU discovery such as TCP will be 236 affected. Operators of tunnels should ensure that the receiving 237 end's reassembly timeout is short enough that mis-association cannot 238 occur given the tunnel's maximum rate. 240 6. Mitigating Mis-Association 242 It is difficult to concisely describe all possible situations under 243 which fragments might be mis-associated. Even if an end host 244 carefully follows the specification, ensuring unique IP IDs, the 245 presence of NATs or tunnels may expose applications to IP ID space 246 conflicts. Further, devices in the network that the end hosts cannot 247 see or control, such as tunnels, may cause mis-association. Even a 248 fragmenting application that sends at a low rate might possibly be 249 exposed when running simultaneously with a non-fragmenting 250 application that sends at a high rate. As described above, the 251 receiver might implement to reduce or eliminate the possibility of 252 conflict, but there is no mechanism in place for a sender to know 253 what the receiver is doing in this respect. As a consequence, there 254 is no general mechanism for an application that is using IPv4 255 fragmentation to know if it is deterministically or statistically 256 protected from mis-associated fragments. 258 Under circumstances when it is impossible or impractical to prevent 259 mis-assiciation, its effects may be mitigated by use of stronger 260 integrity checking at any layer above IP. This is a natural side 261 effect of using cryptographic authentication. For example, IPsec AH 262 [RFC2402] will discard any corrupted datagrams, preventing their 263 deliver to upper layers. A stronger transport layer checksum such as 264 SCTP's, which is 32 bits in length [RFC2960], may help significantly. 265 At the application layer, SSH message authentication codes [RFC4251] 266 will prevent delivery of corrupted data, though since the TCP 267 connection underneath is not protected, it is considered invalid and 268 the session is immediately terminated. While stronger integrity 269 checking may prevent data corruption, it will not prevent the 270 potential performance impact described above of non-congestive loss 271 on congestion control at high congestion windows. 273 It should also be noted that mis-association is not the only possible 274 source of data corruption above the network layer [Stone00]. Most 275 applications for which data integrity is critically imporatant should 276 implement strong integrity checking regardless of exposure to mis- 277 association. 279 In general, applications that rely on IPv4 fragmentation should be 280 written with these issues in mind, as well as those issues documented 281 in [Kent87]. Applications that rely on IPv4 fragmentation while 282 sending at high speeds (the order of 100 Mbps or higher), and devices 283 that deliberately introduce fragmentation to otherwise unfragmented 284 traffic (e.g., tunnels) should be particularly cautious, and 285 introduce strong mechanisms to ensure data integrity. 287 7. Security Considerations 289 If a malicious entity knows that a pair of hosts are communicating 290 using a fragmented stream, it may present an opportunity for this 291 entity to corrupt the flow. By sending "high" fragments (those with 292 offset greater than zero) with a forged source address, the attacker 293 can deliberately cause corruption as described above. Exploiting 294 this vulnerability requires only knowledge of the source and 295 destination addresses of the flow, its protocol number, and fragment 296 boundaries. It does not require knowledge of port or sequence 297 numbers. 299 If the attacker has visibility of packets on the path, the attack 300 profile is similar to injecting full segments. Using this attack 301 makes blind disruptions easier, and might possibly be used to cause 302 degradation of service. We believe only streams using IPv4 303 fragmentation are likely vulnerable. Because of the nature of the 304 problems outlined in this draft, the use of IPv4 fragmentation for 305 critical applications may not be advisable regardless of security 306 concerns. 308 8. IANA Considerations 310 None. 312 9. Informative References 314 [Kent87] Kent, C. and J. Mogul, "Fragmentation considered harmful", 315 Proc. SIGCOMM '87 vol. 17, No. 5, October 1987. 317 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 318 RFC 2923, September 2000. 320 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 321 September 1981. 323 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 324 November 1990. 326 [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 327 "Performance of Checksums and CRC's over Real Data", IEEE/ 328 ACM Transactions on Networking vol. 6, No. 5, 329 October 1998. 331 [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP Checksum 332 Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, 333 October 2000. 335 [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., 336 Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for 337 high performance data delivery over photonic networks", 338 Future Generation Computer Systems Vol. 19, No. 6, 339 August 2003. 341 [Bellovin02] 342 Bellovin, S., "A Technique for Counting NATted Hosts", 343 November 2002. 345 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 346 (IPv6) Specification", RFC 2460, December 1998. 348 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 349 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 350 Zhang, L., and V. Paxson, "Stream Control Transmission 351 Protocol", RFC 2960, October 2000. 353 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", 354 RFC 2402, November 1998. 356 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 357 Protocol Architecture", RFC 4251, January 2006. 359 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 360 Internet Protocol", RFC 4301, December 2005. 362 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 363 Network Tunneling", RFC 4459, April 2006. 365 [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU 366 Discovery", RFC 4821, March 2007. 368 Appendix A. Acknowledgements 370 This work was supported by the National Science Foundation under 371 Grant No. 0083285. 373 Authors' Addresses 375 John W. Heffner 376 Pittsburgh Supercomputing Center 377 4400 Fifth Avenue 378 Pittsburgh, PA 15213 379 US 381 Phone: 412-268-2329 382 Email: jheffner@psc.edu 384 Matt Mathis 385 Pittsburgh Supercomputing Center 386 4400 Fifth Avenue 387 Pittsburgh, PA 15213 388 US 390 Phone: 412-268-3319 391 Email: mathis@psc.edu 393 Ben Chandler 394 Pittsburgh Supercomputing Center 395 4400 Fifth Avenue 396 Pittsburgh, PA 15213 397 US 399 Phone: 412-268-9783 400 Email: bchandle@psc.edu 402 Full Copyright Statement 404 Copyright (C) The IETF Trust (2007). 406 This document is subject to the rights, licenses and restrictions 407 contained in BCP 78, and except as set forth therein, the authors 408 retain all their rights. 410 This document and the information contained herein are provided on an 411 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 412 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 413 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 414 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 415 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 416 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 418 Intellectual Property 420 The IETF takes no position regarding the validity or scope of any 421 Intellectual Property Rights or other rights that might be claimed to 422 pertain to the implementation or use of the technology described in 423 this document or the extent to which any license under such rights 424 might or might not be available; nor does it represent that it has 425 made any independent effort to identify any such rights. Information 426 on the procedures with respect to rights in RFC documents can be 427 found in BCP 78 and BCP 79. 429 Copies of IPR disclosures made to the IETF Secretariat and any 430 assurances of licenses to be made available, or the result of an 431 attempt made to obtain a general license or permission for the use of 432 such proprietary rights by implementers or users of this 433 specification can be obtained from the IETF on-line IPR repository at 434 http://www.ietf.org/ipr. 436 The IETF invites any interested party to bring to its attention any 437 copyrights, patents or patent applications, or other proprietary 438 rights that may cover technology that may be required to implement 439 this standard. Please address the information to the IETF at 440 ietf-ipr@ietf.org. 442 Acknowledgment 444 Funding for the RFC Editor function is provided by the IETF 445 Administrative Support Activity (IASA).