idnits 2.17.1 draft-heffner-frag-harmful-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 363. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 340. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 347. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 353. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 22, 2006) is 6518 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 2460 (Obsoleted by RFC 8200) -- Obsolete informational reference (is this intentional?): RFC 2960 (Obsoleted by RFC 4960) -- Obsolete informational reference (is this intentional?): RFC 2402 (Obsoleted by RFC 4302, RFC 4305) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Heffner 3 Internet-Draft M. Mathis 4 Expires: December 24, 2006 B. Chandler 5 PSC 6 June 22, 2006 8 Fragmentation Considered Very Harmful 9 draft-heffner-frag-harmful-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on December 24, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 IPv4 fragmentation is not sufficiently robust for general use in 43 today's Internet. The 16-bit IP identification field is not large 44 enough to prevent frequent incorrectly assembled IP fragments, and 45 the TCP and UDP checksums are insufficient to prevent the resulting 46 corrupted datagrams from being delivered to higher protocol layers. 47 This note describes some easily reproduced experiments demonstrating 48 the problem, and discusses some of the operational implications of 49 these observations. 51 1. Introduction 53 The IPv4 header was designed at a time when data rates were several 54 orders of magnitude lower than those achievable today. This document 55 describes a consequent scale-related failure in the IP identification 56 (ID) field, where fragments may be incorrectly assembled at a rate 57 high enough likely to invalidate assumptions about data integrity 58 failure rates. 60 That IP fragmentation results in inefficient use of the network has 61 been well documented [Kent87]. This note presents a different kind 62 of problem, which can result not only in significant performance 63 degradation, but also frequent data corruption. This is especially 64 pertinent due to the recent proliferation of UDP bulk transport tools 65 that sometimes fragment every datagram. 67 Additionally, there is some network equipment that ignores the Don't 68 Fragment (DF) bit in the IP header to work around MTU discovery 69 problems [RFC2923]. This equipment indirectly exposes properly 70 implemented protocols and applications to corrupt data. 72 2. Wrapping the IP ID Field 74 The Internet Protocol standard specifies: 76 "The choice of the Identifier for a datagram is based on the need 77 to provide a way to uniquely identify the fragments of a 78 particular datagram. The protocol module assembling fragments 79 judges fragments to belong to the same datagram if they have the 80 same source, destination, protocol, and Identifier. Thus, the 81 sender must choose the Identifier to be unique for this source, 82 destination pair and protocol for the time the datagram (or any 83 fragment of it) could be alive in the Internet." [RFC0791] 85 Strict conformance to this standard limits transmissions in one 86 direction between any address pair to no more than 65536 packets per 87 protocol (e.g. TCP, UDP or ICMP) per maximum packet lifetime. 89 Clearly not all hosts will follow this standard, because it implies 90 an unreasonably low maximum data rate. For example, a host sending 91 1500 byte packets with a 30 second maximum packet lifetime could send 92 at only about 26 Mbits/s before exceeding 65535 packets per packet 93 lifetime. Or, filling a 1 Gbit/s interface with 1500 byte packets 94 requires sending 65536 packets in less than 1 second, an unreasonably 95 short maximum packet lifetime, being less than the round-trip time on 96 some paths. This requirement is widely ignored. 98 IP receivers store fragments in a reassembly buffer until all 99 fragments in a datagram arrive, or until the reassembly timeout 100 expires (15 seconds is suggested in [RFC0791]). Fragments in a 101 datagram are associated with each other by the value in their ID 102 field, and by the source, destination address pair. If a sender 103 wraps the ID field in less than the reassembly timeout, it becomes 104 possible for fragments from different datagrams to be incorrectly 105 spliced together ("mis-associated"), and delivered to the upper layer 106 protocol. 108 A case of particular concern is when mis-association is self- 109 propagating. This occurs, for example, when there is reliable 110 ordering of packets and the first fragment of a datagram is lost in 111 the network. The rest of the fragments are stored in the fragment 112 reassembly buffer, and when the sender wraps the ID field, the first 113 fragment of the new datagram will be mis-associated with the rest of 114 the old datagram. The new datagram will be now be incomplete (since 115 it is missing its first fragment), so the rest of it will be saved in 116 the fragment reassembly buffer, forming a cycle that repeats every 117 65536 datagrams. It is possible to have a number of simultaneous 118 cycles, bounded by the size of the fragment reassembly buffer. 120 3. Harmful Effects of Mis-Associated Fragments 122 When the mis-associated fragments are delivered, transport-layer 123 checksumming should detect these datagrams as incorrect and discard 124 them. When the datagrams are discarded, it could pose a problem for 125 loss-feedback congestion control algorithms since there will be a 126 high number of non-congestion-related losses. 128 However, transport checksums may not be designed to handle such high 129 error rates, either. The TCP/UDP checksum is only 16 bits in length. 130 If these checksums follow a uniform random distribution, we expect 131 mis-associated datagrams to be accepted by the checksum at a rate of 132 one per 65536. With only one mis-association cycle, we expect 133 corrupt data delivered to the application layer once per 2^32 134 datagrams. This number can be significantly higher with multiple 135 cycles. 137 With non-random data, the TCP/UDP checksum may be even weaker still. 138 It is possible to construct datasets where mis-associated fragments 139 will always have the same checksum. Such a case may be considered 140 unlikely, but is worth considering. "Real" data may be more likely 141 than random data to cause checksum hot spots and increase the 142 probability of false checksum match [Stone98]. Also, some 143 applications or higher-level protocols may turn off checksumming to 144 increase speed, though this practice has been found to be dangerous 145 for other reasons when data reliability is important [Stone00]. 147 4. Experimental Observations 149 To test the practical impact of fragmentation on UDP, we ran a series 150 of experiments using a UDP bulk data transport protocol that was 151 designed to be used as an alternative to TCP for transporting large 152 data sets over specialized networks. The tool, Reliable Blast UDP 153 (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected 154 because it has a clean interface which facilitated automated 155 experiments. The decision to use RBUDP had little to do with the 156 details of the transport protocol itself. Any UDP transport protocol 157 that does not have additional means to detect corruption, and that 158 could be configured to use IP fragmentation, would have the same 159 results. 161 In order to diagnose corruption on files transferred with the UDP 162 bulk transfer tool, we used a file format that included embedded 163 sequence numbers and MD5 checksums in each fragment of each datagram. 164 Thus it was possible to distinguish random corruption from that 165 caused by mis-associated fragments. We used two different types of 166 files. One was constructed so that all the UDP checksums were 167 constant -- we will call this the "constant" dataset. The other was 168 constructed so that UDP checksums were uniformly random -- the 169 "random" dataset. All tests were done using 400 MB files. 171 The UDP bulk file transport tool was used to send the datasets 172 between a pair of hosts at slightly less than the available data rate 173 (100 Mbps). Near the beginning of each flow, a brief secondary flow 174 was started to induce packet loss in the primary flow. Throughout 175 the life of the primary flow, we typically observed mis-association 176 rates on the order of a few hundredths of a percent. 178 Tests run with the "constant" dataset resulted in corruption on all 179 mis-associated fragments, that is, corruption on the order of a few 180 hundredths of a percent. In sending approximately 10 TB of "random" 181 datasets, we observed 8847668 UDP checksum errors and 121 corruptions 182 of the data due to mis-associated fragments. 184 5. Implications 186 Most TCP implementations today participate in MTU discovery 187 [RFC1191], which will avoid the problems described in this note by 188 avoiding IP fragmentation altogether. However, as a work-around for 189 MTU discovery problems [RFC2923], some TCP implementations and 190 communications gear provide mechanisms to disable path MTU discovery 191 by clearing or ignoring the DF bit. Doing so will expose all 192 protocols using IPv4, even those which participate in MTU discovery, 193 to mis-association errors. 195 IPv6 is less vulnerable to this type of problem, since its fragment 196 header contains a 32-bit identification field [RFC2460]. Mis- 197 association will only be a problem at packet rates 65536 times higher 198 than for IPv4. 200 Since mis-association of fragments will only occur when the IP ID 201 field is wrapped within the fragment reassembly timeout, it may be 202 possible to reduce the timeout sufficiently so that mis-association 203 will not occur. However, there are a number of difficulties with 204 such an approach. Since the sender controls the rate of packets sent 205 and selection of IP ID, while the receiver controls the reassembly 206 timeout, there would need to be some mutual assurance between each 207 party as to participation in the scheme. Further, it is not 208 generally possible to set the timeout low enough so that a fast 209 sender's fragments will not be mis-associated, yet high enough so 210 that a slow sender's fragments will not be unconditionally discarded 211 before it is possible to reassemble them. So the timeout and IP ID 212 selection would need to be done on a per peer basis. Also, it is 213 likely NAT will break any per peer tables keyed by IP address. It is 214 not within the scope of this document to recommend solutions to these 215 problems. 217 Another means of solving the corruption issue is to add stronger 218 integrity checking, which can be done at any layer above IP. This is 219 a natural side effect of using cryptographic authentication. If 220 IPsec AH [RFC2402] is in use, the mis-associated fragments will be 221 discarded at the network layer with extremely high probability. Some 222 higher layers may use longer checksums (for example, SCTP's is 32 223 bits in length [RFC2960]) or cryptographic authentication (SSH 224 message authentication codes [RFC4251]). While stronger integrity 225 checking may prevent data corruption, it will not solve the problem 226 of a high effective loss rate. In the case of SSH, any stream 227 corruption results in immediate termination of the connection. 229 6. Security Considerations 231 If a malicious entity knows that a pair of hosts are communicating 232 using a fragmented stream, it may present an opportunity for this 233 entity to corrupt the flow. By sending "high" fragments (those with 234 offset greater than zero) with a forged source address, the attacker 235 can deliberately cause corruption as described above. Exploiting 236 this vulnerability requires only knowledge of the source and 237 destination addresses of the flow, its protocol number, and fragment 238 boundaries. It does not require knowledge of port or sequence 239 numbers. 241 If the attacker has visibility of packets on the path, the attack 242 profile is similar to injecting full segments. Using this attack 243 makes blind disruptions easier, and could certainly be used 244 effectively to cause denial of service. However, only streams using 245 IPv4 fragmentation are vulnerable. Because of the nature of the 246 problems outlined in this draft, the use of IPv4 fragmentation for 247 critical applications may not be advisable regardless of security 248 concerns. 250 7. IANA Considerations 252 None. 254 8. Informative References 256 [Kent87] Kent, C. and J. Mogul, "Fragmentation considered harmful", 257 Proc. SIGCOMM '87 vol. 17, No. 5, October 1987. 259 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 260 RFC 2923, September 2000. 262 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 263 September 1981. 265 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 266 November 1990. 268 [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 269 "Performance of Checksums and CRC's over Real Data", IEEE/ 270 ACM Transactions on Networking vol. 6, No. 5, 271 October 1998. 273 [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP Checksum 274 Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, 275 October 2000. 277 [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., 278 Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for 279 high performance data delivery over photonic networks", 280 Future Generation Computer Systems Vol. 19, No. 6, 281 August 2003. 283 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 284 (IPv6) Specification", RFC 2460, December 1998. 286 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 287 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 288 Zhang, L., and V. Paxson, "Stream Control Transmission 289 Protocol", RFC 2960, October 2000. 291 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", 292 RFC 2402, November 1998. 294 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 295 Protocol Architecture", RFC 4251, January 2006. 297 Appendix A. Acknowledgements 299 This work was supported by the National Science Foundation under 300 Grant No. 0083285. 302 Authors' Addresses 304 John W. Heffner 305 Pittsburgh Supercomputing Center 306 4400 Fifth Avenue 307 Pittsburgh, PA 15213 308 US 310 Phone: 412-268-2329 311 Email: jheffner@psc.edu 313 Matt Mathis 314 Pittsburgh Supercomputing Center 315 4400 Fifth Avenue 316 Pittsburgh, PA 15213 317 US 319 Phone: 412-268-3319 320 Email: mathis@psc.edu 322 Ben Chandler 323 Pittsburgh Supercomputing Center 324 4400 Fifth Avenue 325 Pittsburgh, PA 15213 326 US 328 Phone: 412-268-9783 329 Email: bchandle@psc.edu 331 Intellectual Property Statement 333 The IETF takes no position regarding the validity or scope of any 334 Intellectual Property Rights or other rights that might be claimed to 335 pertain to the implementation or use of the technology described in 336 this document or the extent to which any license under such rights 337 might or might not be available; nor does it represent that it has 338 made any independent effort to identify any such rights. Information 339 on the procedures with respect to rights in RFC documents can be 340 found in BCP 78 and BCP 79. 342 Copies of IPR disclosures made to the IETF Secretariat and any 343 assurances of licenses to be made available, or the result of an 344 attempt made to obtain a general license or permission for the use of 345 such proprietary rights by implementers or users of this 346 specification can be obtained from the IETF on-line IPR repository at 347 http://www.ietf.org/ipr. 349 The IETF invites any interested party to bring to its attention any 350 copyrights, patents or patent applications, or other proprietary 351 rights that may cover technology that may be required to implement 352 this standard. Please address the information to the IETF at 353 ietf-ipr@ietf.org. 355 Disclaimer of Validity 357 This document and the information contained herein are provided on an 358 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 359 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 360 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 361 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 362 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 363 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 365 Copyright Statement 367 Copyright (C) The Internet Society (2006). This document is subject 368 to the rights, licenses and restrictions contained in BCP 78, and 369 except as set forth therein, the authors retain all their rights. 371 Acknowledgment 373 Funding for the RFC Editor function is currently provided by the 374 Internet Society.