idnits 2.17.1 draft-heffner-frag-harmful-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 356. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 333. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 340. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 346. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 21, 2006) is 6579 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'Kent87' ** Downref: Normative reference to an Informational RFC: RFC 2923 -- Possible downref: Non-RFC (?) normative reference: ref. 'Stone98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Stone00' -- Possible downref: Non-RFC (?) normative reference: ref. 'QUANTA' ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 2960 (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 2402 (Obsoleted by RFC 4302, RFC 4305) Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Heffner 3 Internet-Draft M. Mathis 4 Expires: October 23, 2006 B. Chandler 5 PSC 6 April 21, 2006 8 Fragmentation Considered Very Harmful 9 draft-heffner-frag-harmful-01 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on October 23, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 IPv4 fragmentation is not sufficiently robust for general use in 43 today's Internet. The 16-bit IP identification field is not large 44 enough to prevent frequent incorrectly assembled IP fragments, and 45 the TCP and UDP checksums are insufficient to prevent the resulting 46 corrupted datagrams from being delivered to higher protocol layers. 47 This note describes some easily reproduced experiments demonstrating 48 the problem, and discusses some of the operational implications of 49 these observations. 51 1. Introduction 53 The IPv4 header was designed at a time when data rates were several 54 orders of magnitude lower than those achievable today. This document 55 describes a consequent scale-related failure in the IP identification 56 (ID) field, where fragments may be incorrectly assembled at a rate 57 high enough likely to invalidate assumptions about data integrity 58 failure rates. 60 That IP fragmentation results in inefficient use of the network has 61 been well documented [Kent87]. This note presents a different kind 62 of problem, which can result not only in significant performance 63 degradation, but also frequent data corruption. This is especially 64 pertinent due to the recent proliferation of UDP bulk transport tools 65 that sometimes fragment every datagram. Additionally, there is some 66 network equipment that ignores the Don't Fragment (DF) bit in the IP 67 header to work around MTU discovery problems [RFC2923]. This 68 equipment indirectly exposes properly implemented protocols and 69 applications to corrupt data. 71 2. Wrapping the IP ID Field 73 The Internet Protocol standard specifies: 75 "The choice of the Identifier for a datagram is based on the need 76 to provide a way to uniquely identify the fragments of a 77 particular datagram. The protocol module assembling fragments 78 judges fragments to belong to the same datagram if they have the 79 same source, destination, protocol, and Identifier. Thus, the 80 sender must choose the Identifier to be unique for this source, 81 destination pair and protocol for the time the datagram (or any 82 fragment of it) could be alive in the Internet." [RFC0791] 84 Strict conformance to this standard limits transmissions in one 85 direction between any address pair to no more than 65536 packets per 86 protocol (e.g. TCP, UDP or ICMP) per maximum packet lifetime. 88 Clearly not all hosts will follow this standard, because it implies 89 an unreasonably low maximum data rate. For example, a host sending 90 1500 byte packets with a 30 second maximum packet lifetime could send 91 at only about 26 Mbits/s before exceeding 65535 packets per packet 92 lifetime. Or, filling a 1 Gbit/s interface with 1500 byte packets 93 requires sending 65536 packets in less than 1 second, an unreasonably 94 short maximum packet lifetime, being less than the round-trip time on 95 some paths. This requirement is widely ignored. 97 IP receivers store fragments in a reassembly buffer until all 98 fragments in a datagram arrive, or until the reassembly timeout 99 expires (15 seconds is suggested in [RFC0791]). Fragments in a 100 datagram are associated with each other by the value in their ID 101 field, and by the source, destination address pair. If a sender 102 wraps the ID field in less than the reassembly timeout, it becomes 103 possible for fragments from different datagrams to be incorrectly 104 spliced together ("mis-associated"), and delivered to the upper layer 105 protocol. 107 A case of particular concern is when mis-association is self- 108 propagating. This occurs, for example, when there is reliable 109 ordering of packets and the first fragment of a datagram is lost in 110 the network. The rest of the fragments are stored in the fragment 111 reassembly buffer, and when the sender wraps the ID field, the first 112 fragment of the new datagram will be mis-associated with the rest of 113 the old datagram. The new datagram will be now be incomplete (since 114 it is missing its first fragment), so the rest of it will be saved in 115 the fragment reassembly buffer, forming a cycle that repeats every 116 65536 datagrams. It is possible to have a number of simultaneous 117 cycles, bounded by the size of the fragment reassembly buffer. 119 3. Harmful Effects of Mis-Associated Fragments 121 When the mis-associated fragments are delivered, transport-layer 122 checksumming should detect these datagrams as incorrect and discard 123 them. When the datagrams are discarded, it could pose a problem for 124 loss-feedback congestion control algorithms since there will be a 125 high number of non-congestion-related losses. 127 However, transport checksums may not be designed to handle such high 128 error rates, either. The TCP/UDP checksum is only 16 bits in length. 129 If these checksums follow a uniform random distribution, we expect 130 mis-associated datagrams to be accepted by the checksum at a rate of 131 one per 65536. With only one mis-association cycle, we expect 132 corrupt data delivered to the application layer once per 2^32 133 datagrams. This number can be significantly higher with multiple 134 cycles. 136 With non-random data, the TCP/UDP checksum may be even weaker still. 137 It is possible to construct datasets where mis-associated fragments 138 will always have the same checksum. Such a case may be considered 139 unlikely, but is worth considering. "Real" data may be more likely 140 than random data to cause checksum hot spots and increase the 141 probability of false checksum match [Stone98]. Also, some 142 applications may turn off checksumming to increase speed, though this 143 practice has been found to be dangerous for other reasons [Stone00]. 145 4. Experimental Observations 147 To test the practical impact of fragmentation on UDP, we ran a series 148 of experiments using a UDP bulk data transport protocol that was 149 designed to be used as an alternative to TCP for transporting large 150 data sets over specialized networks. The tool, Reliable Blast UDP 151 (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected 152 because it has a clean interface which facilitated automated 153 experiments. The decision to use RBUDP had little to do with the 154 details of the transport protocol itself. Any UDP transport protocol 155 that does not have additional means to detect corruption, and that 156 could be configured to use IP fragmentation, would have the same 157 results. 159 In order to diagnose corruption on files transferred with the UDP 160 bulk transfer tool, we used a file format that included embedded 161 sequence numbers and MD5 checksums in each fragment of each datagram. 162 Thus it was possible to distinguish random corruption from that 163 caused by mis-associated fragments. We used two different types of 164 files. One was constructed so that all the UDP checksums were 165 constant -- we will call this the "constant" dataset. The other was 166 constructed so that UDP checksums were uniformly random -- the 167 "random" dataset. All tests were done using 400 MB files. 169 The UDP bulk file transport tool was used to send the datasets 170 between a pair of hosts at slightly less than the available data rate 171 (100 Mbps). Near the beginning of each flow, a brief secondary flow 172 was started to induce packet loss in the primary flow. Throughout 173 the life of the primary flow, we typically observed mis-association 174 rates on the order of a few hundredths of a percent. 176 Tests run with the "constant" dataset resulted in corruption on all 177 mis-associated fragments, that is, corruption on the order of a few 178 hundredths of a percent. In sending approximately 10 TB of "random" 179 datasets, we observed 8847668 UDP checksum errors and 121 corruptions 180 of the data due to mis-associated fragments. 182 5. Implications 184 Most TCP implementations today participate in MTU discovery 185 [RFC1191], which will avoid the problems described in this note by 186 avoiding IP fragmentation altogether. However, as a work-around for 187 MTU discovery problems [RFC2923], some TCP implementations and 188 communications gear provide mechanisms to disable path MTU discovery 189 by clearing or ignoring the DF bit. Doing so will expose all 190 protocols using IPv4, even those which participate in MTU discovery, 191 to mis-association errors. 193 IPv6 is less vulnerable to this type of problem, since its fragment 194 header contains a 32-bit identification field [RFC2460]. Mis- 195 association will only be a problem at packet rates 65536 times higher 196 than for IPv4. 198 Since mis-association of fragments will only occur when the IP ID 199 field is wrapped within the fragment reassembly timeout, it may be 200 possible to reduce the timeout sufficiently so that mis-association 201 will not occur. However, there are a number of difficulties with 202 such an approach. Since the sender controls the rate of packets sent 203 and selection of IP ID, while the receiver controls the reassembly 204 timeout, there would need to be some mutual assurance between each 205 party as to participation in the scheme. Further, it is not 206 generally possible to set the timeout low enough so that a fast 207 sender's fragments will not be mis-associated, yet high enough so 208 that a slow sender's fragments will not be unconditionally discarded 209 before it is possible to reassemble them. So the timeout and IP ID 210 selection would need to be done on a per peer basis. Also, it is 211 likely NAT will break any per peer tables keyed by IP address. It is 212 not within the scope of this document to recommend solutions to these 213 problems. 215 Another means of solving the corruption issue is to add stronger 216 integrity checking, which can be done at any layer above IP. This is 217 a natural side effect of using cryptographic authentication. If 218 IPsec AH [RFC2402] is in use, the mis-associated fragments will be 219 discarded at the network layer with extremely high probability. Some 220 higher layers may use longer checksums (for example, SCTP's is 32 221 bits in length [RFC2960]) or cryptographic authentication (SSH 222 message authentication codes [RFC4251]). While stronger integrity 223 checking may prevent data corruption, it will not solve the problem 224 of a high effective loss rate. In the case of SSH, any stream 225 corruption results in immediate termination of the connection. 227 6. Security Considerations 229 If a malicious entity knows that a pair of hosts are communicating 230 using a fragmented stream, it may present an opportunity for this 231 entity to corrupt the flow. By sending "high" fragments (those with 232 offset greater than zero) with a forged source address, the attacker 233 can deliberately cause corruption as described above. Exploiting 234 this vulnerability requires only knowledge of the source and 235 destination addresses of the flow, and fragment boundaries. It does 236 not require knowledge of port or sequence numbers. 238 If the attacker has visibility of packets on the path, the attack 239 profile is similar to injecting full segments. Using this attack 240 makes blind disruptions easier, and could certainly be used 241 effectively to cause denial of service. However, only streams using 242 IPv4 fragmentation are vulnerable. Because of the nature of the 243 problems outlined in this draft, the use of IPv4 fragmentation for 244 critical applications may not be advisable regardless of security 245 concerns. 247 7. References 249 [Kent87] Kent, C. and J. Mogul, "Fragmentation considered harmful", 250 Proc. SIGCOMM '87 vol. 17, No. 5, October 1987. 252 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 253 RFC 2923, September 2000. 255 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 256 September 1981. 258 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 259 November 1990. 261 [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, 262 "Performance of Checksums and CRC's over Real Data", IEEE/ 263 ACM Transactions on Networking vol. 6, No. 5, 264 October 1998. 266 [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP Checksum 267 Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, 268 October 2000. 270 [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., 271 Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for 272 high performance data delivery over photonic networks", 273 Future Generation Computer Systems Vol. 19, No. 6, 274 August 2003. 276 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 277 (IPv6) Specification", RFC 2460, December 1998. 279 [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., 280 Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., 281 Zhang, L., and V. Paxson, "Stream Control Transmission 282 Protocol", RFC 2960, October 2000. 284 [RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", 285 RFC 2402, November 1998. 287 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 288 Protocol Architecture", RFC 4251, January 2006. 290 Appendix A. Acknowledgements 292 This work was supported by the National Science Foundation under 293 Grant No. 0083285. 295 Authors' Addresses 297 John W. Heffner 298 Pittsburgh Supercomputing Center 299 4400 Fifth Avenue 300 Pittsburgh, PA 15213 301 US 303 Phone: 412-268-2329 304 Email: jheffner@psc.edu 306 Matt Mathis 307 Pittsburgh Supercomputing Center 308 4400 Fifth Avenue 309 Pittsburgh, PA 15213 310 US 312 Phone: 412-268-3319 313 Email: mathis@psc.edu 315 Ben Chandler 316 Pittsburgh Supercomputing Center 317 4400 Fifth Avenue 318 Pittsburgh, PA 15213 319 US 321 Phone: 412-268-9783 322 Email: bchandle@psc.edu 324 Intellectual Property Statement 326 The IETF takes no position regarding the validity or scope of any 327 Intellectual Property Rights or other rights that might be claimed to 328 pertain to the implementation or use of the technology described in 329 this document or the extent to which any license under such rights 330 might or might not be available; nor does it represent that it has 331 made any independent effort to identify any such rights. Information 332 on the procedures with respect to rights in RFC documents can be 333 found in BCP 78 and BCP 79. 335 Copies of IPR disclosures made to the IETF Secretariat and any 336 assurances of licenses to be made available, or the result of an 337 attempt made to obtain a general license or permission for the use of 338 such proprietary rights by implementers or users of this 339 specification can be obtained from the IETF on-line IPR repository at 340 http://www.ietf.org/ipr. 342 The IETF invites any interested party to bring to its attention any 343 copyrights, patents or patent applications, or other proprietary 344 rights that may cover technology that may be required to implement 345 this standard. Please address the information to the IETF at 346 ietf-ipr@ietf.org. 348 Disclaimer of Validity 350 This document and the information contained herein are provided on an 351 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 352 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 353 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 354 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 355 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 356 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 358 Copyright Statement 360 Copyright (C) The Internet Society (2006). This document is subject 361 to the rights, licenses and restrictions contained in BCP 78, and 362 except as set forth therein, the authors retain all their rights. 364 Acknowledgment 366 Funding for the RFC Editor function is currently provided by the 367 Internet Society.