idnits 2.17.1 

draft-heffner-frag-harmful-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 379.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 356.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 363.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 369.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 1, 2006) is 6356 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 2402
     (Obsoleted by RFC 4302, RFC 4305)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Heffner
3	Internet-Draft                                                 M. Mathis
4	Expires: June 4, 2007                                        B. Chandler
5	                                                                     PSC
6	                                                        December 1, 2006

8	               IPv4 Fragmentation Considered Very Harmful
9	                     draft-heffner-frag-harmful-03

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on June 4, 2007.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   IPv4 fragmentation is not sufficiently robust for general use in
43	   today's Internet.  The 16-bit IP identification field is not large
44	   enough to prevent frequent incorrectly assembled IP fragments, and
45	   the TCP and UDP checksums are insufficient to prevent the resulting
46	   corrupted datagrams from being delivered to higher protocol layers.
47	   This note describes some easily reproduced experiments demonstrating
48	   the problem, and discusses some of the operational implications of
49	   these observations.

51	1.  Introduction

53	   The IPv4 header was designed at a time when data rates were several
54	   orders of magnitude lower than those achievable today.  This document
55	   describes a consequent scale-related failure in the IP identification
56	   (ID) field, where fragments may be incorrectly assembled at a rate
57	   high enough likely to invalidate assumptions about data integrity
58	   failure rates.

60	   That IP fragmentation results in inefficient use of the network has
61	   been well documented [Kent87].  This note presents a different kind
62	   of problem, which can result not only in significant performance
63	   degradation, but also frequent data corruption.  This is especially
64	   pertinent due to the recent proliferation of UDP bulk transport tools
65	   that sometimes fragment every datagram.

67	   Additionally, there is some network equipment that ignores the Don't
68	   Fragment (DF) bit in the IP header to work around MTU discovery
69	   problems [RFC2923].  This equipment indirectly exposes properly
70	   implemented protocols and applications to corrupt data.

72	2.  Wrapping the IP ID Field

74	   The Internet Protocol standard specifies:

76	      "The choice of the Identifier for a datagram is based on the need
77	      to provide a way to uniquely identify the fragments of a
78	      particular datagram.  The protocol module assembling fragments
79	      judges fragments to belong to the same datagram if they have the
80	      same source, destination, protocol, and Identifier.  Thus, the
81	      sender must choose the Identifier to be unique for this source,
82	      destination pair and protocol for the time the datagram (or any
83	      fragment of it) could be alive in the Internet."  [RFC0791]

85	   Strict conformance to this standard limits transmissions in one
86	   direction between any address pair to no more than 65536 packets per
87	   protocol (e.g.  TCP, UDP or ICMP) per maximum packet lifetime.

89	   Clearly not all hosts follow this standard, because it implies an
90	   unreasonably low maximum data rate.  For example, a host sending 1500
91	   byte packets with a 30 second maximum packet lifetime could send at
92	   only about 26 Mbits/s before exceeding 65535 packets per packet
93	   lifetime.  Or, filling a 1 Gbit/s interface with 1500 byte packets
94	   requires sending 65536 packets in less than 1 second, an unreasonably
95	   short maximum packet lifetime, being less than the round-trip time on
96	   some paths.  This requirement is widely ignored.

98	   IP receivers store fragments in a reassembly buffer until all
99	   fragments in a datagram arrive, or until the reassembly timeout
100	   expires (15 seconds is suggested in [RFC0791]).  Fragments in a
101	   datagram are associated with each other by their protocol number, the
102	   value in their ID field, and by the source, destination address pair.
103	   If a sender wraps the ID field in less than the reassembly timeout,
104	   it becomes possible for fragments from different datagrams to be
105	   incorrectly spliced together ("mis-associated"), and delivered to the
106	   upper layer protocol.

108	   A case of particular concern is when mis-association is self-
109	   propagating.  This occurs, for example, when there is reliable
110	   ordering of packets and the first fragment of a datagram is lost in
111	   the network.  The rest of the fragments are stored in the fragment
112	   reassembly buffer, and when the sender wraps the ID field, the first
113	   fragment of the new datagram will be mis-associated with the rest of
114	   the old datagram.  The new datagram will be now be incomplete (since
115	   it is missing its first fragment), so the rest of it will be saved in
116	   the fragment reassembly buffer, forming a cycle that repeats every
117	   65536 datagrams.  It is possible to have a number of simultaneous
118	   cycles, bounded by the size of the fragment reassembly buffer.

120	3.  Harmful Effects of Mis-Associated Fragments

122	   When the mis-associated fragments are delivered, transport-layer
123	   checksumming should detect these datagrams as incorrect and discard
124	   them.  When the datagrams are discarded, it could pose a problem for
125	   loss-feedback congestion control algorithms since there will be a
126	   high number of non-congestion-related losses.

128	   However, transport checksums may not be designed to handle such high
129	   error rates, either.  The TCP/UDP checksum is only 16 bits in length.
130	   If these checksums follow a uniform random distribution, we expect
131	   mis-associated datagrams to be accepted by the checksum at a rate of
132	   one per 65536.  With only one mis-association cycle, we expect
133	   corrupt data delivered to the application layer once per 2^32
134	   datagrams.  This number can be significantly higher with multiple
135	   cycles.

137	   With non-random data, the TCP/UDP checksum may be even weaker still.
138	   It is possible to construct datasets where mis-associated fragments
139	   will always have the same checksum.  Such a case may be considered
140	   unlikely, but is worth considering.  "Real" data may be more likely
141	   than random data to cause checksum hot spots and increase the
142	   probability of false checksum match [Stone98].  Also, some
143	   applications or higher-level protocols may turn off checksumming to
144	   increase speed, though this practice has been found to be dangerous
145	   for other reasons when data reliability is important [Stone00].

147	4.  Experimental Observations

149	   To test the practical impact of fragmentation on UDP, we ran a series
150	   of experiments using a UDP bulk data transport protocol that was
151	   designed to be used as an alternative to TCP for transporting large
152	   data sets over specialized networks.  The tool, Reliable Blast UDP
153	   (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected
154	   because it has a clean interface which facilitated automated
155	   experiments.  The decision to use RBUDP had little to do with the
156	   details of the transport protocol itself.  Any UDP transport protocol
157	   that does not have additional means to detect corruption, and that
158	   could be configured to use IP fragmentation, would have the same
159	   results.

161	   In order to diagnose corruption on files transferred with the UDP
162	   bulk transfer tool, we used a file format that included embedded
163	   sequence numbers and MD5 checksums in each fragment of each datagram.
164	   Thus it was possible to distinguish random corruption from that
165	   caused by mis-associated fragments.  We used two different types of
166	   files.  One was constructed so that all the UDP checksums were
167	   constant -- we will call this the "constant" dataset.  The other was
168	   constructed so that UDP checksums were uniformly random -- the
169	   "random" dataset.  All tests were done using 400 MB files.

171	   The UDP bulk file transport tool was used to send the datasets
172	   between a pair of hosts at slightly less than the available data rate
173	   (100 Mbps).  Near the beginning of each flow, a brief secondary flow
174	   was started to induce packet loss in the primary flow.  Throughout
175	   the life of the primary flow, we typically observed mis-association
176	   rates on the order of a few hundredths of a percent.

178	   Tests run with the "constant" dataset resulted in corruption on all
179	   mis-associated fragments, that is, corruption on the order of a few
180	   hundredths of a percent.  In sending approximately 10 TB of "random"
181	   datasets, we observed 8847668 UDP checksum errors and 121 corruptions
182	   of the data due to mis-associated fragments.

184	5.  Implications

186	   Most TCP implementations today participate in MTU discovery
187	   [RFC1191], which will avoid the problems described in this note by
188	   avoiding IP fragmentation altogether.  However, as a work-around for
189	   MTU discovery problems [RFC2923], some TCP implementations and
190	   communications gear provide mechanisms to disable path MTU discovery
191	   by clearing or ignoring the DF bit.  Doing so will expose all
192	   protocols using IPv4, even those that participate in MTU discovery,
193	   to mis-association errors.

195	   A case particularly worth noting is that of tunnels encapsulating
196	   payload in IPv4.  To deal with difficulties in MTU Discovery
197	   [RFC4459], tunnels may rely on fragmentation between the two
198	   endpoints, even if the payload is marked with a DF bit [RFC4301].  In
199	   such a mode, the two tunnel endpoints behave as IP end hosts, with
200	   all tunneled traffic having the same protocol type.  Thus, the
201	   aggregate rate of tunneled packets may not exceed 65536 per maximum
202	   packet lifetime, or tunneled data becomes exposed to possible mis-
203	   association.  Even protocols doing MTU discovery such as TCP will be
204	   affected.

206	   IPv6 is less vulnerable to this type of problem, since its fragment
207	   header contains a 32-bit identification field [RFC2460].  Mis-
208	   association will only be a problem at packet rates 65536 times higher
209	   than for IPv4.

211	   Since mis-association of fragments will only occur when the IP ID
212	   field is wrapped within the fragment reassembly timeout, it may be
213	   possible to reduce the timeout sufficiently so that mis-association
214	   will not occur.  However, there are a number of difficulties with
215	   such an approach.  Since the sender controls the rate of packets sent
216	   and selection of IP ID, while the receiver controls the reassembly
217	   timeout, there would need to be some mutual assurance between each
218	   party as to participation in the scheme.  Further, it is not
219	   generally possible to set the timeout low enough so that a fast
220	   sender's fragments will not be mis-associated, yet high enough so
221	   that a slow sender's fragments will not be unconditionally discarded
222	   before it is possible to reassemble them.  So the timeout and IP ID
223	   selection would need to be done on a per peer basis.  Also, it is
224	   likely NAT will break any per peer tables keyed by IP address.  It is
225	   not within the scope of this document to recommend solutions to these
226	   problems.

228	   Another means of solving the corruption issue is to add stronger
229	   integrity checking, which can be done at any layer above IP.  This is
230	   a natural side effect of using cryptographic authentication.  If
231	   IPsec AH [RFC2402] is in use, the mis-associated fragments will be
232	   discarded at the network layer with extremely high probability.  Some
233	   higher layers may use longer checksums (for example, SCTP's is 32
234	   bits in length [RFC2960]) or cryptographic authentication (SSH
235	   message authentication codes [RFC4251]).  While stronger integrity
236	   checking may prevent data corruption, it will not solve the problem
237	   of a high effective loss rate.  In the case of SSH, any stream
238	   corruption results in immediate termination of the connection.

240	6.  Security Considerations

242	   If a malicious entity knows that a pair of hosts are communicating
243	   using a fragmented stream, it may present an opportunity for this
244	   entity to corrupt the flow.  By sending "high" fragments (those with
245	   offset greater than zero) with a forged source address, the attacker
246	   can deliberately cause corruption as described above.  Exploiting
247	   this vulnerability requires only knowledge of the source and
248	   destination addresses of the flow, its protocol number, and fragment
249	   boundaries.  It does not require knowledge of port or sequence
250	   numbers.

252	   If the attacker has visibility of packets on the path, the attack
253	   profile is similar to injecting full segments.  Using this attack
254	   makes blind disruptions easier, and could likely be used to cause
255	   denial of service.  However, only streams using IPv4 fragmentation
256	   are vulnerable.  Because of the nature of the problems outlined in
257	   this draft, the use of IPv4 fragmentation for critical applications
258	   may not be advisable regardless of security concerns.

260	7.  IANA Considerations

262	   None.

264	8.  Informative References

266	   [Kent87]   Kent, C. and J. Mogul, "Fragmentation considered harmful",
267	              Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

269	   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
270	              RFC 2923, September 2000.

272	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
273	              September 1981.

275	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
276	              November 1990.

278	   [Stone98]  Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
279	              "Performance of Checksums and CRC's over Real Data", IEEE/
280	              ACM Transactions on Networking vol. 6, No. 5,
281	              October 1998.

283	   [Stone00]  Stone, J. and C. Partridge, "When The CRC and TCP Checksum
284	              Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4,
285	              October 2000.

287	   [QUANTA]   He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N.,
288	              Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for
289	              high performance data delivery over photonic networks",
290	              Future Generation Computer Systems Vol. 19, No. 6,
291	              August 2003.

293	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
294	              (IPv6) Specification", RFC 2460, December 1998.

296	   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
297	              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
298	              Zhang, L., and V. Paxson, "Stream Control Transmission
299	              Protocol", RFC 2960, October 2000.

301	   [RFC2402]  Kent, S. and R. Atkinson, "IP Authentication Header",
302	              RFC 2402, November 1998.

304	   [RFC4251]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
305	              Protocol Architecture", RFC 4251, January 2006.

307	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
308	              Internet Protocol", RFC 4301, December 2005.

310	   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
311	              Network Tunneling", RFC 4459, April 2006.

313	Appendix A.  Acknowledgements

315	   This work was supported by the National Science Foundation under
316	   Grant No. 0083285.

318	Authors' Addresses

320	   John W. Heffner
321	   Pittsburgh Supercomputing Center
322	   4400 Fifth Avenue
323	   Pittsburgh, PA  15213
324	   US

326	   Phone: 412-268-2329
327	   Email: jheffner@psc.edu

329	   Matt Mathis
330	   Pittsburgh Supercomputing Center
331	   4400 Fifth Avenue
332	   Pittsburgh, PA  15213
333	   US

335	   Phone: 412-268-3319
336	   Email: mathis@psc.edu

338	   Ben Chandler
339	   Pittsburgh Supercomputing Center
340	   4400 Fifth Avenue
341	   Pittsburgh, PA  15213
342	   US

344	   Phone: 412-268-9783
345	   Email: bchandle@psc.edu

347	Intellectual Property Statement

349	   The IETF takes no position regarding the validity or scope of any
350	   Intellectual Property Rights or other rights that might be claimed to
351	   pertain to the implementation or use of the technology described in
352	   this document or the extent to which any license under such rights
353	   might or might not be available; nor does it represent that it has
354	   made any independent effort to identify any such rights.  Information
355	   on the procedures with respect to rights in RFC documents can be
356	   found in BCP 78 and BCP 79.

358	   Copies of IPR disclosures made to the IETF Secretariat and any
359	   assurances of licenses to be made available, or the result of an
360	   attempt made to obtain a general license or permission for the use of
361	   such proprietary rights by implementers or users of this
362	   specification can be obtained from the IETF on-line IPR repository at
363	   http://www.ietf.org/ipr.

365	   The IETF invites any interested party to bring to its attention any
366	   copyrights, patents or patent applications, or other proprietary
367	   rights that may cover technology that may be required to implement
368	   this standard.  Please address the information to the IETF at
369	   ietf-ipr@ietf.org.

371	Disclaimer of Validity

373	   This document and the information contained herein are provided on an
374	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
375	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
376	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
377	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
378	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
379	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

381	Copyright Statement

383	   Copyright (C) The Internet Society (2006).  This document is subject
384	   to the rights, licenses and restrictions contained in BCP 78, and
385	   except as set forth therein, the authors retain all their rights.

387	Acknowledgment

389	   Funding for the RFC Editor function is currently provided by the
390	   Internet Society.