idnits 2.17.1 

draft-heffner-frag-harmful-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 356.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 333.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 340.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 346.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 21, 2006) is 6579 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Kent87'

  ** Downref: Normative reference to an Informational RFC: RFC 2923

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Stone98'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Stone00'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'QUANTA'

  ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200)

  ** Obsolete normative reference: RFC 2960 (Obsoleted by RFC 4960)

  ** Obsolete normative reference: RFC 2402 (Obsoleted by RFC 4302, RFC 4305)


     Summary: 9 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Heffner
3	Internet-Draft                                                 M. Mathis
4	Expires: October 23, 2006                                    B. Chandler
5	                                                                     PSC
6	                                                          April 21, 2006

8	                 Fragmentation Considered Very Harmful
9	                     draft-heffner-frag-harmful-01

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on October 23, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   IPv4 fragmentation is not sufficiently robust for general use in
43	   today's Internet.  The 16-bit IP identification field is not large
44	   enough to prevent frequent incorrectly assembled IP fragments, and
45	   the TCP and UDP checksums are insufficient to prevent the resulting
46	   corrupted datagrams from being delivered to higher protocol layers.
47	   This note describes some easily reproduced experiments demonstrating
48	   the problem, and discusses some of the operational implications of
49	   these observations.

51	1.  Introduction

53	   The IPv4 header was designed at a time when data rates were several
54	   orders of magnitude lower than those achievable today.  This document
55	   describes a consequent scale-related failure in the IP identification
56	   (ID) field, where fragments may be incorrectly assembled at a rate
57	   high enough likely to invalidate assumptions about data integrity
58	   failure rates.

60	   That IP fragmentation results in inefficient use of the network has
61	   been well documented [Kent87].  This note presents a different kind
62	   of problem, which can result not only in significant performance
63	   degradation, but also frequent data corruption.  This is especially
64	   pertinent due to the recent proliferation of UDP bulk transport tools
65	   that sometimes fragment every datagram.  Additionally, there is some
66	   network equipment that ignores the Don't Fragment (DF) bit in the IP
67	   header to work around MTU discovery problems [RFC2923].  This
68	   equipment indirectly exposes properly implemented protocols and
69	   applications to corrupt data.

71	2.  Wrapping the IP ID Field

73	   The Internet Protocol standard specifies:

75	      "The choice of the Identifier for a datagram is based on the need
76	      to provide a way to uniquely identify the fragments of a
77	      particular datagram.  The protocol module assembling fragments
78	      judges fragments to belong to the same datagram if they have the
79	      same source, destination, protocol, and Identifier.  Thus, the
80	      sender must choose the Identifier to be unique for this source,
81	      destination pair and protocol for the time the datagram (or any
82	      fragment of it) could be alive in the Internet."  [RFC0791]

84	   Strict conformance to this standard limits transmissions in one
85	   direction between any address pair to no more than 65536 packets per
86	   protocol (e.g.  TCP, UDP or ICMP) per maximum packet lifetime.

88	   Clearly not all hosts will follow this standard, because it implies
89	   an unreasonably low maximum data rate.  For example, a host sending
90	   1500 byte packets with a 30 second maximum packet lifetime could send
91	   at only about 26 Mbits/s before exceeding 65535 packets per packet
92	   lifetime.  Or, filling a 1 Gbit/s interface with 1500 byte packets
93	   requires sending 65536 packets in less than 1 second, an unreasonably
94	   short maximum packet lifetime, being less than the round-trip time on
95	   some paths.  This requirement is widely ignored.

97	   IP receivers store fragments in a reassembly buffer until all
98	   fragments in a datagram arrive, or until the reassembly timeout
99	   expires (15 seconds is suggested in [RFC0791]).  Fragments in a
100	   datagram are associated with each other by the value in their ID
101	   field, and by the source, destination address pair.  If a sender
102	   wraps the ID field in less than the reassembly timeout, it becomes
103	   possible for fragments from different datagrams to be incorrectly
104	   spliced together ("mis-associated"), and delivered to the upper layer
105	   protocol.

107	   A case of particular concern is when mis-association is self-
108	   propagating.  This occurs, for example, when there is reliable
109	   ordering of packets and the first fragment of a datagram is lost in
110	   the network.  The rest of the fragments are stored in the fragment
111	   reassembly buffer, and when the sender wraps the ID field, the first
112	   fragment of the new datagram will be mis-associated with the rest of
113	   the old datagram.  The new datagram will be now be incomplete (since
114	   it is missing its first fragment), so the rest of it will be saved in
115	   the fragment reassembly buffer, forming a cycle that repeats every
116	   65536 datagrams.  It is possible to have a number of simultaneous
117	   cycles, bounded by the size of the fragment reassembly buffer.

119	3.  Harmful Effects of Mis-Associated Fragments

121	   When the mis-associated fragments are delivered, transport-layer
122	   checksumming should detect these datagrams as incorrect and discard
123	   them.  When the datagrams are discarded, it could pose a problem for
124	   loss-feedback congestion control algorithms since there will be a
125	   high number of non-congestion-related losses.

127	   However, transport checksums may not be designed to handle such high
128	   error rates, either.  The TCP/UDP checksum is only 16 bits in length.
129	   If these checksums follow a uniform random distribution, we expect
130	   mis-associated datagrams to be accepted by the checksum at a rate of
131	   one per 65536.  With only one mis-association cycle, we expect
132	   corrupt data delivered to the application layer once per 2^32
133	   datagrams.  This number can be significantly higher with multiple
134	   cycles.

136	   With non-random data, the TCP/UDP checksum may be even weaker still.
137	   It is possible to construct datasets where mis-associated fragments
138	   will always have the same checksum.  Such a case may be considered
139	   unlikely, but is worth considering.  "Real" data may be more likely
140	   than random data to cause checksum hot spots and increase the
141	   probability of false checksum match [Stone98].  Also, some
142	   applications may turn off checksumming to increase speed, though this
143	   practice has been found to be dangerous for other reasons [Stone00].

145	4.  Experimental Observations

147	   To test the practical impact of fragmentation on UDP, we ran a series
148	   of experiments using a UDP bulk data transport protocol that was
149	   designed to be used as an alternative to TCP for transporting large
150	   data sets over specialized networks.  The tool, Reliable Blast UDP
151	   (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected
152	   because it has a clean interface which facilitated automated
153	   experiments.  The decision to use RBUDP had little to do with the
154	   details of the transport protocol itself.  Any UDP transport protocol
155	   that does not have additional means to detect corruption, and that
156	   could be configured to use IP fragmentation, would have the same
157	   results.

159	   In order to diagnose corruption on files transferred with the UDP
160	   bulk transfer tool, we used a file format that included embedded
161	   sequence numbers and MD5 checksums in each fragment of each datagram.
162	   Thus it was possible to distinguish random corruption from that
163	   caused by mis-associated fragments.  We used two different types of
164	   files.  One was constructed so that all the UDP checksums were
165	   constant -- we will call this the "constant" dataset.  The other was
166	   constructed so that UDP checksums were uniformly random -- the
167	   "random" dataset.  All tests were done using 400 MB files.

169	   The UDP bulk file transport tool was used to send the datasets
170	   between a pair of hosts at slightly less than the available data rate
171	   (100 Mbps).  Near the beginning of each flow, a brief secondary flow
172	   was started to induce packet loss in the primary flow.  Throughout
173	   the life of the primary flow, we typically observed mis-association
174	   rates on the order of a few hundredths of a percent.

176	   Tests run with the "constant" dataset resulted in corruption on all
177	   mis-associated fragments, that is, corruption on the order of a few
178	   hundredths of a percent.  In sending approximately 10 TB of "random"
179	   datasets, we observed 8847668 UDP checksum errors and 121 corruptions
180	   of the data due to mis-associated fragments.

182	5.  Implications

184	   Most TCP implementations today participate in MTU discovery
185	   [RFC1191], which will avoid the problems described in this note by
186	   avoiding IP fragmentation altogether.  However, as a work-around for
187	   MTU discovery problems [RFC2923], some TCP implementations and
188	   communications gear provide mechanisms to disable path MTU discovery
189	   by clearing or ignoring the DF bit.  Doing so will expose all
190	   protocols using IPv4, even those which participate in MTU discovery,
191	   to mis-association errors.

193	   IPv6 is less vulnerable to this type of problem, since its fragment
194	   header contains a 32-bit identification field [RFC2460].  Mis-
195	   association will only be a problem at packet rates 65536 times higher
196	   than for IPv4.

198	   Since mis-association of fragments will only occur when the IP ID
199	   field is wrapped within the fragment reassembly timeout, it may be
200	   possible to reduce the timeout sufficiently so that mis-association
201	   will not occur.  However, there are a number of difficulties with
202	   such an approach.  Since the sender controls the rate of packets sent
203	   and selection of IP ID, while the receiver controls the reassembly
204	   timeout, there would need to be some mutual assurance between each
205	   party as to participation in the scheme.  Further, it is not
206	   generally possible to set the timeout low enough so that a fast
207	   sender's fragments will not be mis-associated, yet high enough so
208	   that a slow sender's fragments will not be unconditionally discarded
209	   before it is possible to reassemble them.  So the timeout and IP ID
210	   selection would need to be done on a per peer basis.  Also, it is
211	   likely NAT will break any per peer tables keyed by IP address.  It is
212	   not within the scope of this document to recommend solutions to these
213	   problems.

215	   Another means of solving the corruption issue is to add stronger
216	   integrity checking, which can be done at any layer above IP.  This is
217	   a natural side effect of using cryptographic authentication.  If
218	   IPsec AH [RFC2402] is in use, the mis-associated fragments will be
219	   discarded at the network layer with extremely high probability.  Some
220	   higher layers may use longer checksums (for example, SCTP's is 32
221	   bits in length [RFC2960]) or cryptographic authentication (SSH
222	   message authentication codes [RFC4251]).  While stronger integrity
223	   checking may prevent data corruption, it will not solve the problem
224	   of a high effective loss rate.  In the case of SSH, any stream
225	   corruption results in immediate termination of the connection.

227	6.  Security Considerations

229	   If a malicious entity knows that a pair of hosts are communicating
230	   using a fragmented stream, it may present an opportunity for this
231	   entity to corrupt the flow.  By sending "high" fragments (those with
232	   offset greater than zero) with a forged source address, the attacker
233	   can deliberately cause corruption as described above.  Exploiting
234	   this vulnerability requires only knowledge of the source and
235	   destination addresses of the flow, and fragment boundaries.  It does
236	   not require knowledge of port or sequence numbers.

238	   If the attacker has visibility of packets on the path, the attack
239	   profile is similar to injecting full segments.  Using this attack
240	   makes blind disruptions easier, and could certainly be used
241	   effectively to cause denial of service.  However, only streams using
242	   IPv4 fragmentation are vulnerable.  Because of the nature of the
243	   problems outlined in this draft, the use of IPv4 fragmentation for
244	   critical applications may not be advisable regardless of security
245	   concerns.

247	7.  References

249	   [Kent87]   Kent, C. and J. Mogul, "Fragmentation considered harmful",
250	              Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

252	   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
253	              RFC 2923, September 2000.

255	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
256	              September 1981.

258	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
259	              November 1990.

261	   [Stone98]  Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
262	              "Performance of Checksums and CRC's over Real Data", IEEE/
263	              ACM Transactions on Networking vol. 6, No. 5,
264	              October 1998.

266	   [Stone00]  Stone, J. and C. Partridge, "When The CRC and TCP Checksum
267	              Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4,
268	              October 2000.

270	   [QUANTA]   He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N.,
271	              Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for
272	              high performance data delivery over photonic networks",
273	              Future Generation Computer Systems Vol. 19, No. 6,
274	              August 2003.

276	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
277	              (IPv6) Specification", RFC 2460, December 1998.

279	   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
280	              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
281	              Zhang, L., and V. Paxson, "Stream Control Transmission
282	              Protocol", RFC 2960, October 2000.

284	   [RFC2402]  Kent, S. and R. Atkinson, "IP Authentication Header",
285	              RFC 2402, November 1998.

287	   [RFC4251]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
288	              Protocol Architecture", RFC 4251, January 2006.

290	Appendix A.  Acknowledgements

292	   This work was supported by the National Science Foundation under
293	   Grant No. 0083285.

295	Authors' Addresses

297	   John W. Heffner
298	   Pittsburgh Supercomputing Center
299	   4400 Fifth Avenue
300	   Pittsburgh, PA  15213
301	   US

303	   Phone: 412-268-2329
304	   Email: jheffner@psc.edu

306	   Matt Mathis
307	   Pittsburgh Supercomputing Center
308	   4400 Fifth Avenue
309	   Pittsburgh, PA  15213
310	   US

312	   Phone: 412-268-3319
313	   Email: mathis@psc.edu

315	   Ben Chandler
316	   Pittsburgh Supercomputing Center
317	   4400 Fifth Avenue
318	   Pittsburgh, PA  15213
319	   US

321	   Phone: 412-268-9783
322	   Email: bchandle@psc.edu

324	Intellectual Property Statement

326	   The IETF takes no position regarding the validity or scope of any
327	   Intellectual Property Rights or other rights that might be claimed to
328	   pertain to the implementation or use of the technology described in
329	   this document or the extent to which any license under such rights
330	   might or might not be available; nor does it represent that it has
331	   made any independent effort to identify any such rights.  Information
332	   on the procedures with respect to rights in RFC documents can be
333	   found in BCP 78 and BCP 79.

335	   Copies of IPR disclosures made to the IETF Secretariat and any
336	   assurances of licenses to be made available, or the result of an
337	   attempt made to obtain a general license or permission for the use of
338	   such proprietary rights by implementers or users of this
339	   specification can be obtained from the IETF on-line IPR repository at
340	   http://www.ietf.org/ipr.

342	   The IETF invites any interested party to bring to its attention any
343	   copyrights, patents or patent applications, or other proprietary
344	   rights that may cover technology that may be required to implement
345	   this standard.  Please address the information to the IETF at
346	   ietf-ipr@ietf.org.

348	Disclaimer of Validity

350	   This document and the information contained herein are provided on an
351	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
352	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
353	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
354	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
355	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
356	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

358	Copyright Statement

360	   Copyright (C) The Internet Society (2006).  This document is subject
361	   to the rights, licenses and restrictions contained in BCP 78, and
362	   except as set forth therein, the authors retain all their rights.

364	Acknowledgment

366	   Funding for the RFC Editor function is currently provided by the
367	   Internet Society.