idnits 2.17.1 

draft-heffner-frag-harmful-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 363.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 340.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 347.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 353.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 22, 2006) is 6518 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 2402
     (Obsoleted by RFC 4302, RFC 4305)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Heffner
3	Internet-Draft                                                 M. Mathis
4	Expires: December 24, 2006                                   B. Chandler
5	                                                                     PSC
6	                                                           June 22, 2006

8	                 Fragmentation Considered Very Harmful
9	                     draft-heffner-frag-harmful-02

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on December 24, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   IPv4 fragmentation is not sufficiently robust for general use in
43	   today's Internet.  The 16-bit IP identification field is not large
44	   enough to prevent frequent incorrectly assembled IP fragments, and
45	   the TCP and UDP checksums are insufficient to prevent the resulting
46	   corrupted datagrams from being delivered to higher protocol layers.
47	   This note describes some easily reproduced experiments demonstrating
48	   the problem, and discusses some of the operational implications of
49	   these observations.

51	1.  Introduction

53	   The IPv4 header was designed at a time when data rates were several
54	   orders of magnitude lower than those achievable today.  This document
55	   describes a consequent scale-related failure in the IP identification
56	   (ID) field, where fragments may be incorrectly assembled at a rate
57	   high enough likely to invalidate assumptions about data integrity
58	   failure rates.

60	   That IP fragmentation results in inefficient use of the network has
61	   been well documented [Kent87].  This note presents a different kind
62	   of problem, which can result not only in significant performance
63	   degradation, but also frequent data corruption.  This is especially
64	   pertinent due to the recent proliferation of UDP bulk transport tools
65	   that sometimes fragment every datagram.

67	   Additionally, there is some network equipment that ignores the Don't
68	   Fragment (DF) bit in the IP header to work around MTU discovery
69	   problems [RFC2923].  This equipment indirectly exposes properly
70	   implemented protocols and applications to corrupt data.

72	2.  Wrapping the IP ID Field

74	   The Internet Protocol standard specifies:

76	      "The choice of the Identifier for a datagram is based on the need
77	      to provide a way to uniquely identify the fragments of a
78	      particular datagram.  The protocol module assembling fragments
79	      judges fragments to belong to the same datagram if they have the
80	      same source, destination, protocol, and Identifier.  Thus, the
81	      sender must choose the Identifier to be unique for this source,
82	      destination pair and protocol for the time the datagram (or any
83	      fragment of it) could be alive in the Internet."  [RFC0791]

85	   Strict conformance to this standard limits transmissions in one
86	   direction between any address pair to no more than 65536 packets per
87	   protocol (e.g.  TCP, UDP or ICMP) per maximum packet lifetime.

89	   Clearly not all hosts will follow this standard, because it implies
90	   an unreasonably low maximum data rate.  For example, a host sending
91	   1500 byte packets with a 30 second maximum packet lifetime could send
92	   at only about 26 Mbits/s before exceeding 65535 packets per packet
93	   lifetime.  Or, filling a 1 Gbit/s interface with 1500 byte packets
94	   requires sending 65536 packets in less than 1 second, an unreasonably
95	   short maximum packet lifetime, being less than the round-trip time on
96	   some paths.  This requirement is widely ignored.

98	   IP receivers store fragments in a reassembly buffer until all
99	   fragments in a datagram arrive, or until the reassembly timeout
100	   expires (15 seconds is suggested in [RFC0791]).  Fragments in a
101	   datagram are associated with each other by the value in their ID
102	   field, and by the source, destination address pair.  If a sender
103	   wraps the ID field in less than the reassembly timeout, it becomes
104	   possible for fragments from different datagrams to be incorrectly
105	   spliced together ("mis-associated"), and delivered to the upper layer
106	   protocol.

108	   A case of particular concern is when mis-association is self-
109	   propagating.  This occurs, for example, when there is reliable
110	   ordering of packets and the first fragment of a datagram is lost in
111	   the network.  The rest of the fragments are stored in the fragment
112	   reassembly buffer, and when the sender wraps the ID field, the first
113	   fragment of the new datagram will be mis-associated with the rest of
114	   the old datagram.  The new datagram will be now be incomplete (since
115	   it is missing its first fragment), so the rest of it will be saved in
116	   the fragment reassembly buffer, forming a cycle that repeats every
117	   65536 datagrams.  It is possible to have a number of simultaneous
118	   cycles, bounded by the size of the fragment reassembly buffer.

120	3.  Harmful Effects of Mis-Associated Fragments

122	   When the mis-associated fragments are delivered, transport-layer
123	   checksumming should detect these datagrams as incorrect and discard
124	   them.  When the datagrams are discarded, it could pose a problem for
125	   loss-feedback congestion control algorithms since there will be a
126	   high number of non-congestion-related losses.

128	   However, transport checksums may not be designed to handle such high
129	   error rates, either.  The TCP/UDP checksum is only 16 bits in length.
130	   If these checksums follow a uniform random distribution, we expect
131	   mis-associated datagrams to be accepted by the checksum at a rate of
132	   one per 65536.  With only one mis-association cycle, we expect
133	   corrupt data delivered to the application layer once per 2^32
134	   datagrams.  This number can be significantly higher with multiple
135	   cycles.

137	   With non-random data, the TCP/UDP checksum may be even weaker still.
138	   It is possible to construct datasets where mis-associated fragments
139	   will always have the same checksum.  Such a case may be considered
140	   unlikely, but is worth considering.  "Real" data may be more likely
141	   than random data to cause checksum hot spots and increase the
142	   probability of false checksum match [Stone98].  Also, some
143	   applications or higher-level protocols may turn off checksumming to
144	   increase speed, though this practice has been found to be dangerous
145	   for other reasons when data reliability is important [Stone00].

147	4.  Experimental Observations

149	   To test the practical impact of fragmentation on UDP, we ran a series
150	   of experiments using a UDP bulk data transport protocol that was
151	   designed to be used as an alternative to TCP for transporting large
152	   data sets over specialized networks.  The tool, Reliable Blast UDP
153	   (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected
154	   because it has a clean interface which facilitated automated
155	   experiments.  The decision to use RBUDP had little to do with the
156	   details of the transport protocol itself.  Any UDP transport protocol
157	   that does not have additional means to detect corruption, and that
158	   could be configured to use IP fragmentation, would have the same
159	   results.

161	   In order to diagnose corruption on files transferred with the UDP
162	   bulk transfer tool, we used a file format that included embedded
163	   sequence numbers and MD5 checksums in each fragment of each datagram.
164	   Thus it was possible to distinguish random corruption from that
165	   caused by mis-associated fragments.  We used two different types of
166	   files.  One was constructed so that all the UDP checksums were
167	   constant -- we will call this the "constant" dataset.  The other was
168	   constructed so that UDP checksums were uniformly random -- the
169	   "random" dataset.  All tests were done using 400 MB files.

171	   The UDP bulk file transport tool was used to send the datasets
172	   between a pair of hosts at slightly less than the available data rate
173	   (100 Mbps).  Near the beginning of each flow, a brief secondary flow
174	   was started to induce packet loss in the primary flow.  Throughout
175	   the life of the primary flow, we typically observed mis-association
176	   rates on the order of a few hundredths of a percent.

178	   Tests run with the "constant" dataset resulted in corruption on all
179	   mis-associated fragments, that is, corruption on the order of a few
180	   hundredths of a percent.  In sending approximately 10 TB of "random"
181	   datasets, we observed 8847668 UDP checksum errors and 121 corruptions
182	   of the data due to mis-associated fragments.

184	5.  Implications

186	   Most TCP implementations today participate in MTU discovery
187	   [RFC1191], which will avoid the problems described in this note by
188	   avoiding IP fragmentation altogether.  However, as a work-around for
189	   MTU discovery problems [RFC2923], some TCP implementations and
190	   communications gear provide mechanisms to disable path MTU discovery
191	   by clearing or ignoring the DF bit.  Doing so will expose all
192	   protocols using IPv4, even those which participate in MTU discovery,
193	   to mis-association errors.

195	   IPv6 is less vulnerable to this type of problem, since its fragment
196	   header contains a 32-bit identification field [RFC2460].  Mis-
197	   association will only be a problem at packet rates 65536 times higher
198	   than for IPv4.

200	   Since mis-association of fragments will only occur when the IP ID
201	   field is wrapped within the fragment reassembly timeout, it may be
202	   possible to reduce the timeout sufficiently so that mis-association
203	   will not occur.  However, there are a number of difficulties with
204	   such an approach.  Since the sender controls the rate of packets sent
205	   and selection of IP ID, while the receiver controls the reassembly
206	   timeout, there would need to be some mutual assurance between each
207	   party as to participation in the scheme.  Further, it is not
208	   generally possible to set the timeout low enough so that a fast
209	   sender's fragments will not be mis-associated, yet high enough so
210	   that a slow sender's fragments will not be unconditionally discarded
211	   before it is possible to reassemble them.  So the timeout and IP ID
212	   selection would need to be done on a per peer basis.  Also, it is
213	   likely NAT will break any per peer tables keyed by IP address.  It is
214	   not within the scope of this document to recommend solutions to these
215	   problems.

217	   Another means of solving the corruption issue is to add stronger
218	   integrity checking, which can be done at any layer above IP.  This is
219	   a natural side effect of using cryptographic authentication.  If
220	   IPsec AH [RFC2402] is in use, the mis-associated fragments will be
221	   discarded at the network layer with extremely high probability.  Some
222	   higher layers may use longer checksums (for example, SCTP's is 32
223	   bits in length [RFC2960]) or cryptographic authentication (SSH
224	   message authentication codes [RFC4251]).  While stronger integrity
225	   checking may prevent data corruption, it will not solve the problem
226	   of a high effective loss rate.  In the case of SSH, any stream
227	   corruption results in immediate termination of the connection.

229	6.  Security Considerations

231	   If a malicious entity knows that a pair of hosts are communicating
232	   using a fragmented stream, it may present an opportunity for this
233	   entity to corrupt the flow.  By sending "high" fragments (those with
234	   offset greater than zero) with a forged source address, the attacker
235	   can deliberately cause corruption as described above.  Exploiting
236	   this vulnerability requires only knowledge of the source and
237	   destination addresses of the flow, its protocol number, and fragment
238	   boundaries.  It does not require knowledge of port or sequence
239	   numbers.

241	   If the attacker has visibility of packets on the path, the attack
242	   profile is similar to injecting full segments.  Using this attack
243	   makes blind disruptions easier, and could certainly be used
244	   effectively to cause denial of service.  However, only streams using
245	   IPv4 fragmentation are vulnerable.  Because of the nature of the
246	   problems outlined in this draft, the use of IPv4 fragmentation for
247	   critical applications may not be advisable regardless of security
248	   concerns.

250	7.  IANA Considerations

252	   None.

254	8.  Informative References

256	   [Kent87]   Kent, C. and J. Mogul, "Fragmentation considered harmful",
257	              Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

259	   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
260	              RFC 2923, September 2000.

262	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
263	              September 1981.

265	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
266	              November 1990.

268	   [Stone98]  Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
269	              "Performance of Checksums and CRC's over Real Data", IEEE/
270	              ACM Transactions on Networking vol. 6, No. 5,
271	              October 1998.

273	   [Stone00]  Stone, J. and C. Partridge, "When The CRC and TCP Checksum
274	              Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4,
275	              October 2000.

277	   [QUANTA]   He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N.,
278	              Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for
279	              high performance data delivery over photonic networks",
280	              Future Generation Computer Systems Vol. 19, No. 6,
281	              August 2003.

283	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
284	              (IPv6) Specification", RFC 2460, December 1998.

286	   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
287	              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
288	              Zhang, L., and V. Paxson, "Stream Control Transmission
289	              Protocol", RFC 2960, October 2000.

291	   [RFC2402]  Kent, S. and R. Atkinson, "IP Authentication Header",
292	              RFC 2402, November 1998.

294	   [RFC4251]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
295	              Protocol Architecture", RFC 4251, January 2006.

297	Appendix A.  Acknowledgements

299	   This work was supported by the National Science Foundation under
300	   Grant No. 0083285.

302	Authors' Addresses

304	   John W. Heffner
305	   Pittsburgh Supercomputing Center
306	   4400 Fifth Avenue
307	   Pittsburgh, PA  15213
308	   US

310	   Phone: 412-268-2329
311	   Email: jheffner@psc.edu

313	   Matt Mathis
314	   Pittsburgh Supercomputing Center
315	   4400 Fifth Avenue
316	   Pittsburgh, PA  15213
317	   US

319	   Phone: 412-268-3319
320	   Email: mathis@psc.edu

322	   Ben Chandler
323	   Pittsburgh Supercomputing Center
324	   4400 Fifth Avenue
325	   Pittsburgh, PA  15213
326	   US

328	   Phone: 412-268-9783
329	   Email: bchandle@psc.edu

331	Intellectual Property Statement

333	   The IETF takes no position regarding the validity or scope of any
334	   Intellectual Property Rights or other rights that might be claimed to
335	   pertain to the implementation or use of the technology described in
336	   this document or the extent to which any license under such rights
337	   might or might not be available; nor does it represent that it has
338	   made any independent effort to identify any such rights.  Information
339	   on the procedures with respect to rights in RFC documents can be
340	   found in BCP 78 and BCP 79.

342	   Copies of IPR disclosures made to the IETF Secretariat and any
343	   assurances of licenses to be made available, or the result of an
344	   attempt made to obtain a general license or permission for the use of
345	   such proprietary rights by implementers or users of this
346	   specification can be obtained from the IETF on-line IPR repository at
347	   http://www.ietf.org/ipr.

349	   The IETF invites any interested party to bring to its attention any
350	   copyrights, patents or patent applications, or other proprietary
351	   rights that may cover technology that may be required to implement
352	   this standard.  Please address the information to the IETF at
353	   ietf-ipr@ietf.org.

355	Disclaimer of Validity

357	   This document and the information contained herein are provided on an
358	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
359	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
360	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
361	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
362	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
363	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

365	Copyright Statement

367	   Copyright (C) The Internet Society (2006).  This document is subject
368	   to the rights, licenses and restrictions contained in BCP 78, and
369	   except as set forth therein, the authors retain all their rights.

371	Acknowledgment

373	   Funding for the RFC Editor function is currently provided by the
374	   Internet Society.