idnits 2.17.1 

draft-heffner-frag-harmful-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 416.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 427.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 434.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 440.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 4, 2007) is 6231 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)

  -- Obsolete informational reference (is this intentional?): RFC 2960
     (Obsoleted by RFC 4960)

  -- Obsolete informational reference (is this intentional?): RFC 2402
     (Obsoleted by RFC 4302, RFC 4305)


     Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Heffner
3	Internet-Draft                                                 M. Mathis
4	Expires: October 6, 2007                                     B. Chandler
5	                                                                     PSC
6	                                                           April 4, 2007

8	               IPv4 Reassembly Errors at High Data Rates
9	                     draft-heffner-frag-harmful-05

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on October 6, 2007.

36	Copyright Notice

38	   Copyright (C) The IETF Trust (2007).

40	Abstract

42	   IPv4 fragmentation is not sufficiently robust for use under some
43	   conditions in today's Internet.  At high data rates, the 16-bit IP
44	   identification field is not large enough to prevent frequent
45	   incorrectly assembled IP fragments, and the TCP and UDP checksums are
46	   insufficient to prevent the resulting corrupted datagrams from being
47	   delivered to higher protocol layers.  This note describes some easily
48	   reproduced experiments demonstrating the problem, and discusses some
49	   of the operational implications of these observations.

51	1.  Introduction

53	   The IPv4 header was designed at a time when data rates were several
54	   orders of magnitude lower than those achievable today.  This document
55	   describes a consequent scale-related failure in the IP identification
56	   (ID) field, where fragments may be incorrectly assembled at a rate
57	   high enough likely to invalidate assumptions about data integrity
58	   failure rates.

60	   That IP fragmentation results in inefficient use of the network has
61	   been well documented [Kent87].  This note presents a different kind
62	   of problem, which can result not only in significant performance
63	   degradation, but also frequent data corruption.  This is especially
64	   pertinent due to the recent proliferation of UDP bulk transport tools
65	   that sometimes fragment every datagram.

67	   Additionally, there is some network equipment that ignores the Don't
68	   Fragment (DF) bit in the IP header to work around MTU discovery
69	   problems [RFC2923].  This equipment indirectly exposes properly
70	   implemented protocols and applications to corrupt data.

72	2.  Wrapping the IP ID Field

74	   The Internet Protocol standard specifies:

76	      "The choice of the Identifier for a datagram is based on the need
77	      to provide a way to uniquely identify the fragments of a
78	      particular datagram.  The protocol module assembling fragments
79	      judges fragments to belong to the same datagram if they have the
80	      same source, destination, protocol, and Identifier.  Thus, the
81	      sender must choose the Identifier to be unique for this source,
82	      destination pair and protocol for the time the datagram (or any
83	      fragment of it) could be alive in the Internet."  [RFC0791]

85	   Strict conformance to this standard limits transmissions in one
86	   direction between any address pair to no more than 65536 packets per
87	   protocol (e.g.  TCP, UDP or ICMP) per maximum packet lifetime.

89	   Clearly not all hosts follow this standard, because it implies an
90	   unreasonably low maximum data rate.  For example, a host sending 1500
91	   byte packets with a 30 second maximum packet lifetime could send at
92	   only about 26 Mbits/s before exceeding 65535 packets per packet
93	   lifetime.  Or, filling a 1 Gbit/s interface with 1500 byte packets
94	   requires sending 65536 packets in less than 1 second, an unreasonably
95	   short maximum packet lifetime, being less than the round-trip time on
96	   some paths.  This requirement is widely ignored.

98	   Additionally, it is worth noting that re-using values in the IP ID
99	   field once per 65536 datagrams is the best case.  Some
100	   implementations randomize the IP ID to prevent leaking information
101	   out of the kernel [Bellovin02], which causes re-use of the IP ID
102	   field to occur probabilistically at all sending rates.

104	   IP receivers store fragments in a reassembly buffer until all
105	   fragments in a datagram arrive, or until the reassembly timeout
106	   expires (15 seconds is suggested in [RFC0791]).  Fragments in a
107	   datagram are associated with each other by their protocol number, the
108	   value in their ID field, and by the source, destination address pair.
109	   If a sender wraps the ID field in less than the reassembly timeout,
110	   it becomes possible for fragments from different datagrams to be
111	   incorrectly spliced together ("mis-associated"), and delivered to the
112	   upper layer protocol.

114	   A case of particular concern is when mis-association is self-
115	   propagating.  This occurs, for example, when there is reliable
116	   ordering of packets and the first fragment of a datagram is lost in
117	   the network.  The rest of the fragments are stored in the fragment
118	   reassembly buffer, and when the sender wraps the ID field, the first
119	   fragment of the new datagram will be mis-associated with the rest of
120	   the old datagram.  The new datagram will be now be incomplete (since
121	   it is missing its first fragment), so the rest of it will be saved in
122	   the fragment reassembly buffer, forming a cycle that repeats every
123	   65536 datagrams.  It is possible to have a number of simultaneous
124	   cycles, bounded by the size of the fragment reassembly buffer.

126	   IPv6 is considerably less vulnerable to this type of problem, since
127	   its fragment header contains a 32-bit identification field [RFC2460].
128	   Mis-association will only be a problem at packet rates 65536 times
129	   higher than for IPv4.

131	3.  Effects of Mis-Associated Fragments

133	   When the mis-associated fragments are delivered, transport-layer
134	   checksumming should detect these datagrams as incorrect and discard
135	   them.  When the datagrams are discarded, it could create a
136	   performance problem for loss-feedback congestion control algorithms,
137	   particularly when a large congestion window is required, since it
138	   will introduce a certain amount of non-congestive loss.

140	   Transport checksums, however, may not be designed to handle such high
141	   error rates.  The TCP/UDP checksum is only 16 bits in length.  If
142	   these checksums follow a uniform random distribution, we expect mis-
143	   associated datagrams to be accepted by the checksum at a rate of one
144	   per 65536.  With only one mis-association cycle, we expect corrupt
145	   data delivered to the application layer once per 2^32 datagrams.
146	   This number can be significantly higher with multiple concurrent
147	   cycles.

149	   With non-random data, the TCP/UDP checksum may be even weaker still.
150	   It is possible to construct datasets where mis-associated fragments
151	   will always have the same checksum.  Such a case may be considered
152	   unlikely, but is worth considering.  "Real" data may be more likely
153	   than random data to cause checksum hot spots and increase the
154	   probability of false checksum match [Stone98].  Also, some
155	   applications or higher-level protocols may turn off checksumming to
156	   increase speed, though this practice has been found to be dangerous
157	   for other reasons when data reliability is important [Stone00].

159	4.  Experimental Observations

161	   To test the practical impact of fragmentation on UDP, we ran a series
162	   of experiments using a UDP bulk data transport protocol that was
163	   designed to be used as an alternative to TCP for transporting large
164	   data sets over specialized networks.  The tool, Reliable Blast UDP
165	   (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected
166	   because it has a clean interface which facilitated automated
167	   experiments.  The decision to use RBUDP had little to do with the
168	   details of the transport protocol itself.  Any UDP transport protocol
169	   that does not have additional means to detect corruption, and that
170	   could be configured to use IP fragmentation, would have the same
171	   results.

173	   In order to diagnose corruption on files transferred with the UDP
174	   bulk transfer tool, we used a file format that included embedded
175	   sequence numbers and MD5 checksums in each fragment of each datagram.
176	   Thus it was possible to distinguish random corruption from that
177	   caused by mis-associated fragments.  We used two different types of
178	   files.  One was constructed so that all the UDP checksums were
179	   constant -- we will call this the "constant" dataset.  The other was
180	   constructed so that UDP checksums were uniformly random -- the
181	   "random" dataset.  All tests were done using 400 MB files, sent in
182	   1524-byte datagrams so that they were fragmented on standard Fast
183	   Ethernet with a 1500-byte MTU.

185	   The UDP bulk file transport tool was used to send the datasets
186	   between a pair of hosts at slightly less than the available data rate
187	   (100 Mbps).  Near the beginning of each flow, a brief secondary flow
188	   was started to induce packet loss in the primary flow.  Throughout
189	   the life of the primary flow, we typically observed mis-association
190	   rates on the order of a few hundredths of a percent.

192	   Tests run with the "constant" dataset resulted in corruption on all
193	   mis-associated fragments, that is, corruption on the order of a few
194	   hundredths of a percent.  In sending approximately 10 TB of "random"
195	   datasets, we observed 8847668 UDP checksum errors and 121 corruptions
196	   of the data due to mis-associated fragments.

198	5.  Preventing Mis-Association

200	   The most straightforward way to avoid mis-association is to avoid
201	   fragmentation altogether by implementing Path MTU Discovery [RFC1191]
202	   [RFC4821].  However, this is not always feasible for all
203	   applications.  Further, as a work-around for MTU discovery problems
204	   [RFC2923], some TCP implementations and communications gear provide
205	   mechanisms to disable path MTU discovery by clearing or ignoring the
206	   DF bit.  Doing so will expose all protocols using IPv4, even those
207	   that participate in MTU discovery, to mis-association errors.

209	   If IP fragmentation is in use, it may be possible to reduce the
210	   timeout sufficiently so that mis-association will not occur.
211	   However, there are a number of difficulties with such an approach.
212	   Since the sender controls the rate of packets sent and selection of
213	   IP ID, while the receiver controls the reassembly timeout, there
214	   would need to be some mutual assurance between each party as to
215	   participation in the scheme.  Further, it is not generally possible
216	   to set the timeout low enough so that a fast sender's fragments will
217	   not be mis-associated, yet high enough so that a slow sender's
218	   fragments will not be unconditionally discarded before it is possible
219	   to reassemble them.  Therefore, the timeout and IP ID selection would
220	   need to be done on a per peer basis.  Also, it is likely NAT will
221	   break any per peer tables keyed by IP address.  It is not within the
222	   scope of this document to recommend solutions to these problems,
223	   though we believe a per-peer adaptive timeout is likely to prevent
224	   mis-association under circumstances where it would most commonly
225	   occur.

227	   A case particularly worth noting is that of tunnels encapsulating
228	   payload in IPv4.  To deal with difficulties in MTU Discovery
229	   [RFC4459], tunnels may rely on fragmentation between the two
230	   endpoints, even if the payload is marked with a DF bit [RFC4301].  In
231	   such a mode, the two tunnel endpoints behave as IP end hosts, with
232	   all tunneled traffic having the same protocol type.  Thus, the
233	   aggregate rate of tunneled packets may not exceed 65536 per maximum
234	   packet lifetime, or tunneled data becomes exposed to possible mis-
235	   association.  Even protocols doing MTU discovery such as TCP will be
236	   affected.  Operators of tunnels should ensure that the receiving
237	   end's reassembly timeout is short enough that mis-association cannot
238	   occur given the tunnel's maximum rate.

240	6.  Mitigating Mis-Association

242	   It is difficult to concisely describe all possible situations under
243	   which fragments might be mis-associated.  Even if an end host
244	   carefully follows the specification, ensuring unique IP IDs, the
245	   presence of NATs or tunnels may expose applications to IP ID space
246	   conflicts.  Further, devices in the network that the end hosts cannot
247	   see or control, such as tunnels, may cause mis-association.  Even a
248	   fragmenting application that sends at a low rate might possibly be
249	   exposed when running simultaneously with a non-fragmenting
250	   application that sends at a high rate.  As described above, the
251	   receiver might implement to reduce or eliminate the possibility of
252	   conflict, but there is no mechanism in place for a sender to know
253	   what the receiver is doing in this respect.  As a consequence, there
254	   is no general mechanism for an application that is using IPv4
255	   fragmentation to know if it is deterministically or statistically
256	   protected from mis-associated fragments.

258	   Under circumstances when it is impossible or impractical to prevent
259	   mis-assiciation, its effects may be mitigated by use of stronger
260	   integrity checking at any layer above IP.  This is a natural side
261	   effect of using cryptographic authentication.  For example, IPsec AH
262	   [RFC2402] will discard any corrupted datagrams, preventing their
263	   deliver to upper layers.  A stronger transport layer checksum such as
264	   SCTP's, which is 32 bits in length [RFC2960], may help significantly.
265	   At the application layer, SSH message authentication codes [RFC4251]
266	   will prevent delivery of corrupted data, though since the TCP
267	   connection underneath is not protected, it is considered invalid and
268	   the session is immediately terminated.  While stronger integrity
269	   checking may prevent data corruption, it will not prevent the
270	   potential performance impact described above of non-congestive loss
271	   on congestion control at high congestion windows.

273	   It should also be noted that mis-association is not the only possible
274	   source of data corruption above the network layer [Stone00].  Most
275	   applications for which data integrity is critically imporatant should
276	   implement strong integrity checking regardless of exposure to mis-
277	   association.

279	   In general, applications that rely on IPv4 fragmentation should be
280	   written with these issues in mind, as well as those issues documented
281	   in [Kent87].  Applications that rely on IPv4 fragmentation while
282	   sending at high speeds (the order of 100 Mbps or higher), and devices
283	   that deliberately introduce fragmentation to otherwise unfragmented
284	   traffic (e.g., tunnels) should be particularly cautious, and
285	   introduce strong mechanisms to ensure data integrity.

287	7.  Security Considerations

289	   If a malicious entity knows that a pair of hosts are communicating
290	   using a fragmented stream, it may present an opportunity for this
291	   entity to corrupt the flow.  By sending "high" fragments (those with
292	   offset greater than zero) with a forged source address, the attacker
293	   can deliberately cause corruption as described above.  Exploiting
294	   this vulnerability requires only knowledge of the source and
295	   destination addresses of the flow, its protocol number, and fragment
296	   boundaries.  It does not require knowledge of port or sequence
297	   numbers.

299	   If the attacker has visibility of packets on the path, the attack
300	   profile is similar to injecting full segments.  Using this attack
301	   makes blind disruptions easier, and might possibly be used to cause
302	   degradation of service.  We believe only streams using IPv4
303	   fragmentation are likely vulnerable.  Because of the nature of the
304	   problems outlined in this draft, the use of IPv4 fragmentation for
305	   critical applications may not be advisable regardless of security
306	   concerns.

308	8.  IANA Considerations

310	   None.

312	9.  Informative References

314	   [Kent87]   Kent, C. and J. Mogul, "Fragmentation considered harmful",
315	              Proc. SIGCOMM '87 vol. 17, No. 5, October 1987.

317	   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
318	              RFC 2923, September 2000.

320	   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
321	              September 1981.

323	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
324	              November 1990.

326	   [Stone98]  Stone, J., Greenwald, M., Partridge, C., and J. Hughes,
327	              "Performance of Checksums and CRC's over Real Data", IEEE/
328	              ACM Transactions on Networking vol. 6, No. 5,
329	              October 1998.

331	   [Stone00]  Stone, J. and C. Partridge, "When The CRC and TCP Checksum
332	              Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4,
333	              October 2000.

335	   [QUANTA]   He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N.,
336	              Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit for
337	              high performance data delivery over photonic networks",
338	              Future Generation Computer Systems Vol. 19, No. 6,
339	              August 2003.

341	   [Bellovin02]
342	              Bellovin, S., "A Technique for Counting NATted Hosts",
343	              November 2002.

345	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
346	              (IPv6) Specification", RFC 2460, December 1998.

348	   [RFC2960]  Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
349	              Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
350	              Zhang, L., and V. Paxson, "Stream Control Transmission
351	              Protocol", RFC 2960, October 2000.

353	   [RFC2402]  Kent, S. and R. Atkinson, "IP Authentication Header",
354	              RFC 2402, November 1998.

356	   [RFC4251]  Ylonen, T. and C. Lonvick, "The Secure Shell (SSH)
357	              Protocol Architecture", RFC 4251, January 2006.

359	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
360	              Internet Protocol", RFC 4301, December 2005.

362	   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
363	              Network Tunneling", RFC 4459, April 2006.

365	   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
366	              Discovery", RFC 4821, March 2007.

368	Appendix A.  Acknowledgements

370	   This work was supported by the National Science Foundation under
371	   Grant No. 0083285.

373	Authors' Addresses

375	   John W. Heffner
376	   Pittsburgh Supercomputing Center
377	   4400 Fifth Avenue
378	   Pittsburgh, PA  15213
379	   US

381	   Phone: 412-268-2329
382	   Email: jheffner@psc.edu

384	   Matt Mathis
385	   Pittsburgh Supercomputing Center
386	   4400 Fifth Avenue
387	   Pittsburgh, PA  15213
388	   US

390	   Phone: 412-268-3319
391	   Email: mathis@psc.edu

393	   Ben Chandler
394	   Pittsburgh Supercomputing Center
395	   4400 Fifth Avenue
396	   Pittsburgh, PA  15213
397	   US

399	   Phone: 412-268-9783
400	   Email: bchandle@psc.edu

402	Full Copyright Statement

404	   Copyright (C) The IETF Trust (2007).

406	   This document is subject to the rights, licenses and restrictions
407	   contained in BCP 78, and except as set forth therein, the authors
408	   retain all their rights.

410	   This document and the information contained herein are provided on an
411	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
412	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
413	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
414	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
415	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
416	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

418	Intellectual Property

420	   The IETF takes no position regarding the validity or scope of any
421	   Intellectual Property Rights or other rights that might be claimed to
422	   pertain to the implementation or use of the technology described in
423	   this document or the extent to which any license under such rights
424	   might or might not be available; nor does it represent that it has
425	   made any independent effort to identify any such rights.  Information
426	   on the procedures with respect to rights in RFC documents can be
427	   found in BCP 78 and BCP 79.

429	   Copies of IPR disclosures made to the IETF Secretariat and any
430	   assurances of licenses to be made available, or the result of an
431	   attempt made to obtain a general license or permission for the use of
432	   such proprietary rights by implementers or users of this
433	   specification can be obtained from the IETF on-line IPR repository at
434	   http://www.ietf.org/ipr.

436	   The IETF invites any interested party to bring to its attention any
437	   copyrights, patents or patent applications, or other proprietary
438	   rights that may cover technology that may be required to implement
439	   this standard.  Please address the information to the IETF at
440	   ietf-ipr@ietf.org.

442	Acknowledgment

444	   Funding for the RFC Editor function is provided by the IETF
445	   Administrative Support Activity (IASA).