idnits 2.17.1 

draft-ietf-tcpm-anumita-tcp-stronger-checksum-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 25, 2010) is 5079 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 1146
     (Obsoleted by RFC 6247)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                A. Biswas
3	Internet-Draft                                              NetApp, Inc.
4	Intended status: Standards Track                            May 25, 2010
5	Expires: November 26, 2010

7	   Support for Stronger Error Detection Codes in TCP for Jumbo Frames
8	            draft-ietf-tcpm-anumita-tcp-stronger-checksum-00

10	Abstract

12	   There is a class of data serving protocols and applications that
13	   cannot tolerate undetected data corruption on the wire.  Data
14	   corruption could occur at the source in software, in the network
15	   interface card, out on the link, on intermediate routers or at the
16	   destination network interface card or node.  The Ethernet CRC and the
17	   16-bit checksum in the TCP/UDP headers are used to detect data
18	   errors.  Most applications rely on these checksums to detect data
19	   corruptions and do not use any checksums or CRC checks at their
20	   level.  Research has shown that the TCP/UDP checksums are catching a
21	   significant number of errors, however, the research suggests that one
22	   packet in 10 billion will have an error that goes undetected for
23	   Ethernet MTU frames (MTU of 1500).  Under certain situations, "bad"
24	   hosts can introduce undetected errors at a much higher frequency and
25	   order.  With the use of Jumbo frames on the rise, and therefore more
26	   data bits on the wire that could be corrupted, the current 16-bit
27	   TCP/UDP checksum, or the Ethernet 32-bit CRC are simply not
28	   sufficient for detecting errors.  This document specifies a proposal
29	   to use stronger checksum algorithms for TCP Jumbo Frames for IPv4 and
30	   IPv6 networks.  The Castagnoli CRC 32C algorithm used in iSCSI and
31	   SCTP is proposed as the error detection code of choice.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on November 26, 2010.

50	Copyright Notice

52	   Copyright (c) 2010 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
68	     1.1.  Conventions  . . . . . . . . . . . . . . . . . . . . . . .  4
69	   2.  Calculating the CRC-32C value  . . . . . . . . . . . . . . . .  4
70	   3.  Negotiating the use of CRC 32C . . . . . . . . . . . . . . . .  6
71	   4.  IPv6 Considerations  . . . . . . . . . . . . . . . . . . . . .  8
72	   5.  Conclusions and Acknowledgements . . . . . . . . . . . . . . .  8
73	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  9
74	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . .  9
75	   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . .  9
76	     8.1.  Normative References . . . . . . . . . . . . . . . . . . .  9
77	     8.2.  Informative References . . . . . . . . . . . . . . . . . .  9
78	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10

80	1.  Introduction

82	   There is a class of data serving applications that host business and
83	   financial data.  Detecting and recovering from data corruption is
84	   paramount to the success of this class of applications.  Data
85	   corruption can occur while data is transiting from the source to a
86	   desired destination.  Data can get corrupted right at the source due
87	   to software errors, within the network interface card, out on the
88	   wire or link, in intermediate routers and at the destination network
89	   interface or node.  Link errors are detected using the Ethernet 32-
90	   bit CRC.  Node or router errors are detected using the 16-bit
91	   checksum in the transport headers of TCP and UDP.  Most applications
92	   do not have built-in error detection capability and typically rely on
93	   the checksums in the underlying networking layers.  Stone et al.
94	   [Stone] have recommended applications employ their own checksums to
95	   detect errors that go undetected by lower levels.  They have made
96	   this recommendation for the standard Ethernet MTU.  They have done so
97	   considering situations where a "bad" host can introduce undetected
98	   errors at a much higher frequency and order.  It must also be said
99	   that the physical layer already does encodings with bit error
100	   rates(BER) of 10^-12 ti 10^-14 and therefore the current checksum
101	   algorithms may be sufficient.  However, stronger checksumming
102	   accounts for the cases where noisy hardware, bad cables can introduce
103	   noise at a much higher frequency and order.  It is also to be noted
104	   that increasing speed of the physical medium (to 40G and 100G) can
105	   also lead to higher BER.

107	   Another dynamic, very much in the rise is the use and deployment of
108	   Jumbo Frames.  Jumbo Frames reduce per packet overheads significantly
109	   and are a cheap way of improving the performance of bulk data
110	   applications.  Combining the use of Jumbo frames with noisy physical
111	   medium increases the risk of undetected bit errors as there simply
112	   are more bits that can get corrupted.  This is rather concerning as
113	   business and financial data typically are transported over the
114	   network using file access based protocols like NFS, CIFS, HTTP over
115	   TCP.

117	   The strength of the Ethernet CRC checksum and the 16-bit Transport
118	   checksum has been found to reduce for data segments that are larger
119	   than the standard Ethernet MTU.  Koopman et. al.  [Koopman] have
120	   explored a number of CRC polynomials as well as the polynomial used
121	   in the Ethernet CRC calculation.  They have measured the
122	   effectiveness of these CRC polynomials for different data word
123	   lengths, where a data word is a bit stream from 64 bits to 128 Kbits.
124	   These data word lengths cover lengths equivalent to Ethernet MTUs and
125	   Jumbo frames and also frame lengths larger than Jumbo frames.  They
126	   found that the Castagnoli polynomial x^32 + x^28 + x^27 + x^26 + x^25
127	   + x^23 + x^22 + x^20 + x^19 + x^18 + x^14 + x^13 + x^11 + x^10 + x^9
128	   + x^8 + x^6 + x^0 represented as the 32-bit code 0x8F6E37A0 bests
129	   other CRC polynomials for Jumbo frames and larger segments.  This
130	   polynomial has been adopted by the iSCSI and SCTP standards.  It is
131	   to be noted that this polynomial is represented as the 32-bit code
132	   0x11EDC6F41 in SCTP in accordance to the convention adopted for bit-
133	   ordering at the transport-level, i.e., bit-ordering for mapping SCTP
134	   messages to polynomials is that bytes are taken most significant
135	   first, but within each bytes, bits are taken least-significant first.

137	   Given the ubiquity of TCP, it is the layer where we can introduce
138	   stronger error detection capability without duplicating the effort in
139	   higher layers.  TCP options provide an easy path to introduce
140	   stronger checksum without hindering interoperability.  TCP options
141	   allow a TCP stack supporting a TCP option to interoperate seamlessly
142	   with a TCP stack that does not support the new TCP option (RFC 1122
143	   [RFC1122] requires the interoperability in Section 4.2.2.5).

145	   This document proposes that the use of the Castagnoli polynomial,
146	   also known as the CRC 32C as the "checksum" of choice for TCP
147	   protocol.  Other summation based checksum algorithms like Fletcher
148	   and Adler's algorithm were evaluated in RFC 3385 [RFC3385] and found
149	   to behave substanially worse than CRCs and hence are not considered
150	   in this proposal.

152	   By standardizing a stronger checksum at the TCP level, we can quickly
153	   drive the offloading of this checksum to NIC hardware, just as the
154	   16-bit TCP checksum is offloaded by most NIC vendors today.
155	   Offloading computation to hardware allows us to get rid of the in-
156	   software computation overheads of stronger checksum algorithms.

158	   Another positive effect of implementing strong TCP checksumming is
159	   that this will drive the rapid adoption of 9K Jumbo frames and make
160	   it considerably easier to consider even larger Jumbo Frames.

162	1.1.  Conventions

164	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
165	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
166	   document are to be interpreted as described in RFC 2119 [RFC2119].

168	2.  Calculating the CRC-32C value

170	   The 16-bit TCP checksum does a checksum of the TCP header and
171	   payload.  It also includes the pseudo header values of Source
172	   Address, Destination Address, Protocol and TCP Length.  The addition
173	   of the bytes of a pseudo header into a summation based checksum
174	   algorithm is simpler than the inclusion of the bytes of a pseudo
175	   header into a CRC computation.  This is because a CRC computation
176	   assumes a contiguous bit stream when translating the bit stream to a
177	   polynomial for doing the polynomial division.  The psuedo header was
178	   added to the TCP checksum computation in order to detect errors
179	   introduced in one of the IP header fields that could possibly cause
180	   the packet to be sent to an incorrect destination.  These fields also
181	   get included in the IP header checksum.  The intent was to include
182	   them in two separate checksums for better data integrity.  One can
183	   question the need for including the pseudo-header fields twice.  The
184	   pseudo-header currently get included thrice if one considers the fact
185	   that the Ethernet CRC is computed over the entire Ethernet frame and
186	   Ethernet is ubiquitous today.  So for the purposes of this draft, all
187	   the fields used in the current TCP checksum except the pseudo-header
188	   must be included in the CRC-32c calculation.  If this draft's
189	   proposal is accepted for standardization, IETF may elect to add back
190	   the pseudo-header into the CRC-32C calculation or add only a smaller
191	   subset of the fields.  But it is to be noted that in this proposal we
192	   do have room to consider changes like this without disrupting current
193	   installations.

195	   It may also be questionable whether one needs to compute the 16-bit
196	   TCP checksum if the new TCP checksum option is present.  To avoid a
197	   chicken and egg problem, this document proposes that the 16-bit
198	   checksum field be zeroed out and included in the CRC 32C checksum as
199	   part of the TCP header bit stream.  The standardization process may
200	   choose a different approach and decide to do both the 16-bit TCP
201	   checksum and the CRC 32C checksum, in which case, a method will need
202	   to be defined as to the order of checksumming and the fields used in
203	   each of the checksum computations.

205	   This document also recommends the use of the CRC-32C when the
206	   negotiated Maximum Segment Size (MSS) value is equal or greater than
207	   8948 bytes (excluding frame and TCPIP header bytes), the most common
208	   Jumbo Frame size, but does not explicitly recommend the use of CRC-
209	   32C for standard Ethernet MTU frames.

211	   The CRC-32C MAY be used even for regular Ethernet MTU frames also if
212	   the application so desires for stricter data integrity checking,
213	   since CRC-32C can detect more independent bit errors than Ethernet
214	   CRC for Ethernet MTU sized packets.  The use of CRC-32C can be made
215	   settable by the application, by providing a socket option to the
216	   application.  The provision for an application to enable/disable the
217	   use of the new checksum option is left as an API detail of the
218	   particular TCP/IP socket layer implementation.

220	   The following section describes two possible approaches to
221	   negotiating the proposed 32-bit TCP checksum.  The common thread in
222	   the two approaches is the use of TCP options to negotiate the use of
223	   this checksum during the connection setup phase.  Once the connection
224	   is setup, all subsequent packets sent during the connection transfer
225	   phase MUST carry the stronger checksum except as described below.

227	   It is also possible that Path MTU discovery causes a connection to
228	   reduce the negotiated MSS value post connection establishment.  So,
229	   during connection establishment, an MSS equal or greater than 9K
230	   might have been negotiated along with stronger TCP checksumming, and
231	   then later the MSS reduced to be equal to the discovered path MTU.
232	   If the reduced MSS value is equal or less than an Ethernet MSS
233	   (typically 1460 without other TCP options), then the TCP end point
234	   that reduced its MTU may choose to NOT send the TCP checksum option
235	   in subsequent data packets.  The peer must then rely on the 16-bit
236	   TCP checksum for end to end data integrity which is okay since the
237	   Ethernet CRC has comparable data integrity checking capability for
238	   Ethernet sized packets.

240	   Now, let us discuss the method for computing the CRC 32c value:

242	   The CRC computation uses polynomial division.  The TCP header and
243	   payload is mapped to a polynomial and the CRC is calculated by
244	   dividing the bit stream with the CRC 32C polynomial.  Stone et. al.
245	   in Appendix B of RFC 4960 [RFC4960] describe a convention for mapping
246	   the bytes of the bit stream into the polynomial.  The same MUST be
247	   adopted for TCP transport too.

249	3.  Negotiating the use of CRC 32C

251	   There are two possible approaches to negotiating the proposed CRC 32C
252	   checksum during the TCP connection setup phase.

254	   o  A new TCP option

256	   o  Using the TCP Alternate Checksum Data Option

258	   The first approach introduces a new TCP option to be negotiated by
259	   TCP endpoints during the connection setup phase.  It will be of the
260	   same format as other defined TCP options and will have Type, Length
261	   and Value fields.  A new type will be requested from IANA.  The
262	   length field will be the sum total length of the new TCP checksum
263	   option which is 6 bytes.  The value field will hold the 32-bit CRC
264	   32C checksum.

266	   If either one of the peers does not add this option to its TCP
267	   options list in its SYN segment, the CRC-32C checksum must not be
268	   used by the other peer.  Most TCP implementations are written to
269	   process the TCP options they recognize and ignore unknown options on
270	   SYN segments so an endpoint that supports the new TCP option can
271	   interoperate with an endpoint that does not support the proposed TCP
272	   option.

274	   Since we have seen that the 16-bit TCP checksum is insufficient for
275	   detecting multiple independent errors for Jumbo frames, this proposal
276	   says that a peer supporting this option MUST send the new TCP
277	   checksum option if its link MTU is equal or greather than 9K.
278	   However, if the remote peer does not recognize the new option, the
279	   initiating peer MUST NOT use this TCP extension for the connection
280	   transfer phase.  If the remote peer recognizes the option and also
281	   has a Maximum Segment Size equal to the peer's advertised MSS or a
282	   minimum MSS of 9K, it MUST respond with the TCP checksum option.
283	   Every subsequent packet from both peers must include this option in
284	   the TCP header.  The extra overhead for adding this option is minimal
285	   for Jumbo frame sized segments and the higher data integrity pays for
286	   itself.

288	   Note that all TCP control packets sent after succesfully negotiating
289	   this TCP option may carry this TCP option also, although this draft
290	   does not mandate it.

292	   TCP CRC Checksum Option.

294	   +----------+------------+----------------------------+
295	   | Kind = X | Length = 6 | Value = 4 bytes of CRC 32C |
296	   |----------+------------+----------------------------+

298	   .

300	                                 Figure 1

302	   The second approach utilizes a pair of existing TCP options called
303	   the "TCP Alternate Checksum Options" specified in RFC 1146 [RFC1146].
304	   The current checksum types specified by that option are TCP checksum,
305	   8-bit Fletcher's algorithm and 16-bit Fletcher's algorithm.  A new
306	   checksum type can be added to this list for CRC-32C checksums.  The
307	   negotiation rules for selecting the checksum type would follow the
308	   rules described in RFC1146.  That is, if both SYN segments carry the
309	   Alternate Checksum Request option, and both specify the same
310	   algorithm, that algorithm must be used for the remainder of the
311	   connection.  Otherwise, the standard TCP checksum must be used for
312	   the entire connection.

314	   Once the CRC 32C checksum algorithm is negotiated, the TCP Alternate
315	   Checksum Data Option is sent whose data will equal 4 bytes for the
316	   CRC-32C checksum.

318	   TCP Alternate Checksum Request Option
319	   +-----------+------------+-----------------+
320	   | Kind = 14 | Length = 3 | Value = CRC-32C |
321	   |-----------+------------+-----------------+

323	   Here the value for CRC32C would need to be defined, and may possibly
324	   be the next undefined value '3', following the definitions for 8-bit
325	   and 16-bit fletcher's algorithms.

327	   TCP Alternate Checksum Data Option
328	   +-----------+------------+--------------------------------+
329	   | Kind = 15 | Length = 6 | Value = CRC-32C computed value |
330	   |-----------+------------+--------------------------------+

332	   The TCP Alternate Checksum Data Option must be sent only during the
333	   connection transfer and tear down phase.  Again, the 16-bit TCP
334	   checksum field must be zeroed out before computing the 32-bit CRC 32C
335	   code.

337	   One or more padding bytes may be used when sending any of the above
338	   options to align to a 4 or 8 byte boundary for faster parsing on both
339	   32-bit and 64-bit machines.

341	   At this stage of draft development, the author is evaluating and
342	   seeking inputs for both approaches.

344	4.  IPv6 Considerations

346	   The TCP extension for CRC 32C can be applied equally to IPv4 and
347	   IPv6.  The pseudo header for IPv6 includes 128 bit source and
348	   destination addresses.  This pseudo header, the TCP header and
349	   payload MUST be included in the CRC 32C checksum of a TCP/IPv6
350	   segment as there is no IPv6 header checksum.

352	5.  Conclusions and Acknowledgements

354	   This document proposes the use of stronger error detection codes for
355	   TCP connections sending Jumbo Frames.  It does not provide a solution
356	   for UDP based applications.  I would also like to thank Tom Kessler
357	   (kessler@netapp.com) for his review comments.  He specifically
358	   pointed out his concerns about the safety of TCP checksum + Ethernet
359	   CRC at 40G and 100G speeds with even 9K jumbo frames.  He also
360	   provided information on the Intel instruction set that can be used to
361	   speed up CRC-32c computation.  Special thanks to Janet Takami
362	   (jtakami@netapp.com) for her comments as well as for pointing out
363	   that there is no IPv6 header checksum and so the pseudo header must
364	   be included in the CRC 32c checksum.

366	6.  IANA Considerations

368	   This memo includes a request to IANA for a new Type Number for the
369	   new TCP Checksum Option if we do not go with the TCP Alternate
370	   Checksum Option.  If we go with the TCP Alternate Checksum option,
371	   then a new checksum type will need to be defined for CRC 32C,
372	   probably after the defined values for Fletcher's 8-bit and 16-bit
373	   algorithm types.

375	7.  Security Considerations

377	   The CRC 32C codes can detect unintentional changes to data such as
378	   those caused by noise.  If an attacker changes the data, it can also
379	   change the error-detection code to match the changed data.  Hence,
380	   these codes are not intended for security purposes.

382	8.  References

384	8.1.  Normative References

386	   [RFC1122]  IETF, "Requirements for Internet Hosts -- Communication
387	              Layers", October 1989.

389	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
390	              Requirement Levels", BCP 14, RFC 2119, March 1997.

392	8.2.  Informative References

394	   [Koopman]  Koopman, P., "32-Bit Cyclic Redundancy Codes for Internet
395	              Applications", 2002.

397	   [Stone]    Stone, J., Partridge, C., "When the CRC and TCP Checksum
398	              Disagree"

400	   [RFC1146]  Zweig, J., Partridge, C., "TCP Alternate Checksum Options"
401	              March 1990.

403	   [RFC3385]  Sheinwald, D., et. al. "Internet Protocol Small Computer
404	              System Interface (iSCSI) Cyclic Redundance Check (CRC)/
405	              Checksum Considerations", September 2002.

407	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
408	              September 2007.

410	Author's Address

412	   Anumita Biswas
413	   NetApp, Inc.
414	   495, E. Java Dr
415	   Sunnyvale, CA  95054
416	   USA

418	   Phone: +14088223204
419	   Email: anumita.biswas@netapp.com