idnits 2.17.1 draft-ietf-tcpm-anumita-tcp-stronger-checksum-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 25, 2010) is 5079 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 1146 (Obsoleted by RFC 6247) -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force A. Biswas 3 Internet-Draft NetApp, Inc. 4 Intended status: Standards Track May 25, 2010 5 Expires: November 26, 2010 7 Support for Stronger Error Detection Codes in TCP for Jumbo Frames 8 draft-ietf-tcpm-anumita-tcp-stronger-checksum-00 10 Abstract 12 There is a class of data serving protocols and applications that 13 cannot tolerate undetected data corruption on the wire. Data 14 corruption could occur at the source in software, in the network 15 interface card, out on the link, on intermediate routers or at the 16 destination network interface card or node. The Ethernet CRC and the 17 16-bit checksum in the TCP/UDP headers are used to detect data 18 errors. Most applications rely on these checksums to detect data 19 corruptions and do not use any checksums or CRC checks at their 20 level. Research has shown that the TCP/UDP checksums are catching a 21 significant number of errors, however, the research suggests that one 22 packet in 10 billion will have an error that goes undetected for 23 Ethernet MTU frames (MTU of 1500). Under certain situations, "bad" 24 hosts can introduce undetected errors at a much higher frequency and 25 order. With the use of Jumbo frames on the rise, and therefore more 26 data bits on the wire that could be corrupted, the current 16-bit 27 TCP/UDP checksum, or the Ethernet 32-bit CRC are simply not 28 sufficient for detecting errors. This document specifies a proposal 29 to use stronger checksum algorithms for TCP Jumbo Frames for IPv4 and 30 IPv6 networks. The Castagnoli CRC 32C algorithm used in iSCSI and 31 SCTP is proposed as the error detection code of choice. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on November 26, 2010. 50 Copyright Notice 52 Copyright (c) 2010 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Conventions . . . . . . . . . . . . . . . . . . . . . . . 4 69 2. Calculating the CRC-32C value . . . . . . . . . . . . . . . . 4 70 3. Negotiating the use of CRC 32C . . . . . . . . . . . . . . . . 6 71 4. IPv6 Considerations . . . . . . . . . . . . . . . . . . . . . 8 72 5. Conclusions and Acknowledgements . . . . . . . . . . . . . . . 8 73 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 74 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 75 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 76 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 77 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 78 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 80 1. Introduction 82 There is a class of data serving applications that host business and 83 financial data. Detecting and recovering from data corruption is 84 paramount to the success of this class of applications. Data 85 corruption can occur while data is transiting from the source to a 86 desired destination. Data can get corrupted right at the source due 87 to software errors, within the network interface card, out on the 88 wire or link, in intermediate routers and at the destination network 89 interface or node. Link errors are detected using the Ethernet 32- 90 bit CRC. Node or router errors are detected using the 16-bit 91 checksum in the transport headers of TCP and UDP. Most applications 92 do not have built-in error detection capability and typically rely on 93 the checksums in the underlying networking layers. Stone et al. 94 [Stone] have recommended applications employ their own checksums to 95 detect errors that go undetected by lower levels. They have made 96 this recommendation for the standard Ethernet MTU. They have done so 97 considering situations where a "bad" host can introduce undetected 98 errors at a much higher frequency and order. It must also be said 99 that the physical layer already does encodings with bit error 100 rates(BER) of 10^-12 ti 10^-14 and therefore the current checksum 101 algorithms may be sufficient. However, stronger checksumming 102 accounts for the cases where noisy hardware, bad cables can introduce 103 noise at a much higher frequency and order. It is also to be noted 104 that increasing speed of the physical medium (to 40G and 100G) can 105 also lead to higher BER. 107 Another dynamic, very much in the rise is the use and deployment of 108 Jumbo Frames. Jumbo Frames reduce per packet overheads significantly 109 and are a cheap way of improving the performance of bulk data 110 applications. Combining the use of Jumbo frames with noisy physical 111 medium increases the risk of undetected bit errors as there simply 112 are more bits that can get corrupted. This is rather concerning as 113 business and financial data typically are transported over the 114 network using file access based protocols like NFS, CIFS, HTTP over 115 TCP. 117 The strength of the Ethernet CRC checksum and the 16-bit Transport 118 checksum has been found to reduce for data segments that are larger 119 than the standard Ethernet MTU. Koopman et. al. [Koopman] have 120 explored a number of CRC polynomials as well as the polynomial used 121 in the Ethernet CRC calculation. They have measured the 122 effectiveness of these CRC polynomials for different data word 123 lengths, where a data word is a bit stream from 64 bits to 128 Kbits. 124 These data word lengths cover lengths equivalent to Ethernet MTUs and 125 Jumbo frames and also frame lengths larger than Jumbo frames. They 126 found that the Castagnoli polynomial x^32 + x^28 + x^27 + x^26 + x^25 127 + x^23 + x^22 + x^20 + x^19 + x^18 + x^14 + x^13 + x^11 + x^10 + x^9 128 + x^8 + x^6 + x^0 represented as the 32-bit code 0x8F6E37A0 bests 129 other CRC polynomials for Jumbo frames and larger segments. This 130 polynomial has been adopted by the iSCSI and SCTP standards. It is 131 to be noted that this polynomial is represented as the 32-bit code 132 0x11EDC6F41 in SCTP in accordance to the convention adopted for bit- 133 ordering at the transport-level, i.e., bit-ordering for mapping SCTP 134 messages to polynomials is that bytes are taken most significant 135 first, but within each bytes, bits are taken least-significant first. 137 Given the ubiquity of TCP, it is the layer where we can introduce 138 stronger error detection capability without duplicating the effort in 139 higher layers. TCP options provide an easy path to introduce 140 stronger checksum without hindering interoperability. TCP options 141 allow a TCP stack supporting a TCP option to interoperate seamlessly 142 with a TCP stack that does not support the new TCP option (RFC 1122 143 [RFC1122] requires the interoperability in Section 4.2.2.5). 145 This document proposes that the use of the Castagnoli polynomial, 146 also known as the CRC 32C as the "checksum" of choice for TCP 147 protocol. Other summation based checksum algorithms like Fletcher 148 and Adler's algorithm were evaluated in RFC 3385 [RFC3385] and found 149 to behave substanially worse than CRCs and hence are not considered 150 in this proposal. 152 By standardizing a stronger checksum at the TCP level, we can quickly 153 drive the offloading of this checksum to NIC hardware, just as the 154 16-bit TCP checksum is offloaded by most NIC vendors today. 155 Offloading computation to hardware allows us to get rid of the in- 156 software computation overheads of stronger checksum algorithms. 158 Another positive effect of implementing strong TCP checksumming is 159 that this will drive the rapid adoption of 9K Jumbo frames and make 160 it considerably easier to consider even larger Jumbo Frames. 162 1.1. Conventions 164 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 165 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 166 document are to be interpreted as described in RFC 2119 [RFC2119]. 168 2. Calculating the CRC-32C value 170 The 16-bit TCP checksum does a checksum of the TCP header and 171 payload. It also includes the pseudo header values of Source 172 Address, Destination Address, Protocol and TCP Length. The addition 173 of the bytes of a pseudo header into a summation based checksum 174 algorithm is simpler than the inclusion of the bytes of a pseudo 175 header into a CRC computation. This is because a CRC computation 176 assumes a contiguous bit stream when translating the bit stream to a 177 polynomial for doing the polynomial division. The psuedo header was 178 added to the TCP checksum computation in order to detect errors 179 introduced in one of the IP header fields that could possibly cause 180 the packet to be sent to an incorrect destination. These fields also 181 get included in the IP header checksum. The intent was to include 182 them in two separate checksums for better data integrity. One can 183 question the need for including the pseudo-header fields twice. The 184 pseudo-header currently get included thrice if one considers the fact 185 that the Ethernet CRC is computed over the entire Ethernet frame and 186 Ethernet is ubiquitous today. So for the purposes of this draft, all 187 the fields used in the current TCP checksum except the pseudo-header 188 must be included in the CRC-32c calculation. If this draft's 189 proposal is accepted for standardization, IETF may elect to add back 190 the pseudo-header into the CRC-32C calculation or add only a smaller 191 subset of the fields. But it is to be noted that in this proposal we 192 do have room to consider changes like this without disrupting current 193 installations. 195 It may also be questionable whether one needs to compute the 16-bit 196 TCP checksum if the new TCP checksum option is present. To avoid a 197 chicken and egg problem, this document proposes that the 16-bit 198 checksum field be zeroed out and included in the CRC 32C checksum as 199 part of the TCP header bit stream. The standardization process may 200 choose a different approach and decide to do both the 16-bit TCP 201 checksum and the CRC 32C checksum, in which case, a method will need 202 to be defined as to the order of checksumming and the fields used in 203 each of the checksum computations. 205 This document also recommends the use of the CRC-32C when the 206 negotiated Maximum Segment Size (MSS) value is equal or greater than 207 8948 bytes (excluding frame and TCPIP header bytes), the most common 208 Jumbo Frame size, but does not explicitly recommend the use of CRC- 209 32C for standard Ethernet MTU frames. 211 The CRC-32C MAY be used even for regular Ethernet MTU frames also if 212 the application so desires for stricter data integrity checking, 213 since CRC-32C can detect more independent bit errors than Ethernet 214 CRC for Ethernet MTU sized packets. The use of CRC-32C can be made 215 settable by the application, by providing a socket option to the 216 application. The provision for an application to enable/disable the 217 use of the new checksum option is left as an API detail of the 218 particular TCP/IP socket layer implementation. 220 The following section describes two possible approaches to 221 negotiating the proposed 32-bit TCP checksum. The common thread in 222 the two approaches is the use of TCP options to negotiate the use of 223 this checksum during the connection setup phase. Once the connection 224 is setup, all subsequent packets sent during the connection transfer 225 phase MUST carry the stronger checksum except as described below. 227 It is also possible that Path MTU discovery causes a connection to 228 reduce the negotiated MSS value post connection establishment. So, 229 during connection establishment, an MSS equal or greater than 9K 230 might have been negotiated along with stronger TCP checksumming, and 231 then later the MSS reduced to be equal to the discovered path MTU. 232 If the reduced MSS value is equal or less than an Ethernet MSS 233 (typically 1460 without other TCP options), then the TCP end point 234 that reduced its MTU may choose to NOT send the TCP checksum option 235 in subsequent data packets. The peer must then rely on the 16-bit 236 TCP checksum for end to end data integrity which is okay since the 237 Ethernet CRC has comparable data integrity checking capability for 238 Ethernet sized packets. 240 Now, let us discuss the method for computing the CRC 32c value: 242 The CRC computation uses polynomial division. The TCP header and 243 payload is mapped to a polynomial and the CRC is calculated by 244 dividing the bit stream with the CRC 32C polynomial. Stone et. al. 245 in Appendix B of RFC 4960 [RFC4960] describe a convention for mapping 246 the bytes of the bit stream into the polynomial. The same MUST be 247 adopted for TCP transport too. 249 3. Negotiating the use of CRC 32C 251 There are two possible approaches to negotiating the proposed CRC 32C 252 checksum during the TCP connection setup phase. 254 o A new TCP option 256 o Using the TCP Alternate Checksum Data Option 258 The first approach introduces a new TCP option to be negotiated by 259 TCP endpoints during the connection setup phase. It will be of the 260 same format as other defined TCP options and will have Type, Length 261 and Value fields. A new type will be requested from IANA. The 262 length field will be the sum total length of the new TCP checksum 263 option which is 6 bytes. The value field will hold the 32-bit CRC 264 32C checksum. 266 If either one of the peers does not add this option to its TCP 267 options list in its SYN segment, the CRC-32C checksum must not be 268 used by the other peer. Most TCP implementations are written to 269 process the TCP options they recognize and ignore unknown options on 270 SYN segments so an endpoint that supports the new TCP option can 271 interoperate with an endpoint that does not support the proposed TCP 272 option. 274 Since we have seen that the 16-bit TCP checksum is insufficient for 275 detecting multiple independent errors for Jumbo frames, this proposal 276 says that a peer supporting this option MUST send the new TCP 277 checksum option if its link MTU is equal or greather than 9K. 278 However, if the remote peer does not recognize the new option, the 279 initiating peer MUST NOT use this TCP extension for the connection 280 transfer phase. If the remote peer recognizes the option and also 281 has a Maximum Segment Size equal to the peer's advertised MSS or a 282 minimum MSS of 9K, it MUST respond with the TCP checksum option. 283 Every subsequent packet from both peers must include this option in 284 the TCP header. The extra overhead for adding this option is minimal 285 for Jumbo frame sized segments and the higher data integrity pays for 286 itself. 288 Note that all TCP control packets sent after succesfully negotiating 289 this TCP option may carry this TCP option also, although this draft 290 does not mandate it. 292 TCP CRC Checksum Option. 294 +----------+------------+----------------------------+ 295 | Kind = X | Length = 6 | Value = 4 bytes of CRC 32C | 296 |----------+------------+----------------------------+ 298 . 300 Figure 1 302 The second approach utilizes a pair of existing TCP options called 303 the "TCP Alternate Checksum Options" specified in RFC 1146 [RFC1146]. 304 The current checksum types specified by that option are TCP checksum, 305 8-bit Fletcher's algorithm and 16-bit Fletcher's algorithm. A new 306 checksum type can be added to this list for CRC-32C checksums. The 307 negotiation rules for selecting the checksum type would follow the 308 rules described in RFC1146. That is, if both SYN segments carry the 309 Alternate Checksum Request option, and both specify the same 310 algorithm, that algorithm must be used for the remainder of the 311 connection. Otherwise, the standard TCP checksum must be used for 312 the entire connection. 314 Once the CRC 32C checksum algorithm is negotiated, the TCP Alternate 315 Checksum Data Option is sent whose data will equal 4 bytes for the 316 CRC-32C checksum. 318 TCP Alternate Checksum Request Option 319 +-----------+------------+-----------------+ 320 | Kind = 14 | Length = 3 | Value = CRC-32C | 321 |-----------+------------+-----------------+ 323 Here the value for CRC32C would need to be defined, and may possibly 324 be the next undefined value '3', following the definitions for 8-bit 325 and 16-bit fletcher's algorithms. 327 TCP Alternate Checksum Data Option 328 +-----------+------------+--------------------------------+ 329 | Kind = 15 | Length = 6 | Value = CRC-32C computed value | 330 |-----------+------------+--------------------------------+ 332 The TCP Alternate Checksum Data Option must be sent only during the 333 connection transfer and tear down phase. Again, the 16-bit TCP 334 checksum field must be zeroed out before computing the 32-bit CRC 32C 335 code. 337 One or more padding bytes may be used when sending any of the above 338 options to align to a 4 or 8 byte boundary for faster parsing on both 339 32-bit and 64-bit machines. 341 At this stage of draft development, the author is evaluating and 342 seeking inputs for both approaches. 344 4. IPv6 Considerations 346 The TCP extension for CRC 32C can be applied equally to IPv4 and 347 IPv6. The pseudo header for IPv6 includes 128 bit source and 348 destination addresses. This pseudo header, the TCP header and 349 payload MUST be included in the CRC 32C checksum of a TCP/IPv6 350 segment as there is no IPv6 header checksum. 352 5. Conclusions and Acknowledgements 354 This document proposes the use of stronger error detection codes for 355 TCP connections sending Jumbo Frames. It does not provide a solution 356 for UDP based applications. I would also like to thank Tom Kessler 357 (kessler@netapp.com) for his review comments. He specifically 358 pointed out his concerns about the safety of TCP checksum + Ethernet 359 CRC at 40G and 100G speeds with even 9K jumbo frames. He also 360 provided information on the Intel instruction set that can be used to 361 speed up CRC-32c computation. Special thanks to Janet Takami 362 (jtakami@netapp.com) for her comments as well as for pointing out 363 that there is no IPv6 header checksum and so the pseudo header must 364 be included in the CRC 32c checksum. 366 6. IANA Considerations 368 This memo includes a request to IANA for a new Type Number for the 369 new TCP Checksum Option if we do not go with the TCP Alternate 370 Checksum Option. If we go with the TCP Alternate Checksum option, 371 then a new checksum type will need to be defined for CRC 32C, 372 probably after the defined values for Fletcher's 8-bit and 16-bit 373 algorithm types. 375 7. Security Considerations 377 The CRC 32C codes can detect unintentional changes to data such as 378 those caused by noise. If an attacker changes the data, it can also 379 change the error-detection code to match the changed data. Hence, 380 these codes are not intended for security purposes. 382 8. References 384 8.1. Normative References 386 [RFC1122] IETF, "Requirements for Internet Hosts -- Communication 387 Layers", October 1989. 389 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 390 Requirement Levels", BCP 14, RFC 2119, March 1997. 392 8.2. Informative References 394 [Koopman] Koopman, P., "32-Bit Cyclic Redundancy Codes for Internet 395 Applications", 2002. 397 [Stone] Stone, J., Partridge, C., "When the CRC and TCP Checksum 398 Disagree" 400 [RFC1146] Zweig, J., Partridge, C., "TCP Alternate Checksum Options" 401 March 1990. 403 [RFC3385] Sheinwald, D., et. al. "Internet Protocol Small Computer 404 System Interface (iSCSI) Cyclic Redundance Check (CRC)/ 405 Checksum Considerations", September 2002. 407 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 408 September 2007. 410 Author's Address 412 Anumita Biswas 413 NetApp, Inc. 414 495, E. Java Dr 415 Sunnyvale, CA 95054 416 USA 418 Phone: +14088223204 419 Email: anumita.biswas@netapp.com