idnits 2.17.1 draft-herbert-remotecsumoffload-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 29, 2016) is 2978 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'LCO' is mentioned on line 166, but not defined == Unused Reference: 'LOC' is defined on line 447, but no explicit reference was found in the text ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-gue-02 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-01 == Outdated reference: A later version (-28) exists of draft-ietf-sfc-nsh-02 Summary: 2 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT T. Herbert 3 Intended Status: Informational Facebook 5 February 29, 2016 7 Remote checksum offload for encapsulation 8 draft-herbert-remotecsumoffload-02 10 Abstract 12 This document describes remote checksum offload for encapsulation, 13 which is a mechanism that provides checksum offload of encapsulated 14 packets using rudimentary offload capabilities found in most Network 15 Interface Card (NIC) devices. The outer header checksum e.g. that in 16 UDP or GRE) is enabled in packets and, with some additional meta 17 information, a receiver is able to deduce the checksum to be set for 18 an inner encapsulated packet. Effectively this offloads the 19 computation of the inner checksum. Enabling the outer checksum in 20 encapsulation has the additional advantage that it covers more of the 21 packet than the inner checksum including the encapsulation headers. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as 31 Internet-Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/1id-abstracts.html 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html 44 Copyright and License Notice 45 Copyright (c) 2016 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2 Checksum offload background . . . . . . . . . . . . . . . . . . 3 62 2.1 The Internet checksum . . . . . . . . . . . . . . . . . . . 3 63 2.2 Transmit checksum offload . . . . . . . . . . . . . . . . . 4 64 2.2.1 Generic transmit offload . . . . . . . . . . . . . . . . 4 65 2.2.2 Local checksum offload . . . . . . . . . . . . . . . . . 4 66 2.2.3 Protocol specific transmit offload . . . . . . . . . . . 5 67 2.3 Receive checksum offload . . . . . . . . . . . . . . . . . . 5 68 2.3.1 CHECKSUM_COMPLETE . . . . . . . . . . . . . . . . . . . 6 69 2.3.2 CHECKSUM_UNNECESSARY . . . . . . . . . . . . . . . . . . 6 70 3.0 Remote checksum offload . . . . . . . . . . . . . . . . . . 6 71 3.1 Option format . . . . . . . . . . . . . . . . . . . . . . . 6 72 3.2 Transmit operation . . . . . . . . . . . . . . . . . . . . . 7 73 3.3 Receiver operation . . . . . . . . . . . . . . . . . . . . . 8 74 3.4 Interaction with TCP segmentation offload . . . . . . . . . 9 75 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 9 76 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 77 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 78 6.1 Normative References . . . . . . . . . . . . . . . . . . . 9 79 6.2 Informative References . . . . . . . . . . . . . . . . . . 10 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 82 1 Introduction 84 Checksum offload is a capability of NICs where the checksum 85 calculation for a transport layer packet (TCP, UDP, etc.) is 86 performed by a device on behalf of the host stack. Checksum offload 87 is applicable to both transmit and receive, where on transmit the 88 device writes the computed checksum into the packet, and on receive 89 the device provides the computed checksum of the packet or an 90 indication that specific transport checksums were validated. This 91 feature saves CPU cycles in the host and has become ubiquitous in 92 modern NICs. 94 A host may both source transport packets and encapsulate them for 95 transit over an underlying network. In this case checksum offload is 96 still desirable, but now must be done on an encapsulated packet. Many 97 deployed NICs are only capable of providing checksum offload for 98 simple TCP or UDP packets. Such NICs typically use protocol specific 99 mechanisms where they must parse headers in order to perform checksum 100 calculations. Updating these NICs to perform checksum offload for 101 encapsulation requires new parsing logic which is likely infeasible 102 or at cost prohibitive. 104 In this specification we describe an alternative that uses 105 rudimentary NIC offload features to support offloading checksum 106 calculation of encapsulated packets. In this design, the outer 107 checksum is enabled on transmit, and meta information indicating the 108 location of the checksum field being offloaded and its starting point 109 for computation are sent with a packet. On receipt, after the outer 110 checksum is verified, the receiver sets the offloaded checksum field 111 per the computed packet checksum and the meta data. 113 2 Checksum offload background 115 In this section we provide some background into checksum offload 116 operation. 118 2.1 The Internet checksum 120 The Internet checksum [RFC0791] is used by several Internet protocols 121 including IP [RFC1122], TCP [RFC0793], UDP [RFC0768] and GRE 122 [RFC2784]. Efficient checksum calculation is critical to good 123 performance [RFC1071], and the mathematical properties are useful in 124 incrementally updating checksums [RFC1624]. An early approach to 125 implementing checksum offload in hardware is described in [RFC1936]. 127 TCP and UDP checksums cover a pseudo header which is composed of the 128 source and destination addresses of the corresponding IP packet, 129 upper layer packet length, and protocol. The checksum pseudo header 130 is defined in [RFC0768] and [RFC0793] for IPv4, and in [RFC2460] for 131 IPv6. 133 2.2 Transmit checksum offload 135 In transmit checksum offload, a host network stack defers the 136 calculation and setting of a transport checksum in the packet to the 137 device. A device may provide checksum offload only for specific 138 protocols, or may provide a generic interface. In either case, 139 support for only one offloaded checksum per packet is typical. 141 When using transmit checksum offload, a host stack must initialize 142 the checksum field in the packet. This is done by setting to zero 143 (GRE) or to the bitwise not of the pseudo header (UDP or TCP). The 144 device proceeds by computing the packet checksum from the start of 145 the transport header through to the end of the packet. The bitwise 146 not of the resulting value is written in the checksum field of the 147 transport packet. 149 2.2.1 Generic transmit offload 151 A device can provide a generic interface for transmit checksum 152 offload. Checksum offload is enabled by setting two fields in the 153 transmit descriptor for a packet: start offset and checksum offset. 154 The start offset indicates the byte in the packet where the checksum 155 calculation should start. The checksum offset indicates the offset in 156 the packet where the checksum value is to be written. 158 The generic interface is protocol agnostic, however only supports one 159 offloaded checksum per packet. While it is conceivable that a NIC 160 could provide offload for more checksums by defining more than one 161 checksum start/offset pair in the transmit descriptor, a more general 162 and efficient solution is Local Checksum Offload. 164 2.2.2 Local checksum offload 166 Local Checksum Offload [LCO] (or LCO) is a technique for efficiently 167 computing the outer checksum of an encapsulated datagram when the 168 inner checksum is due to be offloaded. The ones-complement sum of a 169 correctly checksummed TCP or UDP packet is equal to the sum of the 170 pseudo header, since everything else gets 'cancelled out' by the 171 checksum field. This property holds since the sum was complemented 172 before being written to the checksum field. More generally, this 173 holds in any case where the Internet one's complement checksum is 174 used, and thus any checksum that generic transmit offload supports. 175 That is, if we have set up transmit checksum offload with a 176 start/offset pair, we know that after the device has filled in that 177 checksum the one's complement sum from checksum start to the end of 178 the packet will be equal to whatever value is set in the checksum 179 field beforehand. This property allows computing the outer checksum 180 without considering at the payload per the algorithm: 182 1) Compute the checksum from the outer packet's checksum start 183 offset to the inner packet's checksum start offset. 185 2) Add the bit-wise not of the pseudo header checksum for the 186 inner packet. 188 3) The result is the checksum from the outer packet's start offset 189 to the end of the packet. Taking into account the pseudo header 190 for the outer checksum allows the outer checksum field to be 191 set without offload processing. 193 Step 1) requires that some checksum calculation is performed on the 194 host stack, however this is only done over some portion of packet 195 headers which is typically much smaller than the payload of the 196 packet. 198 LCO can be used for nested encapsulations; in this case, the outer 199 encapsulation layer will sum over both its own header and the 200 'middle' header. Thus, if the device has the capability to offload 201 an inner checksum in encapsulation, any number of outer checksums can 202 be efficiently calculated using this technique. 204 2.2.3 Protocol specific transmit offload 206 Some devices support transmit checksum offload for very specific 207 protocols. For instance, many legacy devices can only perform 208 checksum offload for UDP/IP and TCP/IP packets. These devices parse 209 transmitted packets in order to determine the checksum start and 210 checksum offset. They may also ignore the value in the checksum field 211 by setting it to zero for checksum computation and computing the 212 checksum of the pseudo header themselves. 214 Protocol specific transmit offload is limited to the protocols a 215 device supports. To support checksum offload of an encapsulated 216 packet, a device must be a able to parse the encapsulation layer in 217 order to locate the inner packet. 219 2.3 Receive checksum offload 221 Upon receiving a packet, a device may perform a checksum calculation 222 over the packet or part of the packet depending on the protocol. A 223 result of this calculation is returned in the meta data of the 224 receive descriptor for the packet. The host stack can apply the 225 result in verifying checksums as it processes the packet. The intent 226 is that the offload will obviate the need for the networking stack to 227 perform its own checksum calculation over the packet. 229 There are two basic methods of receive checksum offload: 230 CHECKSUM_COMPLETE and CHECKSUM_UNNECESSARY. 232 2.3.1 CHECKSUM_COMPLETE 234 A device may calculate the checksum of a whole packet (layer 2 235 payload) and return the resultant value to the host stack. The host 236 stack can subsequently use this value to validate checksums in the 237 packet. As the packet is parsed through various layers, the 238 calculated checksum is updated to correspond to each layer (subtract 239 out checksum for preceding bytes for a given header). 241 CHECKSUM_COMPLETE is protocol agnostic and does not require any 242 protocol awareness in the device. It works for any encapsulation and 243 supports an arbitrary number of checksums in the packet. 245 2.3.2 CHECKSUM_UNNECESSARY 247 A device may explicitly validate a checksum in a packet and return a 248 flag in the receive descriptor that a transport checksum has been 249 verified (host performing checksum computation is unnecessary). Some 250 devices may be capable of validating more than one checksum in the 251 packet, in which case the device returns a count of the number 252 verified. Typically, only a positive signal is returned, if the 253 device was unable to validate a checksum it does not return any 254 information and the host will generally perform its own checksum 255 computation. If a device returns a count of validations, this must 256 refer to consecutive checksums that are present and validated in a 257 packet (checksums cannot be skipped). 259 CHECKSUM_UNNECESSARY is protocol specific, for instance in the case 260 of UDP or TCP a device needs to consider the pseudo header in 261 checksum validation. To support checksum offload of an encapsulated 262 packet, a device must be able to parse the encapsulation layer in 263 order to locate the inner packet. 265 3.0 Remote checksum offload 267 This section describes the remote checksum offload mechanism. This is 268 primarily useful with UDP based encapsulation where the UDP checksum 269 is enabled (not set to zero on transmit). The same technique could be 270 applied to GRE encapsulation where the GRE checksum is enabled. 272 3.1 Option format 273 Remote checksum offload requires the sending of optional data with an 274 encapsulated packet. This data is a pair of checksum start and 275 checksum offset values. More than one offloaded checksum could be 276 supported if multiple pairs are sent. 278 The logical data format for remote checksum offload is: 280 0 1 2 3 281 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 283 | Checksum start | Checksum offset | 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 286 o Checksum start: starting offset for checksum computation 287 relative to the start of the encapsulated packet. This is 288 typically the offset of a transport header (e.g. UDP or TCP). 290 o Checksum offset: Offset relative to the start of the 291 encapsulated packet where the derived checksum value is to be 292 written. This typically is the offset of the checksum field in 293 the transport header (e.g. UDP or TCP). 295 Support for remote checksum offload with specific encapsulation 296 protocols is outside the scope of this document, however any 297 encapsulation format that supports some reasonable form of optional 298 meta data should be amenable. In Generic UDP Encapsulation [GUE] this 299 would entail defining an optional field, in Geneve [GENEVE] a TLV 300 would be defined, for NSH [NSH] the meta data can either be in a 301 service header or within a TLV. In any scenario, what the offsets in 302 the meta data are relative to must be unambiguous. 304 3.2 Transmit operation 306 The typical actions to set remote checksum offload on transmit are: 308 1) Transport layer creates a packet and indicates in internal 309 packet meta data that checksum is to be offloaded to the NIC 310 (normal transport layer processing for checksum offload). The 311 checksum field is populated with the bitwise not of the 312 checksum of the pseudo header or zero as appropriate. 314 2) Encapsulation layer adds its headers to the packet including 315 the offload meta data. The start offset and checksum offset are 316 set accordingly. 318 3) Encapsulation layer arranges for checksum offload of the outer 319 header checksum (e.g. UDP). 321 4) Packet is sent to the NIC. The NIC will perform transmit 322 checksum offload and set the checksum field in the outer 323 header. The inner header and rest of the packet are transmitted 324 without modification. 326 3.3 Receiver operation 328 The typical actions a host receiver does to support remote checksum 329 offload are: 331 1) Receive packet and validate outer checksum following normal 332 processing (e.g. validate non-zero UDP checksum). 334 2) Deduce full checksum for the IP packet. This is directly 335 provided if device returns the packet checksum in 336 CHECKSUM_COMPLETE. If the device returned CHECKSUM_UNNECESSARY, 337 then the complete checksum can be trivially derived as either 338 zero (GRE) or the bitwise not of the outer pseudo header (UDP). 340 3) From the packet checksum, subtract the checksum computed from 341 the start of the packet (outer IP header) to the offset in the 342 packet indicted by checksum start in the meta data. The result 343 is the deduced checksum to set in the checksum field of the 344 encapsulated transport packet. 346 In pseudo code: 348 csum: initialized to checksum computed from start (outer IP 349 header) to the end of the packet 350 start_of_packet: address of start of packet 351 encap_payload_offset: relative to start_of_packet 352 csum_start: value from meta data 353 checksum(start, len): function to compute checksum from start 354 address for len bytes 356 csum -= checksum(start_of_packet, encap_payload_offset + 357 csum_start) 359 4) Write the resultant checksum value into the packet at the 360 offset provided by checksum offset in the meta data. 362 In pseudo code: 364 csum_offset: offset of checksum field 366 *(start_of_packet + encap_payload_offset + 367 csum_offset) = csum 369 5) Checksum is verified at the transport layer using normal 370 processing. This should not require any checksum computation 371 over the packet since the complete checksum has already been 372 provided. 374 3.4 Interaction with TCP segmentation offload 376 Remote checksum offload may be useful with TCP Segmentation Offload 377 (TSO) in order to avoid host checksum calculations at the receiver. 378 This can be implemented on a transmitter as follows: 380 1) Host stack prepares a large segment for transmission including 381 adding of encapsulation headers and the remote checksum option 382 which refers to the encapsulated transport checksum in the 383 large segment. 385 2) TSO is performed by the device taking encapsulation into 386 account. The outer checksum is computed and written for each 387 packet. The inner checksum is not computed, and the 388 encapsulation header (including checksum meta data) is 389 replicated for each packet. 391 3) At the receiver remote checksum offload processing occurs as 392 normal for each packet. 394 4 Security Considerations 396 Remote checksum offload should not impact protocol security. 398 5 IANA Considerations 400 There are no IANA considerations in this specification. The remote 401 checksum offload meta data may require an option number or type in 402 specific encapsulation formats that support it. 404 6 References 406 6.1 Normative References 408 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 409 1981. 411 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 412 Communication Layers", STD 3, RFC 1122, October 1989. 414 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 415 793, September 1981. 417 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 418 August 1980. 420 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, 421 "Generic Routing Encapsulation (GRE)", RFC 2784, March 422 2000. 424 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 425 (IPv6) Specification", RFC 2460, December 1998. 427 6.2 Informative References 429 [RFC1071] Braden, R., Borman, D., and C. Partridge, "Computing the 430 Internet checksum", RFC1071, September 1988. 432 [RFC1624] Rijsinghani, A., Ed., "Computation of the Internet Checksum 433 via Incremental Update", RFC1624, May 1994. 435 [RFC1936] Touch, J. and B. Parham, "Implementing the Internet 436 Checksum in Hardware", RFC1936, April 1996. 438 [GUE] Herbert, T., Yong, L, and Zia, O., "Generic UDP 439 Encapsulation". draft-ietf-nvo3-gue-02 441 [GENEVE] Gross, J. and Gango, I., "Geneve: Generic Network 442 Virtualization Encapsulation", draft-ietf-nvo3-geneve-01, 443 January 1, 2016 444 [NSH] Quinn, P. and Elzur, U., "Network Service Header", draft- 445 ietf-sfc-nsh-02.txt, January 19,2016 447 [LOC] Cree, E. Checksum Offloads in the Linux Networking Stack, 448 Linux documentation: 449 Documentation/networking/checksum-offloads.txt 451 Authors' Addresses 453 Tom Herbert 454 Facebook 455 1 Hacker Way 456 Menlo Park, CA 457 US 459 EMail: tom@herbertland.com