idnits 2.17.1 draft-herbert-remotecsumoffload-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 12, 2014) is 3447 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-03) exists of draft-herbert-gue-02 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT T. Herbert 3 Intended Status: Experimental Google 4 Expires: May 2015 November 12, 2014 6 Remote checksum offload for encapsulation 7 draft-herbert-remotecsumoffload-01 9 Status of this Memo 11 This Internet-Draft is submitted to IETF in full conformance with the 12 provisions of BCP 78 and BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/1id-abstracts.html 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Copyright and License Notice 32 Copyright (c) 2014 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with respect 40 to this document. Code Components extracted from this document must 41 include Simplified BSD License text as described in Section 4.e of 42 the Trust Legal Provisions and are provided without warranty as 43 described in the Simplified BSD License. 45 Abstract 47 This specification describes remote checksum offload, which is a 48 mechanism that provides checksum offload of transport checksums in 49 encapsulated packets using rudimentary offload capabilities found in 50 most Network Interface Card (NIC) devices. The outer header checksum 51 (e.g. that in UDP or GRE) is enabled in packets and, with some 52 additional meta information, a receiver is able to deduce the 53 checksum to be set in an encapsulated packet. Effectively this 54 offloads the computation of the inner checksum. Enabling the outer 55 checksum in encapsulation has the additional advantage that it covers 56 more of the packet than the inner checksum including the 57 encapsulation headers. 59 Table of Contents 61 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2 Checksum offload background . . . . . . . . . . . . . . . . . . 3 63 2.1 The Internet checksum . . . . . . . . . . . . . . . . . . . 3 64 2.2 Transmit checksum offload . . . . . . . . . . . . . . . . . 4 65 2.2.1 Generic transmit offload . . . . . . . . . . . . . . . . 4 66 2.2.2 Protocol specific transmit offload . . . . . . . . . . . 4 67 2.3 Receive checksum offload . . . . . . . . . . . . . . . . . . 5 68 2.3.1 Checksum-complete . . . . . . . . . . . . . . . . . . . 5 69 2.3.2 Checksum-unnecessary . . . . . . . . . . . . . . . . . . 5 70 2.3.3 Checksum-unnecessary conversion . . . . . . . . . . . . 6 71 3 Remote checksum offload . . . . . . . . . . . . . . . . . . . . 6 72 3.1 Meta data format . . . . . . . . . . . . . . . . . . . . . . 6 73 3.2 Transmitter operation . . . . . . . . . . . . . . . . . . . 6 74 3.3 Receiver operation . . . . . . . . . . . . . . . . . . . . . 7 75 3.4 Interaction with TCP segmentation offload . . . . . . . . . 8 76 4 Remote checksum offload for Generic UDP Encapsulation . . . . . 8 77 5 Security Considerations . . . . . . . . . . . . . . . . . . . . 9 78 6 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 79 7 References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 80 7.1 Normative References . . . . . . . . . . . . . . . . . . . 10 81 7.2 Informative References . . . . . . . . . . . . . . . . . . 10 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 84 1 Introduction 86 Checksum offload is a capability of NICs where the checksum 87 calculation for a transport layer packet (TCP, UDP, etc.) is 88 performed by a device on behalf of the host stack. Checksum offload 89 is applicable to both transmit and receive, where on transmit the 90 device writes the computed checksum into the packet, and on receive 91 the device provides the computed checksum of the packet or an 92 indication that specific transport checksums were validated. This 93 feature saves CPU cycles in the host and has become ubiquitous in 94 modern NICs. 96 A host may both source transport packets and encapsulate them for 97 transit over an underlying network. In this case, checksum offload is 98 still desirable, but now must be done on an encapsulated packet. Many 99 deployed NICs are only capable of providing checksum offload for 100 simple TCP or UDP packets. Such NICs typically use protocol specific 101 mechanisms where they must parse headers in order to perform checksum 102 calculations. Updating these NICs to perform checksum offload for 103 encapsulation requires new parsing logic which is likely infeasible 104 or at cost prohibitive. 106 In this specification we describe an alternative that uses 107 rudimentary NIC offload features to support offloading checksum 108 calculation of encapsulated packets. In this design, the outer 109 checksum is enabled on transmit, and meta information indicating the 110 location of the checksum field being offloaded and its starting point 111 for computation are sent with a packet. On receipt, after the outer 112 checksum is verified, the receiver sets the offloaded checksum field 113 per the computed packet checksum and the meta data. 115 2 Checksum offload background 117 In this section we provide some background into checksum offload 118 operation. 120 2.1 The Internet checksum 122 The Internet checksum [RFC0791] is used by several Internet protocols 123 including IP [RFC1122], TCP [RFC0793], UDP [RFC0768] and GRE 124 [RFC2784]. Efficient checksum calculation is critical to good 125 performance [RFC1071], and the mathematical properties are useful in 126 incrementally updating checksums [RFC1624]. An early approach to 127 implementing checksum offload in hardware is described in [RFC1936]. 129 TCP and UDP checksums cover a pseudo header which is composed of the 130 source and destination addresses of the corresponding IP packet, 131 layer 4 packet length, and protocol. The checksum pseudo header is 132 defined in [RFC0768] and [RFC0793] for IPv4, and in [RFC2460] for 133 IPv6. 135 2.2 Transmit checksum offload 137 In transmit checksum offload, a host networking stack defers the 138 calculation and setting of a transport checksum in the packet to the 139 device. A device may provide checksum offload only for specific 140 protocols, or may provide a generic interface. In either case, only 141 one offloaded checksum per packet is typical. 143 When using transmit checksum offload, a host stack must initialize 144 the checksum field in the packet. This is done by setting to zero 145 (GRE) or to the bitwise "not" of the pseudo header (UDP or TCP). The 146 device proceeds by computing the packet checksum from the start of 147 the transport header through to the end of the packet. The resulting 148 value is written in the checksum field of the transport packet. 150 2.2.1 Generic transmit offload 152 A device can provide a generic interface for transmit checksum 153 offload. Checksum offload is enabled by setting two fields in the 154 transmit descriptor for a packet: start offset and checksum offset. 155 The start offset indicates the byte in the packet where the checksum 156 calculation should start. The checksum offset indicates the offset in 157 the packet where the checksum value is to be written. 159 The generic interface is protocol agnostic, however only supports one 160 offloaded checksum per packet. It is conceivable that a NIC could 161 provide offload for more checksums by defining more than one 162 checksum start, checksum offset pair in the transmit descriptor. 164 2.2.2 Protocol specific transmit offload 166 Some devices support transmit checksum offload for very specific 167 protocols. For instance, many legacy devices can only perform 168 checksum offload for UDP/IP and TCP/IP packets. These devices parse 169 transmitted packets in order to determine the checksum start and 170 checksum offset. They may also ignore the value in the checksum field 171 by setting it to zero for checksum computation and computing the 172 pseudo header checksum themselves. 174 Protocol specific transmit offload is limited to the protocols a 175 device supports. To support checksum offload of an encapsulated 176 packet, a device must be a able to parse the encapsulation layer in 177 order to locate the inner packet. 179 2.3 Receive checksum offload 181 Upon receiving a packet, a device may perform a checksum calculation 182 over the packet or part of the packet depending on the protocol. A 183 result of this calculation is returned in the meta data of the 184 receive descriptor for the packet. The host stack can apply the 185 result in verifying checksums as it processes the packet. The intent 186 is that the offload will obviate the need for the networking stack to 187 perform its own checksum calculation for the packet. 189 There are two basic methods of receive checksum offload: checksum- 190 complete and checksum-unnecessary. 192 2.3.1 Checksum-complete 194 A device may calculate the checksum of a whole packet (layer 2 195 payload) and return the resultant value to the host stack. The host 196 stack can subsequently use this value to validate checksums in the 197 packet. As the packet is parsed through various layers, the 198 calculated checksum is updated to correspond to each layer (subtract 199 out checksum for preceding bytes for a given header). 201 Checksum-complete is protocol agnostic and does not require any 202 protocol awareness in the device. It works for any encapsulation and 203 supports an arbitrary number of checksums in the packet. 205 2.3.2 Checksum-unnecessary 207 A device may explicitly validate a checksum in a packet and return a 208 flag in the receive descriptor that a transport checksum has been 209 verified (host performing checksum computation is unnecessary). Some 210 devices may be capable of validating more than one checksum in the 211 packet, in which case the device returns a count of the number 212 verified. Typically, only a positive signal is returned, if the 213 device was unable to validate a checksum it does not return any 214 information and the host will generally perform its own checksum 215 computation. If a device returns a count of validations, this must 216 refer to consecutive checksums that are present and validated in a 217 packet (checksums cannot be skipped). 219 Checksum-unnecessary is protocol specific, for instance in the case 220 of UDP or TCP a device needs to consider the pseudo header in 221 checksum validation. To support checksum offload of an encapsulated 222 packet, a device must be able to parse the encapsulation layer in 223 order to locate the inner packet. 225 2.3.3 Checksum-unnecessary conversion 227 If a device returns checksum-unnecessary for a non-zero checksum, the 228 checksum-complete value can easily be derived as the bitwise "not" of 229 the pseudo header checksum. This is useful in the case that the 230 device has verified the outermost checksum of the packet, and there 231 are checksums in an encapsulated packet to be verified. 233 3 Remote checksum offload 235 This section describes the remote checksum offload mechanism. This is 236 primarily useful with UDP based encapsulation where the UDP checksum 237 is enabled (not set to zero on transmit). The same technique could be 238 applied to GRE encapsulation where the GRE checksum is enabled. 240 3.1 Meta data format 242 Remote checksum offload requires the sending of meta data with an 243 encapsulated packet. This data is a pair of checksum start and 244 checksum offset values. More than one offloaded checksum could be 245 supported if multiple pairs are sent. 247 Remote checksum offload will typically be implemented as a remote 248 checksum option in the encapsulation headers. Any encapsulation 249 format that allows optional data for extensibility should be able to 250 support remote checksum offload. The format of the remote checksum 251 offload option is diagrammed below. 253 0 1 2 3 254 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 256 | Checksum start | Checksum offset | 257 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 259 o Checksum start: starting offset for checksum computation 260 relative to the start of the encapsulated payload. This is 261 typically the offset of a transport header (e.g. UDP or TCP). 263 o Checksum offset: Offset where the derived checksum value is to 264 be written relative to the start of encapsulated payload. This 265 typically is the offset of the checksum field in the transport 266 header (e.g. UDP or TCP). 268 3.2 Transmitter operation 270 The typical actions to set remote checksum offload on transmit are: 272 1) Transport layer creates a packet and indicates in internal packet 273 meta data that checksum is to be offloaded to the NIC (normal 274 transport layer processing for checksum offload). The checksum 275 field is populated with the bitwise "not" of the checksum of the 276 pseudo header or zero as appropriate. 278 2) Encapsulation layer adds its headers to the packet including the 279 remote checksum offload option. The start offset and checksum 280 offset are set accordingly. 282 3) Encapsulation layer arranges for checksum offload of the outer 283 header checksum (e.g. UDP). This supersedes the settings to 284 offload the inner packet's transport checksum. 286 4) Packet is sent to the NIC. The NIC will perform transmit checksum 287 offload and set the checksum field in the outer header. The inner 288 header and rest of the packet are transmitted without 289 modification. 291 3.3 Receiver operation 293 The typical actions a host receiver does to support remote checksum 294 offload are: 296 1) Receive packet and validate outer checksum following normal 297 processing (e.g. validate non-zero UDP checksum). 299 2) Deduce full checksum for the IP packet. This is directly provided 300 if device returns the packet checksum in checksum-complete or 301 checksum-unnecessary conversion can be done. 303 3) From the packet checksum, subtract the checksum computed from the 304 start of the packet (outer IP header) to the offset in the packet 305 indicted by checksum start in the remote checksum offload option. 306 The result is the deduced checksum to set in the checksum field of 307 the encapsulated transport packet. 309 4) Write the resultant checksum value into the packet at the offset 310 provided by checksum offset in the remote checksum offload option. 312 5) Adjust the packet checksum to account for changing the checksum 313 field within the packet. 315 6) Checksum is verified at the transport layer using normal 316 processing. This should not require any checksum computation over 317 the packet since the complete checksum has already been provided. 319 Steps 3,4, and 5 in pseudo code: 321 packet_csum: checksum computed by receiver covering the start 322 of the packet (outer IP header) to the end of the packet 324 start_of_packet: memory address of start of packet 326 offset_encap_payload: offset of encapsulation payload relative 327 to start_of_packet 329 csum_start, csum_offset: values from remote checksum offload 330 option 332 checksum(start, len): function to compute checksum from start 333 address for len bytes 335 // Compute packet checksum starting from checksum start value 336 // (1's complement arithmetic) 337 csum -= checksum(start_of_packet, 338 offset_encap_payload + csum_start) 340 // Set derived checksum in the checksum field 341 old = *(start_of_packet + offset_encap_payload + csum_offset) 342 *(start_of_packet + offset_encap_payload + csum_offset) = csum 344 // Adjust packet checksum (1's complement arithmetic) 345 packet_csum += (csum - old) 347 3.4 Interaction with TCP segmentation offload 349 Remote checksum offload may be useful with TCP Segmentation Offload 350 (TSO) in order to avoid host checksum calculation at the receiver. 351 This can be implemented on a transmitter as follows: 353 1) Host stack prepares a large segment for transmission including 354 encapsulation headers and the remote checksum option which refers 355 to the encapsulated transport checksum in the large segment. 357 2) TSO is performed by the device taking encapsulation into account. 358 The outer checksum is computed and written for each packet. The 359 inner checksum is not computed, and the encapsulation header 360 (including checksum meta data) is replicated for each packet. 362 3) At the receiver remote checksum offload processing occurs as 363 normal for each packet. 365 4 Remote checksum offload for Generic UDP Encapsulation 367 Remote checksum offload in Generic UDP Encapsulation [GUE] is 368 supported with the addition of a remote checksum option. The GUE 369 header format below illustrates remote checksum option as a private 370 field. 372 0 1 2 3 373 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 375 | Source port | Destination port | 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 377 | Length | Checksum | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 379 |0x0|C| Hlen | Proto/ctype | Flags |P| 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 381 | | 382 ~ Fields (optional) ~ 383 | | 384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 385 |R| Private flags(optional) | 386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 387 | Checksum start | Checksum offset | 388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 389 | | 390 ~ Private fields (optional) ~ 391 | | 392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 394 Pertinent fields are described below: 396 o Hlen: GUE header length. The offset of the encapsulated payload 397 is Hlen * 4 + 4. 399 o P bit: Set to one to indicate presence of private options 401 o R bit: Private flag bit that indicates presence of the remote 402 checksum option. Remote checksum offload is four bytes in length 404 o Checksum start: Offset of start of checksum computation for 405 remote checksum offload. This is relative to the encapsulated 406 payload whose offset is provided by Hlen. 408 o Checksum offset: Offset to write the checksum which is computed 409 by the receiver. This is relative to the encapsulated payload 410 whose offset is provided by Hlen. 412 5 Security Considerations 414 Remote checksum offload should not impact protocol security. 416 6 IANA Considerations 417 There are no IANA considerations in this specification. The remote 418 checksum offload meta data may require an option number or type in 419 specific encapsulation formats that support it. 421 7 References 423 7.1 Normative References 425 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 426 1981. 428 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 429 Communication Layers", STD 3, RFC 1122, October 1989. 431 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 432 793, September 1981. 434 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 435 August 1980. 437 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, 438 "Generic Routing Encapsulation (GRE)", RFC 2784, March 439 2000. 441 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 442 (IPv6) Specification", RFC 2460, December 1998. 444 7.2 Informative References 446 [RFC1071] Braden, R., Borman, D., and C. Partridge, "Computing the 447 Internet checksum", RFC1071, September 1988. 449 [RFC1624] Rijsinghani, A., Ed., "Computation of the Internet Checksum 450 via Incremental Update", RFC1624, May 1994. 452 [RFC1936] Touch, J. and B. Parham, "Implementing the Internet 453 Checksum in Hardware", RFC1936, April 1996. 455 [GUE] Generic UDP Encapsulation draft-herbert-gue-02 457 Authors' Addresses 459 Tom Herbert 460 Google 461 1600 Amphitheatre Parkway 462 Mountain View, CA 463 EMail: therbert@google.com