idnits 2.17.1 draft-herbert-remotecsumoffload-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 27, 2014) is 3530 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) == Outdated reference: A later version (-03) exists of draft-herbert-gue-01 == Outdated reference: A later version (-02) exists of draft-gross-geneve-01 == Outdated reference: A later version (-07) exists of draft-quinn-sfc-nsh-03 Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT T. Herbert 3 Intended Status: Informational Google 4 Expires: February 2015 August 27, 2014 6 Remote checksum offload for encapsulation 7 draft-herbert-remotecsumoffload-00 9 Status of this Memo 11 This Internet-Draft is submitted to IETF in full conformance with the 12 provisions of BCP 78 and BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/1id-abstracts.html 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Copyright and License Notice 32 Copyright (c) 2014 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with respect 40 to this document. Code Components extracted from this document must 41 include Simplified BSD License text as described in Section 4.e of 42 the Trust Legal Provisions and are provided without warranty as 43 described in the Simplified BSD License. 45 Abstract 47 This specification describes remote checksum offload for 48 encapsulation, which is a mechanism that provides checksum offload of 49 encapsulated packets using rudimentary offload capabilities found in 50 most Network Interface Card (NIC) devices. The outer header checksum 51 (e.g. that in UDP or GRE) is enabled in packets and, with some 52 additional meta information, a receiver is able to deduce the 53 checksum to be set for an inner encapsulated packet. Effectively this 54 offloads the computation of the inner checksum. Enabling the outer 55 checksum in encapsulation has the additional advantage that it covers 56 more of the packet than the inner checksum including the 57 encapsulation headers. 59 Table of Contents 61 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2 Checksum offload background . . . . . . . . . . . . . . . . . . . 3 63 2.1 The Internet checksum . . . . . . . . . . . . . . . . . . . 3 64 2.2 Transmit checksum offload . . . . . . . . . . . . . . . . . 4 65 2.2.1 Generic transmit offload . . . . . . . . . . . . . . . 4 66 2.2.2 Protocol specific transmit offload . . . . . . . . . . 4 67 2.3 Receive checksum offload . . . . . . . . . . . . . . . . . . 5 68 2.3.1 CHECKSUM_COMPLETE . . . . . . . . . . . . . . . . . . . 5 69 2.3.2 CHECKSUM_UNNECESSARY . . . . . . . . . . . . . . . . . 5 70 3 Remote checksum offload . . . . . . . . . . . . . . . . . . . . . 5 71 3.1 Meta data format . . . . . . . . . . . . . . . . . . . . . . 6 72 3.2 Transmit operation . . . . . . . . . . . . . . . . . . . . . 6 73 3.3 Receiver operation . . . . . . . . . . . . . . . . . . . . . 7 74 3.4 Interaction with TCP segmentation offload . . . . . . . . . 8 75 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 8 76 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 77 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 78 6.1 Normative References . . . . . . . . . . . . . . . . . . . 8 79 6.2 Informative References . . . . . . . . . . . . . . . . . . 9 80 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 82 1 Introduction 84 Checksum offload is a capability of NICs where the checksum 85 calculation for a transport layer packet (TCP, UDP, etc.) is 86 performed by a device on behalf of the host stack. Checksum offload 87 is applicable to both transmit and receive, where on transmit the 88 device writes the computed checksum into the packet, and on receive 89 the device provides the computed checksum of the packet or an 90 indication that specific transport checksums were validated. This 91 feature saves CPU cycles in the host and has become ubiquitous in 92 modern NICs. 94 A host may both source transport packets and encapsulate them for 95 transit over an underlying network. In this case, checksum offload is 96 still desirable, but now must be done on an encapsulated packet. Many 97 deployed NICs are only capable of providing checksum offload for 98 simple TCP or UDP packets. Such NICs typically use protocol specific 99 mechanisms where they must parse headers in order to perform checksum 100 calculations. Updating these NICs to perform checksum offload for 101 encapsulation requires new parsing logic which is likely infeasible 102 or at cost prohibitive. 104 In this specification we describe an alternative that uses 105 rudimentary NIC offload features to support offloading checksum 106 calculation of encapsulated packets. In this design, the outer 107 checksum is enabled on transmit, and meta information indicating the 108 location of the checksum field being offloaded and its starting point 109 for computation are sent with a packet. On receipt, after the outer 110 checksum is verified, the receiver sets the offloaded checksum field 111 per the computed packet checksum and the meta data. 113 2 Checksum offload background 115 In this section we provide some background into checksum offload 116 operation. 118 2.1 The Internet checksum 120 The Internet checksum [RFC0791] is used by several Internet protocols 121 including IP [RFC1122], TCP [RFC0793], UDP [RFC0768] and GRE 122 [RFC2784]. Efficient checksum calculation is critical to good 123 performance [RFC1071], and the mathematical properties are useful in 124 incrementally updating checksums [RFC1624]. An early approach to 125 implementing checksum offload in hardware is described in [RFC1936]. 127 TCP and UDP checksums cover a pseudo header which is composed of the 128 source and destination addresses of the corresponding IP packet, 129 upper layer packet length, and protocol. The checksum pseudo header 130 is defined in [RFC0768] and [RFC0793] for IPv4, and in [RFC2460] for 131 IPv6. 133 2.2 Transmit checksum offload 135 In transmit checksum offload, a host networking stack defers the 136 calculation and setting of a transport checksum in the packet to the 137 device. A device may provide checksum offload only for specific 138 protocols, or may provide a generic interface. In either case, only 139 one offloaded checksum per packet is typical. 141 When using transmit checksum offload, a host stack must initialize 142 the checksum field in the packet. This is done by setting to zero 143 (GRE) or to the bitwise not of the pseudo header (UDP or TCP). The 144 device proceeds by computing the packet checksum from the start of 145 the transport header through to the end of the packet. The bitwise 146 not of the resulting value is written in the checksum field of the 147 transport packet. 149 2.2.1 Generic transmit offload 151 A device can provide a generic interface for transmit checksum 152 offload. Checksum offload is enabled by setting two fields in the 153 transmit descriptor for a packet: start offset and checksum offset. 154 The start offset indicates the byte in the packet where the checksum 155 calculation should start. The checksum offset indicates the offset in 156 the packet where the checksum value is to be written. 158 The generic interface is protocol agnostic, however only supports one 159 offloaded checksum per packet. It is conceivable that a NIC could 160 provide offload for more checksums by defining more than one 161 checksum start, checksum offset pair in the transmit descriptor. 163 2.2.2 Protocol specific transmit offload 165 Some devices support transmit checksum offload for very specific 166 protocols. For instance, many legacy devices can only perform 167 checksum offload for UDP/IP and TCP/IP packets. These devices parse 168 transmitted packets in order to determine the checksum start and 169 checksum offset. They may also ignore the value in the checksum field 170 by setting it to zero for checksum computation and computing the 171 checksum of the pseudo header themselves. 173 Protocol specific transmit offload is limited to the protocols a 174 device supports. To support checksum offload of an encapsulated 175 packet, a device must be a able to parse the encapsulation layer in 176 order to locate the inner packet. 178 2.3 Receive checksum offload 180 Upon receiving a packet, a device may perform a checksum calculation 181 over the packet or part of the packet depending on the protocol. A 182 result of this calculation is returned in the meta data of the 183 receive descriptor for the packet. The host stack can apply the 184 result in verifying checksums as it processes the packet. The intent 185 is that the offload will obviate the need for the networking stack to 186 perform its own checksum calculation over the packet. 188 There are two basic methods of receive checksum offload: 189 CHECKSUM_COMPLETE and CHECKSUM_UNNECESSARY. 191 2.3.1 CHECKSUM_COMPLETE 193 A device may calculate the checksum of a whole packet (layer 2 194 payload) and return the resultant value to the host stack. The host 195 stack can subsequently use this value to validate checksums in the 196 packet. As the packet is parsed through various layers, the 197 calculated checksum is updated to correspond to each layer (subtract 198 out checksum for preceding bytes for a given header). 200 CHECKSUM_COMPLETE is protocol agnostic and does not require any 201 protocol awareness in the device. It works for any encapsulation and 202 supports an arbitrary number of checksums in the packet. 204 2.3.2 CHECKSUM_UNNECESSARY 206 A device may explicitly validate a checksum in a packet and return a 207 flag in the receive descriptor that a transport checksum has been 208 verified (host performing checksum computation is unnecessary). Some 209 devices may be capable of validating more than one checksum in the 210 packet, in which case the device returns a count of the number 211 verified. Typically, only a positive signal is returned, if the 212 device was unable to validate a checksum it does not return any 213 information and the host will generally perform its own checksum 214 computation. If a device returns a count of validations, this must 215 refer to consecutive checksums that are present and validated in a 216 packet (checksums cannot be skipped). 218 CHECKSUM_UNNECESSARY is protocol specific, for instance in the case 219 of UDP or TCP a device needs to consider the pseudo header in 220 checksum validation. To support checksum offload of an encapsulated 221 packet, a device must be able to parse the encapsulation layer in 222 order to locate the inner packet. 224 3 Remote checksum offload 225 This section describes the remote checksum offload mechanism. This is 226 primarily useful with UDP based encapsulation where the UDP checksum 227 is enabled (not set to zero on transmit). The same technique could be 228 applied to GRE encapsulation where the GRE checksum is enabled. 230 3.1 Meta data format 232 Remote checksum offload requires the sending of meta data with an 233 encapsulated packet. This data is a pair of checksum start and 234 checksum offset values. More than one offloaded checksum could be 235 supported if multiple pairs are sent. 237 The meta data format for remote checksum offload is: 239 0 1 2 3 240 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 242 | Checksum start | Checksum offset | 243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 245 o Checksum start: starting offset for checksum computation relative 246 to the start of the encapsulation header. This is typically the 247 offset of a transport header (e.g. UDP or TCP). 249 o Checksum offset: Offset relative to the start of the encapsulation 250 header where the derived checksum value is to be written. This 251 typically is the offset of the checksum field in the transport header 252 (e.g. UDP or TCP). 254 Support for remote checksum offload with specific encapsulation 255 protocols is outside the scope of this document, however any 256 encapsulation format that supports some reasonable form of optional 257 meta data should be amenable. In Generic UDP Encapsulation [GUE] this 258 would entail defining an optional field, in Geneve [GENEVE] a TLV 259 would be defined, for NSH [NSH] the meta data can either be in a 260 service header or within a TLV. In any scenario, what the offsets in 261 the meta data are relative to must be unambiguous (for instance when 262 used in NSH the offsets may be relative to the NSH header itself). 264 3.2 Transmit operation 266 The typical actions to set remote checksum offload on transmit are: 268 1) Transport layer creates a packet and indicates in internal packet 269 meta data that checksum is to be offloaded to the NIC (normal 270 transport layer processing for checksum offload). The checksum field 271 is populated with the bitwise not of the checksum of the pseudo 272 header or zero as appropriate. 274 2) Encapsulation layer adds its headers to the packet including the 275 offload meta data. The start offset and checksum offset are set 276 accordingly. 278 3) Encapsulation layer arranges for checksum offload of the outer 279 header checksum (e.g. UDP). 281 4) Packet is sent to the NIC. The NIC will perform transmit checksum 282 offload and set the checksum field in the outer header. The inner 283 header and rest of the packet are transmitted without modification. 285 3.3 Receiver operation 287 The typical actions a host receiver does to support remote checksum 288 offload are: 290 1) Receive packet and validate outer checksum following normal 291 processing (e.g. validate non-zero UDP checksum). 293 2) Deduce full checksum for the IP packet. This is directly provided 294 if device returns the packet checksum in CHECKSUM_COMPLETE. If the 295 device returned CHECKSUM_UNNECESSARY, then the complete checksum can 296 be trivially derived as either zero (GRE) or the bitwise not of the 297 outer pseudo header (UDP). 299 3) From the packet checksum, subtract the checksum computed from the 300 start of the packet (outer IP header) to the offset in the packet 301 indicted by checksum start in the meta data. The result is the 302 deduced checksum to set in the checksum field of the encapsulated 303 transport packet. 305 In pseudo code: 307 csum: initialized to checksum computed from start (outer IP header) to 308 the end of the packet 309 start_of_packet: address of start of packet 310 offset_of_encap_hdr: relative to start_of_packet 311 csum_start: value from meta data 312 checksum(start, len): function to compute checksum from start address 313 for len bytes 315 csum -= checksum(start_of_packet, offset_of_encap_hdr + csum_start) 317 4) Write the resultant checksum value into the packet at the offset 318 provided by checksum offset in the meta data. 320 In pseudo code: 322 csum_offset: offset of checksum field 324 *(start_of_packet + offset_of_encap_hdr + csum_offset) = csum 326 5) Checksum is verified at the transport layer using normal 327 processing. This should not require any checksum computation over the 328 packet since the complete checksum has already been provided. 330 3.4 Interaction with TCP segmentation offload 332 Remote checksum offload may be useful with TCP Segmentation Offload 333 (TSO) in order to avoid host checksum calculations at the receiver. 334 This can be implemented on a transmitter as follows: 336 1) Host stack prepares a large segment for transmission including 337 adding of encapsulation headers and the remote checksum option which 338 refers to the encapsulated transport checksum in the large segment. 340 2) TSO is performed by the device taking encapsulation into account. 341 The outer checksum is computed and written for each packet. The inner 342 checksum is not computed, and the encapsulation header (including 343 checksum meta data) is replicated for each packet. 345 3) At the receiver remote checksum offload processing occurs as 346 normal for each packet. 348 4 Security Considerations 350 Remote checksum offload should not impact protocol security. 352 5 IANA Considerations 354 There are no IANA considerations in this specification. The remote 355 checksum offload meta data may require an option number or type in 356 specific encapsulation formats that support it. 358 6 References 360 6.1 Normative References 362 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 363 1981. 365 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 366 Communication Layers", STD 3, RFC 1122, October 1989. 368 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 369 793, September 1981. 371 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 372 August 1980. 374 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, 375 "Generic Routing Encapsulation (GRE)", RFC 2784, March 376 2000. 378 [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 379 (IPv6) Specification", RFC 2460, December 1998. 381 6.2 Informative References 383 [RFC1071] Braden, R., Borman, D., and C. Partridge, "Computing the 384 Internet checksum", RFC1071, September 1988. 386 [RFC1624] Rijsinghani, A., Ed., "Computation of the Internet Checksum 387 via Incremental Update", RFC1624, May 1994. 389 [RFC1936] Touch, J. and B. Parham, "Implementing the Internet 390 Checksum in Hardware", RFC1936, April 1996. 392 [GUE] Generic UDP Encapsulation draft-herbert-gue-01 394 [GENEVE] Geneve: Generic Network Virtualization Encapsulation draft- 395 gross-geneve-01 397 [NSH] Network Service Header draft-quinn-sfc-nsh-03 399 Authors' Addresses 401 Tom Herbert 402 Google 403 1600 Amphitheatre Parkway 404 Mountain View, CA 405 EMail: therbert@google.com