idnits 2.17.1 draft-ietf-nvo3-gue-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 326: '...and decapsulator MUST agree on the mea...' RFC 2119 keyword, line 331: '... GUE packet with private data, it MUST...' RFC 2119 keyword, line 333: '...capsulator the packet MUST be dropped....' RFC 2119 keyword, line 335: '...ntics the packet MUST also be dropped....' RFC 2119 keyword, line 445: '...o a decapsulator MUST NOT be ignored. ...' (18 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 24, 2015) is 3223 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC478' is mentioned on line 486, but not defined == Missing Reference: 'RFC6935' is mentioned on line 520, but not defined == Missing Reference: 'GUECSUM' is mentioned on line 533, but not defined == Missing Reference: 'RFC768' is mentioned on line 568, but not defined == Missing Reference: 'RFC1122' is mentioned on line 556, but not defined == Missing Reference: 'RFC2460' is mentioned on line 568, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC5405' is mentioned on line 604, but not defined ** Obsolete undefined reference: RFC 5405 (Obsoleted by RFC 8085) == Missing Reference: 'RFC2473' is mentioned on line 742, but not defined == Unused Reference: 'RFC2434' is defined on line 828, but no explicit reference was found in the text == Unused Reference: 'RFC5925' is defined on line 886, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-19) exists of draft-ietf-tsvwg-gre-in-udp-encap-06 == Outdated reference: A later version (-03) exists of draft-hy-gue-4-secure-transport-02 == Outdated reference: A later version (-02) exists of draft-herbert-remotecsumoffload-00 Summary: 6 errors (**), 0 flaws (~~), 14 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Virtualization Overlays (nvo3) T. Herbert 3 Internet-Draft Facebook 4 Intended status: Standard track L. Yong 5 Expires December 14, 2015 Huawei USA 6 O. Zia 7 Microsoft 8 June 24, 2015 10 Generic UDP Encapsulation 11 draft-ietf-nvo3-gue-01 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on September 7, 2015. 36 Copyright Notice 38 Copyright (c) 2014 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of arbitrary IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Packet formats . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2.1. GUE version . . . . . . . . . . . . . . . . . . . . . . . . 5 76 2.2. GUE header . . . . . . . . . . . . . . . . . . . . . . . . 6 77 2.3. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 7 78 2.4. Flags and optional fields . . . . . . . . . . . . . . . . . 8 79 2.5. Private data . . . . . . . . . . . . . . . . . . . . . . . 8 80 3. Message types . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 3.1. Control messages . . . . . . . . . . . . . . . . . . . . . 9 82 3.2. Data messages . . . . . . . . . . . . . . . . . . . . . . . 9 83 4. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 84 4.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 10 85 4.2. Transport layer encapsulation . . . . . . . . . . . . . . . 10 86 4.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 10 87 4.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 11 88 4.5. Router and switch operation . . . . . . . . . . . . . . . . 11 89 4.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 12 90 4.7. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 91 4.8. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 12 92 4.8.1. Checksum requirements . . . . . . . . . . . . . . . . . 12 93 4.8.2. GUE header checksum . . . . . . . . . . . . . . . . . . 13 94 4.8.3. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 13 95 4.8.4. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 14 96 4.9. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 14 97 4.10. Congestion control . . . . . . . . . . . . . . . . . . . . 14 98 4.11. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 15 99 5. Inner flow identifier properties . . . . . . . . . . . . . . . 15 100 5.1. Flow classification . . . . . . . . . . . . . . . . . . . . 15 101 5.2. Inner flow identifier properties . . . . . . . . . . . . . 16 102 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 17 103 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 18 104 8. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 18 105 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 106 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 107 10.1. Normative References . . . . . . . . . . . . . . . . . . . 19 108 10.2. Informative References . . . . . . . . . . . . . . . . . . 20 109 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 21 110 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 21 111 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 22 112 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 22 113 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 23 114 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 23 115 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 24 116 Appendix B: Privileged ports . . . . . . . . . . . . . . . . . . . 25 117 Appendix C: Inner flow identifier as a route selector . . . . . . 25 118 Appendix D: Hardware protocol implementation considerations . . . 25 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 121 1. Introduction 123 This specification describes Generic UDP Encapsulation (GUE) which is 124 a general method for encapsulating packets of arbitrary IP protocols 125 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 126 packets in UDP facilitates efficient transport across networks. 127 Networking devices widely provide protocol specific processing and 128 optimizations for UDP (as well as TCP) packets. Packets for atypical 129 IP protocols (those not usually parsed by networking hardware) can be 130 encapsulated in UDP packets to maximize deliverability and to 131 leverage flow specific mechanisms for routing and packet steering. 133 GUE provides an extensible header format for including optional data 134 in the encapsulation header. This data potentially covers items such 135 as virtual networking identifier, security data for validating or 136 authenticating the GUE header, congestion control data, etc. GUE also 137 allows private optional data in the encapsulation header. This 138 feature can be used by a site or implementation to define local 139 custom optional data, and allows experimentation of options that may 140 eventually become standard. 142 2. Packet formats 144 A GUE packet is comprised of a UDP packet whose payload is a GUE 145 header followed by a payload which is either an encapsulated packet 146 of some IP protocol or a control message (like an OAM message). A GUE 147 packet has the general format: 149 +-------------------------------+ 150 | | 151 | UDP/IP header | 152 | | 153 |-------------------------------| 154 | | 155 | GUE Header | 156 | | 157 |-------------------------------| 158 | | 159 | Encapsulated packet | 160 | or control message | 161 | | 162 +-------------------------------+ 164 The GUE header is variable length as determined by the presence of 165 optional fields. 167 2.1. GUE version 169 The first two bits of the GUE header contain the GUE protocol version 170 number. The rest of the fields after the GUE version number are 171 defined based on the version number. The remainder of this 172 specification describes version 0x0 of GUE. 174 2.2. GUE header 176 The header format for version 0x0 of GUE in UDP is: 178 0 1 2 3 179 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 181 | Source port | Destination port | 182 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 183 | Length | Checksum | 184 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 185 |0x0|C| Hlen | Proto/ctype | Flags |E| 186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 187 | | 188 ~ Fields (optional) ~ 189 | | 190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 191 | Extension flags (optional) | 192 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 | | 194 ~ Extension fields (optional) ~ 195 | | 196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 197 | | 198 ~ Private data (optional) ~ 199 | | 200 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 202 The contents of the UDP header are: 204 o Source port (inner flow identifier): This should be set to a 205 value that represents the encapsulated flow. The properties of 206 the inner flow identifier are described below. 208 o Destination port: The GUE assigned port number, 6080. 210 o Length: Canonical length of the UDP packet (length of UDP header 211 and payload). 213 o Checksum: Standard UDP checksum (see section 4). 215 The GUE header consists of: 217 o Ver: GUE protocol version (0x0). 219 o C: Control flag. When set indicates a control message, not set 220 indicates a data message. 222 o Hlen: Length in 32-bit words of the GUE header, including 223 optional fields but not the first four bytes of the header. 224 Computed as (header_len - 4) / 4. All GUE headers are a multiple 225 of four bytes in length. Maximum header length is 128 bytes. 227 o Proto/ctype: When the C bit is set this field contains a control 228 message type for the payload. When C bit is not set, the field 229 holds the IP protocol number for the encapsulated packet in the 230 payload. The control message or encapsulated packet begins at 231 the offset provided by Hlen. 233 o Flags. Header flags that may be allocated for various purposes 234 and may indicate presence of optional fields. Undefined header 235 flag bits must be set to zero on transmission. 237 o 'E' Extension flag. Indicates presence of extension flags option 238 in the optional fields. 240 o Fields: Optional fields whose presence is indicated by 241 corresponding flags. 243 o Extension flags: An optional field indicated by the E bit. This 244 field provides an additional set of thirty-two bits for flags. 246 o Extension fields: Optional fields whose presence is indicated by 247 corresponding extension flags. 249 o Private data: Optional private data. If private data is present 250 it immediately follows that last field present in the header. 251 The length of this data is determined by subtracting the 252 starting offset from the header length. 254 2.3. Proto/ctype field 256 When the C bit is not set, the proto/ctype field must be set to a 257 valid IP protocol number. The IP protocol number serves as an 258 indication of the type of next protocol header which is contained in 259 the GUE payload at the offset indicated in Hlen. Intermediate devices 260 may parse the GUE payload per the IP protocol number in the 261 proto/ctype field, and header flags cannot affect the interpretation 262 of the proto/ctype field. 264 IP protocol number 59 ("No next header") may be set to indicate that 265 the GUE payload does not begin with the header of an IP protocol. 266 This would be the case, for instance, if the GUE payload were a 267 fragment when performing GUE level fragmentation. The interpretation 268 of the payload is performed though other means (such as flags and 269 optional fields), and intermediate devices must not parse packets the 270 packet based on the IP protocol number in this case. 272 When the C bit is set, the proto/ctype field must be set to a valid 273 control message type. A value of zero indicates that the GUE payload 274 requires further interpretation to deduce the control type. This 275 might be the case when the payload is a fragment of a control 276 message, where only the reassembled packet can be interpreted as a 277 control message. 279 2.4. Flags and optional fields 281 Flags and associated optional fields are the primary mechanism of 282 extensibility in GUE. There are sixteen flag bits in the primary GUE 283 header with one being reserved to indicate that an optional extension 284 flags field is present. The extension flags field contains an 285 additional thirty-two flag bits. 287 A flag may indicate presence of optional fields. The size of an 288 optional field indicated by a flag must be fixed. 290 Flags may be paired together to allow different lengths for an 291 optional field. For example, if two flag bits are paired, a field may 292 possibly be three different lengths. Regardless of how flag bits may 293 be paired, the lengths and offsets of optional fields corresponding 294 to a set of flags must be well defined. 296 Optional fields are placed in order of the flags. New flags should be 297 allocated from high to low order bit contiguously without holes. 298 Flags allow random access, for instance to inspect the field 299 corresponding to the Nth flag bit, an implementation only considers 300 the previous N-1 flags to determine the offset. Flags after the Nth 301 flag are not pertinent in calculating the offset of the Nth flag. 303 Flags (or paired flags) are idempotent such that new flags should not 304 cause reinterpretation of old flags. Also, new flags should not alter 305 interpretation of other elements in the GUE header nor how the 306 message is parsed (for instance, in a data message the proto/ctype 307 field always holds an IP protocol number as an invariant). 309 2.5. Private data 311 An implementation may use private data for its own use. The private 312 data immediately follows the last field in the GUE header and is not 313 a fixed length. This data is considered part of the GUE header and 314 must be accounted for in header length (Hlen). The length of the 315 private data must be a multiple of four and is determined by 316 subtracting the offset of private data in the GUE header from the 317 header length. Specifically: 319 Private_length = (Hlen * 4) - Length(flags) 321 Where "Length(flags)" returns the sum of lengths of all the optional 322 fields present in the GUE header. When there is no private data 323 present, length of the private data is zero. 325 The semantics and interpretation of private data are implementation 326 specific. An encapsulator and decapsulator MUST agree on the meaning 327 of private data before using it. The private data may be structured 328 as necessary, for instance it might contain its own set of flags and 329 optional fields. 331 If a decapsulator receives a GUE packet with private data, it MUST 332 validate the private data appropriately. If a decapsulator does not 333 expect private data from an encapsulator the packet MUST be dropped. 334 If a decapsulator cannot validate the contents of private data per 335 the provided semantics the packet MUST also be dropped. An 336 implementation may place security data in GUE private data which must 337 be verified for packet acceptance. 339 3. Message types 341 3.1. Control messages 343 Control messages are indicated in the GUE header when the C bit is 344 set. The payload is interpreted as a control message with type 345 specified in the proto/ctype field. The format and contents of the 346 control message are indicated by the type and can be variable length. 348 Other than interpreting the proto/ctype field as a control message 349 type, the meaning and semantics of the rest of the elements in the 350 GUE header are the same as that of data messages. Forwarding and 351 routing of control messages should be the same as that of a data 352 message with the same outer IP and UDP header and GUE flags-- this 353 ensures that control messages can be created that follow the same 354 path as data messages. 356 Control messages can be defined for OAM type messages. For instance, 357 an echo request and corresponding echo reply message may be defined 358 to test for liveness. 360 3.2. Data messages 362 Data messages are indicated in GUE header with C bit not set. The 363 payload of a data message is interpreted as an encapsulated packet of 364 an IP protocol indicated in the proto/ctype field. The packet 365 immediately follows the GUE header. 367 Data messages are a primary means of encapsulation and can be used to 368 create tunnels for overlay networks. 370 4. Operation 372 The figure below illustrates the use of GUE encapsulation between two 373 servers. Sever 1 is sending packets to server 2. An encapsulator 374 performs encapsulation of packets from server 1. These encapsulated 375 packets traverse the network as UDP packets. At the decapsulator, 376 packets are decapsulated and sent on to server 2. Packet flow in the 377 reverse direction need not be symmetric; GUE encapsulation is not 378 required in the reverse path. 380 +---------------+ +---------------+ 381 | | | | 382 | Server 1 | | Server 2 | 383 | | | | 384 +---------------+ +---------------+ 385 | ^ 386 V | 387 +---------------+ +---------------+ +---------------+ 388 | | | | | | 389 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 390 | | | Network | | | 391 +---------------+ +---------------+ +---------------+ 393 The encapsulator and decapsulator may be co-resident with the 394 corresponding servers, or may be on separate nodes in the network. 396 4.1. Network tunnel encapsulation 398 Network tunneling can be achieved by encapsulating layer 2 or layer 3 399 packets. In this case the encapsulator and decapsulator nodes are the 400 tunnel endpoints. These could be routers that provide network tunnels 401 on behalf of communicating servers. 403 4.2. Transport layer encapsulation 405 When encapsulating layer 4 packets, the encapsulator and decapsulator 406 should be co-resident with the servers. In this case, the 407 encapsulation headers are inserted between the IP header and the 408 transport packet. The addresses in the IP header refer to both the 409 endpoints of the encapsulation and the endpoints for terminating the 410 the transport protocol. 412 4.3. Encapsulator operation 414 Encapsulators create GUE data messages, set the source port to the 415 inner flow identifier, set flags and optional fields in the GUE 416 header, and forward packets to a decapsulator. 418 An encapsulator may be an end host originating the packets of a flow, 419 or may be a network device performing encapsulation on behalf of 420 servers (routers implementing tunnels for instance). In either case, 421 the intended target (decapsulator) is indicated by the outer 422 destination IP address. 424 If an encapsulator is tunneling packets, that is encapsulating 425 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 426 tunnel mode), it should follow standard conventions for tunneling of 427 one IP protocol over another. Diffserv interaction with tunnels is 428 described in [RFC2983], ECN propagation for tunnels is described in 429 [RFC6040]. 431 4.4. Decapsulator operation 433 A decapsulator performs decapsulation of GUE packets. A decapsulator 434 is addressed by the outer destination IP address of a GUE packet. 435 The decapsulator validates packets, including fields of the GUE 436 header. If a packet is acceptable, the UDP and GUE headers are 437 removed and the packet is resubmitted for IP protocol processing or 438 control message processing if it is a control message. 440 If a decapsulator receives a GUE packet with an unsupported version, 441 unknown flag, bad header length (too small for included optional 442 fields), unknown control message type, or an otherwise malformed 443 header, it must drop the packet and may log the event. No error 444 message is returned back to the encapsulator. Note that set flags in 445 GUE that are unknown to a decapsulator MUST NOT be ignored. If a GUE 446 packet is received by a decapsulator with unknown flags, the packet 447 MUST be dropped. 449 4.5. Router and switch operation 451 Routers and switches should forward GUE packets as standard UDP/IP 452 packets. The outer five-tuple should contain sufficient information 453 to perform flow classification corresponding to the flow of the inner 454 packet. A switch should not normally need to parse a GUE header, and 455 none of the flags or optional fields in the GUE header should affect 456 routing. 458 A router should not modify a GUE header when forwarding a packet. It 459 may encapsulate a GUE packet in another GUE packet, for instance to 460 implement a network tunnel. In this case the router takes the role of 461 an encapsulator, and the corresponding decapsulator is the logical 462 endpoint of the tunnel. 464 4.6. Middlebox interactions 466 A middle box may interpret some flags and optional fields of the GUE 467 header for classification purposes, but is not required to understand 468 all flags and fields in GUE packets. A middle box should not drop a 469 GUE packet because there are flags unknown to it. The header length 470 in the GUE header allows a middlebox to inspect the payload packet 471 without needing to parse the flags or optional fields. 473 A middlebox may infer bidirectional connection semantics to a UDP 474 flow. For instance a stateful firewall may create a five-tuple rule 475 to match flows on egress, and a corresponding five-tuple rule for 476 matching ingress packets where the roles of source and destination 477 are reversed for the IP addresses and UDP port numbers. To operate in 478 this environment, a GUE tunnel must assume connected semantics 479 defined by the UDP five tuple and the use of GUE encapsulation must 480 be symmetric between both endpoints. The source port set in the UDP 481 header must be the destination port the peer would set for replies. 483 4.7. NAT 485 IP address and port translation can be performed on the UDP/IP 486 headers adhering to the requirements for NAT with UDP [RFC478]. In 487 the case of stateful NAT, connection semantics must be applied to a 488 GUE tunnel as described above. 490 When using transport mode encapsulation and traversing a NAT, the IP 491 addresses may be changed such that the pseudo header checksum used 492 for checksum calculation is modified and the checksum will be found 493 invalid at the receiver. To compensate for this, A GUE option can be 494 added which contains the checksum over the source and destination 495 addresses when the packet is transmitted. Upon receiving this option, 496 the delta of the pseudo header checksum is computed by subtracting 497 the checksum over the source and and destination addresses from the 498 checksum value in the option. The resultant value is then added into 499 checksum calculation when validating the inner transport checksum. 501 4.8. Checksum Handling 503 This section describes the requirements around the UDP checksum and 504 GUE header checksum. Checksums are an important consideration in that 505 that they can provide end to end validation and protect against 506 packet mis-delivery. The latter is allowed by the inclusion of a 507 pseudo header that covers the IP addresses and UDP ports of the 508 encapsulating headers. 510 4.8.1. Checksum requirements 511 The potential for mis-delivery of packets due to corruption of IP, 512 UDP, or GUE headers must be considered. One of the following 513 requirements must be met: 515 o UDP checksums are enabled (for IPv4 or IPv6). 517 o The GUE header checksum is used. 519 o Zero UDP checksums are used in accordance with applicable 520 requirements in [GREUDP], [RFC6935], and [RFC6936]. 522 4.8.2. GUE header checksum 524 The GUE header checksum provides a UDP-lite [RFC3828] type of 525 checksum capability as an optional field of the GUE header. The GUE 526 header checksum minimally covers the GUE header and a GUE pseudo 527 header. The GUE pseudo header includes the corresponding IP 528 addresses as well as the UDP ports of the encapsulating headers. 529 This checksum should provide adequate protection against address 530 corruption in IPv6 when the UDP checksum is zero. Additionally, the 531 GUE checksum provides protection of the GUE header when the UDP 532 checksum is set to zero with either IPv4 or IPv6. The GUE header 533 checksum is defined in [GUECSUM]. 535 4.8.3. UDP Checksum with IPv4 537 For UDP in IPv4, the UDP checksum MUST be processed as specified in 538 [RFC768] and [RFC1122] for both transmit and receive. An 539 encapsulator MAY set the UDP checksum to zero for performance or 540 implementation considerations. The IPv4 header includes a checksum 541 that protects against mis-delivery of the packet due to corruption 542 of IP addresses. The UDP checksum potentially provides protection 543 against corruption of the UDP header, GUE header, and GUE payload. 544 Enabling or disabling the use of checksums is a deployment 545 consideration that should take into account the risk and effects of 546 packet corruption, and whether the packets in the network are 547 already adequately protected by other, possibly stronger mechanisms 548 such as the Ethernet CRC. If an encapsulator sets a zero UDP 549 checksum for IPv4 it SHOULD use the GUE header checksum as described 550 in section 4.8.2. 552 When a decapsulator receives a packet, the UDP checksum field MUST 553 be processed. If the UDP checksum is non-zero, the decapsulator MUST 554 verify the checksum before accepting the packet. By default a 555 decapsulator SHOULD accept UDP packets with a zero checksum. A node 556 MAY be configured to disallow zero checksums per [RFC1122]; this may 557 be done selectively, for instance disallowing zero checksums from 558 certain hosts that are known to be sending over paths subject to 559 packet corruption. If verification of a non-zero checksum fails, a 560 decapsulator lacks the capability to verify a non-zero checksum, or 561 a packet with a zero-checksum was received and the decapsulator is 562 configured to disallow, the packet MUST be dropped and an event MAY 563 be logged. 565 4.8.4. UDP Checksum with IPv6 567 For UDP in IPv6, the UDP checksum MUST be processed as specified in 568 [RFC768] and [RFC2460] for both transmit and receive. Unlike IPv4, 569 there is no header checksum in IPv6 that protects against mis- 570 delivery due to address corruption. Therefore, when GUE is used over 571 IPv6, either the UDP checksum must be enabled or the GUE header 572 checksum must be used. An encapsulator MAY set a zero UDP checksum 573 for performance or implementation reasons, in which case the GUE 574 header checksum MUST be used or applicable requirements for using 575 zero UDP checksums in [GREUDP] MUST be met. If the UDP checksum is 576 enabled, then the GUE header checksum should not be used since it is 577 mostly redundant. 579 When a decapsulator receives a packet, the UDP checksum field MUST 580 be processed. If the UDP checksum is non-zero, the decapsulator MUST 581 verify the checksum before accepting the packet. By default a 582 decapsulator MUST only accept UDP packets with a zero checksum if 583 the GUE header checksum is used and is verified. If verification of 584 a non-zero checksum fails, a decapsulator lacks the capability to 585 verify a non-zero checksum, or a packet with a zero-checksum and no 586 GUE header checksum was received, the packet MUST be dropped and an 587 event MAY be logged. 589 4.9. MTU and fragmentation 591 Standard conventions for handling of MTU (Maximum Transmission Unit) 592 and fragmentation in conjunction with networking tunnels 593 (encapsulation of layer 2 or layer 3 packets) should be followed. 594 Details are described in MTU and Fragmentation Issues with In-the- 595 Network Tunneling [RFC4459] 597 If a packet is fragmented before encapsulation in GUE, all the 598 related fragments must be encapsulated using the same source port 599 (inner flow identifier). An operator may set MTU to account for 600 encapsulation overhead and reduce the likelihood of fragmentation. 602 4.10. Congestion control 604 Per requirements of [RFC5405], if the IP traffic encapsulated with 605 GUE implements proper congestion control no additional mechanisms 606 should be required. 608 In the case that the encapsulated traffic does not implement any or 609 sufficient control, or it is not known rather a transmitter will 610 consistently implement proper congestion control, then congestion 611 control at the encapsulation layer must be provided. Note this case 612 applies to a significant use case in network virtualization in which 613 guests run third party networking stacks that cannot be implicitly 614 trusted to implement conformant congestion control. 616 Out of band mechanisms such as rate limiting, Managed Circuit 617 Breaker, or traffic isolation may used to provide rudimentary 618 congestion control. For finer grained congestion control that allows 619 alternate congestion control algorithms, reaction time within an 620 RTT, and interaction with ECN, in-band mechanisms may warranted. 622 DCCP may be used to provide congestion control for encapsulated 623 flows. In this case, the protocol stack for an IP tunnel may be IP- 624 GUE-DCCP-IP. Alternatively, GUE can be extended to include 625 congestion control (related data carried in GUE optional fields). 626 Congestion control mechanisms for GUE will be elaborated in other 627 specifications. 629 4.11. Multicast 631 GUE packets may be multicast to decapsulators using a multicast 632 destination address in the encapsulating IP headers. Each receiving 633 host will decapsulate the packet independently following normal 634 decapsulator operations. The receiving decapsulators should agree on 635 the same set of GUE parameters and properties. 637 GUE allows encapsulation of unicast, broadcast, or multicast 638 traffic. Entropy for the inner flow identifier (UDP source port) may 639 be generated from the header of encapsulated unicast or 640 broadcast/multicast packets at an encapsulator. The mapping 641 mechanism between the encapsulated multicast traffic and the 642 multicast capability in the IP network is transparent and 643 independent to the encapsulation and is otherwise outside the scope 644 of this document. 646 5. Inner flow identifier properties 648 5.1. Flow classification 650 A major objective of using GUE is that a network device can perform 651 flow classification corresponding to the flow of the inner 652 encapsulated packet based on the contents in the outer headers. 654 Hardware devices commonly perform hash computations on packet 655 headers to classify packets into flows or flow buckets. Flow 656 classification is done to support load balancing (statistical 657 multiplexing) of flows across a set of networking resources. 658 Examples of such load balancing techniques are Equal Cost Multipath 659 routing (ECMP), port selection in Link Aggregation, and NIC device 660 Receive Side Scaling (RSS). Hashes are usually either a three-tuple 661 hash of IP protocol, source address, and destination address; or a 662 five-tuple hash consisting of IP protocol, source address, 663 destination address, source port, and destination port. Typically, 664 networking hardware will compute five-tuple hashes for TCP and UDP, 665 but only three-tuple hashes for other IP protocols. Since the five- 666 tuple hash provides more granularity, load balancing can be finer 667 grained with better distribution. When a packet is encapsulated with 668 GUE, the source port in the outer UDP packet is set to reflect the 669 flow of the inner packet. When a device computes a five-tuple hash 670 on the outer UDP/IP header of a GUE packet, the resultant value 671 classifies the packet per its inner flow. 673 To support flow classification, the source port of the UDP header in 674 GUE is set to a value that maps to the inner flow. This is referred 675 to as the inner flow identifier. The inner flow identifier is set by 676 the encapsulator; it can be computed on the fly based on packet 677 contents or retrieved from a state maintained for the inner flow. 679 Examples of deriving an inner flow identifier are: 681 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 682 instance, the inner flow identifier could be based on the 683 canonical five-tuple hash of the inner packet. 685 o If the encapsulated packet is an AH transport mode packet with 686 TCP as next header, the inner flow identifier could be a hash 687 over a three-tuple: TCP protocol and TCP ports of the 688 encapsulated packet. 690 o If a node is encrypting a packet using ESP tunnel mode and GUE 691 encapsulation, the inner flow identifier could be based on the 692 contents of clear-text packet. For instance, a canonical five- 693 tuple hash for a TCP/IP packet could be used. 695 5.2. Inner flow identifier properties 697 The inner flow identifier is the value set in the UDP source 698 port of a GUE packet. The inner flow identifier should adhere to 699 the following properties: 701 o The value set in the source port should be within the ephemeral 702 port range. IANA suggests this range to be 49152 to 65535, where 703 the high order two bits of the port are set to one. This 704 provides fourteen bits of entropy for the inner flow identifier. 706 o The inner flow identifier should have a uniform distribution 707 across encapsulated flows. 709 o An encapsulator may occasionally change the inner flow 710 identifier used for an inner flow per its discretion (for 711 security, route selection, etc). Changing the value should 712 happen no more than once every thirty seconds. 714 o Decapsulators, or any networking devices, should not attempt any 715 interpretation of the inner flow identifier, nor should they 716 attempt to reproduce any hash calculation. They may use the 717 value to match further receive packets for steering decisions, 718 but cannot assume that the hash uniquely or permanently 719 identifies a flow. 721 o Input to the inner flow identifier is not restricted to ports 722 and addresses; input could include flow label from an IPv6 723 packet, SPI from an ESP packet, or other flow related state in 724 the encapsulator that is not necessarily conveyed in the packet. 726 o The assignment function for inner flow identifiers should be 727 randomly seeded to mitigate denial of service attacks. The seed 728 may be changed periodically. 730 6. Motivation for GUE 732 This section presents the motivation for GUE with respect to other 733 encapsulation methods. 735 A number of different encapsulation techniques have been proposed for 736 the encapsulation of one protocol over another. EtherIP [RFC3378] 737 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 738 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 739 layer 2 and layer 3 packets over IP. NVGRE [NVGRE] and VXLAN 740 [RFC7348] are proposals for encapsulation of layer 2 packets for 741 network virtualization. IPIP [RFC2003] and Generic packet tunneling 742 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 744 Several proposals exist for encapsulating packets over UDP including 745 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN, LISP 746 [RFC6830] which encapsulates layer 3 packets, and Generic UDP 747 Encapsulation for IP Tunneling (GRE over UDP)[GREUDP]. Generic UDP 748 tunneling [GUT] is a proposal similar to GUE in that it aims to 749 tunnel packets of IP protocols over UDP. 751 GUE has the following discriminating features: 753 o UDP encapsulation leverages specialized network device 754 processing for efficient transport. The semantics for using the 755 UDP source port as an identifier for an inner flow are defined. 757 o GUE permits encapsulation of arbitrary IP protocols, which 758 includes layer 2 3, and 4 protocols. This potentially allows 759 nearly all traffic within a data center to be normalized to be 760 either TCP or UDP on the wire. 762 o Multiple protocols can be multiplexed over a single UDP port 763 number. This is in contrast to techniques to encapsulate 764 protocols over UDP using a protocol specific port number (such 765 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 766 extensible mechanism for encapsulating all IP protocols in UDP 767 with minimal overhead (four bytes of additional header). 769 o GUE is extensible. New flags and optional fields can be defined. 771 o The GUE header includes a header length field. This allows a 772 network node to inspect an encapsulated packet without needing 773 to parse the full encapsulation header. 775 o Private data in the encapsulation header allows local 776 customization and experimentation while being compatible with 777 processing in network nodes (routers and middleboxes). 779 o GUE includes both data messages (encapsulation of packets) and 780 control messages (such as OAM). 782 7. Security Considerations 784 Encapsulation of IP protocols within GUE should not increase security 785 risk, nor provide additional security in itself. As suggested in 786 section 5 the source port for of UDP packets in GUE should be 787 randomly seeded to mitigate some possible denial service attacks. 789 Security for Generic UDP Encapsulation, including security for the 790 GUE header and payload, is described in detail in [GUESEC]. 792 8. IANA Consideration 794 A user UDP port number assignment for GUE has been assigned: 796 Service Name: gue 797 Transport Protocol(s): UDP 798 Assignee: Tom Herbert 799 Contact: Tom Herbert 800 Description: Generic UDP Encapsulation 801 Reference: draft-herbert-gue 802 Port Number: 6080 803 Service Code: N/A 804 Known Unauthorized Uses: N/A 805 Assignment Notes: N/A 807 IANA is requested to create a "GUE flag-fields" registry to allocate 808 flags and optional fields for the primary GUE header flags and 809 extension flags. This shall be a registry of bit assignments for 810 flags, length of optional fields for corresponding flags, and 811 descriptive strings. There are sixteen bits for primary GUE header 812 flags (bit number 0-15) where bit 15 is reserved as the extension 813 flag in this document. There are thirty-two bits for extension flags. 815 9. Acknowledgements 817 The authors would like to thank David Liu, Erik Nordmark, and Fred 818 Templin for valuable input on this draft. 820 10. References 822 10.1. Normative References 824 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 825 10.17487/RFC0768, August 1980, . 828 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 829 IANA Considerations Section in RFCs", RFC 2434, DOI 830 10.17487/RFC2434, October 1998, . 833 [RFC2983] Black, D., "Differentiated Services and Tunnels", 834 RFC 2983, DOI 10.17487/RFC2983, October 2000, 835 . 837 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 838 Notification", RFC 6040, DOI 10.17487/RFC6040, November 839 2010, . 841 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 842 for the Use of IPv6 UDP Datagrams with Zero Checksums", 843 RFC 6936, DOI 10.17487/RFC6936, April 2013, 844 . 846 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 847 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 848 2006, . 850 10.2. Informative References 852 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 853 10.17487/RFC2003, October 1996, . 856 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 857 Stenberg, "UDP Encapsulation of IPsec ESP Packets", 858 RFC 3948, DOI 10.17487/RFC3948, January 2005, 859 . 861 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 862 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 863 10.17487/RFC6830, January 2013, . 866 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 867 Ethernet Frames in IP Datagrams", RFC 3378, DOI 868 10.17487/RFC3378, September 2002, . 871 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 872 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 873 DOI 10.17487/RFC2784, March 2000, . 876 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 877 "Encapsulating MPLS in IP or Generic Routing Encapsulation 878 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 879 . 881 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 882 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 883 RFC 2661, DOI 10.17487/RFC2661, August 1999, 884 . 886 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 887 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 888 June 2010, . 890 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 891 and G. Fairhurst, Ed., "The Lightweight User Datagram 892 Protocol (UDP-Lite)", RFC 3828, July 2004, 893 . 895 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 896 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 897 eXtensible Local Area Network (VXLAN): A Framework for 898 Overlaying Virtualized Layer 2 Networks over Layer 3 899 Networks", RFC 7348, August 2014, . 902 [NVGRE] Garg, P., and Wang, Y., "NVGRE: Network Virtualization 903 using Generic Routing Encapsulation" draft-sridharan- 904 virtualization-nvgre-08 906 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 907 "Encapsulation of TCP and other Transport Protocols over 908 UDP" draft-cheshire-tcp-over-udp-00 910 [GREUDP] Crabbe, E., Yong, L., Xu, X., and Herbert, T.,Cpsest 911 "Generic UDP Encapsulation for IP Tunneling" draft-ietf- 912 tsvwg-gre-in-udp-encap-06 914 [GUESEC] Yong, L., Herbert, T., "Generic UDP Encapsulation (GUE) 915 for Secure Transport", draft-hy-gue-4-secure-transport-02, 916 work in progress. 918 [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP 919 Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" 921 [REMCSUM] Herbert, T., "Remote Checksum Offload" draft-herbert- 922 remotecsumoffload-00 924 Appendix A: NIC processing for GUE 926 This appendix provides some guidelines for Network Interface Cards 927 (NICs) to implement common offloads and accelerations to support GUE. 928 Note that most of this discussion is generally applicable to other 929 methods of UDP based encapsulation. 931 A.1. Receive multi-queue 933 Contemporary NICs support multiple receive descriptor queues (multi- 934 queue). Multi-queue enables load balancing of network processing for 935 a NIC across multiple CPUs. On packet reception, a NIC must select 936 the appropriate queue for host processing. Receive Side Scaling is a 937 common method which uses the flow hash for a packet to index an 938 indirection table where each entry stores a queue number. Flow 939 Director and Accelerated Receive Flow Steering (aRFS) allow a host to 940 program the queue that is used for a given flow which is identified 941 either by an explicit five-tuple or by the flow's hash. 943 GUE encapsulation should be compatible with multi-queue NICs that 944 support five-tuple hash calculation for UDP/IP packets as input to 945 RSS. The inner flow identifier (source port) ensures classification 946 of the encapsulated flow even in the case that the outer source and 947 destination addresses are the same for all flows (e.g. all flows are 948 going over a single tunnel). 950 By default, UDP RSS support is often disabled in NICs to avoid out of 951 order reception that can occur when UDP packets are fragmented. As 952 discussed above, fragmentation of GUE packets should be mitigated by 953 fragmenting packets before entering a tunnel, path MTU discovery in 954 higher layer protocols, or operator adjusting MTUs. Other UDP traffic 955 may not implement such procedures to avoid fragmentation, so enabling 956 UDP RSS support in the NIC should be a considered tradeoff during 957 configuration. 959 A.2. Checksum offload 961 Many NICs provide capabilities to calculate standard ones complement 962 payload checksum for packets in transmit or receive. When using GUE 963 encapsulation there are at least two checksums that may be of 964 interest: the encapsulated packet's transport checksum, and the UDP 965 checksum in the outer header. 967 A.2.1. Transmit checksum offload 969 NICs may provide a protocol agnostic method to offload transmit 970 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 971 GUE. In this method the host provides checksum related parameters in 972 a transmit descriptor for a packet. These parameters include the 973 starting offset of data to checksum, the length of data to checksum, 974 and the offset in the packet where the computed checksum is to be 975 written. The host initializes the checksum field to pseudo header 976 checksum. 978 In the case of GUE, the checksum for an encapsulated transport layer 979 packet, a TCP packet for instance, can be offloaded by setting the 980 appropriate checksum parameters. 982 NICs typically can offload only one transmit checksum per packet, so 983 simultaneously offloading both an inner transport packet's checksum 984 and the outer UDP checksum is likely not possible. In this case 985 setting UDP checksum to zero (per above discussion) and offloading 986 the inner transport packet checksum might be acceptable. 988 If an encapsulator is co-resident with a host, then checksum offload 989 may be performed using remote checksum offload [REMCSUM]. Remote 990 checksum offload relies on NIC offload of the simple UDP/IP checksum 991 which is commonly supported even in legacy devices. In remote 992 checksum offload the outer UDP checksum is set and the GUE header 993 includes an option indicating the start and offset of the inner 994 "offloaded" checksum. The inner checksum is initialized to the pseudo 995 header checksum. When a decapsulator receives a GUE packet with the 996 remote checksum offload option, it completes the offload operation by 997 determining the packet checksum from the indicated start point to the 998 end of the packet, and then adds this into the checksum field at the 999 offset given in the option. Computing the checksum from the start to 1000 end of packet is efficient if checksum-complete is provided on the 1001 receiver. 1003 A.2.2. Receive checksum offload 1005 GUE is compatible with NICs that perform a protocol agnostic receive 1006 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1007 NIC computes a ones complement checksum over all (or some predefined 1008 portion) of a packet. The computed value is provided to the host 1009 stack in the packet's receive descriptor. The host driver can use 1010 this checksum to "patch up" and validate any inner packet transport 1011 checksum, as well as the outer UDP checksum if it is non-zero. 1013 Many legacy NICs don't provide checksum-complete but instead provide 1014 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1015 in Linux). Usually, such validation is only done for simple TCP/IP or 1016 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1017 checksum-complete value for the UDP packet is the "not" of the pseudo 1018 header checksum. In this way, checksum-unnecessary can be converted 1019 to checksum-complete. So if the NIC provides checksum-unnecessary for 1020 the outer UDP header in an encapsulation, checksum conversion can be 1021 done so that the checksum-complete value is derived and can be used 1022 by the stack to validate an checksums in the encapsulated packet. 1024 A.3. Transmit Segmentation Offload 1026 Transmit Segmentation Offload (TSO) is a NIC feature where a host 1027 provides a large (>MTU size) TCP packet to the NIC, which in turn 1028 splits the packet into separate segments and transmits each one. This 1029 is useful to reduce CPU load on the host. 1031 The process of TSO can be generalized as: 1033 - Split the TCP payload into segments which allow packets with 1034 size less than or equal to MTU. 1036 - For each created segment: 1038 1. Replicate the TCP header and all preceding headers of the 1039 original packet. 1041 2. Set payload length fields in any headers to reflect the 1042 length of the segment. 1044 3. Set TCP sequence number to correctly reflect the offset of 1045 the TCP data in the stream. 1047 4. Recompute and set any checksums that either cover the payload 1048 of the packet or cover header which was changed by setting a 1049 payload length. 1051 Following this general process, TSO can be extended to support TCP 1052 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1053 header, GUE header, inner IP header if tunneling, and TCP headers are 1054 replicated. Any packet length header fields need to be set properly 1055 (including the length in the outer UDP header), and checksums need to 1056 be set correctly (including the outer UDP checksum if being used). 1058 To facilitate TSO with GUE it is recommended that optional fields 1059 should not contain values that must be updated on a per segment 1060 basis-- for example the GUE fields should not include checksums, 1061 lengths, or sequence numbers that refer to the payload. If the GUE 1062 header does not contain such fields then the TSO engine only needs to 1063 copy the bits in the GUE header when creating each segment and does 1064 not need to parse the GUE header. 1066 A.4. Large Receive Offload 1068 Large Receive Offload (LRO) is a NIC feature where packets of a TCP 1069 connection are reassembled, or coalesced, in the NIC and delivered to 1070 the host as one large packet. This feature can reduce CPU utilization 1071 in the host. 1073 LRO requires significant protocol awareness to be implemented 1074 correctly and is difficult to generalize. Packets in the same flow 1075 need to be unambiguously identified. In the presence of tunnels or 1076 network virtualization, this may require more than a five-tuple match 1077 (for instance packets for flows in two different virtual networks may 1078 have identical five-tuples). Additionally, a NIC needs to perform 1079 validation over packets that are being coalesced, and needs to 1080 fabricate a single meaningful header from all the coalesced packets. 1082 The conservative approach to supporting LRO for GUE would be to 1083 assign packets to the same flow only if they have identical five- 1084 tuple and were encapsulated the same way. That is the outer IP 1085 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1086 and inner five tuple are all identical. 1088 Appendix B: Privileged ports 1090 Using the source port to contain an inner flow identifier value 1091 disallows the security method of a receiver enforcing that the source 1092 port be a privileged port. Privileged ports are defined by some 1093 operating systems to restrict source port binding. Unix, for 1094 instance, considered port number less than 1024 to be privileged. 1096 Enforcing that packets are sent from a privileged port is widely 1097 considered an inadequate security mechanism and has been mostly 1098 deprecated. To approximate this behavior, an implementation could 1099 restrict a user from sending a packet destined to the GUE port 1100 without proper credentials. 1102 Appendix C: Inner flow identifier as a route selector 1104 An encapsulator generating an inner flow identifier may modulate the 1105 value to perform a type of multipath source routing. Assuming that 1106 networking switches perform ECMP based on the flow hash, a sender can 1107 affect the path by altering the inner flow identifier. For instance, 1108 a host may store a flow hash in its PCB for an inner flow, and may 1109 alter the value upon detecting that packets are traversing a lossy 1110 path. Changing the inner flow identifier for a flow should be subject 1111 to hysteresis (at most once every thirty seconds) to limit the number 1112 of out of order packets. 1114 Appendix D: Hardware protocol implementation considerations 1116 A low level protocol, such is GUE, is likely interesting to being 1117 supported by high speed network devices. Variable length header (VLH) 1118 protocols like GUE are often considered difficult to efficiently 1119 implement in hardware. In order to retain the important 1120 characteristics of an extensible and robust protocol, hardware 1121 vendors may practice "constrained flexibility". In this model, only 1122 certain combinations or protocol header parameterizations are 1123 implemented in hardware fast path. Each such parameterization is 1124 fixed length so that the particular instance can be optimized as a 1125 fixed length protocol. In the case of GUE this constitutes specific 1126 combinations of GUE flags, fields, and next protocol. The selected 1127 combinations would naturally be the most common cases which form the 1128 "fast path", and other combinations are assumed to take the "slow 1129 path". 1131 In time, needs and requirements of the protocol may change which may 1132 manifest themselves as new parameterizations to be supported in the 1133 fast path. To allow allow this extensibility, a device practicing 1134 constrained flexibility should allow the fast path parameterizations 1135 to be programmable. 1137 Authors' Addresses 1139 Tom Herbert 1140 Facebook 1141 1 Hacker Way 1142 Menlo Park, CA 94052 1143 US 1145 Email: tom@herbertland.com 1147 Lucy Yong 1148 Huawei USA 1149 5340 Legacy Dr. 1150 Plano, TX 75024 1151 US 1153 Email: lucy.yong@huawei.com 1155 Osama Zia 1156 Microsoft 1157 1 Microsoft Way 1158 Redmond, WA 98029 1159 US 1161 Email: osamaz@microsoft.com