idnits 2.17.1 draft-ietf-intarea-gue-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: In IPv6, there is no checksum in the IPv6 header that protects against mis-delivery due to address corruption. Therefore, when GUE is used over IPv6, either the UDP checksum or the GUE header checksum SHOULD be used unless there are alternative mechanisms in use that protect against misdelivery. The UDP checksum and GUE header checksum SHOULD not be used at the same time since that would be mostly redundant. -- The document date (April 26, 2017) is 2554 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 228, but not defined == Missing Reference: 'RFC2992' is mentioned on line 295, but not defined == Missing Reference: 'GUEXTENS' is mentioned on line 447, but not defined == Missing Reference: 'RFC2460' is mentioned on line 543, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC5245' is mentioned on line 821, but not defined ** Obsolete undefined reference: RFC 5245 (Obsoleted by RFC 8445, RFC 8839) == Missing Reference: 'RFC768' is mentioned on line 850, but not defined == Missing Reference: 'RFC6335' is mentioned on line 1012, but not defined == Missing Reference: 'RFC2473' is mentioned on line 1110, but not defined -- Looks like a reference, but probably isn't: '7510' on line 1115 == Missing Reference: 'RFC5226' is mentioned on line 1239, but not defined ** Obsolete undefined reference: RFC 5226 (Obsoleted by RFC 8126) == Unused Reference: 'RFC2434' is defined on line 1287, but no explicit reference was found in the text == Unused Reference: 'RFC3828' is defined on line 1316, but no explicit reference was found in the text == Unused Reference: 'RFC7605' is defined on line 1328, but no explicit reference was found in the text == Unused Reference: 'RFC7510' is defined on line 1341, but no explicit reference was found in the text == Unused Reference: 'RFC4340' is defined on line 1346, but no explicit reference was found in the text == Unused Reference: 'RFC5285' is defined on line 1361, but no explicit reference was found in the text == Unused Reference: 'GUESEC' is defined on line 1418, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5245 (ref. 'RFC5285') (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-01) exists of draft-herbert-gue-extensions-00 == Outdated reference: A later version (-04) exists of draft-hy-nvo3-gue-4-nvo-03 == Outdated reference: A later version (-01) exists of draft-herbert-transports-over-udp-00 Summary: 6 errors (**), 0 flaws (~~), 21 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Area WG T. Herbert 3 Internet-Draft Quantonium 4 Intended status: Standard track L. Yong 5 Expires October 28, 2017 Huawei USA 6 O. Zia 7 Microsoft 8 April 26, 2017 10 Generic UDP Encapsulation 11 draft-ietf-intarea-gue-02 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on October 28, 2017. 36 Copyright Notice 38 Copyright (c) 2017 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of different IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 as tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.1. Terminology and acronyms . . . . . . . . . . . . . . . . . 5 75 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 6 76 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 7 77 2.1. GUE version . . . . . . . . . . . . . . . . . . . . . . . . 7 78 3. Version 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 79 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 8 80 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 9 81 3.2.1 Proto field . . . . . . . . . . . . . . . . . . . . . . 9 82 3.2.2 Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 83 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 10 84 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 10 85 3.3.2. Example GUE header with extension fields . . . . . . . 11 86 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 12 87 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 12 88 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 12 89 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 90 3.6. Hiding the transport layer protocol number . . . . . . . . 13 91 4. Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 14 93 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 14 94 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 95 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 16 96 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 16 97 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 16 98 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 17 99 5.4.1. Processing a received data message . . . . . . . . . . 17 100 5.4.2. Processing a received control message . . . . . . . . . 18 101 5.5. Router and switch operation . . . . . . . . . . . . . . . . 18 102 5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 18 103 5.6.1. Inferring connection semantics . . . . . . . . . . . . 19 104 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 19 105 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 19 106 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 19 107 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 20 108 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 20 109 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 21 110 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 21 111 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 22 112 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 22 113 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 22 114 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 23 115 5.12 Negotiation of acceptable flags and extension fields . . . 24 116 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 24 117 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 24 118 6.2 Comparison of GUE to other encapsulations . . . . . . . . . 25 119 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 26 120 8. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 27 121 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 27 122 8.2. GUE version number . . . . . . . . . . . . . . . . . . . . 28 123 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 28 124 8.4. Flag-fields . . . . . . . . . . . . . . . . . . . . . . . . 28 125 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 126 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 127 10.1. Normative References . . . . . . . . . . . . . . . . . . . 29 128 10.2. Informative References . . . . . . . . . . . . . . . . . . 30 129 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 33 130 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 33 131 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 33 132 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 34 133 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 34 134 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 35 135 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 36 136 Appendix B: Implementation considerations . . . . . . . . . . . . 36 137 B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 36 138 B.2. Setting flow entropy as a route selector . . . . . . . . . 37 139 B.3. Hardware protocol implementation considerations . . . . . . 37 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 142 1. Introduction 144 This specification describes Generic UDP Encapsulation (GUE) which is 145 a general method for encapsulating packets of arbitrary IP protocols 146 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 147 packets in UDP facilitates efficient transport across networks. 148 Networking devices widely provide protocol specific processing and 149 optimizations for UDP (as well as TCP) packets. Packets for atypical 150 IP protocols (those not usually parsed by networking hardware) can be 151 encapsulated in UDP packets to maximize deliverability and to 152 leverage flow specific mechanisms for routing and packet steering. 154 GUE provides an extensible header format for including optional data 155 in the encapsulation header. This data potentially covers items such 156 as the virtual networking identifier, security data for validating or 157 authenticating the GUE header, congestion control data, etc. GUE also 158 allows private optional data in the encapsulation header. This 159 feature can be used by a site or implementation to define local 160 custom optional data, and allows experimentation of options that may 161 eventually become standard. 163 This document does not define any specific GUE extensions. 164 [GUEEXTENS] specifies a set of core extensions and [GUE4NVO3] defines 165 an extension for using GUE with network virtualization. 167 The motivation for the GUE protocol is described in section 6. 169 1.1. Terminology and acronyms 171 GUE Generic UDP Encapsulation 173 GUE Header A variable length protocol header that is composed 174 of a primary four byte header and zero or more four 175 byte words for optional header data 177 GUE packet A UDP/IP packet that contains a GUE header and GUE 178 payload within the UDP payload 180 Encapsulator A network node that encapsulates a packet in GUE 182 Decapsulator A network node that decapsulates and processes 183 packets encapsulated in GUE 185 Data message An encapsulated packet in the GUE payload that is 186 addressed to the protocol stack for an associated 187 protocol 189 Control message A formatted message in the GUE payload that is 190 implicitly addressed to the decapsulator to monitor 191 or control the state or behavior of a tunnel 193 Flags A set of bit flags in the primary GUE header 195 Extension field 196 An optional field in a GUE header whose presence is 197 indicated by corresponding flag(s) 199 C-bit A single bit flag in the primary GUE header that 200 indicates whether the GUE packet contains a control 201 message or data message 203 Hlen A field in the primary GUE header that gives the 204 length of the GUE header 206 Proto/ctype A field in the GUE header that holds either the IP 207 protocol number for a data message or a type for a 208 control message 210 Private data Optional data in the GUE header that can be used for 211 private purposes 213 Outer IP header Refers to the outer most IP header or packet when 214 encapsulating a packet over IP 216 Inner IP header Refers to an encapsulated IP header when an IP 217 packet is encapsulated 219 Outer packet Refers to an encapsulating packet 221 Inner packet Refers to a packet that is encapsulated 223 1.2. Requirements Language 225 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 226 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 227 document are to be interpreted as described in [RFC2119]. 229 2. Base packet format 231 A GUE packet is comprised of a UDP packet whose payload is a GUE 232 header followed by a payload which is either an encapsulated packet 233 of some IP protocol or a control message such as an OAM (Operations, 234 Administration, and Management) message. A GUE packet has the general 235 format: 237 +-------------------------------+ 238 | | 239 | UDP/IP header | 240 | | 241 |-------------------------------| 242 | | 243 | GUE Header | 244 | | 245 |-------------------------------| 246 | | 247 | Encapsulated packet | 248 | or control message | 249 | | 250 +-------------------------------+ 252 The GUE header is variable length as determined by the presence of 253 optional extension fields. 255 2.1. GUE version 257 The first two bits of the GUE header contain the GUE protocol version 258 number. The rest of the fields after the GUE version number are 259 defined based on the version number. Versions 0 and 1 are described 260 in this specification; versions 2 and 3 are reserved. 262 3. Version 0 264 Version 0 of GUE defines a generic extensible format to encapsulate 265 packets by Internet protocol number. 267 3.1. Header format 269 The header format for version 0 of GUE in UDP is: 271 0 1 2 3 272 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 274 | Source port | Destination port | | 275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 276 | Length | Checksum | | 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 278 | 0 |C| Hlen | Proto/ctype | Flags | 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 280 | | 281 ~ Extensions Fields (optional) ~ 282 | | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 | | 285 ~ Private data (optional) ~ 286 | | 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 The contents of the UDP header are: 291 o Source port: If connection semantics (section 5.6.1) are applied 292 to an encapsulation, this is set to the local source port for 293 the connection. When connection semantics are not applied, this 294 is set to a flow entropy value for use with ECMP (Equal-Cost 295 Mulit-Path [RFC2992]); the properties of flow entropy are 296 described in section 5.11. 298 o Destination port: If connection semantics (section 5.6.1) are 299 applied to an encapsulation, this is set to the destination port 300 for the tuple. If connection semantics are not applied this is 301 set to the GUE assigned port number, 6080. 303 o Length: Canonical length of the UDP packet (length of UDP header 304 and payload). 306 o Checksum: Standard UDP checksum (handling is described in 307 section 5.7). 309 The GUE header consists of: 311 o Ver: GUE protocol version (0). 313 o C: C-bit: When set indicates a control message, not set 314 indicates a data message. 316 o Hlen: Length in 32-bit words of the GUE header, including 317 optional extension fields but not the first four bytes of the 318 header. Computed as (header_len - 4) / 4 where header_len is the 319 total header length in bytes. All GUE headers are a multiple of 320 four bytes in length. Maximum header length is 128 bytes. 322 o Proto/ctype: When the C-bit is set, this field contains a 323 control message type for the payload (section 3.2.2). When C-bit 324 is not set, the field holds the Internet protocol number for the 325 encapsulated packet in the payload (section 3.2.1). The control 326 message or encapsulated packet begins at the offset provided by 327 Hlen. 329 o Flags: Header flags that may be allocated for various purposes 330 and may indicate presence of extension fields. Undefined header 331 flag bits MUST be set to zero on transmission. 333 o Extension Fields: Optional fields whose presence is indicated by 334 corresponding flags. 336 o Private data: Optional private data block (see section 3.4). If 337 the private block is present, it immediately follows that last 338 extension field present in the header. The private block is 339 considered to be part of the GUE header. The length of this data 340 is determined by subtracting the starting offset from the header 341 length. 343 3.2. Proto/ctype field 345 The proto/ctype fields either contains an Internet protocol number 346 (when the C-bit is not set) or GUE control message type (when the C- 347 bit is set). 349 3.2.1 Proto field 351 When the C-bit is not set, the proto/ctype field MUST contain an IANA 352 Internet Protocol Number. The protocol number is interpreted relative 353 to the IP protocol that encapsulates the UDP packet (i.e. protocol of 354 the outer IP header). The protocol number serves as an indication of 355 the type of the next protocol header which is contained in the GUE 356 payload at the offset indicated in Hlen. Intermediate devices MAY 357 parse the GUE payload per the number in the proto/ctype field, and 358 header flags cannot affect the interpretation of the proto/ctype 359 field. 361 When the outer IP protocol is IPv4, the proto field MUST be set to a 362 valid IP protocol number usable with IPv4; it MUST NOT be set to a 363 number for IPv6 extension headers or ICMPv6 options (number 58). An 364 exception is that the destination options extension header using the 365 PadN option MAY be used with IPv4 as described in section 3.6. The 366 "no next header" protocol number (59) also MAY be used with IPv4 as 367 described below. 369 When the outer IP protocol is IPv6, the proto field can be set to any 370 defined protocol number except that it MUST NOT be set to Hop-by-hop 371 options (number 0). If a received GUE packet in IPv6 contains a 372 protocol number that is an extension header (e.g. Destination 373 Options) then the extension header is processed after the GUE header 374 is processed as though the GUE header is an extension header. 376 IP protocol number 59 ("No next header") can be set to indicate that 377 the GUE payload does not begin with the header of an IP protocol. 378 This would be the case, for instance, if the GUE payload were a 379 fragment when performing GUE level fragmentation. The interpretation 380 of the payload is performed through other means (such as flags and 381 extension fields), and intermediate devices MUST NOT parse packets 382 based on the IP protocol number in this case. 384 3.2.2 Ctype field 386 When the C-bit is set, the proto/ctype field MUST be set to a valid 387 control message type. A value of zero indicates that the GUE payload 388 requires further interpretation to deduce the control type. This 389 might be the case when the payload is a fragment of a control 390 message, where only the reassembled packet can be interpreted as a 391 control message. 393 Control messages will be defined in an IANA registry. Control message 394 types 1 through 127 may be defined in standards. Types 128 through 395 255 are reserved to be user defined for experimentation or private 396 control messages. 398 This document does not specify any standard control message types 399 other than type 0. 401 3.3. Flags and extension fields 403 Flags and associated extension fields are the primary mechanism of 404 extensibility in GUE. As mentioned in section 3.1, GUE header flags 405 indicate the presence of optional extension fields in the GUE header. 406 [GUEXTENS] defines a basic set of GUE extensions. 408 3.3.1. Requirements 410 There are sixteen flag bits in the GUE header. Some flags indicate 411 presence of an extension fields. The size of an extension field 412 indicated by a flag MUST be fixed. 414 Flags can be paired together to allow different lengths for an 415 extension field. For example, if two flag bits are paired, a field 416 can possibly be three different lengths-- that is bit value of 00 417 indicates no field present; 01, 10, and 11 indicate three possible 418 lengths for the field. Regardless of how flag bits are paired, the 419 lengths and offsets of optional fields corresponding to a set of 420 flags MUST be well defined. 422 Extension fields are placed in order of the flags. New flags are to 423 be allocated from high to low order bit contiguously without holes. 424 Flags allow random access, for instance to inspect the field 425 corresponding to the Nth flag bit, an implementation only considers 426 the previous N-1 flags to determine the offset. Flags after the Nth 427 flag are not pertinent in calculating the offset of the Nth flag. 428 Random access of flags and fields permits processing of optional 429 extensions in an order that is independent of their position in the 430 packet. The processing order of extensions defined in [GUEEXTENS] 431 demonstrates this property. 433 Flags (or paired flags) are idempotent such that new flags MUST NOT 434 cause reinterpretation of old flags. Also, new flags MUST NOT alter 435 interpretation of other elements in the GUE header nor how the 436 message is parsed (for instance, in a data message the proto/ctype 437 field always holds an IP protocol number as an invariant). 439 The set of available flags can be extended in the future by defining 440 a "flag extensions bit" that refers to a field containing a new set 441 of flags. 443 3.3.2. Example GUE header with extension fields 445 An example GUE header for a data message encapsulating an IPv4 packet 446 and containing the VNID and Security extension fields (both defined 447 in [GUEXTENS]) is shown below: 449 0 1 2 3 450 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | 0 |0| 3 | 94 |1|0 0 1| 0 | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | VNID | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | | 457 + Security + 458 | | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 In the above example, the first flag bit is set which indicates that 461 the VNID extension is present which is a 32 bit field. The second 462 through fourth bits of the flags are paired flags that indicate the 463 presence of a security field with seven possible sizes. In this 464 example 001 indicates a sixty-four bit security field. 466 3.4. Private data 468 An implementation MAY use private data for its own use. The private 469 data immediately follows the last field in the GUE header and is not 470 a fixed length. This data is considered part of the GUE header and 471 MUST be accounted for in header length (Hlen). The length of the 472 private data MUST be a multiple of four and is determined by 473 subtracting the offset of private data in the GUE header from the 474 header length. Specifically: 476 Private_length = (Hlen * 4) - Length(flags) 478 where "Length(flags)" returns the sum of lengths of all the extension 479 fields present in the GUE header. When there is no private data 480 present, the length of the private data is zero. 482 The semantics and interpretation of private data are implementation 483 specific. The private data may be structured as necessary, for 484 instance it might contain its own set of flags and extension fields. 486 An encapsulator and decapsulator MUST agree on the meaning of private 487 data before using it. The mechanism to achieve this agreement is 488 outside the scope of this document but could include implementation- 489 defined behavior, coordinated configuration, in-band communication 490 using GUE control messages, or out-of-band messages. 492 If a decapsulator receives a GUE packet with private data, it MUST 493 validate the private data appropriately. If a decapsulator does not 494 expect private data from an encapsulator, the packet MUST be dropped. 495 If a decapsulator cannot validate the contents of private data per 496 the provided semantics, the packet MUST also be dropped. An 497 implementation MAY place security data in GUE private data which if 498 present MUST be verified for packet acceptance. 500 3.5. Message types 502 3.5.1. Control messages 504 Control messages carry formatted data that are implicitly addressed 505 to the decapsulator to monitor or control the state or behavior of a 506 tunnel (OAM). For instance, an echo request and corresponding echo 507 reply message can be defined to test for liveness. 509 Control messages are indicated in the GUE header when the C-bit is 510 set. The payload is interpreted as a control message with type 511 specified in the proto/ctype field. The format and contents of the 512 control message are indicated by the type and can be variable length. 514 Other than interpreting the proto/ctype field as a control message 515 type, the meaning and semantics of the rest of the elements in the 516 GUE header are the same as that of data messages. Forwarding and 517 routing of control messages should be the same as that of a data 518 message with the same outer IP and UDP header and GUE flags; this 519 ensures that control messages can be created that follow the same 520 path as data messages. 522 3.5.2. Data messages 524 Data messages carry encapsulated packets that are addressed to the 525 protocol stack for the associated protocol. Data messages are a 526 primary means of encapsulation and can be used to create tunnels for 527 overlay networks. 529 Data messages are indicated in GUE header when the C-bit is not set. 530 The payload of a data message is interpreted as an encapsulated 531 packet of an Internet protocol indicated in the proto/ctype field. 532 The packet immediately follows the GUE header. 534 3.6. Hiding the transport layer protocol number 536 The GUE header indicates the Internet protocol of the encapsulated 537 packet. A protocol number is either contained in the Proto/ctype 538 field of the primary GUE header or in the Payload Type field of a GUE 539 Transform extension field (used to encrypt the payload with DTLS, 540 [GUEEXTENS]). If the transport protocol number needs to be hidden 541 from the network, then a trivial destination options can be used. 543 The PadN destination option [RFC2460] can be used to encode the 544 transport protocol as a next header of an extension header (and 545 maintain alignment of encapsulated transport headers). The 546 Proto/ctype field or Payload Type field of the GUE Transform field is 547 set to 60 to indicate that the first encapsulated header is a 548 destination options extension header. 550 The format of the extension header is below: 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | Next Header | 2 | 1 | 0 | 554 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 556 For IPv4, it is permitted in GUE to used this precise destination 557 option to contain the obfuscated protocol number. In this case next 558 header MUST refer to a valid IP protocol for IPv4. No other extension 559 headers or destination options are permitted with IPv4. 561 4. Version 1 563 Version 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. 564 In this version there is no GUE header; a UDP packet carries an IP 565 packet. The first two bits of the UDP payload for GUE are the GUE 566 version and coincide with the first two bits of the version number in 567 the IP header. The first two version bits of IPv4 and IPv6 are 01, so 568 we use GUE version 1 for direct IP encapsulation which makes two bits 569 of GUE version to also be 01. 571 This technique is effectively a means to compress out the GUE header 572 when encapsulating IPv4 or IPv6 packets and there are no flags or 573 extension fields present. This method is compatible to use on the 574 same port number as packets with the GUE header (GUE version 0 575 packets). This technique saves encapsulation overhead on costly links 576 for the common use of IP encapsulation, and also obviates the need to 577 allocate a separate port number for IP-over-UDP encapsulation. 579 4.1. Direct encapsulation of IPv4 581 The format for encapsulating IPv4 directly in UDP is: 583 0 1 2 3 584 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 586 | Source port | Destination port | | 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 588 | Length | Checksum | | 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 590 |0|1|0|0| IHL |Type of Service| Total Length | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 | Identification |Flags| Fragment Offset | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 | Time to Live | Protocol | Header Checksum | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | Source IPv4 Address | 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 598 | Destination IPv4 Address | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 Note that 0100 value IP version field express the GUE version as 1 602 (bits 01) and IP version as 4 (bits 0100). 604 4.2. Direct encapsulation of IPv6 605 The format for encapsulating IPv6 directly in UDP is demonstrated 606 below: 608 0 1 2 3 609 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 611 | Source port | Destination port | | 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 613 | Length | Checksum | | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 615 |0|1|1|0| Traffic Class | Flow Label | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | Payload Length | NextHdr | Hop Limit | 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 | | 620 + + 621 | | 622 + Source IPv6 Address + 623 | | 624 + + 625 | | 626 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 627 | | 628 + + 629 | | 630 + Destination IPv6 Address + 631 | | 632 + + 633 | | 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 Note that 0110 value IP version field expresses the GUE version as 1 637 (bits 01) and IP version as 6 (bits 0110). 639 5. Operation 641 The figure below illustrates the use of GUE encapsulation between two 642 hosts. Host 1 is sending packets to Host 2. An encapsulator performs 643 encapsulation of packets from Host 1. These encapsulated packets 644 traverse the network as UDP packets. At the decapsulator, packets are 645 decapsulated and sent on to Host 2. Packet flow in the reverse 646 direction need not be symmetric; GUE encapsulation is not required in 647 the reverse path. 649 +---------------+ +---------------+ 650 | | | | 651 | Host 1 | | Host 2 | 652 | | | | 653 +---------------+ +---------------+ 654 | ^ 655 V | 656 +---------------+ +---------------+ +---------------+ 657 | | | | | | 658 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 659 | | | Network | | | 660 +---------------+ +---------------+ +---------------+ 662 The encapsulator and decapsulator may be co-resident with the 663 corresponding hosts, or may be on separate nodes in the network. 665 5.1. Network tunnel encapsulation 667 Network tunneling can be achieved by encapsulating layer 2 or layer 3 668 packets. In this case the encapsulator and decapsulator nodes are the 669 tunnel endpoints. These could be routers that provide network tunnels 670 on behalf of communicating hosts. 672 5.2. Transport layer encapsulation 674 When encapsulating layer 4 packets, the encapsulator and decapsulator 675 should be co-resident with the hosts. In this case, the encapsulation 676 headers are inserted between the IP header and the transport packet. 677 The addresses in the IP header refer to both the endpoints of the 678 encapsulation and the endpoints for terminating the the transport 679 protocol. Note that the transport layer ports in the encapsulated 680 packet are independent of the UDP ports in the outer packet. 682 Details about performing transport layer encapsulation are discussed 683 in [TOU]. 685 5.3. Encapsulator operation 687 Encapsulators create GUE data messages, set the fields of the UDP 688 header, set flags and optional extension fields in the GUE header, 689 and forward packets to a decapsulator. 691 An encapsulator can be an end host originating the packets of a flow, 692 or can be a network device performing encapsulation on behalf of 693 hosts (routers implementing tunnels for instance). In either case, 694 the intended target (decapsulator) is indicated by the outer 695 destination IP address and destination port in the UDP header. 697 If an encapsulator is tunneling packets -- that is encapsulating 698 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 699 tunnel mode) -- it SHOULD follow standard conventions for tunneling 700 of one protocol over another. For instance, if an IP packet is being 701 encapsualated in GUE then diffserv interaction [RFC2983] and ECN 702 propagation for tunnels [RFC6040] SHOULD be followed. 704 5.4. Decapsulator operation 706 A decapsulator performs decapsulation of GUE packets. A decapsulator 707 is addressed by the outer destination IP address of a GUE packet. 708 The decapsulator validates packets, including fields of the GUE 709 header. 711 If a decapsulator receives a GUE packet with an unsupported version, 712 unknown flag, bad header length (too small for included extension 713 fields), unknown control message type, bad protocol number, an 714 unsupported payload type, or an otherwise malformed header, it MUST 715 drop the packet. Such events MAY be logged subject to configuration 716 and rate limiting of logging messages. No error message is returned 717 back to the encapsulator. Note that set flags in a GUE header that 718 are unknown to a decapsulator MUST NOT be ignored. If a GUE packet is 719 received by a decapsulator with unknown flags, the packet MUST be 720 dropped. 722 5.4.1. Processing a received data message 724 If a valid data message is received, the UDP headers are removed from 725 the packet. The outer IP header remains intact and the next protocol 726 in the IP header is set to the protocol from the proto field in the 727 GUE header. The resulting packet is then resubmitted into the 728 protocol stack to process that packet as though it was received with 729 the protocol in the GUE header. 731 As an example, consider that a data message is received where GUE 732 encapsulates an IP packet. In this case proto field in the GUE header 733 is set 94 for IPIP: 735 +-------------------------------------+ 736 | IP header (next proto = 17,UDP) | 737 |-------------------------------------| 738 | UDP | 739 |-------------------------------------| 740 | GUE (proto = 94,IPIP) | 741 |-------------------------------------| 742 | IP header and packet | 743 +-------------------------------------+ 744 The receiver removes the UDP and GUE headers and sets the next 745 protocol field in the IP packet to IPIP, which is derived from the 746 GUE proto field. The resultant packet would have the format: 748 +-------------------------------------+ 749 | IP header (next proto = 94,IPIP) | 750 |-------------------------------------| 751 | IP header and packet | 752 +-------------------------------------+ 754 This packet is then resubmitted into the protocol stack to be 755 processed as an IPIP packet. 757 5.4.2. Processing a received control message 759 If a valid control message is received, the packet MUST be processed 760 as a control message. The specific processing to be performed depends 761 on the ctype in the GUE header. 763 5.5. Router and switch operation 765 Routers and switches SHOULD forward GUE packets as standard UDP/IP 766 packets. The outer five-tuple should contain sufficient information 767 to perform flow classification corresponding to the flow of the inner 768 packet. A router does not normally need to parse a GUE header, and 769 none of the flags or extension fields in the GUE header are expected 770 to affect routing. In cases where the outer five-tuple does not 771 provide sufficient entropy for flow classification, for instance UDP 772 ports are fixed to provide connection semantics (section 5.6.1), then 773 the encapsulated packet MAY be parsed to determine flow entropy. 775 A router MUST NOT modify a GUE header when forwarding a packet. It 776 MAY encapsulate a GUE packet in another GUE packet, for instance to 777 implement a network tunnel (i.e. by encapsulating an IP packet with a 778 GUE payload in another IP packet as a GUE payload). In this case, the 779 router takes the role of an encapsulator, and the corresponding 780 decapsulator is the logical endpoint of the tunnel. When 781 encapsulating a GUE packet within another GUE packet, there are no 782 provisions to automatically GUE copy flags or fields to the outer GUE 783 header. Each layer of encapsulation is considered independent. 785 5.6. Middlebox interactions 787 A middle box MAY interpret some flags and extension fields of the GUE 788 header for classification purposes, but is not required to understand 789 any of the flags or extension fields in GUE packets. A middle box 790 MUST NOT drop a GUE packet merely because there are flags unknown to 791 it. The header length in the GUE header allows a middlebox to inspect 792 the payload packet without needing to parse the flags or extension 793 fields. 795 5.6.1. Inferring connection semantics 797 A middlebox might infer bidirectional connection semantics for a UDP 798 flow. For instance, a stateful firewall might create a five-tuple 799 rule to match flows on egress, and a corresponding five-tuple rule 800 for matching ingress packets where the roles of source and 801 destination are reversed for the IP addresses and UDP port numbers. 802 To operate in this environment, a GUE tunnel should be configured to 803 assume connected semantics defined by the UDP five tuple and the use 804 of GUE encapsulation needs to be symmetric between both endpoints. 805 The source port set in the UDP header MUST be the destination port 806 the peer would set for replies. In this case the UDP source port for 807 a tunnel would be a fixed value and not set to be flow entropy as 808 described in section 5.11. 810 The selection of whether to make the UDP source port fixed or set to 811 a flow entropy value for each packet sent SHOULD be configurable for 812 a tunnel. The default MUST be to set the flow entropy value in the 813 UDP source port. 815 5.6.2. NAT 817 IP address and port translation can be performed on the UDP/IP 818 headers adhering to the requirements for NAT with UDP [RFC4787]. In 819 the case of stateful NAT, connection semantics MUST be applied to a 820 GUE tunnel as described in section 5.6.1. GUE endpoints MAY also 821 invoke STUN [RFC5389] or ICE [RFC5245] to manage NAT port mappings 822 for encapsulations. 824 5.7. Checksum Handling 826 The potential for mis-delivery of packets due to corruption of IP, 827 UDP, or GUE headers needs to be considered. Historically, the UDP 828 checksum would be considered sufficient as a check against corruption 829 of either the UDP header and payload or the IP addresses. 830 Encapsulation protocols, such as GUE, can be originated or terminated 831 on devices incapable of computing the UDP checksum for packet. This 832 section discusses the requirements around checksum and alternatives 833 that might be used when an endpoint does not support UDP checksum. 835 5.7.1. Requirements 837 One of the following requirements MUST be met: 839 o UDP checksums are enabled (for IPv4 or IPv6). 841 o The GUE header checksum is used (defined in [GUEEXTENS]). 843 o Use zero UDP checksums. This is always permissible with IPv4; in 844 IPv6, they can only be used in accordance with applicable 845 requirements in [RFC8086], [RFC6935], and [RFC6936]. 847 5.7.2. UDP Checksum with IPv4 849 For UDP in IPv4, the UDP checksum MUST be processed as specified in 850 [RFC768] and [RFC1122] for both transmit and receive. An 851 encapsulator MAY set the UDP checksum to zero for performance or 852 implementation considerations. The IPv4 header includes a checksum 853 that protects against mis-delivery of the packet due to corruption 854 of IP addresses. The UDP checksum potentially provides protection 855 against corruption of the UDP header, GUE header, and GUE payload. 856 Enabling or disabling the use of checksums is a deployment 857 consideration that should take into account the risk and effects of 858 packet corruption, and whether the packets in the network are 859 already adequately protected by other, possibly stronger mechanisms 860 such as the Ethernet CRC. If an encapsulator sets a zero UDP 861 checksum for IPv4, it SHOULD use the GUE header checksum as 862 described in [GUEEXTENS] assuming there are no other mechanisms used 863 to protect the GUE packet. 865 When a decapsulator receives a packet, the UDP checksum field MUST 866 be processed. If the UDP checksum is non-zero, the decapsulator MUST 867 verify the checksum before accepting the packet. By default, a 868 decapsulator SHOULD accept UDP packets with a zero checksum. A node 869 MAY be configured to disallow zero checksums per [RFC1122]. 870 Configuration of zero checksums can be selective. For instance, zero 871 checksums might be disallowed from certain hosts that are known to 872 be sending over paths subject to packet corruption. If verification 873 of a non-zero checksum fails, a decapsulator lacks the capability to 874 verify a non-zero checksum, or a packet with a zero-checksum was 875 received and the decapsulator is configured to disallow, the packet 876 MUST be dropped. 878 5.7.3. UDP Checksum with IPv6 880 In IPv6, there is no checksum in the IPv6 header that protects 881 against mis-delivery due to address corruption. Therefore, when GUE 882 is used over IPv6, either the UDP checksum or the GUE header 883 checksum SHOULD be used unless there are alternative mechanisms in 884 use that protect against misdelivery. The UDP checksum and GUE 885 header checksum SHOULD not be used at the same time since that would 886 be mostly redundant. 888 If neither the UDP checksum or the GUE header checksum is used, then 889 the requirements for using zero IPv6 UDP checksums in [RFC6935] and 890 [RFC6936] MUST be met. 892 When a decapsulator receives a packet, the UDP checksum field MUST 893 be processed. If the UDP checksum is non-zero, the decapsulator MUST 894 verify the checksum before accepting the packet. By default a 895 decapsulator MUST only accept UDP packets with a zero checksum if 896 the GUE header checksum is used and is verified. If verification of 897 a non-zero checksum fails, a decapsulator lacks the capability to 898 verify a non-zero checksum, or a packet with a zero-checksum and no 899 GUE header checksum was received, the packet MUST be dropped. 901 5.8. MTU and fragmentation 903 Standard conventions for handling of MTU (Maximum Transmission Unit) 904 and fragmentation in conjunction with networking tunnels 905 (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. 906 Details are described in MTU and Fragmentation Issues with In-the- 907 Network Tunneling [RFC4459]. 909 If a packet is fragmented before encapsulation in GUE, all the 910 related fragments MUST be encapsulated using the same UDP source 911 port. An operator SHOULD set MTU to account for encapsulation 912 overhead and reduce the likelihood of fragmentation. 914 Alternative to IP fragmentation, the GUE fragmentation extension can 915 be used. GUE fragmentation is described in [GUEEXTENS]. 917 5.9. Congestion control 919 Per requirements of [RFC5405], if the IP traffic encapsulated with 920 GUE implements proper congestion control no additional mechanisms 921 should be required. 923 In the case that the encapsulated traffic does not implement any or 924 sufficient control, or it is not known whether a transmitter will 925 consistently implement proper congestion control, then congestion 926 control at the encapsulation layer MUST be provided per [RFC5405]. 927 Note that this case applies to a significant use case in network 928 virtualization in which guests run third party networking stacks 929 that cannot be implicitly trusted to implement conformant congestion 930 control. 932 Out of band mechanisms such as rate limiting, Managed Circuit 933 Breaker [CIRCBRK], or traffic isolation MAY be used to provide 934 rudimentary congestion control. For finer-grained congestion control 935 that allows alternate congestion control algorithms, reaction time 936 within an RTT, and interaction with ECN, in-band mechanisms might be 937 warranted. 939 5.10. Multicast 941 GUE packets can be multicast to decapsulators using a multicast 942 destination address in the encapsulating IP headers. Each receiving 943 host will decapsulate the packet independently following normal 944 decapsulator operations. The receiving decapsulators need to agree 945 on the same set of GUE parameters and properties; how such an 946 agreement is reached is outside the scope of this document. 948 GUE allows encapsulation of unicast, broadcast, or multicast 949 traffic. Flow entropy (the value in the UDP source port) can be 950 generated from the header of encapsulated unicast or 951 broadcast/multicast packets at an encapsulator. The mapping 952 mechanism between the encapsulated multicast traffic and the 953 multicast capability in the IP network is transparent and 954 independent of the encapsulation and is otherwise outside the scope 955 of this document. 957 5.11. Flow entropy for ECMP 959 5.11.1. Flow classification 961 A major objective of using GUE is that a network device can perform 962 flow classification corresponding to the flow of the inner 963 encapsulated packet based on the contents in the outer headers. 965 Hardware devices commonly perform hash computations on packet 966 headers to classify packets into flows or flow buckets. Flow 967 classification is done to support load balancing of flows across a 968 set of networking resources. Examples of such load balancing 969 techniques are Equal Cost Multipath routing (ECMP), port selection 970 in Link Aggregation, and NIC device Receive Side Scaling (RSS). 971 Hashes are usually either a three-tuple hash of IP protocol, source 972 address, and destination address; or a five-tuple hash consisting of 973 IP protocol, source address, destination address, source port, and 974 destination port. Typically, networking hardware will compute five- 975 tuple hashes for TCP and UDP, but only three-tuple hashes for other 976 IP protocols. Since the five-tuple hash provides more granularity, 977 load balancing can be finer-grained with better distribution. When a 978 packet is encapsulated with GUE and connection semantics are not 979 applied, the source port in the outer UDP packet is set to a flow 980 entropy value that corresponds to the flow of the inner packet. When 981 a device computes a five-tuple hash on the outer UDP/IP header of a 982 GUE packet, the resultant value classifies the packet per its inner 983 flow. 985 Examples of deriving flow entropy for encapsulation are: 987 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 988 instance, the flow entropy could be based on the canonical five- 989 tuple hash of the inner packet. 991 o If the encapsulated packet is an AH transport mode packet with 992 TCP as next header, the flow entropy could be a hash over a 993 three-tuple: TCP protocol and TCP ports of the encapsulated 994 packet. 996 o If a node is encrypting a packet using ESP tunnel mode and GUE 997 encapsulation, the flow entropy could be based on the contents 998 of the clear-text packet. For instance, a canonical five-tuple 999 hash for a TCP/IP packet could be used. 1001 [RFC6438] discusses methods to compute and set flow entropy value for 1002 IPv6 flow labels. Such methods can also be used to create flow 1003 entropy values for GUE. 1005 5.11.2. Flow entropy properties 1007 The flow entropy is the value set in the UDP source port of a GUE 1008 packet. Flow entropy in the UDP source port SHOULD adhere to the 1009 following properties: 1011 o The value set in the source port is within the ephemeral port 1012 range (49152 to 65535 [RFC6335]). Since the high order two bits 1013 of the port are set to one, this provides fourteen bits of 1014 entropy for the value. 1016 o The flow entropy has a uniform distribution across encapsulated 1017 flows. 1019 o An encapsulator MAY occasionally change the flow entropy used 1020 for an inner flow per its discretion (for security, route 1021 selection, etc). To avoid thrashing or flapping the value, the 1022 flow entropy used for a flow SHOULD NOT change more than once 1023 every thirty seconds (or a configurable value). 1025 o Decapsulators, or any networking devices, SHOULD NOT attempt to 1026 interpret flow entropy as anything more than an opaque value. 1027 Neither should they attempt to reproduce the hash calculation 1028 used by an encapasulator in creating a flow entropy value. They 1029 MAY use the value to match further receive packets for steering 1030 decisions, but MUST NOT assume that the hash uniquely or 1031 permanently identifies a flow. 1033 o Input to the flow entropy calculation is not restricted to ports 1034 and addresses; input could include flow label from an IPv6 1035 packet, SPI from an ESP packet, or other flow related state in 1036 the encapsulator that is not necessarily conveyed in the packet. 1038 o The assignment function for flow entropy SHOULD be randomly 1039 seeded to mitigate denial of service attacks. The seed SHOULD be 1040 changed periodically. 1042 5.12 Negotiation of acceptable flags and extension fields 1044 An encapsulator and decapsulator need to achieve agreement about GUE 1045 parameters will be used in communications. Parameters include GUE 1046 version, flags and extension fields that can be used, security 1047 algorithms and keys, supported protocols and control messages, etc. 1048 This document proposes different general methods to accomplish this, 1049 however the details of implementing these are considered out of 1050 scope. 1052 General methods for this are: 1054 o Configuration. The parameters used for a tunnel are configured 1055 at each endpoint. 1057 o Negotiation. A tunnel negotiation can be performed. This could 1058 be accomplished in-band of GUE using control messages or private 1059 data. 1061 o Via a control plane. Parameters for communicating with a tunnel 1062 endpoint can be set in a control plane protocol (such as that 1063 needed for nvo3). 1065 o Via security negotiation. Use of security typically implies a 1066 key exchange between endpoints. Other GUE parameters may be 1067 conveyed as part of that process. 1069 6. Motivation for GUE 1071 This section presents the motivation for GUE with respect to other 1072 encapsulation methods. 1074 6.1. Benefits of GUE 1076 * GUE is a generic encapsulation protocol. GUE can encapsulate 1077 protocols that are represented by an IP protocol number. This 1078 includes layer 2, layer 3, and layer 4 protocols. 1080 * GUE is an extensible encapsulation protocol. Standardized 1081 optional data such as security, virtual networking identifiers, 1082 fragmentation are being defined. 1084 * For extensilbity, GUE uses flag fields as opposed to TLVs as 1085 some other encapsulation protocols do. Flag fields are strictly 1086 ordered, allow random access, and are efficient in use of header 1087 space. 1089 * GUE allows private data to be sent as part of the encapsulation. 1090 This permits experimentation or customization in deployment. 1092 * GUE allows sending of control messages such as OAM using the 1093 same GUE header format (for routing purposes) as normal data 1094 messages. 1096 * GUE maximizes deliverability of non-UDP and non-TCP protocols. 1098 * GUE provides a means for exposing per flow entropy for ECMP for 1099 atypical protocols such as SCTP, DCCP, ESP, etc. 1101 6.2 Comparison of GUE to other encapsulations 1103 A number of different encapsulation techniques have been proposed for 1104 the encapsulation of one protocol over another. EtherIP [RFC3378] 1105 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 1106 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 1107 layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN 1108 [RFC7348] are proposals for encapsulation of layer 2 packets for 1109 network virtualization. IPIP [RFC2003] and Generic packet tunneling 1110 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 1112 Several proposals exist for encapsulating packets over UDP including 1113 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN 1114 [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, 1115 MPLS/UDP [7510], and Generic UDP Encapsulation for IP Tunneling (GRE 1116 over UDP)[RFC8086]. Generic UDP tunneling [GUT] is a proposal similar 1117 to GUE in that it aims to tunnel packets of IP protocols over UDP. 1119 GUE has the following discriminating features: 1121 o UDP encapsulation leverages specialized network device 1122 processing for efficient transport. The semantics for using the 1123 UDP source port for flow entropy as input to ECMP are defined in 1124 section 5.11. 1126 o GUE permits encapsulation of arbitrary IP protocols, which 1127 includes layer 2 3, and 4 protocols. 1129 o Multiple protocols can be multiplexed over a single UDP port 1130 number. This is in contrast to techniques to encapsulate 1131 protocols over UDP using a protocol specific port number (such 1132 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 1133 extensible mechanism for encapsulating all IP protocols in UDP 1134 with minimal overhead (four bytes of additional header). 1136 o GUE is extensible. New flags and extension fields can be 1137 defined. 1139 o The GUE header includes a header length field. This allows a 1140 network node to inspect an encapsulated packet without needing 1141 to parse the full encapsulation header. 1143 o Private data in the encapsulation header allows local 1144 customization and experimentation while being compatible with 1145 processing in network nodes (routers and middleboxes). 1147 o GUE includes both data messages (encapsulation of packets) and 1148 control messages (such as OAM). 1150 o The flags-field model facilitates efficient implementation of 1151 extensibility in hardware. 1153 For instance a TCAM can be use to parse a known set of N flags 1154 where the number of entries in the TCAM is 2^N. 1156 For comparison, the number of TCAM entries needed to parse a set 1157 of N arbitrarily ordered TLVS is approximately e*N!. 1159 7. Security Considerations 1161 There are two important considerations of security with respect to 1162 GUE. 1164 o Authentication and integrity of the GUE header. 1166 o Authentication, integrity, and confidentiality of the GUE 1167 payload. 1169 GUE security is provided by extensions for security defined in 1170 [GUEEXTENS]. These extensions include methods to authenticate the GUE 1171 header and encrypt the GUE payload. 1173 The GUE header can be authenticated using a security extension for an 1174 HMAC. Securing the GUE payload can be accomplished use of the GUE 1175 Payload Transform. This extension can be used to perform DTLS in the 1176 payload of a GUE packet to encrypt the payload. 1178 A hash function for computing flow entropy (section 5.11) SHOULD be 1179 randomly seeded to mitigate some possible denial service attacks. 1181 8. IANA Consideration 1183 8.1. UDP source port 1185 A user UDP port number assignment for GUE has been assigned: 1187 Service Name: gue 1188 Transport Protocol(s): UDP 1189 Assignee: Tom Herbert 1190 Contact: Tom Herbert 1191 Description: Generic UDP Encapsulation 1192 Reference: draft-herbert-gue 1193 Port Number: 6080 1194 Service Code: N/A 1195 Known Unauthorized Uses: N/A 1196 Assignment Notes: N/A 1198 8.2. GUE version number 1200 IANA is requested to set up a registry for the GUE version number. 1201 The GUE version number is 2 bits containing four possible values. 1202 This document defines version 0 and 1. New values are assigned in 1203 accordance with RFC Required policy [RFC5226]. 1205 +----------------+-------------+---------------+ 1206 | Version number | Description | Reference | 1207 +----------------+-------------+---------------+ 1208 | 0 | Version 0 | This document | 1209 | | | | 1210 | 1 | Version 1 | This document | 1211 | | | | 1212 | 2..3 | Unassigned | | 1213 +----------------+-------------+---------------+ 1215 8.3. Control types 1217 IANA is requested to set up a registry for the GUE control types. 1218 Control types are 8 bit values. New values for control types 1-127 1219 are assigned in accordance with RFC Required policy [RFC5226]. 1221 +----------------+------------------+---------------+ 1222 | Control type | Description | Reference | 1223 +----------------+------------------+---------------+ 1224 | 0 | Need further | This document | 1225 | | interpretation | | 1226 | | | | 1227 | 1..127 | Unassigned | | 1228 | | | | 1229 | 128..255 | User defined | This document | 1230 +----------------+------------------+---------------+ 1232 8.4. Flag-fields 1234 IANA is requested to create a "GUE flag-fields" registry to allocate 1235 flags and extension fields used with GUE. This shall be a registry of 1236 bit assignments for flags, length of extension fields for 1237 corresponding flags, and descriptive strings. There are sixteen bits 1238 for primary GUE header flags (bit number 0-15). New values are 1239 assigned in accordance with RFC Required policy [RFC5226]. 1241 +-------------+--------------+-------------+--------------------+ 1242 | Flags bits | Field size | Description | Reference | 1243 +-------------+--------------+-------------+--------------------+ 1244 | Bit 0 | 4 bytes | VNID | [GUE4NVO3] | 1245 | | | | | 1246 | Bit 1..3 | 001->8 bytes | Security | [GUEEXTENS] | 1247 | | 010->16 bytes| | | 1248 | | 011->32 bytes| | | 1249 | | | | | 1250 | Bit 4 | 8 bytes | Fragmen- | [GUEEXTENS] | 1251 | | | tation | | 1252 | | | | | 1253 | Bit 5 | 4 bytes | Payload | [GUEEXTENS] | 1254 | | | transform | | 1255 | | | | | 1256 | Bit 6 | 4 bytes | Remote | [GUEEXTENS] | 1257 | | | checksum | | 1258 | | | offload | | 1259 | | | | | 1260 | Bit 7 | 4 bytes | Checksum | [GUEEXTENS] | 1261 | | | | | 1262 | Bit 8..15 | | Unassigned | | 1263 +-------------+--------------+-------------+--------------------+ 1265 New flags are to be allocated from high to low order bit contiguously 1266 without holes. 1268 9. Acknowledgements 1270 The authors would like to thank David Liu, Erik Nordmark, Fred 1271 Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for 1272 valuable input on this draft. 1274 10. References 1276 10.1. Normative References 1278 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1279 10.17487/RFC0768, August 1980, . 1282 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1283 Communication Layers", STD 3, RFC 1122, DOI 1284 10.17487/RFC1122, October 1989, . 1287 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1288 IANA Considerations Section in RFCs", RFC 2434, DOI 1289 10.17487/RFC2434, October 1998, . 1292 [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 1293 2983, DOI 10.17487/RFC2983, October 2000, . 1296 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1297 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1298 2010, . 1300 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1301 UDP Checksums for Tunneled Packets", RFC 6935, DOI 1302 10.17487/RFC6935, April 2013, . 1305 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1306 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1307 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1308 . 1310 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1311 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1312 2006, . 1314 10.2. Informative References 1316 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1317 and G. Fairhurst, Ed., "The Lightweight User Datagram 1318 Protocol (UDP-Lite)", RFC 3828, July 2004, 1319 . 1321 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1322 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1323 eXtensible Local Area Network (VXLAN): A Framework for 1324 Overlaying Virtualized Layer 2 Networks over Layer 3 1325 Networks", RFC 7348, August 2014, . 1328 [RFC7605] Touch, J., "Recommendations on Using Assigned Transport 1329 Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, 1330 August 2015, . 1332 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 1333 Virtualization Using Generic Routing Encapsulation", RFC 1334 7637, DOI 10.17487/RFC7637, September 2015, 1335 . 1337 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 1338 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 1339 March 2017, . 1341 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1342 "Encapsulating MPLS in UDP", RFC 7510, DOI 1343 10.17487/RFC7510, April 2015, . 1346 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1347 Congestion Control Protocol (DCCP)", RFC 4340, DOI 1348 10.17487/RFC4340, March 2006, . 1351 [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address 1352 Translation (NAT) Behavioral Requirements for Unicast 1353 UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January 1354 2007, . 1356 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1357 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1358 DOI 10.17487/RFC5389, October 2008, . 1361 [RFC5285] Rosenberg, J., "Interactive Connectivity Establishment 1362 (ICE): A Protocol for Network Address Translator (NAT) 1363 Traversal for Offer/Answer Protocols", RFC 5245, DOI 1364 10.17487/RFC5245, April 2010, . 1367 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 1368 for Application Designers", BCP 145, RFC 5405, DOI 1369 10.17487/RFC5405, November 2008, . 1372 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1373 for Equal Cost Multipath Routing and Link Aggregation in 1374 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1375 . 1377 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 1378 10.17487/RFC2003, October 1996, . 1381 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 1382 Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC 1383 3948, DOI 10.17487/RFC3948, January 2005, . 1386 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1387 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 1388 10.17487/RFC6830, January 2013, . 1391 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 1392 Ethernet Frames in IP Datagrams", RFC 3378, DOI 1393 10.17487/RFC3378, September 2002, . 1396 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1397 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1398 DOI 10.17487/RFC2784, March 2000, . 1401 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1402 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1403 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1404 . 1406 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1407 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1408 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1409 . 1411 [GUEEXTENS] Herbert, T., Yong, L., and Templin, F., "Extensions for 1412 Generic UDP Encapsulation" draft-herbert-gue-extensions-00 1414 [GUE4NVO3] Yong, L., Herbert, T., Zia, O., "Generic UDP 1415 Encapsulation (GUE) for Network Virtualization Overlay" 1416 draft-hy-nvo3-gue-4-nvo-03 1418 [GUESEC] Yong, L., Herbert, T., "Generic UDP Encapsulation (GUE) for 1419 Secure Transport" draft-hy-gue-4-secure-transport-03 1421 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 1422 "Encapsulation of TCP and other Transport Protocols over 1423 UDP" draft-cheshire-tcp-over-udp-00 1425 [TOU] Herbert, T., "Transport layer protocols over UDP" draft- 1426 herbert-transports-over-udp-00 1428 [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP 1429 Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" 1431 [CIRCBRK] Fairhurst, G., "Network Transport Circuit Breakers", 1432 draft-ietf-tsvwg-circuit-breaker-15 1434 [LCO] Cree, E., https://www.kernel.org/doc/Documentation/ 1435 networking/checksum-offloads.txt 1437 Appendix A: NIC processing for GUE 1439 This appendix provides some guidelines for Network Interface Cards 1440 (NICs) to implement common offloads and accelerations to support GUE. 1441 Note that most of this discussion is generally applicable to other 1442 methods of UDP based encapsulation. 1444 A.1. Receive multi-queue 1446 Contemporary NICs support multiple receive descriptor queues (multi- 1447 queue). Multi-queue enables load balancing of network processing for 1448 a NIC across multiple CPUs. On packet reception, a NIC selects the 1449 appropriate queue for host processing. Receive Side Scaling is a 1450 common method which uses the flow hash for a packet to index an 1451 indirection table where each entry stores a queue number. Flow 1452 Director and Accelerated Receive Flow Steering (aRFS) allow a host to 1453 program the queue that is used for a given flow which is identified 1454 either by an explicit five-tuple or by the flow's hash. 1456 GUE encapsulation is compatible with multi-queue NICs that support 1457 five-tuple hash calculation for UDP/IP packets as input to RSS. The 1458 flow entropy in the UDP source port ensures classification of the 1459 encapsulated flow even in the case that the outer source and 1460 destination addresses are the same for all flows (e.g. all flows are 1461 going over a single tunnel). 1463 By default, UDP RSS support is often disabled in NICs to avoid out- 1464 of-order reception that can occur when UDP packets are fragmented. As 1465 discussed above, fragmentation of GUE packets is be mostly avoided by 1466 fragmenting packets before entering a tunnel, GUE fragmentation, path 1467 MTU discovery in higher layer protocols, or operator adjusting MTUs. 1468 Other UDP traffic might not implement such procedures to avoid 1469 fragmentation, so enabling UDP RSS support in the NIC might be a 1470 considered tradeoff during configuration. 1472 A.2. Checksum offload 1474 Many NICs provide capabilities to calculate standard ones complement 1475 payload checksum for packets in transmit or receive. When using GUE 1476 encapsulation, there are at least two checksums that are of interest: 1477 the encapsulated packet's transport checksum, and the UDP checksum in 1478 the outer header. 1480 A.2.1. Transmit checksum offload 1482 NICs can provide a protocol agnostic method to offload transmit 1483 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1484 GUE. In this method, the host provides checksum related parameters in 1485 a transmit descriptor for a packet. These parameters include the 1486 starting offset of data to checksum, the length of data to checksum, 1487 and the offset in the packet where the computed checksum is to be 1488 written. The host initializes the checksum field to pseudo header 1489 checksum. 1491 In the case of GUE, the checksum for an encapsulated transport layer 1492 packet, a TCP packet for instance, can be offloaded by setting the 1493 appropriate checksum parameters. 1495 NICs typically can offload only one transmit checksum per packet, so 1496 simultaneously offloading both an inner transport packet's checksum 1497 and the outer UDP checksum is likely not possible. 1499 If an encapsulator is co-resident with a host, then checksum offload 1500 may be performed using remote checksum offload (described in 1501 [GUEEXTENS]). Remote checksum offload relies on NIC offload of the 1502 simple UDP/IP checksum which is commonly supported even in legacy 1503 devices. In remote checksum offload, the outer UDP checksum is set 1504 and the GUE header includes an option indicating the start and offset 1505 of the inner "offloaded" checksum. The inner checksum is initialized 1506 to the pseudo header checksum. When a decapsulator receives a GUE 1507 packet with the remote checksum offload option, it completes the 1508 offload operation by determining the packet checksum from the 1509 indicated start point to the end of the packet, and then adds this 1510 into the checksum field at the offset given in the option. Computing 1511 the checksum from the start to end of packet is efficient if 1512 checksum-complete is provided on the receiver. 1514 Another alternative when an encapsulator is co-resident with a host 1515 is to perform Local Checksum Offload [LCO]. In this method, the inner 1516 transport layer checksum is offloaded and the outer UDP checksum can 1517 be deduced based on the fact that the portion of the packet covered 1518 by the inner transport checksum will sum to zero (or at least the bit 1519 wise "not" of the inner pseudo header). 1521 A.2.2. Receive checksum offload 1523 GUE is compatible with NICs that perform a protocol agnostic receive 1524 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1525 NIC computes a ones complement checksum over all (or some predefined 1526 portion) of a packet. The computed value is provided to the host 1527 stack in the packet's receive descriptor. The host driver can use 1528 this checksum to "patch up" and validate any inner packet transport 1529 checksum, as well as the outer UDP checksum if it is non-zero. 1531 Many legacy NICs don't provide checksum-complete but instead provide 1532 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1533 in Linux). Usually, such validation is only done for simple TCP/IP or 1534 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1535 checksum-complete value for the UDP packet is the "not" of the pseudo 1536 header checksum. In this way, checksum-unnecessary can be converted 1537 to checksum-complete. So, if the NIC provides checksum-unnecessary 1538 for the outer UDP header in an encapsulation, checksum conversion can 1539 be done so that the checksum-complete value is derived and can be 1540 used by the stack to validate checksums in the encapsulated packet. 1542 A.3. Transmit Segmentation Offload 1544 Transmit Segmentation Offload (TSO) is a NIC feature where a host 1545 provides a large (>MTU size) TCP packet to the NIC, which in turn 1546 splits the packet into separate segments and transmits each one. This 1547 is useful to reduce CPU load on the host. 1549 The process of TSO can be generalized as: 1551 - Split the TCP payload into segments which allow packets with 1552 size less than or equal to MTU. 1554 - For each created segment: 1556 1. Replicate the TCP header and all preceding headers of the 1557 original packet. 1559 2. Set payload length fields in any headers to reflect the 1560 length of the segment. 1562 3. Set TCP sequence number to correctly reflect the offset of 1563 the TCP data in the stream. 1565 4. Recompute and set any checksums that either cover the payload 1566 of the packet or cover header which was changed by setting a 1567 payload length. 1569 Following this general process, TSO can be extended to support TCP 1570 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1571 header, GUE header, inner IP header (if tunneling), and TCP headers 1572 are replicated. Any packet length header fields need to be set 1573 properly (including the length in the outer UDP header), and 1574 checksums need to be set correctly (including the outer UDP checksum 1575 if being used). 1577 To facilitate TSO with GUE, it is recommended that extension fields 1578 do not contain values that need to be updated on a per segment basis. 1579 For example, extension fields should not include checksums, lengths, 1580 or sequence numbers that refer to the payload. If the GUE header does 1581 not contain such fields then the TSO engine only needs to copy the 1582 bits in the GUE header when creating each segment and does not need 1583 to parse the GUE header. 1585 A.4. Large Receive Offload 1587 Large Receive Offload (LRO) is a NIC feature where packets of a TCP 1588 connection are reassembled, or coalesced, in the NIC and delivered to 1589 the host as one large packet. This feature can reduce CPU utilization 1590 in the host. 1592 LRO requires significant protocol awareness to be implemented 1593 correctly and is difficult to generalize. Packets in the same flow 1594 need to be unambiguously identified. In the presence of tunnels or 1595 network virtualization, this may require more than a five-tuple match 1596 (for instance packets for flows in two different virtual networks may 1597 have identical five-tuples). Additionally, a NIC needs to perform 1598 validation over packets that are being coalesced, and needs to 1599 fabricate a single meaningful header from all the coalesced packets. 1601 The conservative approach to supporting LRO for GUE would be to 1602 assign packets to the same flow only if they have identical five- 1603 tuple and were encapsulated the same way. That is the outer IP 1604 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1605 and inner five tuple are all identical. 1607 Appendix B: Implementation considerations 1609 This appendix is informational and does not constitute a normative 1610 part of this document. 1612 B.1. Priveleged ports 1614 Using the source port to contain a flow entropy value disallows the 1615 security method of a receiver enforcing that the source port be a 1616 privileged port. Privileged ports are defined by some operating 1617 systems to restrict source port binding. Unix, for instance, 1618 considered port number less than 1024 to be privileged. 1620 Enforcing that packets are sent from a privileged port is widely 1621 considered an inadequate security mechanism and has been mostly 1622 deprecated. To approximate this behavior, an implementation could 1623 restrict a user from sending a packet destined to the GUE port 1624 without proper credentials. 1626 B.2. Setting flow entropy as a route selector 1628 An encapsulator generating flow entropy in the UDP source port could 1629 modulate the value to perform a type of multipath source routing. 1630 Assuming that networking switches perform ECMP based on the flow 1631 hash, a sender can affect the path by altering the flow entropy. For 1632 instance, a host can store a flow hash in its PCB for an inner flow, 1633 and might alter the value upon detecting that packets are traversing 1634 a lossy path. Changing the flow entropy for a flow SHOULD be subject 1635 to hysteresis (at most once every thirty seconds) to limit the number 1636 of out of order packets. 1638 B.3. Hardware protocol implementation considerations 1640 Low level data path protocol, such is GUE, are often supported in 1641 high speed network device hardware. Variable length header (VLH) 1642 protocols like GUE are often considered difficult to efficiently 1643 implement in hardware. In order to retain the important 1644 characteristics of an extensible and robust protocol, hardware 1645 vendors may practice "constrained flexibility". In this model, only 1646 certain combinations or protocol header parameterizations are 1647 implemented in hardware fast path. Each such parameterization is 1648 fixed length so that the particular instance can be optimized as a 1649 fixed length protocol. In the case of GUE this constitutes specific 1650 combinations of GUE flags, fields, and next protocol. The selected 1651 combinations would naturally be the most common cases which form the 1652 "fast path", and other combinations are assumed to take the "slow 1653 path". 1655 In time, needs and requirements of the protocol may change which may 1656 manifest themselves as new parameterizations to be supported in the 1657 fast path. To allow allow this extensibility, a device practicing 1658 constrained flexibility should allow the fast path parameterizations 1659 to be programmable. 1661 Authors' Addresses 1663 Tom Herbert 1664 Quantonium 1665 4701 Patrick Henry 1666 Santa Clara, CA 95054 1667 US 1669 Email: tom@herbertland.com 1671 Lucy Yong 1672 Huawei USA 1673 5340 Legacy Dr. 1675 Plano, TX 75024 1676 US 1678 Email: lucy.yong@huawei.com 1680 Osama Zia 1681 Microsoft 1682 1 Microsoft Way 1683 Redmond, WA 98029 1684 US 1686 Email: osamaz@microsoft.com