idnits 2.17.1 draft-ietf-intarea-gue-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 30, 2017) is 2308 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2119' is mentioned on line 230, but not defined == Missing Reference: 'RFC2992' is mentioned on line 299, but not defined == Missing Reference: 'GUEXTENS' is mentioned on line 1249, but not defined == Missing Reference: 'RFC2460' is mentioned on line 547, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC5245' is mentioned on line 828, but not defined ** Obsolete undefined reference: RFC 5245 (Obsoleted by RFC 8445, RFC 8839) == Missing Reference: 'RFC768' is mentioned on line 857, but not defined == Missing Reference: 'RFC6335' is mentioned on line 1019, but not defined == Missing Reference: 'RFC2473' is mentioned on line 1117, but not defined == Missing Reference: 'RFC5226' is mentioned on line 1247, but not defined ** Obsolete undefined reference: RFC 5226 (Obsoleted by RFC 8126) == Unused Reference: 'RFC2434' is defined on line 1270, but no explicit reference was found in the text == Unused Reference: 'RFC3828' is defined on line 1299, but no explicit reference was found in the text == Unused Reference: 'RFC7605' is defined on line 1311, but no explicit reference was found in the text == Unused Reference: 'RFC4340' is defined on line 1329, but no explicit reference was found in the text == Unused Reference: 'RFC5285' is defined on line 1344, but no explicit reference was found in the text == Unused Reference: 'GUE4NVO3' is defined on line 1401, but no explicit reference was found in the text == Unused Reference: 'GUESEC' is defined on line 1405, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5245 (ref. 'RFC5285') (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 5405 (Obsoleted by RFC 8085) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-01) exists of draft-herbert-gue-extensions-00 == Outdated reference: A later version (-04) exists of draft-hy-nvo3-gue-4-nvo-03 == Outdated reference: A later version (-01) exists of draft-herbert-transports-over-udp-00 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-05 Summary: 6 errors (**), 0 flaws (~~), 21 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Area WG T. Herbert 3 Internet-Draft Quantonium 4 Intended status: Standard track L. Yong 5 Expires June 30, 2017 Huawei USA 6 O. Zia 7 Microsoft 8 December 30, 2017 10 Generic UDP Encapsulation 11 draft-ietf-intarea-gue-05 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on June 30, 2018. 36 Copyright Notice 38 Copyright (c) 2017 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of different IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 as tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.1. Terminology and acronyms . . . . . . . . . . . . . . . . . 5 75 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 6 76 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 7 77 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 7 78 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 79 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 8 80 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 9 81 3.2.1 Proto field . . . . . . . . . . . . . . . . . . . . . . 9 82 3.2.2 Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 83 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 10 84 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 10 85 3.3.2. Example GUE header with extension fields . . . . . . . 11 86 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 12 87 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 12 88 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 12 89 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 90 3.6. Hiding the transport layer protocol number . . . . . . . . 13 91 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 14 93 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 15 94 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 95 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 16 96 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 16 97 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 16 98 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 17 99 5.4.1. Processing a received data message . . . . . . . . . . 17 100 5.4.2. Processing a received control message . . . . . . . . . 18 101 5.5. Router and switch operation . . . . . . . . . . . . . . . . 18 102 5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 18 103 5.6.1. Inferring connection semantics . . . . . . . . . . . . 19 104 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 19 105 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 19 106 5.7.1. Requirements . . . . . . . . . . . . . . . . . . . . . 19 107 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 20 108 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 20 109 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 21 110 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 21 111 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 22 112 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 22 113 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 22 114 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 23 115 5.12 Negotiation of acceptable flags and extension fields . . . 24 116 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 24 117 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 24 118 6.2 Comparison of GUE to other encapsulations . . . . . . . . . 25 119 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 26 120 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 27 121 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 27 122 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 28 123 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 28 124 8.4. Flag-fields . . . . . . . . . . . . . . . . . . . . . . . . 28 125 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 29 126 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 127 10.1. Normative References . . . . . . . . . . . . . . . . . . . 29 128 10.2. Informative References . . . . . . . . . . . . . . . . . . 30 129 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 32 130 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 32 131 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 33 132 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 33 133 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 34 134 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 34 135 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 35 136 Appendix B: Implementation considerations . . . . . . . . . . . . 36 137 B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 36 138 B.2. Setting flow entropy as a route selector . . . . . . . . . 36 139 B.3. Hardware protocol implementation considerations . . . . . . 36 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 37 142 1. Introduction 144 This specification describes Generic UDP Encapsulation (GUE) which is 145 a general method for encapsulating packets of arbitrary IP protocols 146 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 147 packets in UDP facilitates efficient transport across networks. 148 Networking devices widely provide protocol specific processing and 149 optimizations for UDP (as well as TCP) packets. Packets for atypical 150 IP protocols (those not usually parsed by networking hardware) can be 151 encapsulated in UDP packets to maximize deliverability and to 152 leverage flow specific mechanisms for routing and packet steering. 154 GUE provides an extensible header format for including optional data 155 in the encapsulation header. This data potentially covers items such 156 as the virtual networking identifier, security data for validating or 157 authenticating the GUE header, congestion control data, etc. GUE also 158 allows private optional data in the encapsulation header. This 159 feature can be used by a site or implementation to define local 160 custom optional data, and allows experimentation of options that may 161 eventually become standard. 163 This document does not define any specific GUE extensions. [GUEEXTEN] 164 specifies a set of core extensions. 166 The motivation for the GUE protocol is described in section 6. 168 1.1. Terminology and acronyms 170 GUE Generic UDP Encapsulation 172 GUE Header A variable length protocol header that is composed 173 of a primary four byte header and zero or more four 174 byte words for optional header data 176 GUE packet A UDP/IP packet that contains a GUE header and GUE 177 payload within the UDP payload 179 GUE variant A version of the GUE protocol or an alternate form 180 of a version 182 Encapsulator A network node that encapsulates packets in GUE 184 Decapsulator A network node that decapsulates and processes 185 packets encapsulated in GUE 187 Data message An encapsulated packet in the GUE payload that is 188 addressed to the protocol stack for an associated 189 protocol 191 Control message A formatted message in the GUE payload that is 192 implicitly addressed to the decapsulator to monitor 193 or control the state or behavior of a tunnel 195 Flags A set of bit flags in the primary GUE header 197 Extension field 198 An optional field in a GUE header whose presence is 199 indicated by corresponding flag(s) 201 C-bit A single bit flag in the primary GUE header that 202 indicates whether the GUE packet contains a control 203 message or data message 205 Hlen A field in the primary GUE header that gives the 206 length of the GUE header 208 Proto/ctype A field in the GUE header that holds either the IP 209 protocol number for a data message or a type for a 210 control message 212 Private data Optional data in the GUE header that can be used for 213 private purposes 215 Outer IP header Refers to the outer most IP header or packet when 216 encapsulating a packet over IP 218 Inner IP header Refers to an encapsulated IP header when an IP 219 packet is encapsulated 221 Outer packet Refers to an encapsulating packet 223 Inner packet Refers to a packet that is encapsulated 225 1.2. Requirements Language 227 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 228 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 229 document are to be interpreted as described in [RFC2119]. 231 2. Base packet format 233 A GUE packet is comprised of a UDP packet whose payload is a GUE 234 header followed by a payload which is either an encapsulated packet 235 of some IP protocol or a control message such as an OAM (Operations, 236 Administration, and Management) message. A GUE packet has the general 237 format: 239 +-------------------------------+ 240 | | 241 | UDP/IP header | 242 | | 243 |-------------------------------| 244 | | 245 | GUE Header | 246 | | 247 |-------------------------------| 248 | | 249 | Encapsulated packet | 250 | or control message | 251 | | 252 +-------------------------------+ 254 The GUE header is variable length as determined by the presence of 255 optional extension fields. 257 2.1. GUE variant 259 The first two bits of the GUE header contain the GUE protocol variant 260 number. The variant number can indicate the version of the GUE 261 protocol as well as alternate forms of a version. 263 Variants 0 and 1 are described in this specification; variants 2 and 264 3 are reserved. 266 3. Variant 0 268 Variant 0 indicates version 0 of GUE. This variant defines a generic 269 extensible format to encapsulate packets by Internet protocol number. 271 3.1. Header format 273 The header format for variant 0 of GUE in UDP is: 275 0 1 2 3 276 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 278 | Source port | Destination port | | 279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 280 | Length | Checksum | | 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 282 | 0 |C| Hlen | Proto/ctype | Flags | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 | | 285 ~ Extensions Fields (optional) ~ 286 | | 287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 288 | | 289 ~ Private data (optional) ~ 290 | | 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 293 The contents of the UDP header are: 295 o Source port: If connection semantics (section 5.6.1) are applied 296 to an encapsulation, this is set to the local source port for 297 the connection. When connection semantics are not applied, this 298 is set to a flow entropy value for use with ECMP (Equal-Cost 299 Mulit-Path [RFC2992]); the properties of flow entropy are 300 described in section 5.11. 302 o Destination port: If connection semantics (section 5.6.1) are 303 applied to an encapsulation, this is set to the destination port 304 for the tuple. If connection semantics are not applied this is 305 set to the GUE assigned port number, 6080. 307 o Length: Canonical length of the UDP packet (length of UDP header 308 and payload). 310 o Checksum: Standard UDP checksum (handling is described in 311 section 5.7). 313 The GUE header consists of: 315 o Variant: 0 indicates GUE protocol version 0 with a header. 317 o C: C-bit: When set indicates a control message, not set 318 indicates a data message. 320 o Hlen: Length in 32-bit words of the GUE header, including 321 optional extension fields but not the first four bytes of the 322 header. Computed as (header_len - 4) / 4, where header_len is 323 the total header length in bytes. All GUE headers are a multiple 324 of four bytes in length. Maximum header length is 128 bytes. 326 o Proto/ctype: When the C-bit is set, this field contains a 327 control message type for the payload (section 3.2.2). When the 328 C-bit is not set, the field holds the Internet protocol number 329 for the encapsulated packet in the payload (section 3.2.1). The 330 control message or encapsulated packet begins at the offset 331 provided by Hlen. 333 o Flags: Header flags that may be allocated for various purposes 334 and may indicate presence of extension fields. Undefined header 335 flag bits MUST be set to zero on transmission. 337 o Extension Fields: Optional fields whose presence is indicated by 338 corresponding flags. 340 o Private data: Optional private data block (see section 3.4). If 341 the private block is present, it immediately follows that last 342 extension field present in the header. The private block is 343 considered to be part of the GUE header. The length of this data 344 is determined by subtracting the starting offset from the header 345 length. 347 3.2. Proto/ctype field 349 The proto/ctype fields either contains an Internet protocol number 350 (when the C-bit is not set) or GUE control message type (when the C- 351 bit is set). 353 3.2.1 Proto field 355 When the C-bit is not set, the proto/ctype field MUST contain an IANA 356 Internet Protocol Number. The protocol number is interpreted relative 357 to the IP protocol that encapsulates the UDP packet (i.e. protocol of 358 the outer IP header). The protocol number serves as an indication of 359 the type of the next protocol header which is contained in the GUE 360 payload at the offset indicated in Hlen. Intermediate devices MAY 361 parse the GUE payload per the number in the proto/ctype field, and 362 header flags cannot affect the interpretation of the proto/ctype 363 field. 365 When the outer IP protocol is IPv4, the proto field MUST be set to a 366 valid IP protocol number usable with IPv4; it MUST NOT be set to a 367 number for IPv6 extension headers or ICMPv6 options (number 58). An 368 exception is that the destination options extension header using the 369 PadN option MAY be used with IPv4 as described in section 3.6. The 370 "no next header" protocol number (59) also MAY be used with IPv4 as 371 described below. 373 When the outer IP protocol is IPv6, the proto field can be set to any 374 defined protocol number except that it MUST NOT be set to Hop-by-hop 375 options (number 0). If a received GUE packet in IPv6 contains a 376 protocol number that is an extension header (e.g. Destination 377 Options) then the extension header is processed after the GUE header 378 is processed as though the GUE header is an extension header. 380 IP protocol number 59 ("No next header") can be set to indicate that 381 the GUE payload does not begin with the header of an IP protocol. 382 This would be the case, for instance, if the GUE payload were a 383 fragment when performing GUE level fragmentation. The interpretation 384 of the payload is performed through other means (such as flags and 385 extension fields), and intermediate devices MUST NOT parse packets 386 based on the IP protocol number in this case. 388 3.2.2 Ctype field 390 When the C-bit is set, the proto/ctype field MUST be set to a valid 391 control message type. A value of zero indicates that the GUE payload 392 requires further interpretation to deduce the control type. This 393 might be the case when the payload is a fragment of a control 394 message, where only the reassembled packet can be interpreted as a 395 control message. 397 Control messages will be defined in an IANA registry. Control message 398 types 1 through 127 may be defined in standards. Types 128 through 399 255 are reserved to be user defined for experimentation or private 400 control messages. 402 This document does not specify any standard control message types 403 other than type 0. 405 3.3. Flags and extension fields 407 Flags and associated extension fields are the primary mechanism of 408 extensibility in GUE. As mentioned in section 3.1, GUE header flags 409 indicate the presence of optional extension fields in the GUE header. 410 [GUEXTENS] defines a basic set of GUE extensions. 412 3.3.1. Requirements 414 There are sixteen flag bits in the GUE header. Flags may indicate 415 presence of an extension fields. The size of an extension field 416 indicated by a flag MUST be fixed. 418 Flags can be paired together to allow different lengths for an 419 extension field. For example, if two flag bits are paired, a field 420 can possibly be three different lengths-- that is bit value of 00 421 indicates no field present; 01, 10, and 11 indicate three possible 422 lengths for the field. Regardless of how flag bits are paired, the 423 lengths and offsets of optional fields corresponding to a set of 424 flags MUST be well defined. 426 Extension fields are placed in order of the flags. New flags are to 427 be allocated from high to low order bit contiguously without holes. 428 Flags allow random access, for instance to inspect the field 429 corresponding to the Nth flag bit, an implementation only considers 430 the previous N-1 flags to determine the offset. Flags after the Nth 431 flag are not pertinent in calculating the offset of the Nth flag. 432 Random access of flags and fields permits processing of optional 433 extensions in an order that is independent of their position in the 434 packet. The processing order of extensions defined in [GUEEXTEN] 435 demonstrates this property. 437 Flags (or paired flags) are idempotent such that new flags MUST NOT 438 cause reinterpretation of old flags. Also, new flags MUST NOT alter 439 interpretation of other elements in the GUE header nor how the 440 message is parsed (for instance, in a data message the proto/ctype 441 field always holds an IP protocol number as an invariant). 443 The set of available flags can be extended in the future by defining 444 a "flag extensions bit" that refers to a field containing a new set 445 of flags. 447 3.3.2. Example GUE header with extension fields 449 An example GUE header for a data message encapsulating an IPv4 packet 450 and containing the Group Identifier and Security extension fields 451 (both defined in [GUEXTENS]) is shown below: 453 0 1 2 3 454 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | 0 |0| 3 | 94 |1|0 0 1| 0 | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | Group Identifier | 459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 | | 461 + Security + 462 | | 463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 In the above example, the first flag bit is set which indicates that 465 the Group Identifier extension is present which is a 32 bit field. 466 The second through fourth bits of the flags are paired flags that 467 indicate the presence of a Security field with seven possible sizes. 468 In this example 001 indicates a sixty-four bit security field. 470 3.4. Private data 472 An implementation MAY use private data for its own use. The private 473 data immediately follows the last field in the GUE header and is not 474 a fixed length. This data is considered part of the GUE header and 475 MUST be accounted for in header length (Hlen). The length of the 476 private data MUST be a multiple of four and is determined by 477 subtracting the offset of private data in the GUE header from the 478 header length. Specifically: 480 Private_length = (Hlen * 4) - Length(flags) 482 where "Length(flags)" returns the sum of lengths of all the extension 483 fields present in the GUE header. When there is no private data 484 present, the length of the private data is zero. 486 The semantics and interpretation of private data are implementation 487 specific. The private data may be structured as necessary, for 488 instance it might contain its own set of flags and extension fields. 490 An encapsulator and decapsulator MUST agree on the meaning of private 491 data before using it. The mechanism to achieve this agreement is 492 outside the scope of this document but could include implementation- 493 defined behavior, coordinated configuration, in-band communication 494 using GUE control messages, or out-of-band messages. 496 If a decapsulator receives a GUE packet with private data, it MUST 497 validate the private data appropriately. If a decapsulator does not 498 expect private data from an encapsulator, the packet MUST be dropped. 499 If a decapsulator cannot validate the contents of private data per 500 the provided semantics, the packet MUST also be dropped. An 501 implementation MAY place security data in GUE private data which if 502 present MUST be verified for packet acceptance. 504 3.5. Message types 506 3.5.1. Control messages 508 Control messages carry formatted data that are implicitly addressed 509 to the decapsulator to monitor or control the state or behavior of a 510 tunnel (OAM). For instance, an echo request and corresponding echo 511 reply message can be defined to test for liveness. 513 Control messages are indicated in the GUE header when the C-bit is 514 set. The payload is interpreted as a control message with type 515 specified in the proto/ctype field. The format and contents of the 516 control message are indicated by the type and can be variable length. 518 Other than interpreting the proto/ctype field as a control message 519 type, the meaning and semantics of the rest of the elements in the 520 GUE header are the same as that of data messages. Forwarding and 521 routing of control messages should be the same as that of a data 522 message with the same outer IP and UDP header and GUE flags; this 523 ensures that control messages can be created that follow the same 524 path as data messages. 526 3.5.2. Data messages 528 Data messages carry encapsulated packets that are addressed to the 529 protocol stack for the associated protocol. Data messages are a 530 primary means of encapsulation and can be used to create tunnels for 531 overlay networks. 533 Data messages are indicated in GUE header when the C-bit is not set. 534 The payload of a data message is interpreted as an encapsulated 535 packet of an Internet protocol indicated in the proto/ctype field. 536 The packet immediately follows the GUE header. 538 3.6. Hiding the transport layer protocol number 540 The GUE header indicates the Internet protocol of the encapsulated 541 packet. A protocol number is either contained in the Proto/ctype 542 field of the primary GUE header or in the Payload Type field of a GUE 543 Transform extension field (used to encrypt the payload with DTLS, 544 [GUEEXTEN]). If the transport protocol number needs to be hidden from 545 the network, then a trivial destination options can be used. 547 The PadN destination option [RFC2460] can be used to encode the 548 transport protocol as a next header of an extension header (and 549 maintain alignment of encapsulated transport headers). The 550 Proto/ctype field or Payload Type field of the GUE Transform field is 551 set to 60 to indicate that the first encapsulated header is a 552 destination options extension header. 554 The format of the extension header is below: 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 557 | Next Header | 2 | 1 | 0 | 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 560 For IPv4, it is permitted in GUE to used this precise destination 561 option to contain the obfuscated protocol number. In this case next 562 header MUST refer to a valid IP protocol for IPv4. No other extension 563 headers or destination options are permitted with IPv4. 565 4. Variant 1 567 Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. 568 In this varinant there is no GUE header; a UDP packet carries an IP 569 packet. The first two bits of the UDP payload for GUE are the GUE 570 variant and coincide with the first two bits of the version number in 571 the IP header. The first two version bits of IPv4 and IPv6 are 01, so 572 we use GUE variant 1 for direct IP encapsulation which makes two bits 573 of GUE variant to also be 01. 575 This technique is effectively a means to compress out the version 0 576 GUE header when encapsulating IPv4 or IPv6 packets and there are no 577 flags or extension fields present. This method is compatible to use 578 on the same port number as packets with the GUE header (GUE variant 0 579 packets). This technique saves encapsulation overhead on costly links 580 for the common use of IP encapsulation, and also obviates the need to 581 allocate a separate port number for IP-over-UDP encapsulation. 583 4.1. Direct encapsulation of IPv4 585 The format for encapsulating IPv4 directly in UDP is: 587 0 1 2 3 588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 590 | Source port | Destination port | | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 592 | Length | Checksum | | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 594 |0|1|0|0| IHL |Type of Service| Total Length | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | Identification |Flags| Fragment Offset | 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 598 | Time to Live | Protocol | Header Checksum | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 600 | Source IPv4 Address | 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 602 | Destination IPv4 Address | 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 Note that the 0100 value in the first four bits of the the UDP 606 payload expresses the GUE variant as 1 (bits 01) and IP version as 4 607 (bits 0100). 609 4.2. Direct encapsulation of IPv6 611 The format for encapsulating IPv6 directly in UDP is demonstrated 612 below: 614 0 1 2 3 615 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 617 | Source port | Destination port | | 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 619 | Length | Checksum | | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 621 |0|1|1|0| Traffic Class | Flow Label | 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 | Payload Length | NextHdr | Hop Limit | 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 | | 626 + + 627 | | 628 + Source IPv6 Address + 629 | | 630 + + 631 | | 632 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 633 | | 634 + + 635 | | 636 + Destination IPv6 Address + 637 | | 638 + + 639 | | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 Note that the 0110 value in the first four bits of the the UDP 643 payload expresses the GUE variant as 1 (bits 01) and IP version as 6 644 (bits 0110). 646 5. Operation 648 The figure below illustrates the use of GUE encapsulation between two 649 hosts. Host 1 is sending packets to Host 2. An encapsulator performs 650 encapsulation of packets from Host 1. These encapsulated packets 651 traverse the network as UDP packets. At the decapsulator, packets are 652 decapsulated and sent on to Host 2. Packet flow in the reverse 653 direction need not be symmetric; GUE encapsulation is not required in 654 the reverse path. 656 +---------------+ +---------------+ 657 | | | | 658 | Host 1 | | Host 2 | 659 | | | | 660 +---------------+ +---------------+ 661 | ^ 662 V | 663 +---------------+ +---------------+ +---------------+ 664 | | | | | | 665 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 666 | | | Network | | | 667 +---------------+ +---------------+ +---------------+ 669 The encapsulator and decapsulator may be co-resident with the 670 corresponding hosts, or may be on separate nodes in the network. 672 5.1. Network tunnel encapsulation 674 Network tunneling can be achieved by encapsulating layer 2 or layer 3 675 packets. In this case the encapsulator and decapsulator nodes are the 676 tunnel endpoints. These could be routers that provide network tunnels 677 on behalf of communicating hosts. 679 5.2. Transport layer encapsulation 681 When encapsulating layer 4 packets, the encapsulator and decapsulator 682 should be co-resident with the hosts. In this case, the encapsulation 683 headers are inserted between the IP header and the transport packet. 684 The addresses in the IP header refer to both the endpoints of the 685 encapsulation and the endpoints for terminating the transport 686 protocol. Note that the transport layer ports in the encapsulated 687 packet are independent of the UDP ports in the outer packet. 689 Details about performing transport layer encapsulation are discussed 690 in [TOU]. 692 5.3. Encapsulator operation 694 Encapsulators create GUE data messages, set the fields of the UDP 695 header, set flags and optional extension fields in the GUE header, 696 and forward packets to a decapsulator. 698 An encapsulator can be an end host originating the packets of a flow, 699 or can be a network device performing encapsulation on behalf of 700 hosts (routers implementing tunnels for instance). In either case, 701 the intended target (decapsulator) is indicated by the outer 702 destination IP address and destination port in the UDP header. 704 If an encapsulator is tunneling packets -- that is encapsulating 705 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 706 tunnel mode) -- it SHOULD follow standard conventions for tunneling 707 of one protocol over another. For instance, if an IP packet is being 708 encapsualated in GUE then diffserv interaction [RFC2983] and ECN 709 propagation for tunnels [RFC6040] SHOULD be followed. 711 5.4. Decapsulator operation 713 A decapsulator performs decapsulation of GUE packets. A decapsulator 714 is addressed by the outer destination IP address of a GUE packet. 715 The decapsulator validates packets, including fields of the GUE 716 header. 718 If a decapsulator receives a GUE packet with an unsupported variant, 719 unknown flag, bad header length (too small for included extension 720 fields), unknown control message type, bad protocol number, an 721 unsupported payload type, or an otherwise malformed header, it MUST 722 drop the packet. Such events MAY be logged subject to configuration 723 and rate limiting of logging messages. No error message is returned 724 back to the encapsulator. Note that set flags in a GUE header that 725 are unknown to a decapsulator MUST NOT be ignored. If a GUE packet is 726 received by a decapsulator with unknown flags, the packet MUST be 727 dropped. 729 5.4.1. Processing a received data message 731 If a valid data message is received, the UDP header and GUE header 732 are removed from the packet. The outer IP header remains intact and 733 the next protocol in the IP header is set to the protocol from the 734 proto field in the GUE header. The resulting packet is then 735 resubmitted into the protocol stack to process that packet as though 736 it was received with the protocol in the GUE header. 738 As an example, consider that a data message is received where GUE 739 encapsulates an IP packet. In this case proto field in the GUE header 740 is set 94 for IPIP: 742 +-------------------------------------+ 743 | IP header (next proto = 17,UDP) | 744 |-------------------------------------| 745 | UDP | 746 |-------------------------------------| 747 | GUE (proto = 94,IPIP) | 748 |-------------------------------------| 749 | IP header and packet | 750 +-------------------------------------+ 751 The receiver removes the UDP and GUE headers and sets the next 752 protocol field in the IP packet to IPIP, which is derived from the 753 GUE proto field. The resultant packet would have the format: 755 +-------------------------------------+ 756 | IP header (next proto = 94,IPIP) | 757 |-------------------------------------| 758 | IP header and packet | 759 +-------------------------------------+ 761 This packet is then resubmitted into the protocol stack to be 762 processed as an IPIP packet. 764 5.4.2. Processing a received control message 766 If a valid control message is received, the packet MUST be processed 767 as a control message. The specific processing to be performed depends 768 on the value in the ctype field of the GUE header. 770 5.5. Router and switch operation 772 Routers and switches SHOULD forward GUE packets as standard UDP/IP 773 packets. The outer five-tuple should contain sufficient information 774 to perform flow classification corresponding to the flow of the inner 775 packet. A router does not normally need to parse a GUE header, and 776 none of the flags or extension fields in the GUE header are expected 777 to affect routing. In cases where the outer five-tuple does not 778 provide sufficient entropy for flow classification, for instance UDP 779 ports are fixed to provide connection semantics (section 5.6.1), then 780 the encapsulated packet MAY be parsed to determine flow entropy. 782 A router MUST NOT modify a GUE header when forwarding a packet. It 783 MAY encapsulate a GUE packet in another GUE packet, for instance to 784 implement a network tunnel (i.e. by encapsulating an IP packet with a 785 GUE payload in another IP packet as a GUE payload). In this case, the 786 router takes the role of an encapsulator, and the corresponding 787 decapsulator is the logical endpoint of the tunnel. When 788 encapsulating a GUE packet within another GUE packet, there are no 789 provisions to automatically GUE copy flags or fields to the outer GUE 790 header. Each layer of encapsulation is considered independent. 792 5.6. Middlebox interactions 794 A middle box MAY interpret some flags and extension fields of the GUE 795 header for classification purposes, but is not required to understand 796 any of the flags or extension fields in GUE packets. A middle box 797 MUST NOT drop a GUE packet merely because there are flags unknown to 798 it. The header length in the GUE header allows a middlebox to inspect 799 the payload packet without needing to parse the flags or extension 800 fields. 802 5.6.1. Inferring connection semantics 804 A middlebox might infer bidirectional connection semantics for a UDP 805 flow. For instance, a stateful firewall might create a five-tuple 806 rule to match flows on egress, and a corresponding five-tuple rule 807 for matching ingress packets where the roles of source and 808 destination are reversed for the IP addresses and UDP port numbers. 809 To operate in this environment, a GUE tunnel should be configured to 810 assume connected semantics defined by the UDP five tuple and the use 811 of GUE encapsulation needs to be symmetric between both endpoints. 812 The source port set in the UDP header MUST be the destination port 813 the peer would set for replies. In this case the UDP source port for 814 a tunnel would be a fixed value and not set to be flow entropy as 815 described in section 5.11. 817 The selection of whether to make the UDP source port fixed or set to 818 a flow entropy value for each packet sent SHOULD be configurable for 819 a tunnel. The default MUST be to set the flow entropy value in the 820 UDP source port. 822 5.6.2. NAT 824 IP address and port translation can be performed on the UDP/IP 825 headers adhering to the requirements for NAT with UDP [RFC4787]. In 826 the case of stateful NAT, connection semantics MUST be applied to a 827 GUE tunnel as described in section 5.6.1. GUE endpoints MAY also 828 invoke STUN [RFC5389] or ICE [RFC5245] to manage NAT port mappings 829 for encapsulations. 831 5.7. Checksum Handling 833 The potential for mis-delivery of packets due to corruption of IP, 834 UDP, or GUE headers needs to be considered. Historically, the UDP 835 checksum would be considered sufficient as a check against corruption 836 of either the UDP header and payload or the IP addresses. 837 Encapsulation protocols, such as GUE, can be originated or terminated 838 on devices incapable of computing the UDP checksum for packet. This 839 section discusses the requirements around checksum and alternatives 840 that might be used when an endpoint does not support UDP checksum. 842 5.7.1. Requirements 844 One of the following requirements MUST be met: 846 o UDP checksums are enabled (for IPv4 or IPv6). 848 o The GUE header checksum is used (defined in [GUEEXTEN]). 850 o Use zero UDP checksums. This is always permissible with IPv4; in 851 IPv6, they can only be used in accordance with applicable 852 requirements in [RFC8086], [RFC6935], and [RFC6936]. 854 5.7.2. UDP Checksum with IPv4 856 For UDP in IPv4, the UDP checksum MUST be processed as specified in 857 [RFC768] and [RFC1122] for both transmit and receive. An 858 encapsulator MAY set the UDP checksum to zero for performance or 859 implementation considerations. The IPv4 header includes a checksum 860 that protects against mis-delivery of the packet due to corruption 861 of IP addresses. The UDP checksum potentially provides protection 862 against corruption of the UDP header, GUE header, and GUE payload. 863 Enabling or disabling the use of checksums is a deployment 864 consideration that should take into account the risk and effects of 865 packet corruption, and whether the packets in the network are 866 already adequately protected by other, possibly stronger mechanisms 867 such as the Ethernet CRC. If an encapsulator sets a zero UDP 868 checksum for IPv4, it SHOULD use the GUE header checksum as 869 described in [GUEEXTEN] assuming there are no other mechanisms used 870 to protect the GUE packet. 872 When a decapsulator receives a packet, the UDP checksum field MUST 873 be processed. If the UDP checksum is non-zero, the decapsulator MUST 874 verify the checksum before accepting the packet. By default, a 875 decapsulator SHOULD accept UDP packets with a zero checksum. A node 876 MAY be configured to disallow zero checksums per [RFC1122]. 877 Configuration of zero checksums can be selective. For instance, zero 878 checksums might be disallowed from certain hosts that are known to 879 be traversing paths subject to packet corruption. If verification of 880 a non-zero checksum fails, a decapsulator lacks the capability to 881 verify a non-zero checksum, or a packet with a zero-checksum was 882 received and the decapsulator is configured to disallow, the packet 883 MUST be dropped. 885 5.7.3. UDP Checksum with IPv6 887 In IPv6, there is no checksum in the IPv6 header that protects 888 against mis-delivery due to address corruption. Therefore, when GUE 889 is used over IPv6, either the UDP checksum or the GUE header 890 checksum SHOULD be used unless there are alternative mechanisms in 891 use that protect against misdelivery. The UDP checksum and GUE 892 header checksum SHOULD NOT be used at the same time since that would 893 be mostly redundant. 895 If neither the UDP checksum or the GUE header checksum is used, then 896 the requirements for using zero IPv6 UDP checksums in [RFC6935] and 897 [RFC6936] MUST be met. 899 When a decapsulator receives a packet, the UDP checksum field MUST 900 be processed. If the UDP checksum is non-zero, the decapsulator MUST 901 verify the checksum before accepting the packet. By default a 902 decapsulator MUST only accept UDP packets with a zero checksum if 903 the GUE header checksum is used and is verified. If verification of 904 a non-zero checksum fails, a decapsulator lacks the capability to 905 verify a non-zero checksum, or a packet with a zero-checksum and no 906 GUE header checksum was received, the packet MUST be dropped. 908 5.8. MTU and fragmentation 910 Standard conventions for handling of MTU (Maximum Transmission Unit) 911 and fragmentation in conjunction with networking tunnels 912 (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. 913 Details are described in MTU and Fragmentation Issues with In-the- 914 Network Tunneling [RFC4459]. 916 If a packet is fragmented before encapsulation in GUE, all the 917 related fragments MUST be encapsulated using the same UDP source 918 port. An operator SHOULD set MTU to account for encapsulation 919 overhead and reduce the likelihood of fragmentation. 921 Alternative to IP fragmentation, the GUE fragmentation extension can 922 be used. GUE fragmentation is described in [GUEEXTEN]. 924 5.9. Congestion control 926 Per requirements of [RFC5405], if the IP traffic encapsulated with 927 GUE implements proper congestion control no additional mechanisms 928 should be required. 930 In the case that the encapsulated traffic does not implement any or 931 sufficient control, or it is not known whether a transmitter will 932 consistently implement proper congestion control, then congestion 933 control at the encapsulation layer MUST be provided per [RFC5405]. 934 Note that this case applies to a significant use case in network 935 virtualization in which guests run third party networking stacks 936 that cannot be implicitly trusted to implement conformant congestion 937 control. 939 Out of band mechanisms such as rate limiting, Managed Circuit 940 Breaker [RFC8084], or traffic isolation MAY be used to provide 941 rudimentary congestion control. For finer-grained congestion control 942 that allows alternate congestion control algorithms, reaction time 943 within an RTT, and interaction with ECN, in-band mechanisms might be 944 warranted. 946 5.10. Multicast 948 GUE packets can be multicast to decapsulators using a multicast 949 destination address in the encapsulating IP headers. Each receiving 950 host will decapsulate the packet independently following normal 951 decapsulator operations. The receiving decapsulators need to agree 952 on the same set of GUE parameters and properties; how such an 953 agreement is reached is outside the scope of this document. 955 GUE allows encapsulation of unicast, broadcast, or multicast 956 traffic. Flow entropy (the value in the UDP source port) can be 957 generated from the header of encapsulated unicast or 958 broadcast/multicast packets at an encapsulator. The mapping 959 mechanism between the encapsulated multicast traffic and the 960 multicast capability in the IP network is transparent and 961 independent of the encapsulation and is otherwise outside the scope 962 of this document. 964 5.11. Flow entropy for ECMP 966 5.11.1. Flow classification 968 A major objective of using GUE is that a network device can perform 969 flow classification corresponding to the flow of the inner 970 encapsulated packet based on the contents in the outer headers. 972 Hardware devices commonly perform hash computations on packet 973 headers to classify packets into flows or flow buckets. Flow 974 classification is done to support load balancing of flows across a 975 set of networking resources. Examples of such load balancing 976 techniques are Equal Cost Multipath routing (ECMP), port selection 977 in Link Aggregation, and NIC device Receive Side Scaling (RSS). 978 Hashes are usually either a three-tuple hash of IP protocol, source 979 address, and destination address; or a five-tuple hash consisting of 980 IP protocol, source address, destination address, source port, and 981 destination port. Typically, networking hardware will compute five- 982 tuple hashes for TCP and UDP, but only three-tuple hashes for other 983 IP protocols. Since the five-tuple hash provides more granularity, 984 load balancing can be finer-grained with better distribution. When a 985 packet is encapsulated with GUE and connection semantics are not 986 applied, the source port in the outer UDP packet is set to a flow 987 entropy value that corresponds to the flow of the inner packet. When 988 a device computes a five-tuple hash on the outer UDP/IP header of a 989 GUE packet, the resultant value classifies the packet per its inner 990 flow. 992 Examples of deriving flow entropy for encapsulation are: 994 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 995 instance, the flow entropy could be based on the canonical five- 996 tuple hash of the inner packet. 998 o If the encapsulated packet is an AH transport mode packet with 999 TCP as next header, the flow entropy could be a hash over a 1000 three-tuple: TCP protocol and TCP ports of the encapsulated 1001 packet. 1003 o If a node is encrypting a packet using ESP tunnel mode and GUE 1004 encapsulation, the flow entropy could be based on the contents 1005 of the clear-text packet. For instance, a canonical five-tuple 1006 hash for a TCP/IP packet could be used. 1008 [RFC6438] discusses methods to compute and set flow entropy value for 1009 IPv6 flow labels. Such methods can also be used to create flow 1010 entropy values for GUE. 1012 5.11.2. Flow entropy properties 1014 The flow entropy is the value set in the UDP source port of a GUE 1015 packet. Flow entropy in the UDP source port SHOULD adhere to the 1016 following properties: 1018 o The value set in the source port is within the ephemeral port 1019 range (49152 to 65535 [RFC6335]). Since the high order two bits 1020 of the port are set to one, this provides fourteen bits of 1021 entropy for the value. 1023 o The flow entropy has a uniform distribution across encapsulated 1024 flows. 1026 o An encapsulator MAY occasionally change the flow entropy used 1027 for an inner flow per its discretion (for security, route 1028 selection, etc). To avoid thrashing or flapping the value, the 1029 flow entropy used for a flow SHOULD NOT change more than once 1030 every thirty seconds (or a configurable value). 1032 o Decapsulators, or any networking devices, SHOULD NOT attempt to 1033 interpret flow entropy as anything more than an opaque value. 1034 Neither should they attempt to reproduce the hash calculation 1035 used by an encapasulator in creating a flow entropy value. They 1036 MAY use the value to match further receive packets for steering 1037 decisions, but MUST NOT assume that the hash uniquely or 1038 permanently identifies a flow. 1040 o Input to the flow entropy calculation is not restricted to ports 1041 and addresses; input could include flow label from an IPv6 1042 packet, SPI from an ESP packet, or other flow related state in 1043 the encapsulator that is not necessarily conveyed in the packet. 1045 o The assignment function for flow entropy SHOULD be randomly 1046 seeded to mitigate denial of service attacks. The seed SHOULD be 1047 changed periodically. 1049 5.12 Negotiation of acceptable flags and extension fields 1051 An encapsulator and decapsulator need to achieve agreement about GUE 1052 parameters that will be used in communications. Parameters include 1053 supported GUE variants, flags and extension fields that can be used, 1054 security algorithms and keys, supported protocols and control 1055 messages, etc. This document proposes different general methods to 1056 accomplish this, however the details of implementing these are 1057 considered out of scope. 1059 General methods for this are: 1061 o Configuration. The parameters used for a tunnel are configured 1062 at each endpoint. 1064 o Negotiation. A tunnel negotiation can be performed. This could 1065 be accomplished in-band of GUE using control messages or private 1066 data. 1068 o Via a control plane. Parameters for communicating with a tunnel 1069 endpoint can be set in a control plane protocol (such as that 1070 needed for network virtualization). 1072 o Via security negotiation. Use of security typically implies a 1073 key exchange between endpoints. Other GUE parameters may be 1074 conveyed as part of that process. 1076 6. Motivation for GUE 1078 This section presents the motivation for GUE with respect to other 1079 encapsulation methods. 1081 6.1. Benefits of GUE 1083 * GUE is a generic encapsulation protocol. GUE can encapsulate 1084 protocols that are represented by an IP protocol number. This 1085 includes layer 2, layer 3, and layer 4 protocols. 1087 * GUE is an extensible encapsulation protocol. Standardized 1088 optional data such as security, virtual networking identifiers, 1089 fragmentation are being defined. 1091 * For extensilbity, GUE uses flag fields as opposed to TLVs as 1092 some other encapsulation protocols do. Flag fields are strictly 1093 ordered, allow random access, and are efficient in use of header 1094 space. 1096 * GUE allows private data to be sent as part of the encapsulation. 1097 This permits experimentation or customization in deployment. 1099 * GUE allows sending of control messages such as OAM using the 1100 same GUE header format (for routing purposes) as normal data 1101 messages. 1103 * GUE maximizes deliverability of non-UDP and non-TCP protocols. 1105 * GUE provides a means for exposing per flow entropy for ECMP for 1106 atypical protocols such as SCTP, DCCP, ESP, etc. 1108 6.2 Comparison of GUE to other encapsulations 1110 A number of different encapsulation techniques have been proposed for 1111 the encapsulation of one protocol over another. EtherIP [RFC3378] 1112 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 1113 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 1114 layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN 1115 [RFC7348] are proposals for encapsulation of layer 2 packets for 1116 network virtualization. IPIP [RFC2003] and Generic packet tunneling 1117 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 1119 Several proposals exist for encapsulating packets over UDP including 1120 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN 1121 [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, 1122 MPLS/UDP [RFC7510], GENEVE [GENEVE], and Generic UDP Encapsulation 1123 for IP Tunneling (GRE over UDP)[RFC8086]. Generic UDP tunneling [GUT] 1124 is a proposal similar to GUE in that it aims to tunnel packets of IP 1125 protocols over UDP. 1127 GUE has the following discriminating features: 1129 o UDP encapsulation leverages specialized network device 1130 processing for efficient transport. The semantics for using the 1131 UDP source port for flow entropy as input to ECMP are defined in 1132 section 5.11. 1134 o GUE permits encapsulation of arbitrary IP protocols, which 1135 includes layer 2 3, and 4 protocols. 1137 o Multiple protocols can be multiplexed over a single UDP port 1138 number. This is in contrast to techniques to encapsulate 1139 protocols over UDP using a protocol specific port number (such 1140 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 1141 extensible mechanism for encapsulating all IP protocols in UDP 1142 with minimal overhead (four bytes of additional header). 1144 o GUE is extensible. New flags and extension fields can be 1145 defined. 1147 o The GUE header includes a header length field. This allows a 1148 network node to inspect an encapsulated packet without needing 1149 to parse the full encapsulation header. 1151 o Private data in the encapsulation header allows local 1152 customization and experimentation while being compatible with 1153 processing in network nodes (routers and middleboxes). 1155 o GUE includes both data messages (encapsulation of packets) and 1156 control messages (such as OAM). 1158 o The flags-field model facilitates efficient implementation of 1159 extensibility in hardware. For instance, a TCAM can be use to 1160 parse a known set of N flags where the number of entries in the 1161 TCAM is 2^N. By comparison, the number of TCAM entries needed to 1162 parse a set of N arbitrarily ordered TLVS is approximately e*N!. 1164 7. Security Considerations 1166 There are two important considerations of security with respect to 1167 GUE. 1169 o Authentication and integrity of the GUE header. 1171 o Authentication, integrity, and confidentiality of the GUE 1172 payload. 1174 GUE security is provided by extensions for security defined in 1175 [GUEEXTEN]. These extensions include methods to authenticate the GUE 1176 header and encrypt the GUE payload. 1178 The GUE header can be authenticated using a security extension for an 1179 HMAC. Securing the GUE payload can be accomplished use of the GUE 1180 Payload Transform. This extension can be used to perform DTLS in the 1181 payload of a GUE packet to encrypt the payload. 1183 A hash function for computing flow entropy (section 5.11) SHOULD be 1184 randomly seeded to mitigate some possible denial service attacks. 1186 8. IANA Considerations 1188 8.1. UDP source port 1190 A user UDP port number assignment for GUE has been assigned: 1192 Service Name: gue 1193 Transport Protocol(s): UDP 1194 Assignee: Tom Herbert 1195 Contact: Tom Herbert 1196 Description: Generic UDP Encapsulation 1197 Reference: draft-herbert-gue 1198 Port Number: 6080 1199 Service Code: N/A 1200 Known Unauthorized Uses: N/A 1201 Assignment Notes: N/A 1203 8.2. GUE variant number 1205 IANA is requested to set up a registry for the GUE variant number. 1206 The GUE variant number is 2 bits containing four possible values. 1207 This document defines version 0 and 1. New values are assigned in 1208 accordance with RFC Required policy [RFC5226]. 1210 +----------------+----------------+---------------+ 1211 | Variant number | Description | Reference | 1212 +----------------+----------------+---------------+ 1213 | 0 | GUE Version 0 | This document | 1214 | | with header | | 1215 | | | | 1216 | 1 | GUE Version 0 | This document | 1217 | | with direct IP | | 1218 | | encapsulation | | 1219 | | | | 1220 | 2..3 | Unassigned | | 1221 +----------------+----------------+---------------+ 1223 8.3. Control types 1225 IANA is requested to set up a registry for the GUE control types. 1226 Control types are 8 bit values. New values for control types 1-127 1227 are assigned in accordance with RFC Required policy [RFC5226]. 1229 +----------------+------------------+---------------+ 1230 | Control type | Description | Reference | 1231 +----------------+------------------+---------------+ 1232 | 0 | Need further | This document | 1233 | | interpretation | | 1234 | | | | 1235 | 1..127 | Unassigned | | 1236 | | | | 1237 | 128..255 | User defined | This document | 1238 +----------------+------------------+---------------+ 1240 8.4. Flag-fields 1242 IANA is requested to create a "GUE flag-fields" registry to allocate 1243 flags and extension fields used with GUE. This shall be a registry of 1244 bit assignments for flags, length of extension fields for 1245 corresponding flags, and descriptive strings. There are sixteen bits 1246 for primary GUE header flags (bit number 0-15). New values are 1247 assigned in accordance with RFC Required policy [RFC5226]. New flags 1248 should be allocated from high to low order bit contiguously without 1249 holes. [GUEXTENS] requests an initial set of flag assignments. 1251 9. Acknowledgements 1253 The authors would like to thank David Liu, Erik Nordmark, Fred 1254 Templin, Adrian Farrel, Bob Briscoe, and Murray Kucherawy for 1255 valuable input on this draft. 1257 10. References 1259 10.1. Normative References 1261 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1262 10.17487/RFC0768, August 1980, . 1265 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1266 Communication Layers", STD 3, RFC 1122, DOI 1267 10.17487/RFC1122, October 1989, . 1270 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1271 IANA Considerations Section in RFCs", RFC 2434, DOI 1272 10.17487/RFC2434, October 1998, . 1275 [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 1276 2983, DOI 10.17487/RFC2983, October 2000, . 1279 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1280 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1281 2010, . 1283 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1284 UDP Checksums for Tunneled Packets", RFC 6935, DOI 1285 10.17487/RFC6935, April 2013, . 1288 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1289 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1290 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1291 . 1293 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1294 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1295 2006, . 1297 10.2. Informative References 1299 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1300 and G. Fairhurst, Ed., "The Lightweight User Datagram 1301 Protocol (UDP-Lite)", RFC 3828, July 2004, 1302 . 1304 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1305 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1306 eXtensible Local Area Network (VXLAN): A Framework for 1307 Overlaying Virtualized Layer 2 Networks over Layer 3 1308 Networks", RFC 7348, August 2014, . 1311 [RFC7605] Touch, J., "Recommendations on Using Assigned Transport 1312 Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, 1313 August 2015, . 1315 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 1316 Virtualization Using Generic Routing Encapsulation", RFC 1317 7637, DOI 10.17487/RFC7637, September 2015, 1318 . 1320 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 1321 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 1322 March 2017, . 1324 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1325 "Encapsulating MPLS in UDP", RFC 7510, DOI 1326 10.17487/RFC7510, April 2015, . 1329 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1330 Congestion Control Protocol (DCCP)", RFC 4340, DOI 1331 10.17487/RFC4340, March 2006, . 1334 [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address 1335 Translation (NAT) Behavioral Requirements for Unicast 1336 UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January 1337 2007, . 1339 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1340 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1341 DOI 10.17487/RFC5389, October 2008, . 1344 [RFC5285] Rosenberg, J., "Interactive Connectivity Establishment 1345 (ICE): A Protocol for Network Address Translator (NAT) 1346 Traversal for Offer/Answer Protocols", RFC 5245, DOI 1347 10.17487/RFC5245, April 2010, . 1350 [RFC5405] Eggert, L. and G. Fairhurst, "Unicast UDP Usage Guidelines 1351 for Application Designers", BCP 145, RFC 5405, DOI 1352 10.17487/RFC5405, November 2008, . 1355 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1356 for Equal Cost Multipath Routing and Link Aggregation in 1357 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1358 . 1360 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 1361 10.17487/RFC2003, October 1996, . 1364 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 1365 Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC 1366 3948, DOI 10.17487/RFC3948, January 2005, . 1369 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1370 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 1371 10.17487/RFC6830, January 2013, . 1374 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 1375 Ethernet Frames in IP Datagrams", RFC 3378, DOI 1376 10.17487/RFC3378, September 2002, . 1379 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1380 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1381 DOI 10.17487/RFC2784, March 2000, . 1384 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1385 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1386 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1387 . 1389 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1390 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1391 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1392 . 1394 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP 1395 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1396 . 1398 [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for 1399 Generic UDP Encapsulation" draft-herbert-gue-extensions-00 1401 [GUE4NVO3] Yong, L., Herbert, T., Zia, O., "Generic UDP Encapsulation 1402 (GUE) for Network Virtualization Overlay" draft-hy-nvo3- 1403 gue-4-nvo-03 1405 [GUESEC] Yong, L., Herbert, T., "Generic UDP Encapsulation (GUE) 1406 for Secure Transport" draft-hy-gue-4-secure-transport-03 1408 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 1409 "Encapsulation of TCP and other Transport Protocols over 1410 UDP" draft-cheshire-tcp-over-udp-00 1412 [TOU] Herbert, T., "Transport layer protocols over UDP" draft- 1413 herbert-transports-over-udp-00 1415 [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: 1416 Generic Network Virtualization Encapsulation", draft-ietf- 1417 nvo3-geneve-05 1419 [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP 1420 Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" 1422 [LCO] Cree, E., https://www.kernel.org/doc/Documentation/ 1423 networking/checksum-offloads.txt 1425 Appendix A: NIC processing for GUE 1427 This appendix provides some guidelines for Network Interface Cards 1428 (NICs) to implement common offloads and accelerations to support GUE. 1429 Note that most of this discussion is generally applicable to other 1430 methods of UDP based encapsulation. 1432 A.1. Receive multi-queue 1434 Contemporary NICs support multiple receive descriptor queues (multi- 1435 queue). Multi-queue enables load balancing of network processing for 1436 a NIC across multiple CPUs. On packet reception, a NIC selects the 1437 appropriate queue for host processing. Receive Side Scaling is a 1438 common method which uses the flow hash for a packet to index an 1439 indirection table where each entry stores a queue number. Flow 1440 Director and Accelerated Receive Flow Steering (aRFS) allow a host to 1441 program the queue that is used for a given flow which is identified 1442 either by an explicit five-tuple or by the flow's hash. 1444 GUE encapsulation is compatible with multi-queue NICs that support 1445 five-tuple hash calculation for UDP/IP packets as input to RSS. The 1446 flow entropy in the UDP source port ensures classification of the 1447 encapsulated flow even in the case that the outer source and 1448 destination addresses are the same for all flows (e.g. all flows are 1449 going over a single tunnel). 1451 By default, UDP RSS support is often disabled in NICs to avoid out- 1452 of-order reception that can occur when UDP packets are fragmented. As 1453 discussed above, fragmentation of GUE packets is be mostly avoided by 1454 fragmenting packets before entering a tunnel, GUE fragmentation, path 1455 MTU discovery in higher layer protocols, or operator adjusting MTUs. 1456 Other UDP traffic might not implement such procedures to avoid 1457 fragmentation, so enabling UDP RSS support in the NIC might be a 1458 considered tradeoff during configuration. 1460 A.2. Checksum offload 1462 Many NICs provide capabilities to calculate standard ones complement 1463 payload checksum for packets in transmit or receive. When using GUE 1464 encapsulation, there are at least two checksums that are of interest: 1465 the encapsulated packet's transport checksum, and the UDP checksum in 1466 the outer header. 1468 A.2.1. Transmit checksum offload 1470 NICs can provide a protocol agnostic method to offload transmit 1471 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1472 GUE. In this method, the host provides checksum related parameters in 1473 a transmit descriptor for a packet. These parameters include the 1474 starting offset of data to checksum, the length of data to checksum, 1475 and the offset in the packet where the computed checksum is to be 1476 written. The host initializes the checksum field to pseudo header 1477 checksum. 1479 In the case of GUE, the checksum for an encapsulated transport layer 1480 packet, a TCP packet for instance, can be offloaded by setting the 1481 appropriate checksum parameters. 1483 NICs typically can offload only one transmit checksum per packet, so 1484 simultaneously offloading both an inner transport packet's checksum 1485 and the outer UDP checksum is likely not possible. 1487 If an encapsulator is co-resident with a host, then checksum offload 1488 may be performed using remote checksum offload (described in 1489 [GUEEXTEN]). Remote checksum offload relies on NIC offload of the 1490 simple UDP/IP checksum which is commonly supported even in legacy 1491 devices. In remote checksum offload, the outer UDP checksum is set 1492 and the GUE header includes an option indicating the start and offset 1493 of the inner "offloaded" checksum. The inner checksum is initialized 1494 to the pseudo header checksum. When a decapsulator receives a GUE 1495 packet with the remote checksum offload option, it completes the 1496 offload operation by determining the packet checksum from the 1497 indicated start point to the end of the packet, and then adds this 1498 into the checksum field at the offset given in the option. Computing 1499 the checksum from the start to end of packet is efficient if 1500 checksum-complete is provided on the receiver. 1502 Another alternative when an encapsulator is co-resident with a host 1503 is to perform Local Checksum Offload [LCO]. In this method, the inner 1504 transport layer checksum is offloaded and the outer UDP checksum can 1505 be deduced based on the fact that the portion of the packet covered 1506 by the inner transport checksum will sum to zero (or at least the bit 1507 wise "not" of the inner pseudo header). 1509 A.2.2. Receive checksum offload 1511 GUE is compatible with NICs that perform a protocol agnostic receive 1512 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1513 NIC computes a ones complement checksum over all (or some predefined 1514 portion) of a packet. The computed value is provided to the host 1515 stack in the packet's receive descriptor. The host driver can use 1516 this checksum to "patch up" and validate any inner packet transport 1517 checksum, as well as the outer UDP checksum if it is non-zero. 1519 Many legacy NICs don't provide checksum-complete but instead provide 1520 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1521 in Linux). Usually, such validation is only done for simple TCP/IP or 1522 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1523 checksum-complete value for the UDP packet is the "not" of the pseudo 1524 header checksum. In this way, checksum-unnecessary can be converted 1525 to checksum-complete. So, if the NIC provides checksum-unnecessary 1526 for the outer UDP header in an encapsulation, checksum conversion can 1527 be done so that the checksum-complete value is derived and can be 1528 used by the stack to validate checksums in the encapsulated packet. 1530 A.3. Transmit Segmentation Offload 1532 Transmit Segmentation Offload (TSO) is a NIC feature where a host 1533 provides a large (>MTU size) TCP packet to the NIC, which in turn 1534 splits the packet into separate segments and transmits each one. This 1535 is useful to reduce CPU load on the host. 1537 The process of TSO can be generalized as: 1539 - Split the TCP payload into segments which allow packets with 1540 size less than or equal to MTU. 1542 - For each created segment: 1544 1. Replicate the TCP header and all preceding headers of the 1545 original packet. 1547 2. Set payload length fields in any headers to reflect the 1548 length of the segment. 1550 3. Set TCP sequence number to correctly reflect the offset of 1551 the TCP data in the stream. 1553 4. Recompute and set any checksums that either cover the payload 1554 of the packet or cover header which was changed by setting a 1555 payload length. 1557 Following this general process, TSO can be extended to support TCP 1558 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1559 header, GUE header, inner IP header (if tunneling), and TCP headers 1560 are replicated. Any packet length header fields need to be set 1561 properly (including the length in the outer UDP header), and 1562 checksums need to be set correctly (including the outer UDP checksum 1563 if being used). 1565 To facilitate TSO with GUE, it is recommended that extension fields 1566 do not contain values that need to be updated on a per segment basis. 1567 For example, extension fields should not include checksums, lengths, 1568 or sequence numbers that refer to the payload. If the GUE header does 1569 not contain such fields then the TSO engine only needs to copy the 1570 bits in the GUE header when creating each segment and does not need 1571 to parse the GUE header. 1573 A.4. Large Receive Offload 1575 Large Receive Offload (LRO) is a NIC feature where packets of a TCP 1576 connection are reassembled, or coalesced, in the NIC and delivered to 1577 the host as one large packet. This feature can reduce CPU utilization 1578 in the host. 1580 LRO requires significant protocol awareness to be implemented 1581 correctly and is difficult to generalize. Packets in the same flow 1582 need to be unambiguously identified. In the presence of tunnels or 1583 network virtualization, this may require more than a five-tuple match 1584 (for instance packets for flows in two different virtual networks may 1585 have identical five-tuples). Additionally, a NIC needs to perform 1586 validation over packets that are being coalesced, and needs to 1587 fabricate a single meaningful header from all the coalesced packets. 1589 The conservative approach to supporting LRO for GUE would be to 1590 assign packets to the same flow only if they have identical five- 1591 tuple and were encapsulated the same way. That is the outer IP 1592 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1593 and inner five tuple are all identical. 1595 Appendix B: Implementation considerations 1597 This appendix is informational and does not constitute a normative 1598 part of this document. 1600 B.1. Priveleged ports 1602 Using the source port to contain a flow entropy value disallows the 1603 security method of a receiver enforcing that the source port be a 1604 privileged port. Privileged ports are defined by some operating 1605 systems to restrict source port binding. Unix, for instance, 1606 considered port number less than 1024 to be privileged. 1608 Enforcing that packets are sent from a privileged port is widely 1609 considered an inadequate security mechanism and has been mostly 1610 deprecated. To approximate this behavior, an implementation could 1611 restrict a user from sending a packet destined to the GUE port 1612 without proper credentials. 1614 B.2. Setting flow entropy as a route selector 1616 An encapsulator generating flow entropy in the UDP source port could 1617 modulate the value to perform a type of multipath source routing. 1618 Assuming that networking switches perform ECMP based on the flow 1619 hash, a sender can affect the path by altering the flow entropy. For 1620 instance, a host can store a flow hash in its PCB for an inner flow, 1621 and might alter the value upon detecting that packets are traversing 1622 a lossy path. Changing the flow entropy for a flow SHOULD be subject 1623 to hysteresis (at most once every thirty seconds) to limit the number 1624 of out of order packets. 1626 B.3. Hardware protocol implementation considerations 1628 Low level data path protocol, such is GUE, are often supported in 1629 high speed network device hardware. Variable length header (VLH) 1630 protocols like GUE are often considered difficult to efficiently 1631 implement in hardware. In order to retain the important 1632 characteristics of an extensible and robust protocol, hardware 1633 vendors may practice "constrained flexibility". In this model, only 1634 certain combinations or protocol header parameterizations are 1635 implemented in hardware fast path. Each such parameterization is 1636 fixed length so that the particular instance can be optimized as a 1637 fixed length protocol. In the case of GUE this constitutes specific 1638 combinations of GUE flags, fields, and next protocol. The selected 1639 combinations would naturally be the most common cases which form the 1640 "fast path", and other combinations are assumed to take the "slow 1641 path". 1643 In time, needs and requirements of the protocol may change which may 1644 manifest themselves as new parameterizations to be supported in the 1645 fast path. To allow allow this extensibility, a device practicing 1646 constrained flexibility should allow the fast path parameterizations 1647 to be programmable. 1649 Authors' Addresses 1651 Tom Herbert 1652 Quantonium 1653 4701 Patrick Henry 1654 Santa Clara, CA 95054 1655 US 1657 Email: tom@herbertland.com 1659 Lucy Yong 1660 Huawei USA 1661 5340 Legacy Dr. 1662 Plano, TX 75024 1663 US 1665 Email: lucy.yong@huawei.com 1667 Osama Zia 1668 Microsoft 1669 1 Microsoft Way 1670 Redmond, WA 98029 1671 US 1673 Email: osamaz@microsoft.com