idnits 2.17.1 draft-ietf-intarea-gue-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 4, 2019) is 1666 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2460' is mentioned on line 866, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC8200' is mentioned on line 1000, but not defined == Missing Reference: 'RFC768' is mentioned on line 1000, but not defined == Missing Reference: 'RFC2914' is mentioned on line 1014, but not defined == Missing Reference: 'MUTLIQ' is mentioned on line 1536, but not defined == Unused Reference: 'RFC8084' is defined on line 1424, but no explicit reference was found in the text == Unused Reference: 'MULTIQ' is defined on line 1510, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-10 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-10 Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Area WG T. Herbert 3 Internet-Draft Quantonium 4 Intended status: Standard track L. Yong 5 Expires April 6, 2020 Independent 6 O. Zia 7 Microsoft 8 October 4, 2019 10 Generic UDP Encapsulation 11 draft-ietf-intarea-gue-08 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on April 6, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of different IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 as tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . . 5 75 1.2. Terminology and acronyms . . . . . . . . . . . . . . . . . 6 76 1.3. Requirements Language . . . . . . . . . . . . . . . . . . . 7 77 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 8 78 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 8 79 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 9 81 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 10 82 3.2.1. Proto field . . . . . . . . . . . . . . . . . . . . . . 10 83 3.2.2. Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 84 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 11 85 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 11 86 3.3.2. Example GUE header with extension fields . . . . . . . 12 87 3.4. Surplus space . . . . . . . . . . . . . . . . . . . . . . . 12 88 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 89 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 90 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 13 91 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 92 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 14 93 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 15 94 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 95 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 16 96 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 16 97 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 17 98 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 17 99 5.4.1. Processing a received data message . . . . . . . . . . 17 100 5.4.2. Processing a received control message . . . . . . . . . 18 101 5.5. Middlebox inspection . . . . . . . . . . . . . . . . . . . 18 102 5.6. Router and switch operation . . . . . . . . . . . . . . . . 19 103 5.6.1. Connection semantics . . . . . . . . . . . . . . . . . 19 104 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 20 105 5.7. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 20 106 5.8. UDP Checksum Handling . . . . . . . . . . . . . . . . . . . 20 107 5.8.1. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 20 108 5.8.2. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 21 109 5.9. Congestion Considerations . . . . . . . . . . . . . . . . . 24 110 5.9.1. GUE tunnels . . . . . . . . . . . . . . . . . . . . . . 24 111 5.9.2 Transport layer encapsulation . . . . . . . . . . . . . 25 112 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 25 113 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 25 114 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 25 115 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 26 116 5.12. Negotiation of acceptable flags and extension fields . . . 27 117 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 27 118 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 27 119 6.2. Comparison of GUE to other encapsulations . . . . . . . . . 28 120 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 30 121 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 30 122 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 30 123 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 31 124 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 31 125 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 126 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 127 10.1. Normative References . . . . . . . . . . . . . . . . . . . 32 128 10.2. Informative References . . . . . . . . . . . . . . . . . . 33 129 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 36 130 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 36 131 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 36 132 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 37 133 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 37 134 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 38 135 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 39 136 Appendix B: Implementation considerations . . . . . . . . . . . . 39 137 B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 39 138 B.2. Setting flow entropy as a route selector . . . . . . . . . 40 139 B.3. Hardware protocol implementation considerations . . . . . . 40 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 142 1. Introduction 144 This specification describes Generic UDP Encapsulation (GUE) which is 145 a general method for encapsulating packets of arbitrary IP protocols 146 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 147 packets in UDP facilitates efficient transport across networks. 148 Networking devices widely provide protocol specific processing and 149 optimizations for UDP (as well as TCP) packets. Packets for atypical 150 IP protocols (those not usually parsed by networking hardware) can be 151 encapsulated in UDP packets to maximize deliverability and to 152 leverage flow specific mechanisms for routing and packet steering. 154 GUE provides an extensible header format for including optional data 155 in the encapsulation header. This data potentially covers items such 156 as a virtual networking identifier, security data for validating or 157 authenticating the GUE header, congestion control data, etc. 159 This document does not define any specific GUE extensions. [GUEEXTEN] 160 specifies a set of initial extensions. 162 1.1. Applicability 164 GUE is a network encapsulation protocol that encapsulates packets for 165 various IP protocols. Potential use cases include network tunneling, 166 multi-tenant network virtualization, tunneling for mobility, and 167 transport layer encapsulation. GUE is intended for deploying overlay 168 networks in public or private data center environments, as well as 169 providing a general tunneling mechanism usable in the Internet. 171 GUE is a UDP based encapsulation protocol transported over existing 172 IPv4 and IPv6 networks. Hence, as a UDP based protocol, GUE adheres 173 to the UDP usage guidelines as specified in [RFC8085]. Applicability 174 of these guidelines are dependent on the underlay IP network and the 175 nature of GUE payload protocol (for example TCP/IP or IP/Ethernet). 176 GUE may also be used to create IP tunnels, hence the guidelines in 177 [IPTUN] are applicable. 179 [RFC8085] outlines two applicability scenarios for UDP applications: 180 (1) general Internet and (2) a traffic-managed controlled environment 181 (TMCE). The requirements of [RFC8085] pertaining to deployment of a 182 UDP encapsulation protocol in these environments are applicable. 183 Section 5 provides the specifics for satisfying requirements of 184 [RFC8085]. It is the responsibility of the operator deploying GUE to 185 ensure that the necessary operational requirements are met for the 186 environment in which GUE is being deployed. 188 GUE has much of the same applicability and benefits as GRE-in-UDP 189 [RFC8086] that are afforded by UDP encapsulation protocols. GUE 190 offers the possibility of good performance for load-balancing 191 encapsulated IP traffic in transit networks using existing Equal-Cost 192 Multipath (ECMP) mechanisms that use a hash of the five-tuple of 193 source IP address, destination IP address, UDP/TCP source port, 194 UDP/TCP destination port, and protocol number. Encapsulating packets 195 in UDP enables use of the UDP source port to provide entropy to ECMP 196 hashing. A material difference between GUE and GRE-in-UDP is that the 197 payload of GUE is always an IP protocol whereas the payload in GRE- 198 in-UDP may be a non-IP protocol; this distinction is pertinent in the 199 discussion of congestion considerations (section 5.9) since IP 200 protocols are generally assumed to be congestion controlled. 202 In addition, GUE enables extending the use of atypical IP protocols 203 (those other than TCP and UDP) across networks that might otherwise 204 filter packets carrying those protocols. GUE may also be used with 205 connection oriented UDP semantics in order to facilitate traversal 206 through stateful firewalls and stateful NAT. 208 Additional motivation for the GUE protocol is provided in section 6. 210 1.2. Terminology and acronyms 212 GUE Generic UDP Encapsulation 214 GUE Header A variable length protocol header that is composed 215 of a primary four byte header and zero or more four 216 byte words of optional header data 218 GUE packet A UDP/IP packet that contains a GUE header and GUE 219 payload within the UDP payload 221 GUE variant A version of the GUE protocol or an alternate form 222 of a version 224 Encapsulator A network node that encapsulates packets in GUE 226 Decapsulator A network node that decapsulates and processes 227 packets encapsulated in GUE 229 Data message An encapsulated packet in a GUE payload that is 230 addressed to the protocol stack for an associated 231 protocol 233 Control message A formatted message in the GUE payload that is 234 implicitly addressed to the decapsulator to monitor 235 or control the state or behavior of a tunnel 237 Flags A set of bit flags in the primary GUE header 238 Extension field An optional field in a GUE header whose presence is 239 indicated by corresponding flag(s) 241 C-bit A single bit flag in the primary GUE header that 242 indicates whether the GUE packet contains a control 243 message or data message 245 Hlen A field in the primary GUE header that gives the 246 length of the GUE header 248 Proto/ctype A field in the GUE header that holds either the IP 249 protocol number for a data message or a type for a 250 control message 252 Outer IP header Refers to the outer most IP header or packet when 253 encapsulating a packet over IP 255 Inner IP header Refers to an encapsulated IP header when an IP 256 packet is encapsulated 258 Outer packet Refers to an encapsulating packet 260 Inner packet Refers to a packet that is encapsulated 262 TMCE A traffic-managed controlled environment, i.e., an 263 IP network that is traffic-engineered and/or 264 otherwise managed 266 1.3. Requirements Language 268 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 269 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 270 document are to be interpreted as described in [RFC2119]. 272 2. Base packet format 274 A GUE packet is comprised of a UDP packet whose payload is a GUE 275 header followed by a payload which is either an encapsulated packet 276 of some IP protocol or a control message such as an OAM (Operations, 277 Administration, and Management) message. A GUE packet has the general 278 format: 280 +-------------------------------+ 281 | | 282 | UDP/IP header | 283 | | 284 |-------------------------------| 285 | | 286 | GUE Header | 287 | | 288 |-------------------------------| 289 | | 290 | Encapsulated packet | 291 | or control message | 292 | | 293 +-------------------------------+ 295 The GUE header is variable length as determined by the presence of 296 optional extension fields. 298 2.1. GUE variant 300 The first two bits of the GUE header contain the GUE protocol variant 301 number. The variant number can indicate the version of the GUE 302 protocol as well as alternate forms of a version. 304 Variants 0 and 1 are described in this specification; variants 2 and 305 3 are reserved. 307 3. Variant 0 309 Variant 0 indicates version 0 of GUE. This variant defines a generic 310 extensible format to encapsulate packets by Internet protocol number. 312 3.1. Header format 314 The header format for variant 0 of GUE in UDP is: 316 0 1 2 3 317 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 318 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 319 | Source port | Destination port | | 320 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 321 | Length | Checksum | | 322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 323 | 0 |C| Hlen | Proto/ctype | Flags |\ 324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 325 | | GUE 326 ~ Extensions Fields (optional) ~ | 327 | | | 328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 330 The contents of the UDP header are: 332 o Source port: If connection semantics (section 5.6.1) are applied 333 to an encapsulation, this is set to the local source port for 334 the connection. When connection semantics are not applied, the 335 source port is either set to a flow entropy value, as described 336 in section 5.11, or is set to the GUE assigned port number, 337 6080. 339 o Destination port: If connection semantics (section 5.6.1) are 340 applied to an encapsulation, this is set to the destination port 341 for the tuple. If connection semantics are not applied then the 342 destination port is set to the GUE assigned port number, 6080. 344 o Length: Canonical length of the UDP packet (length of UDP header 345 and payload). 347 o Checksum: Standard UDP checksum (handling is described in 348 section 5.8). 350 The GUE header consists of: 352 o Variant: 0 indicates GUE protocol version 0 with a header. 354 o C: C-bit: When set indicates a control message. When not set 355 indicates a data message. 357 o Hlen: Length in 32-bit words of the GUE header, including 358 optional extension fields but not the first four bytes of the 359 header. Computed as (header_len - 4) / 4, where header_len is 360 the total header length in bytes. All GUE headers are a multiple 361 of four bytes in length. Maximum header length is 128 bytes. 363 o Proto/ctype: When the C-bit is set, this field contains a 364 control message type for the payload (section 3.2.2). When the 365 C-bit is not set, the field holds the Internet protocol number 366 for the encapsulated packet in the payload (section 3.2.1). The 367 control message or encapsulated packet begins at the offset 368 provided by Hlen. 370 o Flags: Header flags that may be allocated for various purposes 371 and may indicate the presence of extension fields. Undefined 372 header flag bits MUST be set to zero on transmission. 374 o Extension Fields: Optional fields whose presence is indicated by 375 corresponding flags. 377 3.2. Proto/ctype field 379 The proto/ctype fields either contains an Internet protocol number 380 (when the C-bit is not set) or GUE control message type (when the C- 381 bit is set). 383 3.2.1. Proto field 385 When the C-bit is not set, the proto/ctype field MUST contain an IANA 386 Internet Protocol Number [IANA-PN]. The protocol number is 387 interpreted relative to the IP protocol that encapsulates the UDP 388 packet (i.e. protocol of the outer IP header). The protocol number 389 serves as an indication of the type of the next protocol header which 390 is contained in the GUE payload at the offset indicated in Hlen. 392 IP protocol number 59 ("No next header") can be set to indicate that 393 the GUE payload does not begin with the header of an IP protocol. 394 This would be the case, for instance, if the GUE payload were a 395 fragment when performing GUE level fragmentation. The interpretation 396 of the payload is performed through other means such as flags and 397 extension fields, and nodes MUST NOT parse packets based on the IP 398 protocol number in this case. 400 3.2.2. Ctype field 402 When the C-bit is set, the proto/ctype field MUST be set to a valid 403 control message type. A value of zero indicates that the GUE payload 404 requires further interpretation to deduce the control type. This 405 might be the case when the payload is a fragment of a control 406 message, where only the reassembled packet can be interpreted as a 407 control message. 409 Control messages will be defined in an IANA registry. Control message 410 types 1 through 127 may be defined in standards. Types 128 through 411 255 are reserved to be user defined for experimentation. 413 This document does not specify any standard control message types 414 other than type 0. Type 0 indicates that the GUE payload is a control 415 message, or part of a control message (as might be the case in GUE 416 fragmentation) that cannot be correctly parsed or interpreted without 417 additional context. 419 3.3. Flags and extension fields 421 Flags and associated extension fields are the primary mechanism of 422 extensibility in GUE. As mentioned in section 3.1, GUE header flags 423 indicate the presence of optional extension fields in the GUE header. 424 [GUEEXTEN] defines an initial set of GUE extensions. 426 3.3.1. Requirements 428 There are sixteen flag bits in the GUE header. Flags may indicate 429 presence of extension fields. The size of an extension field 430 indicated by a flag MUST be fixed in the specification of the flag. 432 Flags can be grouped together to allow different lengths for an 433 extension field. For example, if two flag bits are grouped, a field 434 can possibly be three different lengths-- that is bit value of 00 435 indicates no field present; 01, 10, and 11 indicate three possible 436 lengths for the field. Regardless of how flag bits are grouped, the 437 lengths and offsets of extension fields corresponding to a set of 438 flags MUST be well defined and deterministic. 440 Extension fields are placed in order of the flags. New flags are to 441 be allocated from high to low order bit contiguously without holes. 442 Flags allow random access, for instance to inspect the field 443 corresponding to the Nth flag bit, an implementation only considers 444 the previous N-1 flags to determine the offset. Flags after the Nth 445 flag are not pertinent in calculating the offset of the field for the 446 Nth flag. Random access of flags and fields permits processing of 447 optional extensions in an order that is independent of their position 448 in the packet. 450 Flags (or grouped flags) are idempotent such that new flags MUST NOT 451 cause reinterpretation of old flags. Also, new flags MUST NOT alter 452 interpretation of other elements in the GUE header nor how the 453 message is parsed (for instance, in a data message the proto/ctype 454 field always holds an IP protocol number as an invariant). 456 The set of available flags can be extended in the future by defining 457 a "flag extensions bit" that refers to a field containing a new set 458 of flags. 460 3.3.2. Example GUE header with extension fields 462 An example GUE header for a data message encapsulating an IPv4 packet 463 and containing the Group Identifier and Security extension fields 464 (both defined in [GUEEXTEN]) is shown below: 466 0 1 2 3 467 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | 0 |0| 3 | 4 |1|0 0 1| 0 | 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 471 | Group Identifier | 472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 473 | | 474 + Security + 475 | | 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 In the above example, the first flag bit is set which indicates that 479 the Group Identifier extension is present which is a 32 bit field. 480 The second through fourth bits of the flags are grouped flags that 481 indicate the presence of a Security field with seven possible sizes. 482 In this example 001 indicates a sixty-four bit security field. 484 3.4. Surplus space 486 The length of a GUE header, as indicated in the GUE Hlen field, may 487 exceed the space consumed by optional extensions in a packet. The 488 space between the end of the last optional field and the end of the 489 header is termed the "surplus space". 491 Surplus space is reserved per this specification and uses may be 492 defined in future specifications. If a node receives a GUE packet 493 with non-zero length of surplus space then it MUST NOT attempt to 494 interpret the data in the surplus space. For purposes of transforms 495 across the header, such as optional integrity check over the header, 496 the surplus space is considered to be part of the GUE header and 497 would be included in computation. 499 3.5. Message types 501 There are two message types in GUE variant 0: control messages and 502 data messages. 504 3.5.1. Control messages 506 Control messages carry formatted data that are implicitly addressed 507 to the decapsulator to monitor or control the state or behavior of a 508 tunnel (OAM). For instance, an echo request and corresponding echo 509 reply message can be defined to test for liveness. 511 Control messages are indicated in the GUE header when the C-bit is 512 set. The payload is interpreted as a control message with type 513 specified in the proto/ctype field. The format and contents of the 514 control message are indicated by the type and can be variable length. 516 Other than interpreting the proto/ctype field as a control message 517 type, the meaning and semantics of the rest of the elements in the 518 GUE header are the same as that of data messages. Forwarding and 519 routing of control messages should be the same as that of a data 520 message with the same outer IP and UDP header; this ensures that 521 control messages can be created that follow the same path through the 522 network as data messages. 524 3.5.2. Data messages 526 Data messages carry encapsulated packets that are addressed to the 527 protocol stack for the associated protocol. Data messages are a 528 primary means of encapsulation and can be used to create tunnels for 529 overlay networks. 531 Data messages are indicated in the GUE header when the C-bit is not 532 set. The payload of a data message is interpreted as an encapsulated 533 packet of an Internet protocol indicated in the proto/ctype field. 534 The encapsulated packet immediately follows the GUE header. 536 4. Variant 1 538 Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. 539 In this variant there is no GUE header, a UDP packet carries an IP 540 packet. The first two bits of the UDP payload are the GUE variant 541 field and coincide with the first two bits of the version number in 542 the IP header. The first two version bits of IPv4 and IPv6 are 01, so 543 we use GUE variant 1 for direct IP encapsulation which makes the two 544 bits of GUE variant to also be 01. 546 This technique is effectively a means to compress out the GUE version 547 0 header when encapsulating IPv4 or IPv6 packets and there are no 548 flags or extension fields. This method is compatible to use on the 549 same port number as packets with the GUE header (GUE variant 0 550 packets). This technique saves encapsulation overhead on costly links 551 for the common use of IP encapsulation, and also obviates the need to 552 allocate a separate UDP port number for IP-over-UDP encapsulation. 554 4.1. Direct encapsulation of IPv4 556 The format for encapsulating IPv4 directly in UDP is: 558 0 1 2 3 559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 561 | Source port | Destination port | | 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 563 | Length | Checksum | | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 565 |0|1|0|0| IHL |Type of Service| Total Length | 566 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 | Identification |Flags| Fragment Offset | 568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 569 | Time to Live | Protocol | Header Checksum | 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 | Source IPv4 Address | 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 | Destination IPv4 Address | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 The UDP fields are set in a similar manner as described in section 577 3.1. 579 Note that the 0100 value in the first four bits of the UDP payload 580 expresses both the GUE variant as 1 (bits 01) and IP version as 4 581 (bits 0100). 583 4.2. Direct encapsulation of IPv6 585 The format for encapsulating IPv6 directly in UDP is demonstrated 586 below: 588 0 1 2 3 589 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 591 | Source port | Destination port | | 592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 593 | Length | Checksum | | 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 595 |0|1|1|0| Traffic Class | Flow Label | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Payload Length | NextHdr | Hop Limit | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | | 600 + + 601 | | 602 + Source IPv6 Address + 603 | | 604 + + 605 | | 606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 | | 608 + + 609 | | 610 + Destination IPv6 Address + 611 | | 612 + + 613 | | 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 616 The UDP fields are set in a similar manner as described in section 617 3.1. 619 Note that the 0110 value in the first four bits of the the UDP 620 payload expresses both the GUE variant as 1 (bits 01) and IP version 621 as 6 (bits 0110). 623 5. Operation 625 The figure below illustrates the use of GUE encapsulation between two 626 hosts. Host 1 is sending packets to Host 2. An encapsulator performs 627 encapsulation of packets from Host 1. These encapsulated packets 628 traverse the network as UDP packets. At the decapsulator, packets are 629 decapsulated and sent on to Host 2. Packet flow in the reverse 630 direction need not be symmetric; for example, the reverse path might 631 not use GUE or any other form of encapsulation. 633 +---------------+ +---------------+ 634 | | | | 635 | Host 1 | | Host 2 | 636 | | | | 637 +---------------+ +---------------+ 638 | ^ 639 V | 640 +---------------+ +---------------+ +---------------+ 641 | | | | | | 642 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 643 | | | Network | | | 644 +---------------+ +---------------+ +---------------+ 646 The encapsulator and decapsulator may be co-resident with the 647 corresponding hosts, or may be on separate nodes in the network. 649 5.1. Network tunnel encapsulation 651 Network tunneling can be achieved by encapsulating layer 2 or layer 3 652 packets. In this case, the encapsulator and decapsulator nodes are 653 the tunnel endpoints. These could be routers that provide network 654 tunnels on behalf of communicating hosts. 656 5.2. Transport layer encapsulation 658 When encapsulating layer 4 packets, the encapsulator and decapsulator 659 should be co-resident with the hosts. In this case, the encapsulation 660 headers are inserted between the IP header and the transport packet. 661 The addresses in the IP header refer to both the endpoints of the 662 encapsulation and the endpoints for terminating the encapsulated 663 transport protocol. Note that the transport layer ports in the 664 encapsulated packet are independent of the UDP ports in the outer 665 packet. 667 5.3. Encapsulator operation 669 Encapsulators create GUE data messages, set the fields of the UDP 670 header, set flags and optional extension fields in the GUE header, 671 and forward packets to a decapsulator. 673 An encapsulator can be an end host originating the packets of a flow, 674 or can be a network device performing encapsulation on behalf of 675 hosts (routers implementing tunnels for instance). In either case, 676 the intended target (decapsulator) is indicated by the outer 677 destination IP address and destination port in the UDP header. 679 If an encapsulator is tunneling packets, that is encapsulating 680 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 681 tunnel mode), it SHOULD follow standard conventions for tunneling one 682 protocol over another. For instance, if an IP packet is being 683 encapsulated in GUE then diffserv interaction [RFC2983] and ECN 684 propagation for tunnels [RFC6040] SHOULD be followed. 686 5.4. Decapsulator operation 688 A decapsulator performs decapsulation of GUE packets. A decapsulator 689 is addressed by the outer destination IP address and UDP destination 690 port of a GUE packet. The decapsulator validates packets, including 691 fields of the GUE header. 693 If a decapsulator receives a GUE packet with an unsupported variant, 694 unknown flag, bad header length (too small for included extension 695 fields), unknown control message type, bad protocol number, an 696 unsupported payload type, or an otherwise malformed header, it MUST 697 drop the packet. Such events MAY be logged subject to configuration 698 and rate limiting of logging messages. Note that set flags in a GUE 699 header that are unknown to a decapsulator MUST NOT be ignored. If a 700 GUE packet is received by a decapsulator with unknown flags, the 701 packet MUST be dropped. 703 5.4.1. Processing a received data message 705 If a valid data message is received, the UDP header and GUE header 706 are (logically) removed from the packet. The outer IP header remains 707 intact and the next protocol in the IP header is set to the protocol 708 from the proto field in the GUE header. The resulting packet is then 709 resubmitted into the protocol stack to process the packet as though 710 it was received with the protocol indicated in the GUE header. 712 As an example, consider that a data message is received where GUE 713 encapsulates an IPv4 packet using GUE variant 0. In this case proto 714 field in the GUE header is set to 4 for IPv4 encapsulation: 716 +-------------------------------------+ 717 | IP header (next proto = 17,UDP) | 718 |-------------------------------------| 719 | UDP | 720 |-------------------------------------| 721 | GUE (proto = 4,IPv4 encapsulation) | 722 |-------------------------------------| 723 | IPv4 header and packet | 724 +-------------------------------------+ 726 The receiver removes the UDP and GUE headers and sets the next 727 protocol field in the IP packet to 4, which is derived from the GUE 728 proto field. The resultant packet would have the format: 730 +-------------------------------------+ 731 | IP header (next proto = 4,IPv4) | 732 |-------------------------------------| 733 | IPv4 header and packet | 734 +-------------------------------------+ 736 This packet is then resubmitted into the protocol stack to be 737 processed as an IPv4 encapsulated packet. 739 5.4.2. Processing a received control message 741 If a valid control message is received, the packet MUST be processed 742 as a control message. The specific processing to be performed depends 743 on the value in the ctype field of the GUE header. 745 5.5. Middlebox inspection 747 A middlebox MAY inspect a GUE header. A middlebox MUST NOT modify a 748 GUE header or UDP payload. 750 To inspect a GUE header, a middlebox needs to identify GUE packets. 751 The obvious method is to match the destination UDP port number to be 752 the GUE port number (i.e. 6080). Per [RFC7605], transport port 753 numbers only have meaning at the endpoints of communications, so 754 inferring the type of a UDP payload based on port number may be 755 incorrect. Middleboxes MUST NOT take any action that would have 756 harmful side effects if a UDP packet were misinterpreted as being a 757 GUE packet. In particular, a middlebox MUST NOT modify a UDP payload 758 based on inferring the payload type from the port number lest the 759 middlebox could cause silent data corruption. 761 A middlebox MAY interpret some flags and extension fields of the GUE 762 header for classification purposes, but is not required to understand 763 any of the flags or extension fields in GUE packets. A middlebox MUST 764 NOT drop a GUE packet merely because there are flags unknown to it. 765 Similarly, a middlebox MUST NOT arbitrarily filter packets based on 766 GUE flags or extension fields that are present or not present. The 767 header length in the GUE header allows a middlebox to inspect the 768 payload packet without needing to parse the flags or extension 769 fields. 771 5.6. Router and switch operation 773 Routers and switches SHOULD forward GUE packets as standard UDP/IP 774 packets. The outer five-tuple should contain sufficient information 775 to perform flow classification corresponding to the flow of the inner 776 packet. A router does not normally need to parse a GUE header, and 777 none of the flags or extension fields in the GUE header are expected 778 to affect routing. In cases where the outer five-tuple does not 779 provide sufficient entropy for flow classification, for instance UDP 780 ports are fixed to provide connection semantics (section 5.6.1), then 781 the encapsulated packet MAY be parsed to determine flow entropy. 783 A router MUST NOT modify a GUE header or payload when forwarding a 784 packet. It MAY encapsulate a GUE packet in another GUE packet, for 785 instance to implement a network tunnel (i.e. by encapsulating an IP 786 packet with a GUE payload in another IP packet as a GUE payload). In 787 this case, the router takes the role of an encapsulator, and the 788 corresponding decapsulator is the logical endpoint of the tunnel. 789 When encapsulating a GUE packet within another GUE packet, there are 790 no provisions to automatically copy flags or fields to the outer GUE 791 header. Each layer of encapsulation is considered independent. 793 5.6.1. Connection semantics 795 A middlebox might infer bidirectional connection semantics for a UDP 796 flow. For instance, a stateful firewall might create a five-tuple 797 rule to match flows on egress, and a corresponding five-tuple rule 798 for matching ingress packets where the roles of source and 799 destination are reversed for the IP addresses and UDP port numbers. 800 To operate in this environment, a GUE tunnel should be configured to 801 assume connected semantics defined by the UDP five tuple and the use 802 of GUE encapsulation needs to be symmetric between both endpoints. 803 The source port set in the UDP header MUST be the destination port 804 the peer would set for replies. In this case, the UDP source port for 805 a tunnel would be a fixed value and not set to be flow entropy. 807 The selection of whether to make the UDP source port fixed or set to 808 a flow entropy value for each packet sent SHOULD be configurable for 809 a tunnel. The default MUST be to set the flow entropy value in the 810 UDP source port. 812 5.6.2. NAT 814 IP address and port translation can be performed on the UDP/IP 815 headers adhering to the requirements for NAT (Network Address 816 Translation) with UDP [RFC4787]. In the case of stateful NAT, 817 connection semantics MUST be applied to a GUE tunnel as described in 818 section 5.6.1. GUE endpoints MAY also invoke STUN [RFC5389] or ICE 819 [RFC5245] to manage NAT port mappings for encapsulations. 821 5.7. MTU and fragmentation 823 Standard conventions for handling of MTU (Maximum Transmission Unit) 824 and fragmentation in conjunction with networking tunnels 825 (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. 826 Details are described in MTU and Fragmentation Issues with In-the- 827 Network Tunneling [RFC4459]. 829 If a packet is fragmented before encapsulation in GUE, all the 830 related fragments MUST be encapsulated using the same UDP source 831 port. An operator SHOULD set MTU to account for encapsulation 832 overhead and reduce the likelihood of fragmentation. 834 Alternative to IP fragmentation, the GUE fragmentation extension can 835 be used. GUE fragmentation is described in [GUEEXTEN]. 837 5.8. UDP Checksum Handling 839 5.8.1. UDP Checksum with IPv4 841 For UDP in IPv4, when a non-zero UDP checksum is used, the UDP 842 checksum MUST be processed as specified in [RFC0768] and [RFC1122] 843 for both transmit and receive. The IPv4 header includes a checksum 844 that protects against misdelivery of the packet due to corruption of 845 IP addresses. The UDP checksum potentially provides protection 846 against corruption of the UDP header, GUE header, and GUE payload. 847 Disabling the use of checksums is a deployment consideration that 848 should take into account the risk and effects of packet corruption. 850 When a decapsulator receives a packet, the UDP checksum field MUST be 851 processed. If the UDP checksum is non-zero, the decapsulator MUST 852 verify the checksum before accepting the packet. By default, a 853 decapsulator SHOULD accept UDP packets with a zero checksum. A node 854 MAY be configured to disallow zero checksums per [RFC1122]; this may 855 be done selectively, for instance by disallowing zero checksums from 856 certain hosts that are known to be sending over paths subject to 857 packet corruption. If verification of a non-zero checksum fails, a 858 decapsulator lacks the capability to verify a non-zero checksum, or a 859 packet with a zero checksum was received and the decapsulator is 860 configured to disallow, the packet MUST be dropped and an event MAY 861 be logged. 863 5.8.2. UDP Checksum with IPv6 865 For UDP in IPv6, the UDP checksum MUST be processed as specified in 866 [RFC0768] and [RFC2460] for both transmit and receive. 868 When UDP is used over IPv6, the UDP checksum is relied upon to 869 protect both the IPv6 and UDP headers from corruption. As such, by 870 default a GUE encapsulator MUST use UDP checksums. 872 [GUEEXTEN] specifies a GUE checksum option that includes a pseudo 873 header containing the IP addresses. An encapsulator MAY use zero-UDP 874 checksums if it uses the GUE checksum. A non-zero UDP checksum and 875 the GUE checksum SHOULD NOT be used simultaneously in a packet since 876 that would be redundant. 878 When deployed in a TMCE, a GUE encapsulator MAY be configured to use 879 UDP zero-checksum mode and no GUE checksum if the traffic-managed 880 controlled environment or a set of closely cooperating traffic- 881 managed controlled environments (such as by network operators who 882 have agreed to work together in order to jointly provide specific 883 services) meet at least one of the following conditions: 885 a. It is known (perhaps through knowledge of equipment types and 886 lower-layer checks) that packet corruption is exceptionally 887 unlikely and where the operator is willing to take the risk of 888 undetected packet corruption. 890 b. It is judged through observational measurements (perhaps of 891 historic or current traffic flows that use a non-zero checksum) 892 that the level of packet corruption is tolerably low and where 893 the operator is willing to take the risk of undetected packet 894 corruption. 896 c. Carrying applications that are tolerant of misdelivered or 897 corrupted packets (perhaps through higher-layer checksum, 898 validation, and retransmission or transmission redundancy) 899 where the operator is willing to rely on the applications using 900 GUE to survive any corrupt packets. 902 The following requirements apply to encapsulators deployed in a TMCE 903 environment that use UDP zero-checksum mode: 905 a. Use of the UDP checksum with IPv6 MUST be the default 906 configuration for all communications. 908 b. The GUE implementation MUST comply with all requirements 909 specified in Section 4 of [RFC6936] and with requirement 1 910 specified in Section 5 of [RFC6936]. 912 c. A decapsulator SHOULD only allow the use of UDP zero-checksum 913 mode for IPv6 on a single received UDP Destination Port, 914 regardless of the encapsulator. The motivation for this 915 requirement is possible corruption of the UDP Destination Port, 916 which may cause packet delivery to the wrong UDP port. If that 917 other UDP port requires the UDP checksum, the misdelivered 918 packet will be discarded. 920 d. It is RECOMMENDED that the UDP zero-checksum mode for IPv6 is 921 only enabled for certain selected source addresses. The 922 decapsulator MUST check that the source and destination IPv6 923 addresses in a received packets are permitted by configuration 924 to use UDP zero-checksum mode and discard any packet for which 925 this check fails. 927 e. The tunnel encapsulator SHOULD use different IPv6 addresses for 928 each GUE communication (tunnel or transport flow) that uses UDP 929 zero-checksum mode, regardless of the decapsulator, in order to 930 strengthen the decapsulator's check of the IPv6 source address 931 (i.e., the same IPv6 source address SHOULD NOT be used with 932 more than one IPv6 destination address, independent of whether 933 that destination address is a unicast or multicast address). 934 When this is not possible, it is RECOMMENDED to use each source 935 IPv6 address for as few GUE communications that use UDP zero- 936 checksum mode as is feasible. 938 f. When any middlebox exists on the path of GUE communication, it 939 is RECOMMENDED to use the default mode, i.e., use UDP checksum, 940 to reduce the chance that the encapsulated packets will be 941 dropped. 943 g. Any middlebox that allows the UDP zero-checksum mode for IPv6 944 MUST comply with requirements 1 and 8-10 in Section 5 of 945 [RFC6936]. 947 h. Measures SHOULD be taken to prevent IPv6 traffic with zero UDP 948 checksums from "escaping" to the general Internet; see Section 949 5.9 for examples of such measures. 951 i. IPv6 traffic with zero UDP checksums MUST be actively monitored 952 for errors by the network operator. For example, the operator 953 may monitor Ethernet-layer packet error rates. 955 j. If a packet with a non-zero checksum is received, the checksum 956 MUST be verified before accepting the packet. This is 957 regardless of whether the tunnel encapsulator and decapsulator 958 have been configured with UDP zero-checksum mode. 960 The above requirements do not change either the requirements 961 specified in [RFC8200] as modified by [RFC6935] or the requirements 962 specified in [RFC6936]. 964 The requirement to check the source IPv6 address in addition to the 965 destination IPv6 address and the strong recommendation against reuse 966 of source IPv6 addresses among GUE communications collectively 967 provide some mitigation for the absence of UDP checksum coverage of 968 the IPv6 header. A traffic-managed controlled environment that 969 satisfies at least one of three conditions listed at the beginning of 970 this section provides additional assurance. 972 GUE packets are suitable for transmission over lower layers in the 973 traffic-managed controlled environments that are allowed by the 974 exceptions stated above, and the rate of corruption of the inner IP 975 packet on such networks is not expected to increase by comparison to 976 traffic that is not encapsulated in UDP. For these reasons, GUE does 977 not provide an additional integrity check except when GUE checksum 978 [GUEEXTEN] is used when UDP zero-checksum mode is used with IPv6, and 979 this design is in accordance with requirements 2, 3, and 5 specified 980 in Section 5 of [RFC6936]. 982 Generic UDP Encapsulation does not accumulate incorrect transport- 983 layer state as a consequence of GUE header corruption. A corrupt GUE 984 packet may result in either packet discard or packet forwarding 985 without accumulation of GUE state. Active monitoring of GUE traffic 986 for errors is REQUIRED, as the occurrence of errors will result in 987 some accumulation of error information outside the protocol for 988 operational and management purposes. This design is in accordance 989 with requirement 4 specified in Section 5 of [RFC6936]. 991 The remaining requirements specified in Section 5 of [RFC6936] are 992 not applicable to GUE. Requirements 6 and 7 do not apply because GUE 993 does not include a control feedback mechanism. Requirements 8-10 are 994 middlebox requirements that do not apply to GUE tunnel endpoints. 995 (See Section 5.5 for further middlebox discussion.) 997 In summary, a TMCE GUE tunnel is allowed to use UDP zero- checksum 998 mode for IPv6 when the conditions and requirements stated above are 999 met. Otherwise, the UDP checksum needs to be used for IPv6 as 1000 specified in [RFC768] and [RFC8200]. Use of GUE checksum is 1001 RECOMMENDED when the UDP checksum is not used. 1003 5.9. Congestion Considerations 1005 This section describes congestion considerations for GUE tunnels 1006 (Layer 2 and Layer 3 encapsulation) and transport layer encapsulation 1007 (Layer 4 protocol over GUE). 1009 5.9.1. GUE tunnels 1011 Section 3.1.9 of [RFC8085] discusses the congestion considerations 1012 for design and use of UDP tunnels; this is important because other 1013 flows could share the path with one or more UDP tunnels, 1014 necessitating congestion control [RFC2914] to avoid destructive 1015 interference. 1017 Congestion has potential impacts both on the rest of the network 1018 containing a UDP tunnel and on the traffic flows using the UDP 1019 tunnels. These impacts depend upon what sort of traffic is carried 1020 over the tunnel, as well as the path of the tunnel. The GUE protocol 1021 does not provide any congestion control and GUE UDP packets are 1022 regular UDP packets. Therefore, a GUE tunnel MUST NOT be deployed to 1023 carry non-congestion-controlled traffic over the Internet [RFC8085]. 1025 Within a TMCE network, GUE tunnels are appropriate for carrying 1026 traffic that is not known to be congestion controlled. For example, a 1027 GUE tunnel may be used to carry Multiprotocol Label Switching (MPLS) 1028 traffic such as pseudowires or VPNs where specific bandwidth 1029 guarantees are provided to each pseudowire or VPN. In such cases, 1030 operators of TMCE networks avoid congestion by careful provisioning 1031 of their networks, rate-limiting of user data traffic, and traffic 1032 engineering according to path capacity. 1034 When a GUE tunnel carries traffic that is not known to be congestion 1035 controlled in a TMCE network, the tunnel MUST be deployed entirely 1036 within that network, and measures SHOULD be taken to prevent the GUE 1037 traffic from "escaping" the network to the general Internet. Examples 1038 of such measures are: 1040 o physical or logical isolation of the links carrying GUE from the 1041 general Internet, 1043 o deployment of packet filters that block the UDP ports assigned 1044 for GUE, and 1046 o imposition of restrictions on GUE traffic by software tools used 1047 to set up GUE tunnels between specific end systems (as might be 1048 used within a single data center) or by tunnel ingress nodes for 1049 tunnels that don't terminate at end systems. 1051 5.9.2 Transport layer encapsulation 1053 If GUE encapsulates a transport layer protocol, such as TCP, it is 1054 expected that the transport layer or application layer properly 1055 implements congestion control or avoidance. In the case that UDP is 1056 encapsulated, the application is expected to provide congestion 1057 control as specified in [RFC8085]. 1059 5.10. Multicast 1061 GUE packets can be multicast to decapsulators using a multicast 1062 destination address in the outer IP header. Each receiving host will 1063 decapsulate the packet independently following normal decapsulator 1064 operations. The receiving decapsulators need to agree on the same set 1065 of GUE parameters and properties; how such an agreement is reached is 1066 outside the scope of this document. 1068 GUE allows encapsulation of unicast, broadcast, or multicast traffic. 1069 Flow entropy (the value in the UDP source port) can be generated from 1070 the header of encapsulated unicast or broadcast/multicast packets at 1071 an encapsulator. The mapping mechanism between the encapsulated 1072 multicast traffic and the multicast capability in the IP network is 1073 transparent and independent of the encapsulation and is otherwise 1074 outside the scope of this document. 1076 5.11. Flow entropy for ECMP 1078 A major objective of using GUE is that a network device can perform 1079 flow classification corresponding to the flow of the inner 1080 encapsulated packet based on the contents of the outer headers. 1082 5.11.1. Flow classification 1084 When a packet is encapsulated with GUE and connection semantics are 1085 not applied, the source port in the outer UDP packet is set to a flow 1086 entropy value that corresponds to the flow of the inner packet. When 1087 a device computes a five-tuple hash on the outer UDP/IP header of a 1088 GUE packet, the resultant value classifies the packet per its inner 1089 flow. 1091 Examples of deriving flow entropy for encapsulation are: 1093 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 1094 instance, the flow entropy could be based on the canonical five- 1095 tuple hash of the inner packet. 1097 o If the encapsulated packet is an AH transport mode packet with 1098 TCP as next header, the flow entropy could be a hash over a 1099 three-tuple: TCP protocol and TCP ports of the encapsulated 1100 packet. 1102 o If a node is encrypting a packet using ESP tunnel mode and GUE 1103 encapsulation, the flow entropy could be based on the contents 1104 of the clear-text packet. For instance, a canonical five-tuple 1105 hash for a TCP/IP packet could be used. 1107 [RFC6438] discusses methods to compute and set flow entropy value for 1108 IPv6 flow labels, such methods can also be used to create flow 1109 entropy values for GUE. 1111 5.11.2. Flow entropy properties 1113 The flow entropy is the value set in the UDP source port of a GUE 1114 packet. Flow entropy in the UDP source port SHOULD adhere to the 1115 following properties: 1117 o The value set in the source port is within the ephemeral port 1118 range (49152 to 65535 [RFC6335]). Since the high order two bits 1119 of the port are set to one, this provides fourteen bits of 1120 entropy for the value. 1122 o The flow entropy has a uniform distribution across encapsulated 1123 flows. 1125 o An encapsulator MAY occasionally change the flow entropy used 1126 for an inner flow per its discretion (for security, route 1127 selection, etc). To avoid thrashing or flapping the value, the 1128 flow entropy used for a flow SHOULD NOT change more than once 1129 every thirty seconds (or a configurable value). 1131 o Decapsulators, or any networking devices, SHOULD NOT attempt to 1132 interpret flow entropy as anything more than an opaque value. 1133 Neither should they attempt to reproduce the hash calculation 1134 used by an encapasulator in creating a flow entropy value. They 1135 MAY use the value to match further receive packets for steering 1136 decisions, but MUST NOT assume that the hash uniquely or 1137 permanently identifies a flow. 1139 o Input to the flow entropy calculation is not restricted to ports 1140 and addresses; input could include the flow label from an IPv6 1141 packet, SPI from an ESP packet, or other flow related state in 1142 the encapsulator that is not necessarily conveyed in the packet. 1144 o The assignment function for flow entropy SHOULD be randomly 1145 seeded to mitigate denial of service attacks. The seed SHOULD be 1146 changed periodically. 1148 5.12. Negotiation of acceptable flags and extension fields 1150 An encapsulator and decapsulator need to achieve agreement about GUE 1151 parameters that will be used in communications. Parameters include 1152 supported GUE variants, flags and extension fields that can be used, 1153 security algorithms and keys, supported protocols and control 1154 messages, etc. This document proposes different general methods to 1155 accomplish this, however the details of implementing these are 1156 considered out of scope. 1158 General methods for this are: 1160 o Configuration. The parameters used for a tunnel are configured 1161 at each endpoint. 1163 o Negotiation. A tunnel negotiation can be performed. This could 1164 be accomplished in-band of GUE using control messages. 1166 o Via a control plane. Parameters for communicating with a tunnel 1167 endpoint can be set in a control plane protocol (such as that 1168 needed for network virtualization). 1170 o Via security negotiation. Use of security typically implies a 1171 key exchange between endpoints. Other GUE parameters may be 1172 conveyed as part of that process. 1174 6. Motivation for GUE 1176 This section provides the motivation for GUE with respect to other 1177 encapsulation methods. 1179 6.1. Benefits of GUE 1181 * GUE is a generic encapsulation protocol. GUE can encapsulate 1182 protocols that are represented by an IP protocol number. This 1183 includes layer 2, layer 3, and layer 4 protocols. 1185 * GUE is an extensible encapsulation protocol. Standardized 1186 optional data such as security, virtual networking identifiers, 1187 fragmentation are defined. 1189 * For extensibility, GUE uses flag fields as opposed to TLVs as 1190 some other encapsulation protocols do. Flag fields are strictly 1191 ordered, allow random access, and are efficient in use of header 1192 space. 1194 * GUE allows sending of control messages such as OAM using the 1195 same GUE header format (for routing purposes) as normal data 1196 messages. 1198 * GUE maximizes deliverability of non-UDP and non-TCP protocols. 1200 * GUE provides a means for exposing per flow entropy for ECMP for 1201 IP atypical protocols such as SCTP, DCCP, ESP, etc. 1203 6.2. Comparison of GUE to other encapsulations 1205 A number of different encapsulation techniques have been proposed for 1206 the encapsulation of one protocol over another. EtherIP [RFC3378] 1207 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 1208 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 1209 layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN 1210 [RFC7348] are proposals for encapsulation of layer 2 packets for 1211 network virtualization. IPIP [RFC2003] and Generic packet tunneling 1212 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 1214 Several proposals exist for encapsulating packets over UDP including 1215 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN 1216 [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, 1217 MPLS/UDP [RFC7510], GENEVE [GENEVE], and GRE-in-UDP Encapsulation 1218 [RFC8086]. 1220 GUE has the following discriminating features: 1222 o UDP encapsulation leverages specialized network device 1223 processing for efficient transport. The semantics for using the 1224 UDP source port for flow entropy as input to ECMP are defined in 1225 section 5.11. 1227 o GUE permits encapsulation of arbitrary IP protocols, which 1228 includes layer 2, 3, and 4 protocols. 1230 o Multiple protocols can be multiplexed over a single UDP port 1231 number. This is in contrast to techniques to encapsulate 1232 protocols over UDP using a protocol specific port number (such 1233 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 1234 extensible mechanism for encapsulating all IP protocols in UDP 1235 with minimal overhead (four bytes of additional header). 1237 o GUE is extensible. New flags and extension fields can be 1238 defined. 1240 o The GUE header includes a header length field. This allows a 1241 network node to inspect an encapsulated packet without needing 1242 to parse the full encapsulation header. 1244 o GUE includes both data messages (encapsulation of packets) and 1245 control messages (such as OAM). 1247 o The flags-field model facilitates efficient implementation of 1248 extensibility in hardware. For instance, a TCAM can be used to 1249 parse a known set of N flags where the number of entries in the 1250 TCAM is 2^N. By comparison, the number of TCAM entries needed to 1251 parse a set of N arbitrarily ordered TLVs is approximately e*N!. 1253 o GUE includes a variant that encapsulates IPv4 and IPv6 packets 1254 directly within UDP. 1256 7. Security Considerations 1258 There are two important considerations of security with respect to 1259 GUE. 1261 o Authentication and integrity of the GUE header. 1263 o Authentication, integrity, and confidentiality of the GUE 1264 payload. 1266 GUE security is provided by extensions for security defined in 1267 [GUEEXTEN]. These extensions include methods to authenticate the GUE 1268 header and encrypt the GUE payload. 1270 The GUE header can be authenticated using a security extension for an 1271 HMAC (Hashed Message Authentication Code). Securing the GUE payload 1272 can be accomplished by use of the GUE Payload Transform extension. 1273 This extension allows the use of DTLS (Datagram Transport Layer 1274 Security) to encrypt and authenticate the GUE payload. 1276 A hash function for computing flow entropy (section 5.11) SHOULD be 1277 randomly seeded to mitigate some possible denial service attacks. 1279 8. IANA Considerations 1281 8.1. UDP source port 1283 A user UDP port number assignment for GUE has been assigned: 1285 Service Name: gue 1286 Transport Protocol(s): UDP 1287 Assignee: Tom Herbert 1288 Contact: Tom Herbert 1289 Description: Generic UDP Encapsulation 1290 Reference: draft-herbert-gue 1291 Port Number: 6080 1292 Service Code: N/A 1293 Known Unauthorized Uses: N/A 1294 Assignment Notes: N/A 1296 8.2. GUE variant number 1298 IANA is requested to set up a registry for the GUE variant number. 1299 The GUE variant number is two bits containing four possible values. 1300 This document defines variants 0 and 1. New values are assigned in 1301 accordance with RFC Required policy [RFC5226]. 1303 +----------------+----------------+---------------+ 1304 | Variant number | Description | Reference | 1305 +----------------+----------------+---------------+ 1306 | 0 | GUE Version 0 | This document | 1307 | | with header | | 1308 | | | | 1309 | 1 | GUE Version 0 | This document | 1310 | | with direct IP | | 1311 | | encapsulation | | 1312 | | | | 1313 | 2..3 | Unassigned | | 1314 +----------------+----------------+---------------+ 1316 8.3. Control types 1318 IANA is requested to set up a registry for the GUE control types. 1319 Control types are 8 bit values. New values for control types 1-127 1320 are assigned in accordance with RFC Required policy [RFC5226]. 1322 +----------------+------------------+---------------+ 1323 | Control type | Description | Reference | 1324 +----------------+------------------+---------------+ 1325 | 0 | Control payload | This document | 1326 | | needs more | | 1327 | | context for | | 1328 | | interpretation | | 1329 | | | | 1330 | 1..127 | Unassigned | | 1331 | | | | 1332 | 128..255 | Experimental | This document | 1333 +----------------+------------------+---------------+ 1335 9. Acknowledgements 1337 The authors would like to thank David Liu, Erik Nordmark, Fred 1338 Templin, Adrian Farrel, Bob Briscoe, Murray Kucherawy, Mirja 1339 Kuhlewind, and David Black for valuable input on this draft. Special 1340 thanks to Fred Templin who is serving as document shepherd. 1342 10. References 1344 10.1. Normative References 1346 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1347 10.17487/RFC0768, August 1980, . 1350 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1351 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1352 March 2017, . 1354 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1355 Requirement Levels", BCP 14, RFC 2119, DOI 1356 10.17487/RFC2119, March 1997, . 1359 [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 1360 2983, DOI 10.17487/RFC2983, October 2000, . 1363 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1364 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1365 2010, . 1367 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1368 UDP Checksums for Tunneled Packets", RFC 6935, DOI 1369 10.17487/RFC6935, April 2013, . 1372 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1373 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1374 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1375 . 1377 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1378 Communication Layers", STD 3, RFC 1122, DOI 1379 10.17487/RFC1122, October 1989, . 1382 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1383 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1384 2006, . 1386 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 1387 Cheshire, "Internet Assigned Numbers Authority (IANA) 1388 Procedures for the Management of the Service Name and 1389 Transport Protocol Port Number Registry", BCP 165, RFC 1390 6335, DOI 10.17487/RFC6335, August 2011, . 1393 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1394 IANA Considerations Section in RFCs", RFC 5226, DOI 1395 10.17487/RFC5226, May 2008, . 1398 10.2. Informative References 1400 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 1401 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 1402 March 2017, . 1404 [RFC7605] Touch, J., "Recommendations on Using Assigned Transport 1405 Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, 1406 August 2015, . 1408 [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address 1409 Translation (NAT) Behavioral Requirements for Unicast 1410 UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January 1411 2007, . 1413 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1414 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1415 DOI 10.17487/RFC5389, October 2008, . 1418 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1419 (ICE): A Protocol for Network Address Translator (NAT) 1420 Traversal for Offer/Answer Protocols", RFC 5245, DOI 1421 10.17487/RFC5245, April 2010, . 1424 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP 1425 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1426 . 1428 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1429 for Equal Cost Multipath Routing and Link Aggregation in 1430 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1431 . 1433 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 1434 Ethernet Frames in IP Datagrams", RFC 3378, DOI 1435 10.17487/RFC3378, September 2002, . 1438 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1439 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1440 DOI 10.17487/RFC2784, March 2000, . 1443 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1444 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1445 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1446 . 1448 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1449 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1450 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1451 . 1453 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 1454 Virtualization Using Generic Routing Encapsulation", RFC 1455 7637, DOI 10.17487/RFC7637, September 2015, 1456 . 1458 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1459 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1460 eXtensible Local Area Network (VXLAN): A Framework for 1461 Overlaying Virtualized Layer 2 Networks over Layer 3 1462 Networks", RFC 7348, August 2014, . 1465 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 1466 10.17487/RFC2003, October 1996, . 1469 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1470 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1471 December 1998, . 1473 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 1474 Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC 1475 3948, DOI 10.17487/RFC3948, January 2005, . 1478 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1479 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 1480 10.17487/RFC6830, January 2013, . 1483 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1484 "Encapsulating MPLS in UDP", RFC 7510, DOI 1485 10.17487/RFC7510, April 2015, . 1488 [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for 1489 Generic UDP Encapsulation", draft-ietf-intarea-gue- 1490 extensions-06 1492 [IPTUN] Touch, J. and Townsley, M., "IP Tunnels in the Internet 1493 Architecture", draft-ietf-intarea-tunnels-10 1495 [IANA-PN] IANA, "Protocol Numbers", 1496 . 1498 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 1499 "Encapsulation of TCP and other Transport Protocols over 1500 UDP", draft-cheshire-tcp-over-udp-00 1502 [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: 1503 Generic Network Virtualization Encapsulation", draft-ietf- 1504 nvo3-geneve-10 1506 [UDPENCAP] Herbert, T., "UDP Encapsulation in Linux", 1507 1510 [MULTIQ] Herbert, T. and de Bruijn, W., "Scaling in the Linux 1511 Networking Stack", 1514 [CSUMOFF] Cree, E., "Checksum Offloads in the Linux Networking 1515 Stack", 1518 [SEGOFF] Duyck, A., "Segmentation Offloads in the Linux Networking 1519 Stack", 1522 Appendix A: NIC processing for GUE 1524 This appendix is informational and does not constitute a normative 1525 part of this document. 1527 This appendix provides some guidelines for Network Interface Cards 1528 (NICs) to implement common offloads and accelerations to support GUE. 1529 Note that most of this discussion is generally applicable to other 1530 methods of UDP based encapsulation. An overview of UDP based 1531 encapsulation and acceleration is in [UDPENCAP] 1533 A.1. Receive multi-queue 1535 Contemporary NICs support multiple receive descriptor queues (multi- 1536 queue) [MUTLIQ]. Multi-queue enables load balancing of network 1537 processing for a NIC across multiple CPUs. On packet reception, a NIC 1538 selects an appropriate queue for host processing. Receive Side 1539 Scaling (RSS) is a common method which uses the flow hash for a 1540 packet to index an indirection table where each entry stores a queue 1541 number. Flow Director and Accelerated Receive Flow Steering (aRFS) 1542 allow a host to program the queue that is used for a given flow which 1543 is identified either by an explicit five-tuple or by the flow's hash. 1545 GUE encapsulation is compatible with multi-queue NICs that support 1546 five-tuple hash calculation for UDP/IP packets as input to RSS. The 1547 flow entropy in the UDP source port ensures classification of the 1548 encapsulated flow even in the case that the outer source and 1549 destination addresses are the same for all flows (e.g. all flows are 1550 going over a single tunnel). 1552 By default, UDP RSS support is often disabled in NICs to avoid out- 1553 of-order reception that can occur when UDP packets are fragmented. As 1554 discussed is section 5.7, fragmentation of GUE packets is mostly 1555 avoided by fragmenting packets before entering a tunnel, GUE 1556 fragmentation, path MTU discovery in higher layer protocols, or 1557 operator adjusting MTUs. Other UDP traffic might not implement such 1558 procedures to avoid fragmentation, so enabling UDP RSS support in the 1559 NIC might be a considered tradeoff during configuration. 1561 A.2. Checksum offload 1563 Many NICs provide capabilities to calculate the standard ones 1564 complement checksum for packets in transmit or receive [CSUMOFF]. 1565 When using GUE encapsulation, there are at least two checksums that 1566 are of interest: the encapsulated packet's transport checksum, and 1567 the UDP checksum in the outer header. 1569 A.2.1. Transmit checksum offload 1571 NICs can provide a protocol agnostic method to offload the transmit 1572 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1573 GUE. In this method, the host provides checksum related parameters in 1574 a transmit descriptor for a packet. These parameters include the 1575 starting offset of data to checksum, the length of data to checksum, 1576 and the offset in the packet where the computed checksum is to be 1577 written. The host initializes the checksum field to a pseudo header 1578 checksum. 1580 In the case of GUE, the checksum for an encapsulated transport layer 1581 packet, a TCP packet for instance, can be offloaded by setting the 1582 appropriate checksum parameters. 1584 NICs typically can offload only one transmit checksum per packet, so 1585 simultaneously offloading both an inner transport packet's checksum 1586 and the outer UDP checksum is likely not possible. 1588 If an encapsulator is co-resident with a host, then checksum offload 1589 may be performed using remote checksum offload (RCO)[GUEEXTEN]. 1590 Remote checksum offload relies on NIC offload of the simple UDP/IP 1591 checksum which is commonly supported even in legacy devices. In 1592 remote checksum offload, the outer UDP checksum is set and the GUE 1593 header includes an option indicating the start and offset of the 1594 inner "offloaded" checksum. The inner checksum is initialized to the 1595 pseudo header checksum. When a decapsulator receives a GUE packet 1596 with the remote checksum offload option, it completes the offload 1597 operation by determining the packet checksum from the indicated start 1598 point to the end of the packet, and then adds this into the checksum 1599 field at the offset given in the option. Computing the checksum from 1600 the start to end of packet is efficient if checksum-complete is 1601 provided on the receiver. 1603 Another alternative when an encapsulator is co-resident with a host 1604 is to perform Local Checksum Offload (LCO) [CSUMOFF]. In this method, 1605 the inner transport layer checksum is offloaded and the outer UDP 1606 checksum can be deduced based on the fact that the portion of the 1607 packet covered by the inner transport checksum will sum to zero or at 1608 least the bitwise "not" of the inner pseudo header. 1610 A.2.2. Receive checksum offload 1612 GUE is compatible with NICs that perform a protocol agnostic receive 1613 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1614 NIC computes a ones complement checksum over all (or some predefined 1615 portion) of a packet. The computed value is provided to the host 1616 stack in the packet's receive descriptor. The host driver can use 1617 this checksum to "patch up" and validate any inner packet transport 1618 checksums, as well as the outer UDP checksum if it is non-zero. 1620 Many legacy NICs don't provide checksum-complete but instead provide 1621 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1622 in Linux). Usually, such validation is only done for simple TCP/IP or 1623 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1624 checksum-complete value for the UDP packet is the bitwise "not" of 1625 the pseudo header checksum. In this way, checksum-unnecessary can be 1626 converted to checksum-complete. So, if the NIC provides checksum- 1627 unnecessary for the outer UDP header in an encapsulation, checksum 1628 conversion can be done so that the checksum-complete value is derived 1629 and can be used by the stack to validate checksums in the 1630 encapsulated packet. 1632 A.3. Transmit Segmentation Offload 1634 Transmit Segmentation Offload (TSO) [SEGOFF] is a NIC feature where a 1635 host provides a large (>MTU size) TCP packet to the NIC, which in 1636 turn splits the packet into separate segments and transmits each one. 1637 This is useful to reduce CPU load on the host. 1639 The process of TSO can be generalized as: 1641 - Split the TCP payload into segments of size less than or equal 1642 to MTU. 1644 - For each created segment: 1646 1. Replicate the TCP header and all preceding headers of the 1647 original packet. 1649 2. Set payload length fields in any headers to reflect the 1650 length of the segment. 1652 3. Set TCP sequence number to correctly reflect the offset of 1653 the TCP data in the stream. 1655 4. Recompute and set any checksums that either cover the payload 1656 of the packet or cover header which was changed by setting a 1657 payload length. 1659 Following this general process, TSO can be extended to support TCP 1660 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1661 header, GUE header, inner IP header (if tunneling), and TCP headers 1662 are replicated. Any packet length header fields need to be set 1663 properly (including the length in the outer UDP header), and 1664 checksums need to be set correctly (including the outer UDP checksum 1665 if being used). 1667 To facilitate TSO with GUE, it is recommended that extension fields 1668 do not contain values that need to be updated on a per segment basis. 1669 For example, extension fields should not include checksums, lengths, 1670 or sequence numbers that refer to the payload. If the GUE header does 1671 not contain such fields then the TSO engine only needs to copy the 1672 bits in the GUE header when creating each segment and does not need 1673 to parse the GUE header. 1675 A.4. Large Receive Offload 1677 Large Receive Offload (LRO) [SEGOFF] is a NIC feature where received 1678 packets of a TCP connection are reassembled, or coalesced, in the NIC 1679 and delivered to the host as one large packet. This feature can 1680 reduce CPU utilization in the host. 1682 LRO requires significant protocol awareness to be implemented 1683 correctly and is difficult to generalize. Packets in the same flow 1684 need to be unambiguously identified. In the presence of tunnels or 1685 network virtualization, this may require more than a five-tuple match 1686 (for instance packets for flows in two different virtual networks may 1687 have identical five-tuples). Additionally, a NIC needs to perform 1688 validation over packets that are being coalesced, and needs to 1689 fabricate a single meaningful header from all the coalesced packets. 1691 The conservative approach to supporting LRO for GUE would be to 1692 assign packets to the same flow only if they have identical five- 1693 tuple and were encapsulated the same way. That is the outer IP 1694 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1695 and inner five tuple are all identical. 1697 Appendix B: Implementation considerations 1699 This appendix is informational and does not constitute a normative 1700 part of this document. 1702 B.1. Priveleged ports 1704 Using the source port to contain a flow entropy value disallows the 1705 security method of a receiver enforcing that the source port be a 1706 privileged port. Privileged ports are defined by some operating 1707 systems to restrict source port binding. Unix, for instance, 1708 considered port number less than 1024 to be privileged. 1710 Enforcing that packets are sent from a privileged port is widely 1711 considered an inadequate security mechanism and has been mostly 1712 deprecated. To approximate this behavior, an implementation could 1713 restrict a user from sending a packet destined to the GUE port 1714 without proper credentials. 1716 B.2. Setting flow entropy as a route selector 1718 An encapsulator generating flow entropy in the UDP source port could 1719 modulate the value to perform a type of multipath source routing. 1720 Assuming that networking switches perform ECMP based on the flow 1721 hash, a sender can affect the path by altering the flow entropy. For 1722 instance, a host can store a flow hash in its protocol control block 1723 (PCB) for an inner flow, and might alter the value upon detecting 1724 that packets are traversing a lossy path. Changing the flow entropy 1725 for a flow SHOULD be subject to hysteresis (at most once every thirty 1726 seconds) to limit the number of out of order packets. 1728 B.3. Hardware protocol implementation considerations 1730 Low level data path protocols, such as GUE, are often supported in 1731 high speed network device hardware. Variable length header (VLH) 1732 protocols like GUE are sometimes considered difficult to efficiently 1733 implement in hardware. In order to retain the important 1734 characteristics of an extensible and robust protocol, hardware 1735 vendors may practice "constrained flexibility". In this model, only 1736 certain combinations or protocol header parameterizations are 1737 implemented in the hardware fast path. Each such parameterization is 1738 fixed length so that the particular instance can be optimized as a 1739 fixed length protocol. In the case of GUE, this constitutes specific 1740 combinations of GUE flags, fields, and next protocol. The selected 1741 combinations would naturally be the most common cases which form the 1742 "fast path", and other combinations are assumed to take the "slow 1743 path". 1745 In time, the needs and requirements of a protocol may change which 1746 may manifest themselves as new parameterizations to be supported in 1747 the fast path. To allow this extensibility, a device practicing 1748 constrained flexibility should allow fast path parameterizations to 1749 be programmable. 1751 Authors' Addresses 1753 Tom Herbert 1754 Quantonium 1755 4701 Patrick Henry 1756 Santa Clara, CA 95054 1757 US 1759 Email: tom@herbertland.com 1761 Lucy Yong 1762 Independent 1763 Austin, TX 1764 US 1766 Email: lucy_yong@yahoo.com 1768 Osama Zia 1769 Microsoft 1770 1 Microsoft Way 1771 Redmond, WA 98029 1772 US 1774 Email: osamaz@microsoft.com