idnits 2.17.1 draft-ietf-intarea-gue-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 26, 2019) is 1644 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC2460' is mentioned on line 914, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC8200' is mentioned on line 1049, but not defined == Missing Reference: 'RFC768' is mentioned on line 1049, but not defined == Missing Reference: 'RFC2914' is mentioned on line 1063, but not defined == Missing Reference: 'MUTLIQ' is mentioned on line 1626, but not defined == Unused Reference: 'RFC8084' is defined on line 1514, but no explicit reference was found in the text == Unused Reference: 'MULTIQ' is defined on line 1600, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-10 == Outdated reference: A later version (-16) exists of draft-ietf-nvo3-geneve-10 Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Area WG T. Herbert 3 Internet-Draft Quantonium 4 Intended status: Standard track L. Yong 5 Expires April 28, 2020 Independent 6 O. Zia 7 Microsoft 8 October 26, 2019 10 Generic UDP Encapsulation 11 draft-ietf-intarea-gue-09 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on April 28, 2020. 36 Copyright Notice 38 Copyright (c) 2019 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of different IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 as tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . . 5 75 1.2. Terminology and acronyms . . . . . . . . . . . . . . . . . 6 76 1.3. Requirements Language . . . . . . . . . . . . . . . . . . . 7 77 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 8 78 2.1. GUE variant . . . . . . . . . . . . . . . . . . . . . . . . 8 79 3. Variant 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 9 81 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 10 82 3.2.1. Proto field . . . . . . . . . . . . . . . . . . . . . . 10 83 3.2.2. Ctype field . . . . . . . . . . . . . . . . . . . . . . 10 84 3.3. Flags and extension fields . . . . . . . . . . . . . . . . 12 85 3.3.1. Requirements . . . . . . . . . . . . . . . . . . . . . 12 86 3.3.2. Example GUE header with extension fields . . . . . . . 12 87 3.4. Surplus space . . . . . . . . . . . . . . . . . . . . . . . 13 88 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 13 89 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 13 90 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 14 91 4. Variant 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 92 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 15 93 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 16 94 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 95 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 17 96 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 17 97 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 18 98 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 18 99 5.4.1. Processing a received data message . . . . . . . . . . 18 100 5.4.2. Processing a received control message . . . . . . . . . 19 101 5.5. Middlebox inspection . . . . . . . . . . . . . . . . . . . 19 102 5.6. Router and switch operation . . . . . . . . . . . . . . . . 20 103 5.6.1. Connection semantics . . . . . . . . . . . . . . . . . 20 104 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 21 105 5.7. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 21 106 5.8. UDP Checksum Handling . . . . . . . . . . . . . . . . . . . 21 107 5.8.1. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 21 108 5.8.2. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 22 109 5.9. Congestion Considerations . . . . . . . . . . . . . . . . . 25 110 5.9.1. GUE tunnels . . . . . . . . . . . . . . . . . . . . . . 25 111 5.9.2 Transport layer encapsulation . . . . . . . . . . . . . 26 112 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 26 113 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 26 114 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 26 115 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 27 116 5.12. Negotiation of acceptable flags and extension fields . . . 28 117 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 28 118 6.1. Benefits of GUE . . . . . . . . . . . . . . . . . . . . . . 28 119 6.2. Comparison of GUE to other encapsulations . . . . . . . . . 29 120 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 31 121 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 31 122 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 31 123 8.2. GUE variant number . . . . . . . . . . . . . . . . . . . . 32 124 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 32 125 8.4 Control Type Experimental Identifiers . . . . . . . . . . . 32 126 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 33 127 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 34 128 10.1. Normative References . . . . . . . . . . . . . . . . . . . 34 129 10.2. Informative References . . . . . . . . . . . . . . . . . . 35 130 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 38 131 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 38 132 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 38 133 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 39 134 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 39 135 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 40 136 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 41 137 Appendix B: Implementation considerations . . . . . . . . . . . . 41 138 B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 41 139 B.2. Setting flow entropy as a route selector . . . . . . . . . 42 140 B.3. Hardware protocol implementation considerations . . . . . . 42 141 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43 143 1. Introduction 145 This specification describes Generic UDP Encapsulation (GUE) which is 146 a general method for encapsulating packets of arbitrary IP protocols 147 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 148 packets in UDP facilitates efficient transport across networks. 149 Networking devices widely provide protocol specific processing and 150 optimizations for UDP (as well as TCP) packets. Packets for atypical 151 IP protocols (those not usually parsed by networking hardware) can be 152 encapsulated in UDP packets to maximize deliverability and to 153 leverage flow specific mechanisms for routing and packet steering. 155 GUE provides an extensible header format for including optional data 156 in the encapsulation header. This data potentially covers items such 157 as a virtual networking identifier, security data for validating or 158 authenticating the GUE header, congestion control data, etc. 160 This document does not define any specific GUE extensions. [GUEEXTEN] 161 specifies a set of initial extensions. 163 1.1. Applicability 165 GUE is a network encapsulation protocol that encapsulates packets for 166 various IP protocols. Potential use cases include network tunneling, 167 multi-tenant network virtualization, tunneling for mobility, and 168 transport layer encapsulation. GUE is intended for deploying overlay 169 networks in public or private data center environments, as well as 170 providing a general tunneling mechanism usable in the Internet. 172 GUE is a UDP based encapsulation protocol transported over existing 173 IPv4 and IPv6 networks. Hence, as a UDP based protocol, GUE adheres 174 to the UDP usage guidelines as specified in [RFC8085]. Applicability 175 of these guidelines are dependent on the underlay IP network and the 176 nature of GUE payload protocol (for example TCP/IP or IP/Ethernet). 177 GUE may also be used to create IP tunnels, hence the guidelines in 178 [IPTUN] are applicable. 180 [RFC8085] outlines two applicability scenarios for UDP applications: 181 (1) general Internet and (2) a traffic-managed controlled environment 182 (TMCE). The requirements of [RFC8085] pertaining to deployment of a 183 UDP encapsulation protocol in these environments are applicable. 184 Section 5 provides the specifics for satisfying requirements of 185 [RFC8085]. It is the responsibility of the operator deploying GUE to 186 ensure that the necessary operational requirements are met for the 187 environment in which GUE is being deployed. 189 GUE has much of the same applicability and benefits as GRE-in-UDP 190 [RFC8086] that are afforded by UDP encapsulation protocols. GUE 191 offers the possibility of good performance for load-balancing 192 encapsulated IP traffic in transit networks using existing Equal-Cost 193 Multipath (ECMP) mechanisms that use a hash of the five-tuple of 194 source IP address, destination IP address, UDP/TCP source port, 195 UDP/TCP destination port, and protocol number. Encapsulating packets 196 in UDP enables use of the UDP source port to provide entropy to ECMP 197 hashing. A material difference between GUE and GRE-in-UDP is that the 198 payload of GUE is always an IP protocol whereas the payload in GRE- 199 in-UDP may be a non-IP protocol; this distinction is pertinent in the 200 discussion of congestion considerations (section 5.9) since IP 201 protocols are generally assumed to be congestion controlled. 203 In addition, GUE enables extending the use of atypical IP protocols 204 (those other than TCP and UDP) across networks that might otherwise 205 filter packets carrying those protocols. GUE may also be used with 206 connection oriented UDP semantics in order to facilitate traversal 207 through stateful firewalls and stateful NAT. 209 Additional motivation for the GUE protocol is provided in section 6. 211 1.2. Terminology and acronyms 213 GUE Generic UDP Encapsulation 215 GUE Header A variable length protocol header that is composed 216 of a primary four byte header and zero or more four 217 byte words of optional header data 219 GUE packet A UDP/IP packet that contains a GUE header and GUE 220 payload within the UDP payload 222 GUE variant A version of the GUE protocol or an alternate form 223 of a version 225 Encapsulator A network node that encapsulates packets in GUE 227 Decapsulator A network node that decapsulates and processes 228 packets encapsulated in GUE 230 Data message An encapsulated packet in a GUE payload that is 231 addressed to the protocol stack for an associated 232 protocol 234 Control message A formatted message in the GUE payload that is 235 implicitly addressed to the decapsulator to monitor 236 or control the state or behavior of a tunnel 238 Flags A set of bit flags in the primary GUE header 239 Extension field An optional field in a GUE header whose presence is 240 indicated by corresponding flag(s) 242 C-bit A single bit flag in the primary GUE header that 243 indicates whether the GUE packet contains a control 244 message or data message 246 Hlen A field in the primary GUE header that gives the 247 length of the GUE header 249 Proto/ctype A field in the GUE header that holds either the IP 250 protocol number for a data message or a type for a 251 control message 253 Outer IP header Refers to the outer most IP header or packet when 254 encapsulating a packet over IP 256 Inner IP header Refers to an encapsulated IP header when an IP 257 packet is encapsulated 259 Outer packet Refers to an encapsulating packet 261 Inner packet Refers to a packet that is encapsulated 263 TMCE A traffic-managed controlled environment, i.e., an 264 IP network that is traffic-engineered and/or 265 otherwise managed 267 1.3. Requirements Language 269 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 270 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 271 document are to be interpreted as described in [RFC2119]. 273 2. Base packet format 275 A GUE packet is comprised of a UDP packet whose payload is a GUE 276 header followed by a payload which is either an encapsulated packet 277 of some IP protocol or a control message such as an OAM (Operations, 278 Administration, and Management) message. A GUE packet has the general 279 format: 281 +-------------------------------+ 282 | | 283 | UDP/IP header | 284 | | 285 |-------------------------------| 286 | | 287 | GUE Header | 288 | | 289 |-------------------------------| 290 | | 291 | Encapsulated packet | 292 | or control message | 293 | | 294 +-------------------------------+ 296 The GUE header is variable length as determined by the presence of 297 optional extension fields. 299 2.1. GUE variant 301 The first two bits of the GUE header contain the GUE protocol variant 302 number. The variant number can indicate the version of the GUE 303 protocol as well as alternate forms of a version. 305 Variants 0 and 1 are described in this specification; variants 2 and 306 3 are reserved. 308 3. Variant 0 310 Variant 0 indicates version 0 of GUE. This variant defines a generic 311 extensible format to encapsulate packets by Internet protocol number. 313 3.1. Header format 315 The header format for variant 0 of GUE in UDP is: 317 0 1 2 3 318 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 320 | Source port | Destination port | | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 322 | Length | Checksum | | 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 324 | 0 |C| Hlen | Proto/ctype | Flags |\ 325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 326 | | GUE 327 ~ Extensions Fields (optional) ~ | 328 | | | 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 331 The contents of the UDP header are: 333 o Source port: If connection semantics (section 5.6.1) are applied 334 to an encapsulation, this is set to the local source port for 335 the connection. When connection semantics are not applied, the 336 source port is either set to a flow entropy value, as described 337 in section 5.11, or is set to the GUE assigned port number, 338 6080. 340 o Destination port: If connection semantics (section 5.6.1) are 341 applied to an encapsulation, this is set to the destination port 342 for the tuple. If connection semantics are not applied then the 343 destination port is set to the GUE assigned port number, 6080. 345 o Length: Canonical length of the UDP packet (length of UDP header 346 and payload). 348 o Checksum: Standard UDP checksum (handling is described in 349 section 5.8). 351 The GUE header consists of: 353 o Variant: 0 indicates GUE protocol version 0 with a header. 355 o C: C-bit: When set indicates a control message. When not set 356 indicates a data message. 358 o Hlen: Length in 32-bit words of the GUE header, including 359 optional extension fields but not the first four bytes of the 360 header. Computed as (header_len - 4) / 4, where header_len is 361 the total header length in bytes. All GUE headers are a multiple 362 of four bytes in length. Maximum header length is 128 bytes. 364 o Proto/ctype: When the C-bit is set, this field contains a 365 control message type for the payload (section 3.2.2). When the 366 C-bit is not set, the field holds the Internet protocol number 367 for the encapsulated packet in the payload (section 3.2.1). The 368 control message or encapsulated packet begins at the offset 369 provided by Hlen. 371 o Flags: Header flags that may be allocated for various purposes 372 and may indicate the presence of extension fields. Undefined 373 header flag bits MUST be set to zero on transmission. 375 o Extension Fields: Optional fields whose presence is indicated by 376 corresponding flags. 378 3.2. Proto/ctype field 380 The proto/ctype fields either contains an Internet protocol number 381 (when the C-bit is not set) or GUE control message type (when the C- 382 bit is set). 384 3.2.1. Proto field 386 When the C-bit is not set, the proto/ctype field MUST contain an IANA 387 Internet Protocol Number [IANA-PN]. The protocol number is 388 interpreted relative to the IP protocol that encapsulates the UDP 389 packet (i.e. protocol of the outer IP header). The protocol number 390 serves as an indication of the type of the next protocol header which 391 is contained in the GUE payload at the offset indicated in Hlen. 393 IP protocol number 59 ("No next header") can be set to indicate that 394 the GUE payload does not begin with the header of an IP protocol. 395 This would be the case, for instance, if the GUE payload were a 396 fragment when performing GUE level fragmentation. The interpretation 397 of the payload is performed through other means such as flags and 398 extension fields, and nodes MUST NOT parse packets based on the IP 399 protocol number in this case. 401 3.2.2. Ctype field 403 When the C-bit is set, the proto/ctype field MUST be set to a valid 404 control message type. Control messages will be defined in an IANA 405 registry. Type 0 and type 255 are specified in this document, type 1 406 through 254 are reserved and may be defined in standards. 408 Type 0 indicates that the GUE payload is a control message, or part 409 of a control message that cannot be correctly parsed or interpreted 410 without additional context. This might be the case when the payload 411 is a fragment of a control message, where only the reassembled packet 412 can be interpreted as a control message. 414 Type 255 is reserved for experimentation. When this control type is 415 set the first four bytes of the GUE payload (control message) are an 416 experiment identifier (ExId). The ExID is used to differentiate 417 experiments (similar to the experimental identifier defined for TCP 418 options in [RFC6994]). A control message of type 255 MUST include an 419 ExID. 421 The format of a GUE control message with the experimental control 422 message type is: 424 0 1 2 3 425 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 427 | Source port | Destination port | | 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 429 | Length | Checksum | | 430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 431 | 0 |1| Hlen | 255 | Flags |\ 432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 433 | | GUE 434 ~ Extensions Fields (optional) ~ | 435 | | | 436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 437 | ExID | 438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 | | 440 ~ Control message ~ 441 | | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 444 Note that the ExID is not part of the GUE header, it is in the 445 payload. In particular, the ExID is not accounted for in the GUE 446 Hlen. 448 ExIDs are selected at design time, when the protocol designer first 449 implements or specifies the experimental control message. An ExID is 450 thirty-two bits. The value is stored in the header in network- 451 standard (big-endian) byte order. 453 ExIDs are registered with IANA using "first come, first served" 454 (FCFS) priority. ExIDs MUST be unique. 456 3.3. Flags and extension fields 458 Flags and associated extension fields are the primary mechanism of 459 extensibility in GUE. As mentioned in section 3.1, GUE header flags 460 indicate the presence of optional extension fields in the GUE header. 461 [GUEEXTEN] defines an initial set of GUE extensions. 463 3.3.1. Requirements 465 There are sixteen flag bits in the GUE header. Flags may indicate 466 presence of extension fields. The size of an extension field 467 indicated by a flag MUST be fixed in the specification of the flag. 469 Flags can be grouped together to allow different lengths for an 470 extension field. For example, if two flag bits are grouped, a field 471 can possibly be three different lengths-- that is bit value of 00 472 indicates no field present; 01, 10, and 11 indicate three possible 473 lengths for the field. Regardless of how flag bits are grouped, the 474 lengths and offsets of extension fields corresponding to a set of 475 flags MUST be well defined and deterministic. 477 Extension fields are placed in order of the flags. New flags are to 478 be allocated from high to low order bit contiguously without holes. 479 Flags allow random access, for instance to inspect the field 480 corresponding to the Nth flag bit, an implementation only considers 481 the previous N-1 flags to determine the offset. Flags after the Nth 482 flag are not pertinent in calculating the offset of the field for the 483 Nth flag. Random access of flags and fields permits processing of 484 optional extensions in an order that is independent of their position 485 in the packet. 487 Flags (or grouped flags) are idempotent such that new flags MUST NOT 488 cause reinterpretation of old flags. Also, new flags MUST NOT alter 489 interpretation of other elements in the GUE header nor how the 490 message is parsed (for instance, in a data message the proto/ctype 491 field always holds an IP protocol number as an invariant). 493 The set of available flags can be extended in the future by defining 494 a "flag extensions bit" that refers to a field containing a new set 495 of flags. 497 3.3.2. Example GUE header with extension fields 499 An example GUE header for a data message encapsulating an IPv4 packet 500 and containing the Group Identifier and Security extension fields 501 (both defined in [GUEEXTEN]) is shown below: 503 0 1 2 3 504 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 506 | 0 |0| 3 | 4 |1|0 0 1| 0 | 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 | Group Identifier | 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 510 | | 511 + Security + 512 | | 513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 In the above example, the first flag bit is set which indicates that 516 the Group Identifier extension is present which is a 32 bit field. 517 The second through fourth bits of the flags are grouped flags that 518 indicate the presence of a Security field with seven possible sizes. 519 In this example 001 indicates a sixty-four bit security field. 521 3.4. Surplus space 523 The length of a GUE header, as indicated in the GUE Hlen field, may 524 exceed the space consumed by optional extensions in a packet. The 525 space between the end of the last optional field and the end of the 526 header is termed the "surplus space". 528 Surplus space is reserved per this specification and uses may be 529 defined in future specifications. If a node receives a GUE packet 530 with non-zero length of surplus space then it MUST NOT attempt to 531 interpret the data in the surplus space. For purposes of transforms 532 across the header, such as optional integrity check over the header, 533 the surplus space is considered to be part of the GUE header and 534 would be included in computation. 536 3.5. Message types 538 There are two message types in GUE variant 0: control messages and 539 data messages. 541 3.5.1. Control messages 543 Control messages carry formatted data that are implicitly addressed 544 to the decapsulator to monitor or control the state or behavior of a 545 tunnel (OAM). For instance, an echo request and corresponding echo 546 reply message can be defined to test for liveness. 548 Control messages are indicated in the GUE header when the C-bit is 549 set. The payload is interpreted as a control message with type 550 specified in the proto/ctype field. The format and contents of the 551 control message are indicated by the type and can be variable length. 553 Other than interpreting the proto/ctype field as a control message 554 type, the meaning and semantics of the rest of the elements in the 555 GUE header are the same as that of data messages. Forwarding and 556 routing of control messages should be the same as that of a data 557 message with the same outer IP and UDP header; this ensures that 558 control messages can be created that follow the same path through the 559 network as data messages. 561 3.5.2. Data messages 563 Data messages carry encapsulated packets that are addressed to the 564 protocol stack for the associated protocol. Data messages are a 565 primary means of encapsulation and can be used to create tunnels for 566 overlay networks. 568 Data messages are indicated in the GUE header when the C-bit is not 569 set. The payload of a data message is interpreted as an encapsulated 570 packet of an Internet protocol indicated in the proto/ctype field. 571 The encapsulated packet immediately follows the GUE header. 573 4. Variant 1 575 Variant 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. 576 In this variant there is no GUE header, a UDP packet carries an IP 577 packet. The first two bits of the UDP payload are the GUE variant 578 field and coincide with the first two bits of the version number in 579 the IP header. The first two version bits of IPv4 and IPv6 are 01, so 580 we use GUE variant 1 for direct IP encapsulation which makes the two 581 bits of GUE variant to also be 01. 583 This technique is effectively a means to compress out the GUE version 584 0 header when encapsulating IPv4 or IPv6 packets and there are no 585 flags or extension fields. This method is compatible to use on the 586 same port number as packets with the GUE header (GUE variant 0 587 packets). This technique saves encapsulation overhead on costly links 588 for the common use of IP encapsulation, and also obviates the need to 589 allocate a separate UDP port number for IP-over-UDP encapsulation. 591 4.1. Direct encapsulation of IPv4 593 The format for encapsulating IPv4 directly in UDP is: 595 0 1 2 3 596 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 598 | Source port | Destination port | | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 600 | Length | Checksum | | 601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 602 |0|1|0|0| IHL |Type of Service| Total Length | 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 | Identification |Flags| Fragment Offset | 605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 606 | Time to Live | Protocol | Header Checksum | 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 | Source IPv4 Address | 609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 610 | Destination IPv4 Address | 611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 The UDP fields are set in a similar manner as described in section 614 3.1. 616 Note that the 0100 value in the first four bits of the UDP payload 617 expresses both the GUE variant as 1 (bits 01) and IP version as 4 618 (bits 0100). 620 4.2. Direct encapsulation of IPv6 622 The format for encapsulating IPv6 directly in UDP is demonstrated 623 below: 625 0 1 2 3 626 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 628 | Source port | Destination port | | 629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 630 | Length | Checksum | | 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 632 |0|1|1|0| Traffic Class | Flow Label | 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 634 | Payload Length | NextHdr | Hop Limit | 635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 | | 637 + + 638 | | 639 + Source IPv6 Address + 640 | | 641 + + 642 | | 643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 | | 645 + + 646 | | 647 + Destination IPv6 Address + 648 | | 649 + + 650 | | 651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 653 The UDP fields are set in a similar manner as described in section 654 3.1. 656 Note that the 0110 value in the first four bits of the the UDP 657 payload expresses both the GUE variant as 1 (bits 01) and IP version 658 as 6 (bits 0110). 660 5. Operation 662 The figure below illustrates the use of GUE encapsulation between two 663 hosts. Host 1 is sending packets to Host 2. An encapsulator performs 664 encapsulation of packets from Host 1. These encapsulated packets 665 traverse the network as UDP packets. At the decapsulator, packets are 666 decapsulated and sent on to Host 2. Packet flow in the reverse 667 direction need not be symmetric; for example, the reverse path might 668 not use GUE or any other form of encapsulation. 670 +---------------+ +---------------+ 671 | | | | 672 | Host 1 | | Host 2 | 673 | | | | 674 +---------------+ +---------------+ 675 | ^ 676 V | 677 +---------------+ +---------------+ +---------------+ 678 | | | | | | 679 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 680 | | | Network | | | 681 +---------------+ +---------------+ +---------------+ 683 The encapsulator and decapsulator may be co-resident with the 684 corresponding hosts, or may be on separate nodes in the network. 686 5.1. Network tunnel encapsulation 688 Network tunneling can be achieved by encapsulating layer 2 or layer 3 689 packets. In this case, the encapsulator and decapsulator nodes are 690 the tunnel endpoints. These could be routers that provide network 691 tunnels on behalf of communicating hosts. 693 5.2. Transport layer encapsulation 695 When encapsulating layer 4 packets, the encapsulator and decapsulator 696 should be co-resident with the hosts. In this case, the encapsulation 697 headers are inserted between the IP header and the transport packet. 698 The addresses in the IP header refer to both the endpoints of the 699 encapsulation and the endpoints for terminating the encapsulated 700 transport protocol. Note that the transport layer ports in the 701 encapsulated packet are independent of the UDP ports in the outer 702 packet. 704 5.3. Encapsulator operation 706 Encapsulators create GUE data messages, set the fields of the UDP 707 header, set flags and optional extension fields in the GUE header, 708 and forward packets to a decapsulator. 710 An encapsulator can be an end host originating the packets of a flow, 711 or can be a network device performing encapsulation on behalf of 712 hosts (routers implementing tunnels for instance). In either case, 713 the intended target (decapsulator) is indicated by the outer 714 destination IP address and destination port in the UDP header. 716 If an encapsulator is tunneling packets, that is encapsulating 717 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 718 tunnel mode), it SHOULD follow standard conventions for tunneling one 719 protocol over another. For instance, if an IP packet is being 720 encapsulated in GUE then diffserv interaction [RFC2983] and ECN 721 propagation for tunnels [RFC6040] SHOULD be followed. 723 5.4. Decapsulator operation 725 A decapsulator performs decapsulation of GUE packets. A decapsulator 726 is addressed by the outer destination IP address and UDP destination 727 port of a GUE packet. The decapsulator validates packets, including 728 fields of the GUE header. 730 If a decapsulator receives a GUE packet with an unsupported variant, 731 unknown flag, bad header length (too small for included extension 732 fields), unknown control message type, bad protocol number, an 733 unsupported payload type, or an otherwise malformed header, it MUST 734 drop the packet. Such events MAY be logged subject to configuration 735 and rate limiting of logging messages. Note that set flags in a GUE 736 header that are unknown to a decapsulator MUST NOT be ignored. If a 737 GUE packet is received by a decapsulator with unknown flags, the 738 packet MUST be dropped. 740 5.4.1. Processing a received data message 742 If a valid data message is received, the UDP header and GUE header 743 are (logically) removed from the packet. The outer IP header remains 744 intact and the next protocol in the IP header is set to the protocol 745 from the proto field in the GUE header. The resulting packet is then 746 resubmitted into the protocol stack to process the packet as though 747 it was received with the protocol indicated in the GUE header. 749 As an example, consider that a data message is received where GUE 750 encapsulates an IPv4 packet using GUE variant 0. In this case proto 751 field in the GUE header is set to 4 for IPv4 encapsulation: 753 +-------------------------------------+ 754 | IP header (next proto = 17,UDP) | 755 |-------------------------------------| 756 | UDP | 757 |-------------------------------------| 758 | GUE (proto = 4,IPv4 encapsulation) | 759 |-------------------------------------| 760 | IPv4 header and packet | 761 +-------------------------------------+ 763 The receiver removes the UDP and GUE headers and sets the next 764 protocol field in the IP packet to 4, which is derived from the GUE 765 proto field. The resultant packet would have the format: 767 +-------------------------------------+ 768 | IP header (next proto = 4,IPv4) | 769 |-------------------------------------| 770 | IPv4 header and packet | 771 +-------------------------------------+ 773 This packet is then resubmitted into the protocol stack to be 774 processed as an IPv4 encapsulated packet. 776 5.4.2. Processing a received control message 778 If a valid control message is received, the packet MUST be processed 779 as a control message. The specific processing to be performed depends 780 on the value in the ctype field of the GUE header. 782 If an experimental control message is received (ctype is 255) then 783 the ExID MUST be processed. The ExID is used to identify the 784 particular experimental control message. 786 If a receiver does not recognize a control message type, or an 787 experimental identifier in an experimental control message, then the 788 packet MUST be dropped and and error message MAY be logged. If a GUE 789 control message is received with control type 255 and the length of 790 the GUE payload is less than four, the size of the ExId, then the 791 packet MUST be dropped and an error message MAY be logged. 793 5.5. Middlebox inspection 795 A middlebox MAY inspect a GUE header. A middlebox MUST NOT modify a 796 GUE header or UDP payload. 798 To inspect a GUE header, a middlebox needs to identify GUE packets. 799 The obvious method is to match the destination UDP port number to be 800 the GUE port number (i.e. 6080). Per [RFC7605], transport port 801 numbers only have meaning at the endpoints of communications, so 802 inferring the type of a UDP payload based on port number may be 803 incorrect. Middleboxes MUST NOT take any action that would have 804 harmful side effects if a UDP packet were misinterpreted as being a 805 GUE packet. In particular, a middlebox MUST NOT modify a UDP payload 806 based on inferring the payload type from the port number lest the 807 middlebox could cause silent data corruption. 809 A middlebox MAY interpret some flags and extension fields of the GUE 810 header for classification purposes, but is not required to understand 811 any of the flags or extension fields in GUE packets. A middlebox MUST 812 NOT drop a GUE packet merely because there are flags unknown to it. 813 Similarly, a middlebox MUST NOT arbitrarily filter packets based on 814 GUE flags or extension fields that are present or not present. The 815 header length in the GUE header allows a middlebox to inspect the 816 payload packet without needing to parse the flags or extension 817 fields. 819 5.6. Router and switch operation 821 Routers and switches SHOULD forward GUE packets as standard UDP/IP 822 packets. The outer five-tuple should contain sufficient information 823 to perform flow classification corresponding to the flow of the inner 824 packet. A router does not normally need to parse a GUE header, and 825 none of the flags or extension fields in the GUE header are expected 826 to affect routing. In cases where the outer five-tuple does not 827 provide sufficient entropy for flow classification, for instance UDP 828 ports are fixed to provide connection semantics (section 5.6.1), then 829 the encapsulated packet MAY be parsed to determine flow entropy. 831 A router MUST NOT modify a GUE header or payload when forwarding a 832 packet. It MAY encapsulate a GUE packet in another GUE packet, for 833 instance to implement a network tunnel (i.e. by encapsulating an IP 834 packet with a GUE payload in another IP packet as a GUE payload). In 835 this case, the router takes the role of an encapsulator, and the 836 corresponding decapsulator is the logical endpoint of the tunnel. 837 When encapsulating a GUE packet within another GUE packet, there are 838 no provisions to automatically copy flags or fields to the outer GUE 839 header. Each layer of encapsulation is considered independent. 841 5.6.1. Connection semantics 843 A middlebox might infer bidirectional connection semantics for a UDP 844 flow. For instance, a stateful firewall might create a five-tuple 845 rule to match flows on egress, and a corresponding five-tuple rule 846 for matching ingress packets where the roles of source and 847 destination are reversed for the IP addresses and UDP port numbers. 848 To operate in this environment, a GUE tunnel should be configured to 849 assume connected semantics defined by the UDP five tuple and the use 850 of GUE encapsulation needs to be symmetric between both endpoints. 851 The source port set in the UDP header MUST be the destination port 852 the peer would set for replies. In this case, the UDP source port for 853 a tunnel would be a fixed value and not set to be flow entropy. 855 The selection of whether to make the UDP source port fixed or set to 856 a flow entropy value for each packet sent SHOULD be configurable for 857 a tunnel. The default MUST be to set the flow entropy value in the 858 UDP source port. 860 5.6.2. NAT 862 IP address and port translation can be performed on the UDP/IP 863 headers adhering to the requirements for NAT (Network Address 864 Translation) with UDP [RFC4787]. In the case of stateful NAT, 865 connection semantics MUST be applied to a GUE tunnel as described in 866 section 5.6.1. GUE endpoints MAY also invoke STUN [RFC5389] or ICE 867 [RFC5245] to manage NAT port mappings for encapsulations. 869 5.7. MTU and fragmentation 871 Standard conventions for handling of MTU (Maximum Transmission Unit) 872 and fragmentation in conjunction with networking tunnels 873 (encapsulation of layer 2 or layer 3 packets) SHOULD be followed. 874 Details are described in MTU and Fragmentation Issues with In-the- 875 Network Tunneling [RFC4459]. 877 If a packet is fragmented before encapsulation in GUE, all the 878 related fragments MUST be encapsulated using the same UDP source 879 port. An operator SHOULD set MTU to account for encapsulation 880 overhead and reduce the likelihood of fragmentation. 882 Alternative to IP fragmentation, the GUE fragmentation extension can 883 be used. GUE fragmentation is described in [GUEEXTEN]. 885 5.8. UDP Checksum Handling 887 5.8.1. UDP Checksum with IPv4 889 For UDP in IPv4, when a non-zero UDP checksum is used, the UDP 890 checksum MUST be processed as specified in [RFC0768] and [RFC1122] 891 for both transmit and receive. The IPv4 header includes a checksum 892 that protects against misdelivery of the packet due to corruption of 893 IP addresses. The UDP checksum potentially provides protection 894 against corruption of the UDP header, GUE header, and GUE payload. 895 Disabling the use of checksums is a deployment consideration that 896 should take into account the risk and effects of packet corruption. 898 When a decapsulator receives a packet, the UDP checksum field MUST be 899 processed. If the UDP checksum is non-zero, the decapsulator MUST 900 verify the checksum before accepting the packet. By default, a 901 decapsulator SHOULD accept UDP packets with a zero checksum. A node 902 MAY be configured to disallow zero checksums per [RFC1122]; this may 903 be done selectively, for instance by disallowing zero checksums from 904 certain hosts that are known to be sending over paths subject to 905 packet corruption. If verification of a non-zero checksum fails, a 906 decapsulator lacks the capability to verify a non-zero checksum, or a 907 packet with a zero checksum was received and the decapsulator is 908 configured to disallow, the packet MUST be dropped and an event MAY 909 be logged. 911 5.8.2. UDP Checksum with IPv6 913 For UDP in IPv6, the UDP checksum MUST be processed as specified in 914 [RFC0768] and [RFC2460] for both transmit and receive. 916 When UDP is used over IPv6, the UDP checksum is relied upon to 917 protect both the IPv6 and UDP headers from corruption. As such, by 918 default a GUE encapsulator MUST use UDP checksums. 920 [GUEEXTEN] specifies a GUE checksum option that includes a pseudo 921 header containing the IP addresses. An encapsulator MAY use zero-UDP 922 checksums if it uses the GUE checksum. A non-zero UDP checksum and 923 the GUE checksum SHOULD NOT be used simultaneously in a packet since 924 that would be redundant. 926 When deployed in a TMCE, a GUE encapsulator MAY be configured to use 927 UDP zero-checksum mode and no GUE checksum if the traffic-managed 928 controlled environment or a set of closely cooperating traffic- 929 managed controlled environments (such as by network operators who 930 have agreed to work together in order to jointly provide specific 931 services) meet at least one of the following conditions: 933 a. It is known (perhaps through knowledge of equipment types and 934 lower-layer checks) that packet corruption is exceptionally 935 unlikely and where the operator is willing to take the risk of 936 undetected packet corruption. 938 b. It is judged through observational measurements (perhaps of 939 historic or current traffic flows that use a non-zero checksum) 940 that the level of packet corruption is tolerably low and where 941 the operator is willing to take the risk of undetected packet 942 corruption. 944 c. Carrying applications that are tolerant of misdelivered or 945 corrupted packets (perhaps through higher-layer checksum, 946 validation, and retransmission or transmission redundancy) 947 where the operator is willing to rely on the applications using 948 GUE to survive any corrupt packets. 950 The following requirements apply to encapsulators deployed in a TMCE 951 environment that use UDP zero-checksum mode: 953 a. Use of the UDP checksum with IPv6 MUST be the default 954 configuration for all communications. 956 b. The GUE implementation MUST comply with all requirements 957 specified in Section 4 of [RFC6936] and with requirement 1 958 specified in Section 5 of [RFC6936]. 960 c. A decapsulator SHOULD only allow the use of UDP zero-checksum 961 mode for IPv6 on a single received UDP Destination Port, 962 regardless of the encapsulator. The motivation for this 963 requirement is possible corruption of the UDP Destination Port, 964 which may cause packet delivery to the wrong UDP port. If that 965 other UDP port requires the UDP checksum, the misdelivered 966 packet will be discarded. 968 d. It is RECOMMENDED that the UDP zero-checksum mode for IPv6 is 969 only enabled for certain selected source addresses. The 970 decapsulator MUST check that the source and destination IPv6 971 addresses in a received packets are permitted by configuration 972 to use UDP zero-checksum mode and discard any packet for which 973 this check fails. 975 e. The tunnel encapsulator SHOULD use different IPv6 addresses for 976 each GUE communication (tunnel or transport flow) that uses UDP 977 zero-checksum mode, regardless of the decapsulator, in order to 978 strengthen the decapsulator's check of the IPv6 source address 979 (i.e., the same IPv6 source address SHOULD NOT be used with 980 more than one IPv6 destination address, independent of whether 981 that destination address is a unicast or multicast address). 982 When this is not possible, it is RECOMMENDED to use each source 983 IPv6 address for as few GUE communications that use UDP zero- 984 checksum mode as is feasible. 986 f. When any middlebox exists on the path of GUE communication, it 987 is RECOMMENDED to use the default mode, i.e., use UDP checksum, 988 to reduce the chance that the encapsulated packets will be 989 dropped. 991 g. Any middlebox that allows the UDP zero-checksum mode for IPv6 992 MUST comply with requirements 1 and 8-10 in Section 5 of 993 [RFC6936]. 995 h. Measures SHOULD be taken to prevent IPv6 traffic with zero UDP 996 checksums from "escaping" to the general Internet; see Section 997 5.9 for examples of such measures. 999 i. IPv6 traffic with zero UDP checksums MUST be actively monitored 1000 for errors by the network operator. For example, the operator 1001 may monitor Ethernet-layer packet error rates. 1003 j. If a packet with a non-zero checksum is received, the checksum 1004 MUST be verified before accepting the packet. This is 1005 regardless of whether the tunnel encapsulator and decapsulator 1006 have been configured with UDP zero-checksum mode. 1008 The above requirements do not change either the requirements 1009 specified in [RFC8200] as modified by [RFC6935] or the requirements 1010 specified in [RFC6936]. 1012 The requirement to check the source IPv6 address in addition to the 1013 destination IPv6 address and the strong recommendation against reuse 1014 of source IPv6 addresses among GUE communications collectively 1015 provide some mitigation for the absence of UDP checksum coverage of 1016 the IPv6 header. A traffic-managed controlled environment that 1017 satisfies at least one of three conditions listed at the beginning of 1018 this section provides additional assurance. 1020 GUE packets are suitable for transmission over lower layers in the 1021 traffic-managed controlled environments that are allowed by the 1022 exceptions stated above, and the rate of corruption of the inner IP 1023 packet on such networks is not expected to increase by comparison to 1024 traffic that is not encapsulated in UDP. For these reasons, GUE does 1025 not provide an additional integrity check except when GUE checksum 1026 [GUEEXTEN] is used when UDP zero-checksum mode is used with IPv6, and 1027 this design is in accordance with requirements 2, 3, and 5 specified 1028 in Section 5 of [RFC6936]. 1030 Generic UDP Encapsulation does not accumulate incorrect transport- 1031 layer state as a consequence of GUE header corruption. A corrupt GUE 1032 packet may result in either packet discard or packet forwarding 1033 without accumulation of GUE state. Active monitoring of GUE traffic 1034 for errors is REQUIRED, as the occurrence of errors will result in 1035 some accumulation of error information outside the protocol for 1036 operational and management purposes. This design is in accordance 1037 with requirement 4 specified in Section 5 of [RFC6936]. 1039 The remaining requirements specified in Section 5 of [RFC6936] are 1040 not applicable to GUE. Requirements 6 and 7 do not apply because GUE 1041 does not include a control feedback mechanism. Requirements 8-10 are 1042 middlebox requirements that do not apply to GUE tunnel endpoints. 1044 (See Section 5.5 for further middlebox discussion.) 1046 In summary, a TMCE GUE tunnel is allowed to use UDP zero- checksum 1047 mode for IPv6 when the conditions and requirements stated above are 1048 met. Otherwise, the UDP checksum needs to be used for IPv6 as 1049 specified in [RFC768] and [RFC8200]. Use of GUE checksum is 1050 RECOMMENDED when the UDP checksum is not used. 1052 5.9. Congestion Considerations 1054 This section describes congestion considerations for GUE tunnels 1055 (Layer 2 and Layer 3 encapsulation) and transport layer encapsulation 1056 (Layer 4 protocol over GUE). 1058 5.9.1. GUE tunnels 1060 Section 3.1.9 of [RFC8085] discusses the congestion considerations 1061 for design and use of UDP tunnels; this is important because other 1062 flows could share the path with one or more UDP tunnels, 1063 necessitating congestion control [RFC2914] to avoid destructive 1064 interference. 1066 Congestion has potential impacts both on the rest of the network 1067 containing a UDP tunnel and on the traffic flows using the UDP 1068 tunnels. These impacts depend upon what sort of traffic is carried 1069 over the tunnel, as well as the path of the tunnel. The GUE protocol 1070 does not provide any congestion control and GUE UDP packets are 1071 regular UDP packets. Therefore, a GUE tunnel MUST NOT be deployed to 1072 carry non-congestion-controlled traffic over the Internet [RFC8085]. 1074 Within a TMCE network, GUE tunnels are appropriate for carrying 1075 traffic that is not known to be congestion controlled. For example, a 1076 GUE tunnel may be used to carry Multiprotocol Label Switching (MPLS) 1077 traffic such as pseudowires or VPNs where specific bandwidth 1078 guarantees are provided to each pseudowire or VPN. In such cases, 1079 operators of TMCE networks avoid congestion by careful provisioning 1080 of their networks, rate-limiting of user data traffic, and traffic 1081 engineering according to path capacity. 1083 When a GUE tunnel carries traffic that is not known to be congestion 1084 controlled in a TMCE network, the tunnel MUST be deployed entirely 1085 within that network, and measures SHOULD be taken to prevent the GUE 1086 traffic from "escaping" the network to the general Internet. Examples 1087 of such measures are: 1089 o physical or logical isolation of the links carrying GUE from the 1090 general Internet, 1092 o deployment of packet filters that block the UDP ports assigned 1093 for GUE, and 1095 o imposition of restrictions on GUE traffic by software tools used 1096 to set up GUE tunnels between specific end systems (as might be 1097 used within a single data center) or by tunnel ingress nodes for 1098 tunnels that don't terminate at end systems. 1100 5.9.2 Transport layer encapsulation 1102 If GUE encapsulates a transport layer protocol, such as TCP, it is 1103 expected that the transport layer or application layer properly 1104 implements congestion control or avoidance. In the case that UDP is 1105 encapsulated, the application is expected to provide congestion 1106 control as specified in [RFC8085]. 1108 5.10. Multicast 1110 GUE packets can be multicast to decapsulators using a multicast 1111 destination address in the outer IP header. Each receiving host will 1112 decapsulate the packet independently following normal decapsulator 1113 operations. The receiving decapsulators need to agree on the same set 1114 of GUE parameters and properties; how such an agreement is reached is 1115 outside the scope of this document. 1117 GUE allows encapsulation of unicast, broadcast, or multicast traffic. 1118 Flow entropy (the value in the UDP source port) can be generated from 1119 the header of encapsulated unicast or broadcast/multicast packets at 1120 an encapsulator. The mapping mechanism between the encapsulated 1121 multicast traffic and the multicast capability in the IP network is 1122 transparent and independent of the encapsulation and is otherwise 1123 outside the scope of this document. 1125 5.11. Flow entropy for ECMP 1127 A major objective of using GUE is that a network device can perform 1128 flow classification corresponding to the flow of the inner 1129 encapsulated packet based on the contents of the outer headers. 1131 5.11.1. Flow classification 1133 When a packet is encapsulated with GUE and connection semantics are 1134 not applied, the source port in the outer UDP packet is set to a flow 1135 entropy value that corresponds to the flow of the inner packet. When 1136 a device computes a five-tuple hash on the outer UDP/IP header of a 1137 GUE packet, the resultant value classifies the packet per its inner 1138 flow. 1140 Examples of deriving flow entropy for encapsulation are: 1142 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 1143 instance, the flow entropy could be based on the canonical five- 1144 tuple hash of the inner packet. 1146 o If the encapsulated packet is an AH transport mode packet with 1147 TCP as next header, the flow entropy could be a hash over a 1148 three-tuple: TCP protocol and TCP ports of the encapsulated 1149 packet. 1151 o If a node is encrypting a packet using ESP tunnel mode and GUE 1152 encapsulation, the flow entropy could be based on the contents 1153 of the clear-text packet. For instance, a canonical five-tuple 1154 hash for a TCP/IP packet could be used. 1156 [RFC6438] discusses methods to compute and set flow entropy value for 1157 IPv6 flow labels, such methods can also be used to create flow 1158 entropy values for GUE. 1160 5.11.2. Flow entropy properties 1162 The flow entropy is the value set in the UDP source port of a GUE 1163 packet. Flow entropy in the UDP source port SHOULD adhere to the 1164 following properties: 1166 o The value set in the source port is within the ephemeral port 1167 range (49152 to 65535 [RFC6335]). Since the high order two bits 1168 of the port are set to one, this provides fourteen bits of 1169 entropy for the value. 1171 o The flow entropy has a uniform distribution across encapsulated 1172 flows. 1174 o An encapsulator MAY occasionally change the flow entropy used 1175 for an inner flow per its discretion (for security, route 1176 selection, etc). To avoid thrashing or flapping the value, the 1177 flow entropy used for a flow SHOULD NOT change more than once 1178 every thirty seconds (or a configurable value). 1180 o Decapsulators, or any networking devices, SHOULD NOT attempt to 1181 interpret flow entropy as anything more than an opaque value. 1182 Neither should they attempt to reproduce the hash calculation 1183 used by an encapasulator in creating a flow entropy value. They 1184 MAY use the value to match further receive packets for steering 1185 decisions, but MUST NOT assume that the hash uniquely or 1186 permanently identifies a flow. 1188 o Input to the flow entropy calculation is not restricted to ports 1189 and addresses; input could include the flow label from an IPv6 1190 packet, SPI from an ESP packet, or other flow related state in 1191 the encapsulator that is not necessarily conveyed in the packet. 1193 o The assignment function for flow entropy SHOULD be randomly 1194 seeded to mitigate denial of service attacks. The seed SHOULD be 1195 changed periodically. 1197 5.12. Negotiation of acceptable flags and extension fields 1199 An encapsulator and decapsulator need to achieve agreement about GUE 1200 parameters that will be used in communications. Parameters include 1201 supported GUE variants, flags and extension fields that can be used, 1202 security algorithms and keys, supported protocols and control 1203 messages, etc. This document proposes different general methods to 1204 accomplish this, however the details of implementing these are 1205 considered out of scope. 1207 General methods for this are: 1209 o Configuration. The parameters used for a tunnel are configured 1210 at each endpoint. 1212 o Negotiation. A tunnel negotiation can be performed. This could 1213 be accomplished in-band of GUE using control messages. 1215 o Via a control plane. Parameters for communicating with a tunnel 1216 endpoint can be set in a control plane protocol (such as that 1217 needed for network virtualization). 1219 o Via security negotiation. Use of security typically implies a 1220 key exchange between endpoints. Other GUE parameters may be 1221 conveyed as part of that process. 1223 6. Motivation for GUE 1225 This section provides the motivation for GUE with respect to other 1226 encapsulation methods. 1228 6.1. Benefits of GUE 1230 * GUE is a generic encapsulation protocol. GUE can encapsulate 1231 protocols that are represented by an IP protocol number. This 1232 includes layer 2, layer 3, and layer 4 protocols. 1234 * GUE is an extensible encapsulation protocol. Standardized 1235 optional data such as security, virtual networking identifiers, 1236 fragmentation are defined. 1238 * For extensibility, GUE uses flag fields as opposed to TLVs as 1239 some other encapsulation protocols do. Flag fields are strictly 1240 ordered, allow random access, and are efficient in use of header 1241 space. 1243 * GUE allows sending of control messages such as OAM using the 1244 same GUE header format (for routing purposes) as normal data 1245 messages. 1247 * GUE maximizes deliverability of non-UDP and non-TCP protocols. 1249 * GUE provides a means for exposing per flow entropy for ECMP for 1250 IP atypical protocols such as SCTP, DCCP, ESP, etc. 1252 6.2. Comparison of GUE to other encapsulations 1254 A number of different encapsulation techniques have been proposed for 1255 the encapsulation of one protocol over another. EtherIP [RFC3378] 1256 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 1257 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 1258 layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN 1259 [RFC7348] are proposals for encapsulation of layer 2 packets for 1260 network virtualization. IPIP [RFC2003] and Generic packet tunneling 1261 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 1263 Several proposals exist for encapsulating packets over UDP including 1264 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN 1265 [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, 1266 MPLS/UDP [RFC7510], GENEVE [GENEVE], and GRE-in-UDP Encapsulation 1267 [RFC8086]. 1269 GUE has the following discriminating features: 1271 o UDP encapsulation leverages specialized network device 1272 processing for efficient transport. The semantics for using the 1273 UDP source port for flow entropy as input to ECMP are defined in 1274 section 5.11. 1276 o GUE permits encapsulation of arbitrary IP protocols, which 1277 includes layer 2, 3, and 4 protocols. 1279 o Multiple protocols can be multiplexed over a single UDP port 1280 number. This is in contrast to techniques to encapsulate 1281 protocols over UDP using a protocol specific port number (such 1282 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 1283 extensible mechanism for encapsulating all IP protocols in UDP 1284 with minimal overhead (four bytes of additional header). 1286 o GUE is extensible. New flags and extension fields can be 1287 defined. 1289 o The GUE header includes a header length field. This allows a 1290 network node to inspect an encapsulated packet without needing 1291 to parse the full encapsulation header. 1293 o GUE includes both data messages (encapsulation of packets) and 1294 control messages (such as OAM). 1296 o The flags-field model facilitates efficient implementation of 1297 extensibility in hardware. For instance, a TCAM can be used to 1298 parse a known set of N flags where the number of entries in the 1299 TCAM is 2^N. By comparison, the number of TCAM entries needed to 1300 parse a set of N arbitrarily ordered TLVs is approximately e*N!. 1302 o GUE includes a variant that encapsulates IPv4 and IPv6 packets 1303 directly within UDP. 1305 7. Security Considerations 1307 There are two important considerations of security with respect to 1308 GUE. 1310 o Authentication and integrity of the GUE header. 1312 o Authentication, integrity, and confidentiality of the GUE 1313 payload. 1315 GUE security is provided by extensions for security defined in 1316 [GUEEXTEN]. These extensions include methods to authenticate the GUE 1317 header and encrypt the GUE payload. 1319 The GUE header can be authenticated using a security extension for an 1320 HMAC (Hashed Message Authentication Code). Securing the GUE payload 1321 can be accomplished by use of the GUE Payload Transform extension. 1322 This extension allows the use of DTLS (Datagram Transport Layer 1323 Security) to encrypt and authenticate the GUE payload. 1325 A hash function for computing flow entropy (section 5.11) SHOULD be 1326 randomly seeded to mitigate some possible denial service attacks. 1328 8. IANA Considerations 1330 8.1. UDP source port 1332 A user UDP port number assignment for GUE has been assigned: 1334 Service Name: gue 1335 Transport Protocol(s): UDP 1336 Assignee: Tom Herbert 1337 Contact: Tom Herbert 1338 Description: Generic UDP Encapsulation 1339 Reference: draft-herbert-gue 1340 Port Number: 6080 1341 Service Code: N/A 1342 Known Unauthorized Uses: N/A 1343 Assignment Notes: N/A 1345 8.2. GUE variant number 1347 IANA is requested to set up a registry for the GUE variant number. 1348 The GUE variant number is two bits containing four possible values. 1349 This document defines variants 0 and 1. New values are assigned in 1350 accordance with RFC Required policy [RFC5226]. 1352 +----------------+----------------+---------------+ 1353 | Variant number | Description | Reference | 1354 +----------------+----------------+---------------+ 1355 | 0 | GUE Version 0 | This document | 1356 | | with header | | 1357 | | | | 1358 | 1 | GUE Version 0 | This document | 1359 | | with direct IP | | 1360 | | encapsulation | | 1361 | | | | 1362 | 2..3 | Unassigned | | 1363 +----------------+----------------+---------------+ 1365 8.3. Control types 1367 IANA is requested to set up a registry for the GUE control types. 1368 Control types are 8 bit values. New values for control types 1-127 1369 are assigned in accordance with RFC Required policy [RFC5226]. 1371 +----------------+------------------+---------------+ 1372 | Control type | Description | Reference | 1373 +----------------+------------------+---------------+ 1374 | 0 | Control payload | This document | 1375 | | needs more | | 1376 | | context for | | 1377 | | interpretation | | 1378 | | | | 1379 | 1..254 | Unassigned | | 1380 | | | | 1381 | 255 | Experimental | This document | 1382 +----------------+------------------+---------------+ 1384 8.4 Control Type Experimental Identifiers 1386 IANA is requested to create a "GUE Control Type Experimental 1387 Identifiers (GUE Control ExIDs)" registry. The registry records 32- 1388 bit ExIDs, as well as a reference (description, document pointer, 1389 assignee name, and e-mail contact) for each entry. 1391 Entries are assigned on a First Come, First Served (FCFS) basis 1392 [RFC5226]. The registry operates FCFS on the entire ExID (in network- 1393 standard order). 1395 IANA will advise applicants of duplicate entries to select an 1396 alternate value, as per typical FCFS processing. 1398 IANA will record known duplicate uses to assist the community in both 1399 debugging assigned uses as well as correcting unauthorized duplicate 1400 uses. 1402 IANA should impose no requirements on making a registration other 1403 than indicating the desired codepoint and providing a point of 1404 contact. A short description or acronym for the use is desired but 1405 should not be required. 1407 Initial assignments are: 1409 +----------------+----------------+---------------+ 1410 | ExI D | Description | Reference | 1411 +----------------+----------------+---------------+ 1412 | 1..x0ffffffff | Unassigned | | 1413 +----------------+----------------+---------------+ 1415 9. Acknowledgements 1417 The authors would like to thank David Liu, Erik Nordmark, Fred 1418 Templin, Adrian Farrel, Bob Briscoe, Murray Kucherawy, Mirja 1419 Kuhlewind, David Black, Joe Touch, and Greg Mirsky for valuable input 1420 on this draft. Special thanks to Fred Templin who is serving as 1421 document shepherd. 1423 10. References 1425 10.1. Normative References 1427 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1428 Requirement Levels", BCP 14, RFC 2119, DOI 1429 10.17487/RFC2119, March 1997, . 1432 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1433 10.17487/RFC0768, August 1980, . 1436 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1437 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1438 March 2017, . 1440 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1441 Requirement Levels", BCP 14, RFC 2119, DOI 1442 10.17487/RFC2119, March 1997, . 1445 [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 1446 2983, DOI 10.17487/RFC2983, October 2000, . 1449 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1450 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1451 2010, . 1453 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1454 UDP Checksums for Tunneled Packets", RFC 6935, DOI 1455 10.17487/RFC6935, April 2013, . 1458 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1459 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1460 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1461 . 1463 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1464 Communication Layers", STD 3, RFC 1122, DOI 1465 10.17487/RFC1122, October 1989, . 1468 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1469 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1470 2006, . 1472 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 1473 Cheshire, "Internet Assigned Numbers Authority (IANA) 1474 Procedures for the Management of the Service Name and 1475 Transport Protocol Port Number Registry", BCP 165, RFC 1476 6335, DOI 10.17487/RFC6335, August 2011, . 1479 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1480 IANA Considerations Section in RFCs", RFC 5226, DOI 1481 10.17487/RFC5226, May 2008, . 1484 10.2. Informative References 1486 [RFC6994] Touch, J., "Shared Use of Experimental TCP Options", RFC 1487 6994, DOI 10.17487/RFC6994, August 2013, . 1490 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 1491 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 1492 March 2017, . 1494 [RFC7605] Touch, J., "Recommendations on Using Assigned Transport 1495 Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, 1496 August 2015, . 1498 [RFC4787] Audet, F., Ed., and C. Jennings, "Network Address 1499 Translation (NAT) Behavioral Requirements for Unicast 1500 UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January 1501 2007, . 1503 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1504 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1505 DOI 10.17487/RFC5389, October 2008, . 1508 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 1509 (ICE): A Protocol for Network Address Translator (NAT) 1510 Traversal for Offer/Answer Protocols", RFC 5245, DOI 1511 10.17487/RFC5245, April 2010, . 1514 [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", BCP 1515 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, 1516 . 1518 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1519 for Equal Cost Multipath Routing and Link Aggregation in 1520 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1521 . 1523 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 1524 Ethernet Frames in IP Datagrams", RFC 3378, DOI 1525 10.17487/RFC3378, September 2002, . 1528 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1529 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1530 DOI 10.17487/RFC2784, March 2000, . 1533 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1534 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1535 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1536 . 1538 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1539 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1540 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1541 . 1543 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 1544 Virtualization Using Generic Routing Encapsulation", RFC 1545 7637, DOI 10.17487/RFC7637, September 2015, 1546 . 1548 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1549 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1550 eXtensible Local Area Network (VXLAN): A Framework for 1551 Overlaying Virtualized Layer 2 Networks over Layer 3 1552 Networks", RFC 7348, August 2014, . 1555 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 1556 10.17487/RFC2003, October 1996, . 1559 [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in 1560 IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, 1561 December 1998, . 1563 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 1564 Stenberg, "UDP Encapsulation of IPsec ESP Packets", RFC 1565 3948, DOI 10.17487/RFC3948, January 2005, . 1568 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1569 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 1570 10.17487/RFC6830, January 2013, . 1573 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1574 "Encapsulating MPLS in UDP", RFC 7510, DOI 1575 10.17487/RFC7510, April 2015, . 1578 [GUEEXTEN] Herbert, T., Yong, L., and Templin, F., "Extensions for 1579 Generic UDP Encapsulation", draft-ietf-intarea-gue- 1580 extensions-06 1582 [IPTUN] Touch, J. and Townsley, M., "IP Tunnels in the Internet 1583 Architecture", draft-ietf-intarea-tunnels-10 1585 [IANA-PN] IANA, "Protocol Numbers", 1586 . 1588 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 1589 "Encapsulation of TCP and other Transport Protocols over 1590 UDP", draft-cheshire-tcp-over-udp-00 1592 [GENEVE] Gross, J., Ed., Ganga, I. Ed., and Sridhar, T., "Geneve: 1593 Generic Network Virtualization Encapsulation", draft-ietf- 1594 nvo3-geneve-10 1596 [UDPENCAP] Herbert, T., "UDP Encapsulation in Linux", 1597 1600 [MULTIQ] Herbert, T. and de Bruijn, W., "Scaling in the Linux 1601 Networking Stack", 1604 [CSUMOFF] Cree, E., "Checksum Offloads in the Linux Networking 1605 Stack", 1608 [SEGOFF] Duyck, A., "Segmentation Offloads in the Linux Networking 1609 Stack", 1612 Appendix A: NIC processing for GUE 1614 This appendix is informational and does not constitute a normative 1615 part of this document. 1617 This appendix provides some guidelines for Network Interface Cards 1618 (NICs) to implement common offloads and accelerations to support GUE. 1619 Note that most of this discussion is generally applicable to other 1620 methods of UDP based encapsulation. An overview of UDP based 1621 encapsulation and acceleration is in [UDPENCAP] 1623 A.1. Receive multi-queue 1625 Contemporary NICs support multiple receive descriptor queues (multi- 1626 queue) [MUTLIQ]. Multi-queue enables load balancing of network 1627 processing for a NIC across multiple CPUs. On packet reception, a NIC 1628 selects an appropriate queue for host processing. Receive Side 1629 Scaling (RSS) is a common method which uses the flow hash for a 1630 packet to index an indirection table where each entry stores a queue 1631 number. Flow Director and Accelerated Receive Flow Steering (aRFS) 1632 allow a host to program the queue that is used for a given flow which 1633 is identified either by an explicit five-tuple or by the flow's hash. 1635 GUE encapsulation is compatible with multi-queue NICs that support 1636 five-tuple hash calculation for UDP/IP packets as input to RSS. The 1637 flow entropy in the UDP source port ensures classification of the 1638 encapsulated flow even in the case that the outer source and 1639 destination addresses are the same for all flows (e.g. all flows are 1640 going over a single tunnel). 1642 By default, UDP RSS support is often disabled in NICs to avoid out- 1643 of-order reception that can occur when UDP packets are fragmented. As 1644 discussed is section 5.7, fragmentation of GUE packets is mostly 1645 avoided by fragmenting packets before entering a tunnel, GUE 1646 fragmentation, path MTU discovery in higher layer protocols, or 1647 operator adjusting MTUs. Other UDP traffic might not implement such 1648 procedures to avoid fragmentation, so enabling UDP RSS support in the 1649 NIC might be a considered tradeoff during configuration. 1651 A.2. Checksum offload 1653 Many NICs provide capabilities to calculate the standard ones 1654 complement checksum for packets in transmit or receive [CSUMOFF]. 1655 When using GUE encapsulation, there are at least two checksums that 1656 are of interest: the encapsulated packet's transport checksum, and 1657 the UDP checksum in the outer header. 1659 A.2.1. Transmit checksum offload 1661 NICs can provide a protocol agnostic method to offload the transmit 1662 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1663 GUE. In this method, the host provides checksum related parameters in 1664 a transmit descriptor for a packet. These parameters include the 1665 starting offset of data to checksum, the length of data to checksum, 1666 and the offset in the packet where the computed checksum is to be 1667 written. The host initializes the checksum field to a pseudo header 1668 checksum. 1670 In the case of GUE, the checksum for an encapsulated transport layer 1671 packet, a TCP packet for instance, can be offloaded by setting the 1672 appropriate checksum parameters. 1674 NICs typically can offload only one transmit checksum per packet, so 1675 simultaneously offloading both an inner transport packet's checksum 1676 and the outer UDP checksum is likely not possible. 1678 If an encapsulator is co-resident with a host, then checksum offload 1679 may be performed using remote checksum offload (RCO)[GUEEXTEN]. 1680 Remote checksum offload relies on NIC offload of the simple UDP/IP 1681 checksum which is commonly supported even in legacy devices. In 1682 remote checksum offload, the outer UDP checksum is set and the GUE 1683 header includes an option indicating the start and offset of the 1684 inner "offloaded" checksum. The inner checksum is initialized to the 1685 pseudo header checksum. When a decapsulator receives a GUE packet 1686 with the remote checksum offload option, it completes the offload 1687 operation by determining the packet checksum from the indicated start 1688 point to the end of the packet, and then adds this into the checksum 1689 field at the offset given in the option. Computing the checksum from 1690 the start to end of packet is efficient if checksum-complete is 1691 provided on the receiver. 1693 Another alternative when an encapsulator is co-resident with a host 1694 is to perform Local Checksum Offload (LCO) [CSUMOFF]. In this method, 1695 the inner transport layer checksum is offloaded and the outer UDP 1696 checksum can be deduced based on the fact that the portion of the 1697 packet covered by the inner transport checksum will sum to zero or at 1698 least the bitwise "not" of the inner pseudo header. 1700 A.2.2. Receive checksum offload 1702 GUE is compatible with NICs that perform a protocol agnostic receive 1703 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1704 NIC computes a ones complement checksum over all (or some predefined 1705 portion) of a packet. The computed value is provided to the host 1706 stack in the packet's receive descriptor. The host driver can use 1707 this checksum to "patch up" and validate any inner packet transport 1708 checksums, as well as the outer UDP checksum if it is non-zero. 1710 Many legacy NICs don't provide checksum-complete but instead provide 1711 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1712 in Linux). Usually, such validation is only done for simple TCP/IP or 1713 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1714 checksum-complete value for the UDP packet is the bitwise "not" of 1715 the pseudo header checksum. In this way, checksum-unnecessary can be 1716 converted to checksum-complete. So, if the NIC provides checksum- 1717 unnecessary for the outer UDP header in an encapsulation, checksum 1718 conversion can be done so that the checksum-complete value is derived 1719 and can be used by the stack to validate checksums in the 1720 encapsulated packet. 1722 A.3. Transmit Segmentation Offload 1724 Transmit Segmentation Offload (TSO) [SEGOFF] is a NIC feature where a 1725 host provides a large (>MTU size) TCP packet to the NIC, which in 1726 turn splits the packet into separate segments and transmits each one. 1727 This is useful to reduce CPU load on the host. 1729 The process of TSO can be generalized as: 1731 - Split the TCP payload into segments of size less than or equal 1732 to MTU. 1734 - For each created segment: 1736 1. Replicate the TCP header and all preceding headers of the 1737 original packet. 1739 2. Set payload length fields in any headers to reflect the 1740 length of the segment. 1742 3. Set TCP sequence number to correctly reflect the offset of 1743 the TCP data in the stream. 1745 4. Recompute and set any checksums that either cover the payload 1746 of the packet or cover header which was changed by setting a 1747 payload length. 1749 Following this general process, TSO can be extended to support TCP 1750 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1751 header, GUE header, inner IP header (if tunneling), and TCP headers 1752 are replicated. Any packet length header fields need to be set 1753 properly (including the length in the outer UDP header), and 1754 checksums need to be set correctly (including the outer UDP checksum 1755 if being used). 1757 To facilitate TSO with GUE, it is recommended that extension fields 1758 do not contain values that need to be updated on a per segment basis. 1759 For example, extension fields should not include checksums, lengths, 1760 or sequence numbers that refer to the payload. If the GUE header does 1761 not contain such fields then the TSO engine only needs to copy the 1762 bits in the GUE header when creating each segment and does not need 1763 to parse the GUE header. 1765 A.4. Large Receive Offload 1767 Large Receive Offload (LRO) [SEGOFF] is a NIC feature where received 1768 packets of a TCP connection are reassembled, or coalesced, in the NIC 1769 and delivered to the host as one large packet. This feature can 1770 reduce CPU utilization in the host. 1772 LRO requires significant protocol awareness to be implemented 1773 correctly and is difficult to generalize. Packets in the same flow 1774 need to be unambiguously identified. In the presence of tunnels or 1775 network virtualization, this may require more than a five-tuple match 1776 (for instance packets for flows in two different virtual networks may 1777 have identical five-tuples). Additionally, a NIC needs to perform 1778 validation over packets that are being coalesced, and needs to 1779 fabricate a single meaningful header from all the coalesced packets. 1781 The conservative approach to supporting LRO for GUE would be to 1782 assign packets to the same flow only if they have identical five- 1783 tuple and were encapsulated the same way. That is the outer IP 1784 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1785 and inner five tuple are all identical. 1787 Appendix B: Implementation considerations 1789 This appendix is informational and does not constitute a normative 1790 part of this document. 1792 B.1. Priveleged ports 1794 Using the source port to contain a flow entropy value disallows the 1795 security method of a receiver enforcing that the source port be a 1796 privileged port. Privileged ports are defined by some operating 1797 systems to restrict source port binding. Unix, for instance, 1798 considered port number less than 1024 to be privileged. 1800 Enforcing that packets are sent from a privileged port is widely 1801 considered an inadequate security mechanism and has been mostly 1802 deprecated. To approximate this behavior, an implementation could 1803 restrict a user from sending a packet destined to the GUE port 1804 without proper credentials. 1806 B.2. Setting flow entropy as a route selector 1808 An encapsulator generating flow entropy in the UDP source port could 1809 modulate the value to perform a type of multipath source routing. 1810 Assuming that networking switches perform ECMP based on the flow 1811 hash, a sender can affect the path by altering the flow entropy. For 1812 instance, a host can store a flow hash in its protocol control block 1813 (PCB) for an inner flow, and might alter the value upon detecting 1814 that packets are traversing a lossy path. Changing the flow entropy 1815 for a flow SHOULD be subject to hysteresis (at most once every thirty 1816 seconds) to limit the number of out of order packets. 1818 B.3. Hardware protocol implementation considerations 1820 Low level data path protocols, such as GUE, are often supported in 1821 high speed network device hardware. Variable length header (VLH) 1822 protocols like GUE are sometimes considered difficult to efficiently 1823 implement in hardware. In order to retain the important 1824 characteristics of an extensible and robust protocol, hardware 1825 vendors may practice "constrained flexibility". In this model, only 1826 certain combinations or protocol header parameterizations are 1827 implemented in the hardware fast path. Each such parameterization is 1828 fixed length so that the particular instance can be optimized as a 1829 fixed length protocol. In the case of GUE, this constitutes specific 1830 combinations of GUE flags, fields, and next protocol. The selected 1831 combinations would naturally be the most common cases which form the 1832 "fast path", and other combinations are assumed to take the "slow 1833 path". 1835 In time, the needs and requirements of a protocol may change which 1836 may manifest themselves as new parameterizations to be supported in 1837 the fast path. To allow this extensibility, a device practicing 1838 constrained flexibility should allow fast path parameterizations to 1839 be programmable. 1841 Authors' Addresses 1843 Tom Herbert 1844 Quantonium 1845 4701 Patrick Henry 1846 Santa Clara, CA 95054 1847 US 1849 Email: tom@herbertland.com 1851 Lucy Yong 1852 Independent 1853 Austin, TX 1854 US 1856 Email: lucy_yong@yahoo.com 1858 Osama Zia 1859 Microsoft 1860 1 Microsoft Way 1861 Redmond, WA 98029 1862 US 1864 Email: osamaz@microsoft.com