idnits 2.17.1 draft-ietf-nvo3-gue-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 253: '... flag bits MUST be set to zero o...' RFC 2119 keyword, line 368: '...and decapsulator MUST agree on the mea...' RFC 2119 keyword, line 374: '... GUE packet with private data, it MUST...' RFC 2119 keyword, line 376: '...capsulator the packet MUST be dropped....' RFC 2119 keyword, line 378: '...ntics the packet MUST also be dropped....' (19 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 6, 2016) is 2851 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC478' is mentioned on line 644, but not defined == Missing Reference: 'RFC6935' is mentioned on line 668, but not defined == Missing Reference: 'RFC768' is mentioned on line 703, but not defined == Missing Reference: 'RFC1122' is mentioned on line 692, but not defined == Missing Reference: 'RFC2460' is mentioned on line 703, but not defined ** Obsolete undefined reference: RFC 2460 (Obsoleted by RFC 8200) == Missing Reference: 'RFC5405' is mentioned on line 741, but not defined ** Obsolete undefined reference: RFC 5405 (Obsoleted by RFC 8085) == Missing Reference: 'RFC2473' is mentioned on line 877, but not defined -- Looks like a reference, but probably isn't: '7510' on line 882 == Missing Reference: 'GUESEC' is mentioned on line 928, but not defined == Missing Reference: 'RFC5226' is mentioned on line 993, but not defined ** Obsolete undefined reference: RFC 5226 (Obsoleted by RFC 8126) == Unused Reference: 'RFC2434' is defined on line 1061, but no explicit reference was found in the text == Unused Reference: 'RFC5925' is defined on line 1119, but no explicit reference was found in the text == Unused Reference: 'RFC3828' is defined on line 1123, but no explicit reference was found in the text == Unused Reference: 'RFC7605' is defined on line 1135, but no explicit reference was found in the text == Unused Reference: 'RFC7510' is defined on line 1144, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Downref: Normative reference to an Informational RFC: RFC 2983 ** Downref: Normative reference to an Informational RFC: RFC 4459 -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) == Outdated reference: A later version (-01) exists of draft-herbert-gue-extensions-00 == Outdated reference: A later version (-04) exists of draft-hy-nvo3-gue-4-nvo-03 == Outdated reference: A later version (-19) exists of draft-ietf-tsvwg-gre-in-udp-encap-13 Summary: 7 errors (**), 0 flaws (~~), 18 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Virtualization Overlays (nvo3) T. Herbert 3 Internet-Draft Facebook 4 Intended status: Standard track L. Yong 5 Expires January 7, 2017 Huawei USA 6 O. Zia 7 Microsoft 8 July 6, 2016 10 Generic UDP Encapsulation 11 draft-ietf-nvo3-gue-04 13 Status of this Memo 15 This Internet-Draft is submitted in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 This Internet-Draft will expire on January 7, 2017. 36 Copyright Notice 38 Copyright (c) 2016 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Abstract 60 This specification describes Generic UDP Encapsulation (GUE), which 61 is a scheme for using UDP to encapsulate packets of arbitrary IP 62 protocols for transport across layer 3 networks. By encapsulating 63 packets in UDP, specialized capabilities in networking hardware for 64 efficient handling of UDP packets can be leveraged. GUE specifies 65 basic encapsulation methods upon which higher level constructs, such 66 tunnels and overlay networks for network virtualization, can be 67 constructed. GUE is extensible by allowing optional data fields as 68 part of the encapsulation, and is generic in that it can encapsulate 69 packets of various IP protocols. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Base packet format . . . . . . . . . . . . . . . . . . . . . . 5 75 2.1. GUE version . . . . . . . . . . . . . . . . . . . . . . . . 5 76 3. Version 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 77 3.1. Header format . . . . . . . . . . . . . . . . . . . . . . . 6 78 3.2. Proto/ctype field . . . . . . . . . . . . . . . . . . . . . 7 79 3.2.1 Proto field . . . . . . . . . . . . . . . . . . . . . . 7 80 3.2.2 Ctype field . . . . . . . . . . . . . . . . . . . . . . 8 81 3.3. Flags and optional fields . . . . . . . . . . . . . . . . . 8 82 3.4. Private data . . . . . . . . . . . . . . . . . . . . . . . 9 83 3.5. Message types . . . . . . . . . . . . . . . . . . . . . . . 9 84 3.5.1. Control messages . . . . . . . . . . . . . . . . . . . 9 85 3.5.2. Data messages . . . . . . . . . . . . . . . . . . . . . 10 86 3.6. Hiding the transport layer protocol number . . . . . . . . 10 87 4. Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 88 4.1. Direct encapsulation of IPv4 . . . . . . . . . . . . . . . 11 89 4.2. Direct encapsulation of IPv6 . . . . . . . . . . . . . . . 12 90 5. Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 91 5.1. Network tunnel encapsulation . . . . . . . . . . . . . . . 13 92 5.2. Transport layer encapsulation . . . . . . . . . . . . . . . 13 93 5.3. Encapsulator operation . . . . . . . . . . . . . . . . . . 13 94 5.4. Decapsulator operation . . . . . . . . . . . . . . . . . . 14 95 5.5. Router and switch operation . . . . . . . . . . . . . . . . 14 96 5.6. Middlebox interactions . . . . . . . . . . . . . . . . . . 14 97 5.6.1. Inferring connection semantics . . . . . . . . . . . . 14 98 5.6.2. NAT . . . . . . . . . . . . . . . . . . . . . . . . . . 15 99 5.7. Checksum Handling . . . . . . . . . . . . . . . . . . . . . 15 100 5.7.1. Checksum requirements . . . . . . . . . . . . . . . . . 15 101 5.7.2. UDP Checksum with IPv4 . . . . . . . . . . . . . . . . 15 102 5.7.3. UDP Checksum with IPv6 . . . . . . . . . . . . . . . . 16 103 5.8. MTU and fragmentation . . . . . . . . . . . . . . . . . . . 17 104 5.9. Congestion control . . . . . . . . . . . . . . . . . . . . 17 105 5.10. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 17 106 5.11. Flow entropy for ECMP . . . . . . . . . . . . . . . . . . 18 107 5.11.1. Flow classification . . . . . . . . . . . . . . . . . 18 108 5.11.2. Flow entropy properties . . . . . . . . . . . . . . . 19 109 6. Motivation for GUE . . . . . . . . . . . . . . . . . . . . . . 19 110 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 21 111 8. IANA Consideration . . . . . . . . . . . . . . . . . . . . . . 21 112 8.1. UDP source port . . . . . . . . . . . . . . . . . . . . . . 21 113 8.2. GUE version number . . . . . . . . . . . . . . . . . . . . 22 114 8.3. Control types . . . . . . . . . . . . . . . . . . . . . . . 22 115 8.4. Flag-fields . . . . . . . . . . . . . . . . . . . . . . . . 22 116 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 117 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 118 10.1. Normative References . . . . . . . . . . . . . . . . . . . 23 119 10.2. Informative References . . . . . . . . . . . . . . . . . . 24 120 Appendix A: NIC processing for GUE . . . . . . . . . . . . . . . . 26 121 A.1. Receive multi-queue . . . . . . . . . . . . . . . . . . . . 27 122 A.2. Checksum offload . . . . . . . . . . . . . . . . . . . . . 27 123 A.2.1. Transmit checksum offload . . . . . . . . . . . . . . . 27 124 A.2.2. Receive checksum offload . . . . . . . . . . . . . . . 28 125 A.3. Transmit Segmentation Offload . . . . . . . . . . . . . . . 28 126 A.4. Large Receive Offload . . . . . . . . . . . . . . . . . . . 29 127 Appendix B: Implementation considerations . . . . . . . . . . . . 30 128 B.1. Priveleged ports . . . . . . . . . . . . . . . . . . . . . 30 129 B.2. Setting flow entropy as a route selector . . . . . . . . . 30 130 B.3. Hardware protocol implementation considerations . . . . . . 30 131 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 31 133 1. Introduction 135 This specification describes Generic UDP Encapsulation (GUE) which is 136 a general method for encapsulating packets of arbitrary IP protocols 137 within User Datagram Protocol (UDP) [RFC0768] packets. Encapsulating 138 packets in UDP facilitates efficient transport across networks. 139 Networking devices widely provide protocol specific processing and 140 optimizations for UDP (as well as TCP) packets. Packets for atypical 141 IP protocols (those not usually parsed by networking hardware) can be 142 encapsulated in UDP packets to maximize deliverability and to 143 leverage flow specific mechanisms for routing and packet steering. 145 GUE provides an extensible header format for including optional data 146 in the encapsulation header. This data potentially covers items such 147 as virtual networking identifier, security data for validating or 148 authenticating the GUE header, congestion control data, etc. GUE also 149 allows private optional data in the encapsulation header. This 150 feature can be used by a site or implementation to define local 151 custom optional data, and allows experimentation of options that may 152 eventually become standard. 154 This document does not define any specific GUE extensions. 155 [GUEEXTENS] specifies a set of core extensions and [GUE4NVO3] defines 156 an extension for using GUE with network virtualization. 158 The motivation for the GUE protocol is described in section 6. 160 2. Base packet format 162 A GUE packet is comprised of a UDP packet whose payload is a GUE 163 header followed by a payload which is either an encapsulated packet 164 of some IP protocol or a control message (like an OAM message). A GUE 165 packet has the general format: 167 +-------------------------------+ 168 | | 169 | UDP/IP header | 170 | | 171 |-------------------------------| 172 | | 173 | GUE Header | 174 | | 175 |-------------------------------| 176 | | 177 | Encapsulated packet | 178 | or control message | 179 | | 180 +-------------------------------+ 182 The GUE header is variable length as determined by the presence of 183 optional fields. 185 2.1. GUE version 187 The first two bits of the GUE header contain the GUE protocol version 188 number. The rest of the fields after the GUE version number are 189 defined based on the version number. Versions 0x0 and 0x1 are 190 described in this specification; versions 0x2 and 0x3 are reserved, 192 3. Version 0 194 Version 0 of GUE defines a gereric extensible format to encapsulate 195 packets by Internet protocol number. 197 3.1. Header format 199 The header format for version 0x0 of GUE in UDP is: 201 0 1 2 3 202 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 204 | Source port | Destination port | | 205 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 206 | Length | Checksum | | 207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 208 |0x0|C| Hlen | Proto/ctype | Flags | 209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 210 | | 211 ~ Fields (optional) ~ 212 | | 213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 214 | | 215 ~ Private data (optional) ~ 216 | | 217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 219 The contents of the UDP header are: 221 o Source port: This should be set to a flow entropy value for use 222 with ECMP. The properties of flow entropy are described in 223 section 5.11. 225 o Destination port: The GUE assigned port number, 6080. 227 o Length: Canonical length of the UDP packet (length of UDP header 228 and payload). 230 o Checksum: Standard UDP checksum (see section 5.7). 232 The GUE header consists of: 234 o Ver: GUE protocol version (0x0). 236 o C: Control flag. When set indicates a control message, not set 237 indicates a data message. 239 o Hlen: Length in 32-bit words of the GUE header, including 240 optional fields but not the first four bytes of the header. 241 Computed as (header_len - 4) / 4. All GUE headers are a multiple 242 of four bytes in length. Maximum header length is 128 bytes. 244 o Proto/ctype: When the C bit is set this field contains a control 245 message type for the payload (section 3.2.2). When C bit is not 246 set, the field holds the Internet protocol number for the 247 encapsulated packet in the payload (section 3.2.1). The control 248 message or encapsulated packet begins at the offset provided by 249 Hlen. 251 o Flags. Header flags that may be allocated for various purposes 252 and may indicate presence of optional fields. Undefined header 253 flag bits MUST be set to zero on transmission. 255 o Fields: Optional fields whose presence is indicated by 256 corresponding flags. 258 o Private data: Optional private data (see section 3.4). If 259 private data is present it immediately follows that last field 260 present in the header. The length of this data is determined by 261 subtracting the starting offset from the header length. 263 3.2. Proto/ctype field 265 When the C bit is not set, the proto/ctype field must be set to a 266 valid Internet protocol number. The protocol number serves as an 267 indication of the type of next protocol header which is contained in 268 the GUE payload at the offset indicated in Hlen. Intermediate devices 269 may parse the GUE payload per the number in the proto/ctype field, 270 and header flags cannot affect the interpretation of the proto/ctype 271 field. 273 3.2.1 Proto field 275 When the C bit is not set the proto/ctype field contains an IANA 276 Internet Protocol Number. The protocol number in interpreted relative 277 to the IP protocol that encapsulates the UDP packet (i.e. protocol of 278 the outer IP header). 280 For an IPv4 header the protocol may be set to any number except for 281 those that refer to IPv6 extension headers or ICMPv6 options (number 282 58). An exception is that the destination options extension header 283 using the PadN option may be used with IPv4 as described in section 284 3.6. The "no next header" protocol number (59) may be used with IPv4 285 as described below. 287 For an IPv6 header the protocol may be set to any defined protocol 288 number except Hop-by-hop options (number 0). If a received GUE packet 289 in IPv6 contains a protocol number that is an extension header (e.g. 290 Destination Options) then the extension header is processed after the 291 GUE header as though the GUE header itself were an extension header. 293 IP protocol number 59 ("No next header") may be set to indicate that 294 the GUE payload does not begin with the header of an IP protocol. 295 This would be the case, for instance, if the GUE payload were a 296 fragment when performing GUE level fragmentation. The interpretation 297 of the payload is performed though other means (such as flags and 298 optional fields), and intermediate devices must not parse packets the 299 packet based on the IP protocol number in this case. 301 3.2.2 Ctype field 303 When the C bit is set, the proto/ctype field must be set to a valid 304 control message type. A value of zero indicates that the GUE payload 305 requires further interpretation to deduce the control type. This 306 might be the case when the payload is a fragment of a control 307 message, where only the reassembled packet can be interpreted as a 308 control message. 310 Control message types 1 through 127 may be defined in standards. 311 Types 128 through 255 are reserved to be user defined for 312 experimentation or private control messages. 314 This document does not specify any standard control message types, 315 other than type 0, for GUE. 317 3.3. Flags and optional fields 319 Flags and associated optional fields are the primary mechanism of 320 extensibility in GUE. There are sixteen flag bits in the GUE header. 322 A flag may indicate presence of optional fields. The size of an 323 optional field indicated by a flag must be fixed. 325 Flags may be paired together to allow different lengths for an 326 optional field. For example, if two flag bits are paired, a field may 327 possibly be three different lengths. Regardless of how flag bits may 328 be paired, the lengths and offsets of optional fields corresponding 329 to a set of flags must be well defined. 331 Optional fields are placed in order of the flags. New flags are to be 332 allocated from high to low order bit contiguously without holes. 333 Flags allow random access, for instance to inspect the field 334 corresponding to the Nth flag bit, an implementation only considers 335 the previous N-1 flags to determine the offset. Flags after the Nth 336 flag are not pertinent in calculating the offset of the Nth flag. 338 Flags (or paired flags) are idempotent such that new flags must not 339 cause reinterpretation of old flags. Also, new flags should not alter 340 interpretation of other elements in the GUE header nor how the 341 message is parsed (for instance, in a data message the proto/ctype 342 field always holds an IP protocol number as an invariant). 344 The set of available flags may be extended in the future by defining 345 a "flag extensions bit" that refers to a field containing a new set 346 of flags. 348 3.4. Private data 350 An implementation may use private data for its own use. The private 351 data immediately follows the last field in the GUE header and is not 352 a fixed length. This data is considered part of the GUE header and 353 must be accounted for in header length (Hlen). The length of the 354 private data must be a multiple of four and is determined by 355 subtracting the offset of private data in the GUE header from the 356 header length. Specifically: 358 Private_length = (Hlen * 4) - Length(flags) 360 Where "Length(flags)" returns the sum of lengths of all the optional 361 fields present in the GUE header. When there is no private data 362 present, length of the private data is zero. 364 The semantics and interpretation of private data are implementation 365 specific. The private data may be structured as necessary, for 366 instance it might contain its own set of flags and optional fields. 368 An encapsulator and decapsulator MUST agree on the meaning of private 369 data before using it. The mechanism to achieve this agreement is 370 outside the scope of this document but could include implementation- 371 defined behavior, coordinated configuration, in-band communication 372 using GUE control messages, and out-of-band messages. 374 If a decapsulator receives a GUE packet with private data, it MUST 375 validate the private data appropriately. If a decapsulator does not 376 expect private data from an encapsulator the packet MUST be dropped. 377 If a decapsulator cannot validate the contents of private data per 378 the provided semantics the packet MUST also be dropped. An 379 implementation may place security data in GUE private data which must 380 be verified for packet acceptance. 382 3.5. Message types 384 3.5.1. Control messages 386 Control messages are indicated in the GUE header when the C bit is 387 set. The payload is interpreted as a control message with type 388 specified in the proto/ctype field. The format and contents of the 389 control message are indicated by the type and can be variable length. 391 Other than interpreting the proto/ctype field as a control message 392 type, the meaning and semantics of the rest of the elements in the 393 GUE header are the same as that of data messages. Forwarding and 394 routing of control messages should be the same as that of a data 395 message with the same outer IP and UDP header and GUE flags-- this 396 ensures that control messages can be created that follow the same 397 path as data messages. 399 Control messages can be defined for OAM type messages. For instance, 400 an echo request and corresponding echo reply message may be defined 401 to test for liveness. 403 3.5.2. Data messages 405 Data messages are indicated in GUE header with C bit not set. The 406 payload of a data message is interpreted as an encapsulated packet of 407 an Internet protocol indicated in the proto/ctype field. The packet 408 immediately follows the GUE header. 410 Data messages are a primary means of encapsulation and can be used to 411 create tunnels for overlay networks. 413 3.6. Hiding the transport layer protocol number 415 The GUE header indicates the Internet protocol of the encapsulated 416 packet. This is either contained in the Proto/ctype field of the 417 primary GUE header, or is contained in the Payload Type field of a 418 GUE Transform Field (used to encrypt the payload with DTLS). If the 419 protocol number must be obfuscated, that is the transport protocol in 420 use must be hidden from the network, then a trivial destination 421 options can be used at the beginning of the payload. 423 The PadN destination option can be used to encode the transport 424 protocol as a next header of an extension header (and maintain 425 alignment of encapsulated transport headers). The Proto/ctype field 426 or Payload Type field of the GUE Transform field is set to 60 to 427 indicate that the first encapsulated header is a Destination Options 428 extension header. 430 The format of the extension header is below: 432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 433 | Next Header | 2 | 1 | 0 | 434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 For IPv4, it is permitted in GUE to used this precise destination 437 option to contain the obfuscated protocol number. In this case next 438 header must refer to a valid IP protocol for IPv4. No other extension 439 headers or destination options are permitted with IPv4. 441 4. Version 1 443 Version 1 of GUE allows direct encapsulation of IPv4 and IPv6 in UDP. 444 In this version there is no GUE header; a UDP packet carries an IP 445 packet. The first two bits of the UDP payload for GUE are the GUE 446 version and coincide with the first two bits of the version number in 447 the IP header. The first two version bits of IPv4 and IPv6 are 01, so 448 we use GUE version 1 for direct IP encapsulation which makes two bits 449 of GUE version to also be 01. 451 This technique is effectively a means to compress out the GUE header 452 when encapsulating IPv4 or IPv6 packets and there are no flags or 453 optional fields present. This method is compatible to use on the same 454 port number as packets with the the GUE header (GUE version 0 455 packets). This technique saves encapsulation overhead on costly links 456 for the common use of IP encapsulation, and also obviates the need to 457 allocate a separate port number for IP-over-UDP encapsulation. 459 4.1. Direct encapsulation of IPv4 461 The format for encapsulating IPv4 directly in UDP is demonstrated 462 below. 463 0 1 2 3 464 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 466 | Source port | Destination port | | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 468 | Length | Checksum | | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 470 |0|1|0|0| IHL |Type of Service| Total Length | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Identification |Flags| Fragment Offset | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Time to Live | Protocol | Header Checksum | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | Source IPv4 Address | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 | Destination IPv4 Address | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 Note that 0100 value IP version field express the GUE version as 1 482 (bits 01) and IP version as 4 (bits 0100). 484 4.2. Direct encapsulation of IPv6 486 The format for encapsulating IPv4 directly in UDP is demonstrated 487 below. 488 0 1 2 3 489 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\ 491 | Source port | Destination port | | 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ UDP 493 | Length | Checksum | | 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+/ 495 |0|1|1|0| Traffic Class | Flow Label | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Payload Length | NextHdr | Hop Limit | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | | 500 + + 501 | | 502 + Outer Source IPv6 Address + 503 | | 504 + + 505 | | 506 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 507 | | 508 + + 509 | | 510 + Outer Destination IPv6 Address + 511 | | 512 + + 513 | | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 516 Note that 0110 value IP version field express the GUE version as 1 517 (bits 01) and IP version as 6 (bits 0110). 519 5. Operation 521 The figure below illustrates the use of GUE encapsulation between two 522 servers. Sever 1 is sending packets to server 2. An encapsulator 523 performs encapsulation of packets from server 1. These encapsulated 524 packets traverse the network as UDP packets. At the decapsulator, 525 packets are decapsulated and sent on to server 2. Packet flow in the 526 reverse direction need not be symmetric; GUE encapsulation is not 527 required in the reverse path. 529 +---------------+ +---------------+ 530 | | | | 531 | Server 1 | | Server 2 | 532 | | | | 533 +---------------+ +---------------+ 534 | ^ 535 V | 536 +---------------+ +---------------+ +---------------+ 537 | | | | | | 538 | Encapsulator |-->| Layer 3 |-->| Decapsulator | 539 | | | Network | | | 540 +---------------+ +---------------+ +---------------+ 542 The encapsulator and decapsulator may be co-resident with the 543 corresponding servers, or may be on separate nodes in the network. 545 5.1. Network tunnel encapsulation 547 Network tunneling can be achieved by encapsulating layer 2 or layer 3 548 packets. In this case the encapsulator and decapsulator nodes are the 549 tunnel endpoints. These could be routers that provide network tunnels 550 on behalf of communicating servers. 552 5.2. Transport layer encapsulation 554 When encapsulating layer 4 packets, the encapsulator and decapsulator 555 should be co-resident with the servers. In this case, the 556 encapsulation headers are inserted between the IP header and the 557 transport packet. The addresses in the IP header refer to both the 558 endpoints of the encapsulation and the endpoints for terminating the 559 the transport protocol. 561 5.3. Encapsulator operation 563 Encapsulators create GUE data messages, set the fields of the UDP 564 header, set flags and optional fields in the GUE header, and forward 565 packets to a decapsulator. 567 An encapsulator may be an end host originating the packets of a flow, 568 or may be a network device performing encapsulation on behalf of 569 hosts (routers implementing tunnels for instance). In either case, 570 the intended target (decapsulator) is indicated by the outer 571 destination IP address. 573 If an encapsulator is tunneling packets, that is encapsulating 574 packets of layer 2 or layer 3 protocols (e.g. EtherIP, IPIP, ESP 575 tunnel mode), it should follow standard conventions for tunneling of 576 one IP protocol over another. Diffserv interaction with tunnels is 577 described in [RFC2983], ECN propagation for tunnels is described in 578 [RFC6040]. 580 5.4. Decapsulator operation 582 A decapsulator performs decapsulation of GUE packets. A decapsulator 583 is addressed by the outer destination IP address of a GUE packet. 584 The decapsulator validates packets, including fields of the GUE 585 header. If a packet is acceptable, the UDP and GUE headers are 586 removed and the packet is resubmitted for protocol processing or 587 control message processing if it is a control message. 589 If a decapsulator receives a GUE packet with an unsupported version, 590 unknown flag, bad header length (too small for included optional 591 fields), unknown control message type, bad protocol number, an 592 unsupported payload type, or an otherwise malformed header, it MUST 593 drop the packet. Such events may be logged subject to configuration 594 and rate limiting of logging messages. No error message is returned 595 back to the encapsulator. Note that set flags in GUE that are unknown 596 to a decapsulator MUST NOT be ignored. If a GUE packet is received by 597 a decapsulator with unknown flags, the packet MUST be dropped. 599 5.5. Router and switch operation 601 Routers and switches should forward GUE packets as standard UDP/IP 602 packets. The outer five-tuple should contain sufficient information 603 to perform flow classification corresponding to the flow of the inner 604 packet. A switch should not normally need to parse a GUE header, and 605 none of the flags or optional fields in the GUE header should affect 606 routing. 608 A router should not modify a GUE header when forwarding a packet. It 609 may encapsulate a GUE packet in another GUE packet, for instance to 610 implement a network tunnel (i.e. by encapsulating an IP packet with a 611 GUE payload in another IP packet as a GUE payload). In this case the 612 router takes the role of an encapsulator, and the corresponding 613 decapsulator is the logical endpoint of the tunnel. 615 5.6. Middlebox interactions 617 A middle box may interpret some flags and optional fields of the GUE 618 header for classification purposes, but is not required to understand 619 any of the flags or fields in GUE packets. A middle box must not drop 620 a GUE packet because there are flags unknown to it. The header length 621 in the GUE header allows a middlebox to inspect the payload packet 622 without needing to parse the flags or optional fields. 624 5.6.1. Inferring connection semantics 625 A middlebox may infer bidirectional connection semantics for a UDP 626 flow. For instance a stateful firewall may create a five-tuple rule 627 to match flows on egress, and a corresponding five-tuple rule for 628 matching ingress packets where the roles of source and destination 629 are reversed for the IP addresses and UDP port numbers. To operate in 630 this environment, a GUE tunnel must assume connected semantics 631 defined by the UDP five tuple and the use of GUE encapsulation must 632 be symmetric between both endpoints. The source port set in the UDP 633 header must be the destination port the peer would set for replies. 634 In this case the UDP source port for a tunnel would be a fixed value 635 and not set to be flow entropy as described in section 5.11. 637 The selection of whether to make the UDP source port fixed or set to 638 a flow entropy value for each packet sent should be configurable for 639 a tunnel. 641 5.6.2. NAT 643 IP address and port translation can be performed on the UDP/IP 644 headers adhering to the requirements for NAT with UDP [RFC478]. In 645 the case of stateful NAT, connection semantics must be applied to a 646 GUE tunnel as described in section 5.6.1. 648 5.7. Checksum Handling 650 This section describes the requirements around the UDP checksum. 651 Checksums are an important consideration in that that they can 652 provide end to end validation and protect against packet mis- 653 delivery. The latter is allowed by the inclusion of a pseudo header 654 that covers the IP addresses and UDP ports of the encapsulating 655 headers. 657 5.7.1. Checksum requirements 659 The potential for mis-delivery of packets due to corruption of IP, 660 UDP, or GUE headers must be considered. One of the following 661 requirements must be met: 663 o UDP checksums are enabled (for IPv4 or IPv6). 665 o The GUE header checksum is used (defined in [GUEEXTENS]). 667 o Zero UDP checksums are used in accordance with applicable 668 requirements in [GREUDP], [RFC6935], and [RFC6936]. 670 5.7.2. UDP Checksum with IPv4 672 For UDP in IPv4, the UDP checksum MUST be processed as specified in 674 [RFC768] and [RFC1122] for both transmit and receive. An 675 encapsulator MAY set the UDP checksum to zero for performance or 676 implementation considerations. The IPv4 header includes a checksum 677 that protects against mis-delivery of the packet due to corruption 678 of IP addresses. The UDP checksum potentially provides protection 679 against corruption of the UDP header, GUE header, and GUE payload. 680 Enabling or disabling the use of checksums is a deployment 681 consideration that should take into account the risk and effects of 682 packet corruption, and whether the packets in the network are 683 already adequately protected by other, possibly stronger mechanisms 684 such as the Ethernet CRC. If an encapsulator sets a zero UDP 685 checksum for IPv4 it SHOULD use the GUE header checksum as described 686 in [GUEEXTENS]. 688 When a decapsulator receives a packet, the UDP checksum field MUST 689 be processed. If the UDP checksum is non-zero, the decapsulator MUST 690 verify the checksum before accepting the packet. By default a 691 decapsulator SHOULD accept UDP packets with a zero checksum. A node 692 MAY be configured to disallow zero checksums per [RFC1122]; this may 693 be done selectively, for instance disallowing zero checksums from 694 certain hosts that are known to be sending over paths subject to 695 packet corruption. If verification of a non-zero checksum fails, a 696 decapsulator lacks the capability to verify a non-zero checksum, or 697 a packet with a zero-checksum was received and the decapsulator is 698 configured to disallow, the packet MUST be dropped. 700 5.7.3. UDP Checksum with IPv6 702 For UDP in IPv6, the UDP checksum MUST be processed as specified in 703 [RFC768] and [RFC2460] for both transmit and receive. Unlike IPv4, 704 there is no header checksum in IPv6 that protects against mis- 705 delivery due to address corruption. Therefore, when GUE is used over 706 IPv6, either the UDP checksum must be enabled or the GUE header 707 checksum must be used. An encapsulator MAY set a zero UDP checksum 708 for performance or implementation reasons, in which case the GUE 709 header checksum MUST be used or applicable requirements for using 710 zero UDP checksums in [GREUDP] MUST be met. If the UDP checksum is 711 enabled, then the GUE header checksum should not be used since it is 712 mostly redundant. 714 When a decapsulator receives a packet, the UDP checksum field MUST 715 be processed. If the UDP checksum is non-zero, the decapsulator MUST 716 verify the checksum before accepting the packet. By default a 717 decapsulator MUST only accept UDP packets with a zero checksum if 718 the GUE header checksum is used and is verified. If verification of 719 a non-zero checksum fails, a decapsulator lacks the capability to 720 verify a non-zero checksum, or a packet with a zero-checksum and no 721 GUE header checksum was received, the packet MUST be dropped. 723 5.8. MTU and fragmentation 725 Standard conventions for handling of MTU (Maximum Transmission Unit) 726 and fragmentation in conjunction with networking tunnels 727 (encapsulation of layer 2 or layer 3 packets) should be followed. 728 Details are described in MTU and Fragmentation Issues with In-the- 729 Network Tunneling [RFC4459] 731 If a packet is fragmented before encapsulation in GUE, all the 732 related fragments must be encapsulated using the same UDP source 733 port. An operator may set MTU to account for encapsulation overhead 734 and reduce the likelihood of fragmentation. 736 Alternative to IP fragmentation, the GUE fragmentation extension can 737 be used. GUE fragmentation is described in [GUEEXTENS]. 739 5.9. Congestion control 741 Per requirements of [RFC5405], if the IP traffic encapsulated with 742 GUE implements proper congestion control no additional mechanisms 743 should be required. 745 In the case that the encapsulated traffic does not implement any or 746 sufficient control, or it is not known rather a transmitter will 747 consistently implement proper congestion control, then congestion 748 control at the encapsulation layer must be provided. Note this case 749 applies to a significant use case in network virtualization in which 750 guests run third party networking stacks that cannot be implicitly 751 trusted to implement conformant congestion control. 753 Out of band mechanisms such as rate limiting, Managed Circuit 754 Breaker [CIRCBRK], or traffic isolation may used to provide 755 rudimentary congestion control. For finer grained congestion control 756 that allows alternate congestion control algorithms, reaction time 757 within an RTT, and interaction with ECN, in-band mechanisms may 758 warranted. 760 DCCP [RFC4340] may be used to provide congestion control for 761 encapsulated flows. In this case, the protocol stack for an IP 762 tunnel may be IP-GUE-DCCP-IP. Alternatively, GUE can be extended to 763 include congestion control (related data carried in GUE optional 764 fields). Congestion control mechanisms for GUE will be elaborated in 765 other specifications. 767 5.10. Multicast 769 GUE packets may be multicast to decapsulators using a multicast 770 destination address in the encapsulating IP headers. Each receiving 771 host will decapsulate the packet independently following normal 772 decapsulator operations. The receiving decapsulators should agree on 773 the same set of GUE parameters and properties; how such an agreement 774 is reached is outside the scope of this document. 776 GUE allows encapsulation of unicast, broadcast, or multicast 777 traffic. Flow entropy (the value in the UDP source port) may be 778 generated from the header of encapsulated unicast or 779 broadcast/multicast packets at an encapsulator. The mapping 780 mechanism between the encapsulated multicast traffic and the 781 multicast capability in the IP network is transparent and 782 independent to the encapsulation and is otherwise outside the scope 783 of this document. 785 5.11. Flow entropy for ECMP 787 5.11.1. Flow classification 789 A major objective of using GUE is that a network device can perform 790 flow classification corresponding to the flow of the inner 791 encapsulated packet based on the contents in the outer headers. 793 Hardware devices commonly perform hash computations on packet 794 headers to classify packets into flows or flow buckets. Flow 795 classification is done to support load balancing (statistical 796 multiplexing) of flows across a set of networking resources. 797 Examples of such load balancing techniques are Equal Cost Multipath 798 routing (ECMP), port selection in Link Aggregation, and NIC device 799 Receive Side Scaling (RSS). Hashes are usually either a three-tuple 800 hash of IP protocol, source address, and destination address; or a 801 five-tuple hash consisting of IP protocol, source address, 802 destination address, source port, and destination port. Typically, 803 networking hardware will compute five-tuple hashes for TCP and UDP, 804 but only three-tuple hashes for other IP protocols. Since the five- 805 tuple hash provides more granularity, load balancing can be finer 806 grained with better distribution. When a packet is encapsulated with 807 GUE, the source port in the outer UDP packet is set to a flow 808 entropy value that corresponds the flow of the inner packet. When a 809 device computes a five-tuple hash on the outer UDP/IP header of a 810 GUE packet, the resultant value classifies the packet per its inner 811 flow. 813 Examples of deriving an flow entropy for encapsulation are: 815 o If the encapsulated packet is a layer 4 packet, TCP/IPv4 for 816 instance, the flow entropy could be based on the canonical five- 817 tuple hash of the inner packet. 819 o If the encapsulated packet is an AH transport mode packet with 820 TCP as next header, the flow entropy could be a hash over a 821 three-tuple: TCP protocol and TCP ports of the encapsulated 822 packet. 824 o If a node is encrypting a packet using ESP tunnel mode and GUE 825 encapsulation, the flow entropy could be based on the contents 826 of clear-text packet. For instance, a canonical five-tuple hash 827 for a TCP/IP packet could be used. 829 5.11.2. Flow entropy properties 831 The flow entropy is the value set in the UDP source port of a 832 GUE packet. Flow entropy in the UDP source port should adhere to 833 the following properties: 835 o The value set in the source port should be within the ephemeral 836 port range. IANA suggests this range to be 49152 to 65535, where 837 the high order two bits of the port are set to one. This 838 provides fourteen bits of entropy for the value. 840 o The flow entropy should have a uniform distribution across 841 encapsulated flows. 843 o An encapsulator may occasionally change the flow entropy used 844 for an inner flow per its discretion (for security, route 845 selection, etc). To avoid thrashing or flapping the value, the 846 flow entropy used for a flow should not change more than once 847 every thirty seconds (or a configurable value). 849 o Decapsulators, or any networking devices, should not attempt to 850 interpret flow entropy as anything more than an opaque value. 851 Neither should they attempt to reproduce the hash calculation 852 used by an encapasulator in creating a flow entropy value. They 853 may use the value to match further receive packets for steering 854 decisions, but cannot assume that the hash uniquely or 855 permanently identifies a flow. 857 o Input to the flow entropy calculation is not restricted to ports 858 and addresses; input could include flow label from an IPv6 859 packet, SPI from an ESP packet, or other flow related state in 860 the encapsulator that is not necessarily conveyed in the packet. 862 o The assignment function for flow entropy should be randomly 863 seeded to mitigate denial of service attacks. The seed may be 864 changed periodically. 866 6. Motivation for GUE 867 This section presents the motivation for GUE with respect to other 868 encapsulation methods. 870 A number of different encapsulation techniques have been proposed for 871 the encapsulation of one protocol over another. EtherIP [RFC3378] 872 provides layer 2 tunneling of Ethernet frames over IP. GRE [RFC2784], 873 MPLS [RFC4023], and L2TP [RFC2661] provide methods for tunneling 874 layer 2 and layer 3 packets over IP. NVGRE [RFC7637] and VXLAN 875 [RFC7348] are proposals for encapsulation of layer 2 packets for 876 network virtualization. IPIP [RFC2003] and Generic packet tunneling 877 in IPv6 [RFC2473] provide methods for tunneling IP packets over IP. 879 Several proposals exist for encapsulating packets over UDP including 880 ESP over UDP [RFC3948], TCP directly over UDP [TCPUDP], VXLAN 881 [RFC7348], LISP [RFC6830] which encapsulates layer 3 packets, 882 MPLS/UDP [7510], and Generic UDP Encapsulation for IP Tunneling (GRE 883 over UDP)[GREUDP]. Generic UDP tunneling [GUT] is a proposal similar 884 to GUE in that it aims to tunnel packets of IP protocols over UDP. 886 GUE has the following discriminating features: 888 o UDP encapsulation leverages specialized network device 889 processing for efficient transport. The semantics for using the 890 UDP source port for flow entropy as input to ECMP are defined in 891 section 5.11. 893 o GUE permits encapsulation of arbitrary IP protocols, which 894 includes layer 2 3, and 4 protocols. 896 o Multiple protocols can be multiplexed over a single UDP port 897 number. This is in contrast to techniques to encapsulate 898 protocols over UDP using a protocol specific port number (such 899 as ESP/UDP, GRE/UDP, SCTP/UDP). GUE provides a uniform and 900 extensible mechanism for encapsulating all IP protocols in UDP 901 with minimal overhead (four bytes of additional header). 903 o GUE is extensible. New flags and optional fields can be defined. 905 o The GUE header includes a header length field. This allows a 906 network node to inspect an encapsulated packet without needing 907 to parse the full encapsulation header. 909 o Private data in the encapsulation header allows local 910 customization and experimentation while being compatible with 911 processing in network nodes (routers and middleboxes). 913 o GUE includes both data messages (encapsulation of packets) and 914 control messages (such as OAM). 916 7. Security Considerations 918 Security for GUE can be provided by lower layers or through GUE 919 security extensions. 921 IPsec in transport mode may be used to authenticate or encrypt GUE 922 packets (GUE header and payload). Existing network security 923 mechanisms, such as address spoofing detection, DDOS mitigation, and 924 transparent encrypted tunnels can be applied to GUE packets. 926 Advanced security extensions for Generic UDP Encapsulation, including 927 security for the GUE header and payload, are described in detail in 928 [GUESEC]. 930 Encapsulation of IP protocols within GUE should not increase security 931 risk, nor provide additional security in itself. A hash function for 932 computing flow entropy (section 5.11) should be randomly seeded to 933 mitigate some possible denial service attacks. 935 8. IANA Consideration 937 8.1. UDP source port 939 A user UDP port number assignment for GUE has been assigned: 941 Service Name: gue 942 Transport Protocol(s): UDP 943 Assignee: Tom Herbert 944 Contact: Tom Herbert 945 Description: Generic UDP Encapsulation 946 Reference: draft-herbert-gue 947 Port Number: 6080 948 Service Code: N/A 949 Known Unauthorized Uses: N/A 950 Assignment Notes: N/A 952 8.2. GUE version number 954 IANA is requested to set up a registry for the GUE version number. 955 The GUE version number is 2 bits containing four possible values. 956 This document defines version 0 and 1. New values are assigned via 957 Standards Action [RFC5226]. 959 +----------------+-------------+---------------+ 960 | Version number | Description | Reference | 961 +----------------+-------------+---------------+ 962 | 0 | Version 0 | This document | 963 | | | | 964 | 1 | Version 1 | This document | 965 | | | | 966 | 2..3 | Unassigned | | 967 +----------------+-------------+---------------+ 969 8.3. Control types 971 IANA is requested to set up a registry for the GUE control types. 972 Control types are 8 bit values. New values are assigned via 973 Standards Action [RFC5226]. 975 +----------------+------------------+---------------+ 976 | Control type | Description | Reference | 977 +----------------+------------------+---------------+ 978 | 0 | Need further | This document | 979 | | interpretation | | 980 | | | | 981 | 1..127 | Unassigned | | 982 | | | | 983 | 128..255 | User defined | This document | 984 +----------------+------------------+---------------+ 986 8.4. Flag-fields 988 IANA is requested to create a "GUE flag-fields" registry to allocate 989 flags and optional fields for the GUE header flags and extension 990 fields. This shall be a registry of bit assignments for flags, length 991 of optional fields for corresponding flags, and descriptive strings. 992 There are sixteen bits for primary GUE header flags (bit number 0- 993 15). New values are assigned via Standards Action [RFC5226]. 995 +-------------+--------------+-------------+--------------------+ 996 | Flags bits | Field size | Description | Reference | 997 +-------------+--------------+-------------+--------------------+ 998 | Bit 0 | 4 bytes | VNID | [GUE4NVO3] | 999 | | | | | 1000 | Bit 1..2 | 01->8 bytes | Security | [GUEEXTENS] | 1001 | | 10->16 bytes | | | 1002 | | 11->32 bytes | | | 1003 | | | | | 1004 | Bit 3 | 4 bytes | Checksum | [GUEEXTENS] | 1005 | | | | | 1006 | Bit 4 | 8 bytes | Fragmen- | [GUEEXTENS] | 1007 | | | tation | | 1008 | | | | | 1009 | Bit 5 | 4 bytes | Payload | [GUEEXTENS] | 1010 | | | transform | | 1011 | | | | | 1012 | Bit 6 | 4 bytes | Remote | [GUEEXTENS] | 1013 | | | checksum | | 1014 | | | offload | | 1015 | | | | | 1016 | Bit 7..15 | | Unassigned | | 1017 +-------------+--------------+-------------+--------------------+ 1019 New flags are to be allocated from high to low order bit contiguously 1020 without holes. 1022 9. Acknowledgements 1024 The authors would like to thank David Liu, Erik Nordmark, Fred 1025 Templin, and Adrian Farrel for valuable input on this draft. 1027 10. References 1029 10.1. Normative References 1031 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1032 10.17487/RFC0768, August 1980, . 1035 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1036 IANA Considerations Section in RFCs", RFC 2434, DOI 1037 10.17487/RFC2434, October 1998, . 1040 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1041 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1042 . 1044 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1045 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1046 2010, . 1048 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1049 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1050 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1051 . 1053 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1054 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1055 2006, . 1057 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 1058 10.17487/RFC0768, August 1980, . 1061 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1062 IANA Considerations Section in RFCs", RFC 2434, DOI 1063 10.17487/RFC2434, October 1998, . 1066 [RFC2983] Black, D., "Differentiated Services and Tunnels", RFC 1067 2983, DOI 10.17487/RFC2983, October 2000, . 1070 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1071 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1072 2010, . 1074 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1075 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1076 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1077 . 1079 [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- 1080 Network Tunneling", RFC 4459, DOI 10.17487/RFC4459, April 1081 2006, . 1083 10.2. Informative References 1085 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, DOI 1086 10.17487/RFC2003, October 1996, . 1089 [RFC3948] Huttunen, A., Swander, B., Volpe, V., DiBurro, L., and M. 1090 Stenberg, "UDP Encapsulation of IPsec ESP Packets", 1091 RFC 3948, DOI 10.17487/RFC3948, January 2005, 1092 . 1094 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 1095 Locator/ID Separation Protocol (LISP)", RFC 6830, DOI 1096 10.17487/RFC6830, January 2013, . 1099 [RFC3378] Housley, R. and S. Hollenbeck, "EtherIP: Tunneling 1100 Ethernet Frames in IP Datagrams", RFC 3378, DOI 1101 10.17487/RFC3378, September 2002, . 1104 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1105 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1106 DOI 10.17487/RFC2784, March 2000, . 1109 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1110 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1111 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1112 . 1114 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 1115 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 1116 RFC 2661, DOI 10.17487/RFC2661, August 1999, 1117 . 1119 [RFC5925] Touch, J., Mankin, A., and R. Bonica, "The TCP 1120 Authentication Option", RFC 5925, DOI 10.17487/RFC5925, 1121 June 2010, . 1123 [RFC3828] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., Ed., 1124 and G. Fairhurst, Ed., "The Lightweight User Datagram 1125 Protocol (UDP-Lite)", RFC 3828, July 2004, 1126 . 1128 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1129 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1130 eXtensible Local Area Network (VXLAN): A Framework for 1131 Overlaying Virtualized Layer 2 Networks over Layer 3 1132 Networks", RFC 7348, August 2014, . 1135 [RFC7605] Touch, J., "Recommendations on Using Assigned Transport 1136 Port Numbers", BCP 165, RFC 7605, DOI 10.17487/RFC7605, 1137 August 2015, . 1139 [RFC7637] Garg, P., Ed., and Y. Wang, Ed., "NVGRE: Network 1140 Virtualization Using Generic Routing Encapsulation", RFC 1141 7637, DOI 10.17487/RFC7637, September 2015, 1142 . 1144 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1145 "Encapsulating MPLS in UDP", RFC 7510, DOI 1146 10.17487/RFC7510, April 2015, . 1149 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 1150 Congestion Control Protocol (DCCP)", RFC 4340, DOI 1151 10.17487/RFC4340, March 2006, . 1154 [GUEEXTENS] Herbert, T., Yong, L., and Templin, F., "Extensions for 1155 Generic UDP Encapsulation" draft-herbert-gue-extensions-00 1157 [GUE4NVO3] Yong, L., Herbert, T., Zia, O., "Generic UDP 1158 Encapsulation (GUE) for Network Virtualization Overlay" 1159 draft-hy-nvo3-gue-4-nvo-03 1161 [TCPUDP] Chesire, S., Graessley, J., and McGuire, R., 1162 "Encapsulation of TCP and other Transport Protocols over 1163 UDP" draft-cheshire-tcp-over-udp-00 1165 [GREUDP] Crabbe, E., Yong, L., Xu, X., and Herbert, T.,Cpsest 1166 "Generic UDP Encapsulation for IP Tunneling" draft-ietf- 1167 tsvwg-gre-in-udp-encap-13 1169 [GUT] Manner, J., Varia, N., and Briscoe, B., "Generic UDP 1170 Tunnelling (GUT) draft-manner-tsvwg-gut-02.txt" 1172 [CIRCBRK] Fairhurst, G., "Network Transport Circuit 1173 Breakers", draft-ietf-tsvwg-circuit-breaker-15 1175 Appendix A: NIC processing for GUE 1177 This appendix provides some guidelines for Network Interface Cards 1178 (NICs) to implement common offloads and accelerations to support GUE. 1179 Note that most of this discussion is generally applicable to other 1180 methods of UDP based encapsulation. 1182 This appendix is informational and does not constitute a normative 1183 part of this document. 1185 A.1. Receive multi-queue 1187 Contemporary NICs support multiple receive descriptor queues (multi- 1188 queue). Multi-queue enables load balancing of network processing for 1189 a NIC across multiple CPUs. On packet reception, a NIC must select 1190 the appropriate queue for host processing. Receive Side Scaling is a 1191 common method which uses the flow hash for a packet to index an 1192 indirection table where each entry stores a queue number. Flow 1193 Director and Accelerated Receive Flow Steering (aRFS) allow a host to 1194 program the queue that is used for a given flow which is identified 1195 either by an explicit five-tuple or by the flow's hash. 1197 GUE encapsulation should be compatible with multi-queue NICs that 1198 support five-tuple hash calculation for UDP/IP packets as input to 1199 RSS. The flow entropy in the UDP source port ensures classification 1200 of the encapsulated flow even in the case that the outer source and 1201 destination addresses are the same for all flows (e.g. all flows are 1202 going over a single tunnel). 1204 By default, UDP RSS support is often disabled in NICs to avoid out of 1205 order reception that can occur when UDP packets are fragmented. As 1206 discussed above, fragmentation of GUE packets should be mitigated by 1207 fragmenting packets before entering a tunnel, path MTU discovery in 1208 higher layer protocols, or operator adjusting MTUs. Other UDP traffic 1209 may not implement such procedures to avoid fragmentation, so enabling 1210 UDP RSS support in the NIC should be a considered tradeoff during 1211 configuration. 1213 A.2. Checksum offload 1215 Many NICs provide capabilities to calculate standard ones complement 1216 payload checksum for packets in transmit or receive. When using GUE 1217 encapsulation there are at least two checksums that may be of 1218 interest: the encapsulated packet's transport checksum, and the UDP 1219 checksum in the outer header. 1221 A.2.1. Transmit checksum offload 1223 NICs may provide a protocol agnostic method to offload transmit 1224 checksum (NETIF_F_HW_CSUM in Linux parlance) that can be used with 1225 GUE. In this method the host provides checksum related parameters in 1226 a transmit descriptor for a packet. These parameters include the 1227 starting offset of data to checksum, the length of data to checksum, 1228 and the offset in the packet where the computed checksum is to be 1229 written. The host initializes the checksum field to pseudo header 1230 checksum. 1232 In the case of GUE, the checksum for an encapsulated transport layer 1233 packet, a TCP packet for instance, can be offloaded by setting the 1234 appropriate checksum parameters. 1236 NICs typically can offload only one transmit checksum per packet, so 1237 simultaneously offloading both an inner transport packet's checksum 1238 and the outer UDP checksum is likely not possible. In this case 1239 setting UDP checksum to zero (per above discussion) and offloading 1240 the inner transport packet checksum might be acceptable. 1242 If an encapsulator is co-resident with a host, then checksum offload 1243 may be performed using remote checksum offload (described in 1244 [GUEEXTENS]). Remote checksum offload relies on NIC offload of the 1245 simple UDP/IP checksum which is commonly supported even in legacy 1246 devices. In remote checksum offload the outer UDP checksum is set and 1247 the GUE header includes an option indicating the start and offset of 1248 the inner "offloaded" checksum. The inner checksum is initialized to 1249 the pseudo header checksum. When a decapsulator receives a GUE packet 1250 with the remote checksum offload option, it completes the offload 1251 operation by determining the packet checksum from the indicated start 1252 point to the end of the packet, and then adds this into the checksum 1253 field at the offset given in the option. Computing the checksum from 1254 the start to end of packet is efficient if checksum-complete is 1255 provided on the receiver. 1257 A.2.2. Receive checksum offload 1259 GUE is compatible with NICs that perform a protocol agnostic receive 1260 checksum (CHECKSUM_COMPLETE in Linux parlance). In this technique, a 1261 NIC computes a ones complement checksum over all (or some predefined 1262 portion) of a packet. The computed value is provided to the host 1263 stack in the packet's receive descriptor. The host driver can use 1264 this checksum to "patch up" and validate any inner packet transport 1265 checksum, as well as the outer UDP checksum if it is non-zero. 1267 Many legacy NICs don't provide checksum-complete but instead provide 1268 an indication that a checksum has been verified (CHECKSUM_UNNECESSARY 1269 in Linux). Usually, such validation is only done for simple TCP/IP or 1270 UDP/IP packets. If a NIC indicates that a UDP checksum is valid, the 1271 checksum-complete value for the UDP packet is the "not" of the pseudo 1272 header checksum. In this way, checksum-unnecessary can be converted 1273 to checksum-complete. So if the NIC provides checksum-unnecessary for 1274 the outer UDP header in an encapsulation, checksum conversion can be 1275 done so that the checksum-complete value is derived and can be used 1276 by the stack to validate an checksums in the encapsulated packet. 1278 A.3. Transmit Segmentation Offload 1280 Transmit Segmentation Offload (TSO) is a NIC feature where a host 1281 provides a large (>MTU size) TCP packet to the NIC, which in turn 1282 splits the packet into separate segments and transmits each one. This 1283 is useful to reduce CPU load on the host. 1285 The process of TSO can be generalized as: 1287 - Split the TCP payload into segments which allow packets with 1288 size less than or equal to MTU. 1290 - For each created segment: 1292 1. Replicate the TCP header and all preceding headers of the 1293 original packet. 1295 2. Set payload length fields in any headers to reflect the 1296 length of the segment. 1298 3. Set TCP sequence number to correctly reflect the offset of 1299 the TCP data in the stream. 1301 4. Recompute and set any checksums that either cover the payload 1302 of the packet or cover header which was changed by setting a 1303 payload length. 1305 Following this general process, TSO can be extended to support TCP 1306 encapsulation in GUE. For each segment the Ethernet, outer IP, UDP 1307 header, GUE header, inner IP header if tunneling, and TCP headers are 1308 replicated. Any packet length header fields need to be set properly 1309 (including the length in the outer UDP header), and checksums need to 1310 be set correctly (including the outer UDP checksum if being used). 1312 To facilitate TSO with GUE it is recommended that optional fields 1313 should not contain values that must be updated on a per segment 1314 basis-- for example the GUE fields should not include checksums, 1315 lengths, or sequence numbers that refer to the payload. If the GUE 1316 header does not contain such fields then the TSO engine only needs to 1317 copy the bits in the GUE header when creating each segment and does 1318 not need to parse the GUE header. 1320 A.4. Large Receive Offload 1322 Large Receive Offload (LRO) is a NIC feature where packets of a TCP 1323 connection are reassembled, or coalesced, in the NIC and delivered to 1324 the host as one large packet. This feature can reduce CPU utilization 1325 in the host. 1327 LRO requires significant protocol awareness to be implemented 1328 correctly and is difficult to generalize. Packets in the same flow 1329 need to be unambiguously identified. In the presence of tunnels or 1330 network virtualization, this may require more than a five-tuple match 1331 (for instance packets for flows in two different virtual networks may 1332 have identical five-tuples). Additionally, a NIC needs to perform 1333 validation over packets that are being coalesced, and needs to 1334 fabricate a single meaningful header from all the coalesced packets. 1336 The conservative approach to supporting LRO for GUE would be to 1337 assign packets to the same flow only if they have identical five- 1338 tuple and were encapsulated the same way. That is the outer IP 1339 addresses, the outer UDP ports, GUE protocol, GUE flags and fields, 1340 and inner five tuple are all identical. 1342 Appendix B: Implementation considerations 1344 This appendix is informational and does not constitute a normative 1345 part of this document. 1347 B.1. Priveleged ports 1349 Using the source port to contain a flow entropy value disallows the 1350 security method of a receiver enforcing that the source port be a 1351 privileged port. Privileged ports are defined by some operating 1352 systems to restrict source port binding. Unix, for instance, 1353 considered port number less than 1024 to be privileged. 1355 Enforcing that packets are sent from a privileged port is widely 1356 considered an inadequate security mechanism and has been mostly 1357 deprecated. To approximate this behavior, an implementation could 1358 restrict a user from sending a packet destined to the GUE port 1359 without proper credentials. 1361 B.2. Setting flow entropy as a route selector 1363 An encapsulator generating flow entropy in the UDP source port may 1364 modulate the value to perform a type of multipath source routing. 1365 Assuming that networking switches perform ECMP based on the flow 1366 hash, a sender can affect the path by altering the flow entropy. For 1367 instance, a host may store a flow hash in its PCB for an inner flow, 1368 and may alter the value upon detecting that packets are traversing a 1369 lossy path. Changing the flow entropy for a flow should be subject to 1370 hysteresis (at most once every thirty seconds) to limit the number of 1371 out of order packets. 1373 B.3. Hardware protocol implementation considerations 1375 A low level protocol, such is GUE, is likely interesting to being 1376 supported by high speed network devices. Variable length header (VLH) 1377 protocols like GUE are often considered difficult to efficiently 1378 implement in hardware. In order to retain the important 1379 characteristics of an extensible and robust protocol, hardware 1380 vendors may practice "constrained flexibility". In this model, only 1381 certain combinations or protocol header parameterizations are 1382 implemented in hardware fast path. Each such parameterization is 1383 fixed length so that the particular instance can be optimized as a 1384 fixed length protocol. In the case of GUE this constitutes specific 1385 combinations of GUE flags, fields, and next protocol. The selected 1386 combinations would naturally be the most common cases which form the 1387 "fast path", and other combinations are assumed to take the "slow 1388 path". 1390 In time, needs and requirements of the protocol may change which may 1391 manifest themselves as new parameterizations to be supported in the 1392 fast path. To allow allow this extensibility, a device practicing 1393 constrained flexibility should allow the fast path parameterizations 1394 to be programmable. 1396 Authors' Addresses 1398 Tom Herbert 1399 Facebook 1400 1 Hacker Way 1401 Menlo Park, CA 94052 1402 US 1404 Email: tom@herbertland.com 1406 Lucy Yong 1407 Huawei USA 1408 5340 Legacy Dr. 1409 Plano, TX 75024 1410 US 1412 Email: lucy.yong@huawei.com 1414 Osama Zia 1415 Microsoft 1416 1 Microsoft Way 1417 Redmond, WA 98029 1418 US 1420 Email: osamaz@microsoft.com