idnits 2.17.1 draft-ietf-nvo3-geneve-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 14, 2016) is 3015 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-08) exists of draft-davie-stt-06 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Gross, Ed. 3 Internet-Draft VMware 4 Intended status: Standards Track I. Ganga, Ed. 5 Expires: July 17, 2016 Intel 6 January 14, 2016 8 Geneve: Generic Network Virtualization Encapsulation 9 draft-ietf-nvo3-geneve-01 11 Abstract 13 Network virtualization involves the cooperation of devices with a 14 wide variety of capabilities such as software and hardware tunnel 15 endpoints, transit fabrics, and centralized control clusters. As a 16 result of their role in tying together different elements in the 17 system, the requirements on tunnels are influenced by all of these 18 components. Flexibility is therefore the most important aspect of a 19 tunnel protocol if it is to keep pace with the evolution of the 20 system. This draft describes Geneve, a protocol designed to 21 recognize and accommodate these changing capabilities and needs. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on July 17, 2016. 40 Copyright Notice 42 Copyright (c) 2016 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 59 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 60 2. Design Requirements . . . . . . . . . . . . . . . . . . . . . 5 61 2.1. Control Plane Independence . . . . . . . . . . . . . . . 6 62 2.2. Data Plane Extensibility . . . . . . . . . . . . . . . . 7 63 2.2.1. Efficient Implementation . . . . . . . . . . . . . . 7 64 2.3. Use of Standard IP Fabrics . . . . . . . . . . . . . . . 8 65 3. Geneve Encapsulation Details . . . . . . . . . . . . . . . . 9 66 3.1. Geneve Frame Format Over IPv4 . . . . . . . . . . . . . . 9 67 3.2. Geneve Frame Format Over IPv6 . . . . . . . . . . . . . . 10 68 3.3. UDP Header . . . . . . . . . . . . . . . . . . . . . . . 12 69 3.4. Tunnel Header Fields . . . . . . . . . . . . . . . . . . 13 70 3.5. Tunnel Options . . . . . . . . . . . . . . . . . . . . . 14 71 3.5.1. Options Processing . . . . . . . . . . . . . . . . . 16 72 4. Implementation and Deployment Considerations . . . . . . . . 17 73 4.1. Encapsulation of Geneve in IP . . . . . . . . . . . . . . 17 74 4.1.1. IP Fragmentation . . . . . . . . . . . . . . . . . . 17 75 4.1.2. DSCP and ECN . . . . . . . . . . . . . . . . . . . . 17 76 4.1.3. Broadcast and Multicast . . . . . . . . . . . . . . . 18 77 4.1.4. Unidirectional Tunnels . . . . . . . . . . . . . . . 18 78 4.2. Constraints on Protocol Features . . . . . . . . . . . . 19 79 4.3. NIC Offloads . . . . . . . . . . . . . . . . . . . . . . 19 80 4.4. Inner VLAN Handling . . . . . . . . . . . . . . . . . . . 20 81 5. Interoperability Issues . . . . . . . . . . . . . . . . . . . 20 82 6. Security Considerations . . . . . . . . . . . . . . . . . . . 21 83 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 84 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 22 85 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23 86 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 87 10.1. Normative References . . . . . . . . . . . . . . . . . . 24 88 10.2. Informative References . . . . . . . . . . . . . . . . . 24 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 91 1. Introduction 93 Networking has long featured a variety of tunneling, tagging, and 94 other encapsulation mechanisms. However, the advent of network 95 virtualization has caused a surge of renewed interest and a 96 corresponding increase in the introduction of new protocols. The 97 large number of protocols in this space, ranging all the way from 98 VLANs [IEEE.802.1Q-2014] and MPLS [RFC3031] through the more recent 99 VXLAN [RFC7348], NVGRE [RFC7637], and STT [I-D.davie-stt], often 100 leads to questions about the need for new encapsulation formats and 101 what it is about network virtualization in particular that leads to 102 their proliferation. 104 While many encapsulation protocols seek to simply partition the 105 underlay network or bridge between two domains, network 106 virtualization views the transit network as providing connectivity 107 between multiple components of a distributed system. In many ways 108 this system is similar to a chassis switch with the IP underlay 109 network playing the role of the backplane and tunnel endpoints on the 110 edge as line cards. When viewed in this light, the requirements 111 placed on the tunnel protocol are significantly different in terms of 112 the quantity of metadata necessary and the role of transit nodes. 114 Current work such as [VL2] and the NVO3 working group 115 [I-D.ietf-nvo3-dataplane-requirements] have described some of the 116 properties that the data plane must have to support network 117 virtualization. However, one additional defining requirement is the 118 need to carry system state along with the packet data. The use of 119 some metadata is certainly not a foreign concept - nearly all 120 protocols used for virtualization have at least 24 bits of identifier 121 space as a way to partition between tenants. This is often described 122 as overcoming the limits of 12-bit VLANs, and when seen in that 123 context, or any context where it is a true tenant identifier, 16 124 million possible entries is a large number. However, the reality is 125 that the metadata is not exclusively used to identify tenants and 126 encoding other information quickly starts to crowd the space. In 127 fact, when compared to the tags used to exchange metadata between 128 line cards on a chassis switch, 24-bit identifiers start to look 129 quite small. There are nearly endless uses for this metadata, 130 ranging from storing input ports for simple security policies to 131 service based context for interposing advanced middleboxes. 133 Existing tunnel protocols have each attempted to solve different 134 aspects of these new requirements, only to be quickly rendered out of 135 date by changing control plane implementations and advancements. 136 Furthermore, software and hardware components and controllers all 137 have different advantages and rates of evolution - a fact that should 138 be viewed as a benefit, not a liability or limitation. This draft 139 describes Geneve, a protocol which seeks to avoid these problems by 140 providing a framework for tunneling for network virtualization rather 141 than being prescriptive about the entire system. 143 1.1. Requirements Language 145 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 147 document are to be interpreted as described in [RFC2119]. 149 In this document, these words will appear with that interpretation 150 only when in ALL CAPS. Lower case uses of these words are not to be 151 interpreted as carrying RFC-2119 significance. 153 1.2. Terminology 155 The NVO3 framework [RFC7365] defines many of the concepts commonly 156 used in network virtualization. In addition, the following terms are 157 specifically meaningful in this document: 159 Checksum offload. An optimization implemented by many NICs which 160 enables computation and verification of upper layer protocol 161 checksums in hardware on transmit and receive, respectively. This 162 typically includes IP and TCP/UDP checksums which would otherwise be 163 computed by the protocol stack in software. 165 Clos network. A technique for composing network fabrics larger than 166 a single switch while maintaining non-blocking bandwidth across 167 connection points. ECMP is used to divide traffic across the 168 multiple links and switches that constitute the fabric. Sometimes 169 termed "leaf and spine" or "fat tree" topologies. 171 ECMP. Equal Cost Multipath. A routing mechanism for selecting from 172 among multiple best next hop paths by hashing packet headers in order 173 to better utilize network bandwidth while avoiding reordering a 174 single stream. 176 Geneve. Generic Network Virtualization Encapsulation. The tunnel 177 protocol described in this draft. 179 LRO. Large Receive Offload. The receive-side equivalent function of 180 LSO, in which multiple protocol segments (primarily TCP) are 181 coalesced into larger data units. 183 NIC. Network Interface Card. A NIC could be part of a tunnel 184 endpoint or transit device and can either process Geneve packets or 185 aid in the processing of Geneve packets. 187 OAM. Operations, Administration, and Management. A suite of tools 188 used to monitor and troubleshoot network problems. 190 Transit device. A forwarding element along the path of the tunnel 191 making up part of the Underlay Network. A transit device MAY be 192 capable of understanding the Geneve frame format but does not 193 originate or terminate Geneve packets. 195 LSO. Large Segmentation Offload. A function provided by many 196 commercial NICs that allows data units larger than the MTU to be 197 passed to the NIC to improve performance, the NIC being responsible 198 for creating smaller segments of size less than or equal to the MTU 199 with correct protocol headers. When referring specifically to TCP/ 200 IP, this feature is often known as TSO (TCP Segmentation Offload). 202 Tunnel endpoint. A component performing encapsulation and 203 decapsulation of packets, such as Ethernet frames or IP datagrams, in 204 Geneve headers. As the ultimate consumer of any tunnel metadata, 205 endpoints have the highest level of requirements for parsing and 206 interpreting tunnel headers. Tunnel endpoints may consist of either 207 software or hardware implementations or a combination of the two. 208 Endpoints are frequently a component of an NVE but may also be found 209 in middleboxes or other elements making up an NVO3 Network. 211 VM. Virtual Machine. 213 2. Design Requirements 215 Geneve is designed to support network virtualization use cases, where 216 tunnels are typically established to act as a backplane between the 217 virtual switches residing in hypervisors, physical switches, or 218 middleboxes or other appliances. An arbitrary IP network can be used 219 as an underlay although Clos networks composed using ECMP links are a 220 common choice to provide consistent bisectional bandwidth across all 221 connection points. Figure 1 shows an example of a hypervisor, top of 222 rack switch for connectivity to physical servers, and a WAN uplink 223 connected using Geneve tunnels over a simplified Clos network. These 224 tunnels are used to encapsulate and forward frames from the attached 225 components such as VMs or physical links. 227 +---------------------+ +-------+ +------+ 228 | +--+ +-------+---+ | |Transit|--|Top of|==Physical 229 | |VM|--| | | | +------+ /|Router | | Rack |==Servers 230 | +--+ |Virtual|NIC|---|Top of|/ +-------+\/+------+ 231 | +--+ |Switch | | | | Rack |\ +-------+/\+------+ 232 | |VM|--| | | | +------+ \|Transit| |Uplink| WAN 233 | +--+ +-------+---+ | |Router |--| |=========> 234 +---------------------+ +-------+ +------+ 235 Hypervisor 237 ()===================================() 238 Switch-Switch Geneve Tunnels 240 Figure 1: Sample Geneve Deployment 242 To support the needs of network virtualization, the tunnel protocol 243 should be able to take advantage of the differing (and evolving) 244 capabilities of each type of device in both the underlay and overlay 245 networks. This results in the following requirements being placed on 246 the data plane tunneling protocol: 248 o The data plane is generic and extensible enough to support current 249 and future control planes. 251 o Tunnel components are efficiently implementable in both hardware 252 and software without restricting capabilities to the lowest common 253 denominator. 255 o High performance over existing IP fabrics. 257 These requirements are described further in the following 258 subsections. 260 2.1. Control Plane Independence 262 Although some protocols for network virtualization have included a 263 control plane as part of the tunnel format specification (most 264 notably, the original VXLAN spec prescribed a multicast learning- 265 based control plane), these specifications have largely been treated 266 as describing only the data format. The VXLAN frame format has 267 actually seen a wide variety of control planes built on top of it. 269 There is a clear advantage in settling on a data format: most of the 270 protocols are only superficially different and there is little 271 advantage in duplicating effort. However, the same cannot be said of 272 control planes, which are diverse in very fundamental ways. The case 273 for standardization is also less clear given the wide variety in 274 requirements, goals, and deployment scenarios. 276 As a result of this reality, Geneve aims to be a pure tunnel format 277 specification that is capable of fulfilling the needs of many control 278 planes by explicitly not selecting any one of them. This 279 simultaneously promotes a shared data format and increases the 280 chances that it will not be obsoleted by future control plane 281 enhancements. 283 2.2. Data Plane Extensibility 285 Achieving the level of flexibility needed to support current and 286 future control planes effectively requires an options infrastructure 287 to allow new metadata types to be defined, deployed, and either 288 finalized or retired. Options also allow for differentiation of 289 products by encouraging independent development in each vendor's core 290 specialty, leading to an overall faster pace of advancement. By far 291 the most common mechanism for implementing options is Type-Length- 292 Value (TLV) format. 294 It should be noted that while options can be used to support non- 295 wirespeed control frames, they are equally important on data frames 296 as well to segregate and direct forwarding (for instance, the 297 examples given before of input port based security policies and 298 service interposition both require tags to be placed on data 299 packets). Therefore, while it would be desirable to limit the 300 extensibility to only control frames for the purposes of simplifying 301 the datapath, that would not satisfy the design requirements. 303 2.2.1. Efficient Implementation 305 There is often a conflict between software flexibility and hardware 306 performance that is difficult to resolve. For a given set of 307 functionality, it is obviously desirable to maximize performance. 308 However, that does not mean new features that cannot be run at that 309 speed today should be disallowed. Therefore, for a protocol to be 310 efficiently implementable means that a set of common capabilities can 311 be reasonably handled across platforms along with a graceful 312 mechanism to handle more advanced features in the appropriate 313 situations. 315 The use of a variable length header and options in a protocol often 316 raises questions about whether it is truly efficiently implementable 317 in hardware. To answer this question in the context of Geneve, it is 318 important to first divide "hardware" into two categories: tunnel 319 endpoints and transit devices. 321 Endpoints must be able to parse the variable header, including any 322 options, and take action. Since these devices are actively 323 participating in the protocol, they are the most affected by Geneve. 325 However, as endpoints are the ultimate consumers of the data, 326 transmitters can tailor their output to the capabilities of the 327 recipient. As new functionality becomes sufficiently well defined to 328 add to endpoints, supporting options can be designed using ordering 329 restrictions and other techniques to ease parsing. 331 Transit devices MAY be able to interpret the options and participate 332 in Geneve packet processing. However, as non-terminating devices, 333 they do not originate or terminate the Geneve packet. The 334 participation of transit devices in Geneve packet processing is 335 OPTIONAL. 337 Further, either tunnel endpoints or transit devices MAY use offload 338 capabilities of NICs such as checksum offload to improve the 339 performance of Geneve packet processing. The presence of a Geneve 340 variable length header SHOULD NOT prevent the tunnel endpoints and 341 transit devices from using such offload capabilities. 343 2.3. Use of Standard IP Fabrics 345 IP has clearly cemented its place as the dominant transport mechanism 346 and many techniques have evolved over time to make it robust, 347 efficient, and inexpensive. As a result, it is natural to use IP 348 fabrics as a transit network for Geneve. Fortunately, the use of IP 349 encapsulation and addressing is enough to achieve the primary goal of 350 delivering packets to the correct point in the network through 351 standard switching and routing. 353 In addition, nearly all underlay fabrics are designed to exploit 354 parallelism in traffic to spread load across multiple links without 355 introducing reordering in individual flows. These equal cost 356 multipathing (ECMP) techniques typically involve parsing and hashing 357 the addresses and port numbers from the packet to select an outgoing 358 link. However, the use of tunnels often results in poor ECMP 359 performance without additional knowledge of the protocol as the 360 encapsulated traffic is hidden from the fabric by design and only 361 endpoint addresses are available for hashing. 363 Since it is desirable for Geneve to perform well on these existing 364 fabrics, it is necessary for entropy from encapsulated packets to be 365 exposed in the tunnel header. The most common technique for this is 366 to use the UDP source port, which is discussed further in 367 Section 3.3. 369 3. Geneve Encapsulation Details 371 The Geneve frame format consists of a compact tunnel header 372 encapsulated in UDP over either IPv4 or IPv6. A small fixed tunnel 373 header provides control information plus a base level of 374 functionality and interoperability with a focus on simplicity. This 375 header is then followed by a set of variable options to allow for 376 future innovation. Finally, the payload consists of a protocol data 377 unit of the indicated type, such as an Ethernet frame. Section 3.1 378 and Section 3.2 illustrate the Geneve frame format transported (for 379 example) over Ethernet along with an Ethernet payload. 381 3.1. Geneve Frame Format Over IPv4 383 0 1 2 3 384 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 386 Outer Ethernet Header: 387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 388 | Outer Destination MAC Address | 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 | Outer Destination MAC Address | Outer Source MAC Address | 391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 | Outer Source MAC Address | 393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 394 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 396 | Ethertype=0x0800 | 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 399 Outer IPv4 Header: 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 |Version| IHL |Type of Service| Total Length | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | Identification |Flags| Fragment Offset | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | Time to Live |Protocol=17 UDP| Header Checksum | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | Outer Source IPv4 Address | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | Outer Destination IPv4 Address | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 412 Outer UDP Header: 413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 | Source Port = xxxx | Dest Port = 6081 | 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 | UDP Length | UDP Checksum | 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 Geneve Header: 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 | Virtual Network Identifier (VNI) | Reserved | 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | Variable Length Options | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 Inner Ethernet Header (example payload): 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | Inner Destination MAC Address | 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | Inner Destination MAC Address | Inner Source MAC Address | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | Inner Source MAC Address | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 Payload: 440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 441 | Ethertype of Original Payload | | 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 443 | Original Ethernet Payload | 444 | | 445 | (Note that the original Ethernet Frame's FCS is not included) | 446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 Frame Check Sequence: 449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 | New FCS (Frame Check Sequence) for Outer Ethernet Frame | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 3.2. Geneve Frame Format Over IPv6 454 0 1 2 3 455 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 457 Outer Ethernet Header: 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 459 | Outer Destination MAC Address | 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 | Outer Destination MAC Address | Outer Source MAC Address | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | Outer Source MAC Address | 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 | Ethertype=0x86DD | 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 Outer IPv6 Header: 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 |Version| Traffic Class | Flow Label | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Payload Length | NxtHdr=17 UDP | Hop Limit | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | | 477 + + 478 | | 479 + Outer Source IPv6 Address + 480 | | 481 + + 482 | | 483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 484 | | 485 + + 486 | | 487 + Outer Destination IPv6 Address + 488 | | 489 + + 490 | | 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 493 Outer UDP Header: 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | Source Port = xxxx | Dest Port = 6081 | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | UDP Length | UDP Checksum | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 Geneve Header: 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 | Virtual Network Identifier (VNI) | Reserved | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 506 | Variable Length Options | 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 509 Inner Ethernet Header (example payload): 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | Inner Destination MAC Address | 512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 513 | Inner Destination MAC Address | Inner Source MAC Address | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 | Inner Source MAC Address | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 Payload: 521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 522 | Ethertype of Original Payload | | 523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 524 | Original Ethernet Payload | 525 | | 526 | (Note that the original Ethernet Frame's FCS is not included) | 527 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 529 Frame Check Sequence: 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | New FCS (Frame Check Sequence) for Outer Ethernet Frame | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 534 3.3. UDP Header 536 The use of an encapsulating UDP [RFC0768] header follows the 537 connectionless semantics of Ethernet and IP in addition to providing 538 entropy to routers performing ECMP. The header fields are therefore 539 interpreted as follows: 541 Source port: A source port selected by the originating tunnel 542 endpoint. This source port SHOULD be the same for all packets 543 belonging to a single encapsulated flow to prevent reordering due 544 to the use of different paths. To encourage an even distribution 545 of flows across multiple links, the source port SHOULD be 546 calculated using a hash of the encapsulated packet headers using, 547 for example, a traditional 5-tuple. Since the port represents a 548 flow identifier rather than a true UDP connection, the entire 549 16-bit range MAY be used to maximize entropy. 551 Dest port: IANA has assigned port 6081 as the fixed well-known 552 destination port for Geneve. Although the well-known value should 553 be used by default, it is RECOMMENDED that implementations make 554 this configurable. The chosen port is used for identification of 555 Geneve packets and MUST NOT be reversed for different ends of a 556 connection as is done with TCP. 558 UDP length: The length of the UDP packet including the UDP header. 560 UDP checksum: The checksum MAY be set to zero on transmit for 561 packets encapsulated in both IPv4 and IPv6 [RFC6935]. When a 562 packet is received with a UDP checksum of zero it MUST be accepted 563 and decapsulated. If the originating tunnel endpoint optionally 564 encapsulates a packet with a non-zero checksum, it MUST be a 565 correctly computed UDP checksum. Upon receiving such a packet, 566 the egress endpoint MUST validate the checksum. If the checksum 567 is not correct, the packet MUST be dropped, otherwise the packet 568 MUST be accepted for decapsulation. It is RECOMMENDED that the 569 UDP checksum be computed to protect the Geneve header and options 570 in situations where the network reliability is not high and the 571 packet is not protected by another checksum or CRC. 573 3.4. Tunnel Header Fields 575 Ver (2 bits): The current version number is 0. Packets received by 576 an endpoint with an unknown version MUST be dropped. Non- 577 terminating devices processing Geneve packets with an unknown 578 version number MUST treat them as UDP packets with an unknown 579 payload. 581 Opt Len (6 bits): The length of the options fields, expressed in 582 four byte multiples, not including the eight byte fixed tunnel 583 header. This results in a minimum total Geneve header size of 8 584 bytes and a maximum of 260 bytes. The start of the payload 585 headers can be found using this offset from the end of the base 586 Geneve header. 588 O (1 bit): OAM frame. This packet contains a control message 589 instead of a data payload. Endpoints MUST NOT forward the payload 590 and transit devices MUST NOT attempt to interpret or process it. 591 Since these are infrequent control messages, it is RECOMMENDED 592 that endpoints direct these packets to a high priority control 593 queue (for example, to direct the packet to a general purpose CPU 594 from a forwarding ASIC or to separate out control traffic on a 595 NIC). Transit devices MUST NOT alter forwarding behavior on the 596 basis of this bit, such as ECMP link selection. 598 C (1 bit): Critical options present. One or more options has the 599 critical bit set (see Section 3.5). If this bit is set then 600 tunnel endpoints MUST parse the options list to interpret any 601 critical options. On endpoints where option parsing is not 602 supported the frame MUST be dropped on the basis of the 'C' bit in 603 the base header. If the bit is not set tunnel endpoints MAY strip 604 all options using 'Opt Len' and forward the decapsulated frame. 605 Transit devices MUST NOT drop or modify packets on the basis of 606 this bit. 608 Rsvd. (6 bits): Reserved field which MUST be zero on transmission 609 and ignored on receipt. 611 Protocol Type (16 bits): The type of the protocol data unit 612 appearing after the Geneve header. This follows the EtherType 613 [ETYPES] convention with Ethernet itself being represented by the 614 value 0x6558. 616 Virtual Network Identifier (VNI) (24 bits): An identifier for a 617 unique element of a virtual network. In many situations this may 618 represent an L2 segment, however, the control plane defines the 619 forwarding semantics of decapsulated packets. The VNI MAY be used 620 as part of ECMP forwarding decisions or MAY be used as a mechanism 621 to distinguish between overlapping address spaces contained in the 622 encapsulated packet when load balancing across CPUs. 624 Reserved (8 bits): Reserved field which MUST be zero on transmission 625 and ignored on receipt. 627 Transit devices MUST maintain consistent forwarding behavior 628 irrespective of the value of 'Opt Len', including ECMP link 629 selection. These devices SHOULD be able to forward packets 630 containing options without resorting to a slow path. 632 3.5. Tunnel Options 634 0 1 2 3 635 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 637 | Option Class | Type |R|R|R| Length | 638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 639 | Variable Option Data | 640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 Geneve Option 644 The base Geneve header is followed by zero or more options in Type- 645 Length-Value format. Each option consists of a four byte option 646 header and a variable amount of option data interpreted according to 647 the type. 649 Option Class (16 bits): Namespace for the 'Type' field. IANA will 650 be requested to create a "Geneve Option Class" registry to 651 allocate identifiers for organizations, technologies, and vendors 652 that have an interest in creating types for options. Each 653 organization may allocate types independently to allow 654 experimentation and rapid innovation. It is expected that over 655 time certain options will become well known and a given 656 implementation may use option types from a variety of sources. In 657 addition, IANA will be requested to reserve specific ranges for 658 standardized and experimental options. 660 Type (8 bits): Type indicating the format of the data contained in 661 this option. Options are primarily designed to encourage future 662 extensibility and innovation and so standardized forms of these 663 options will be defined in a separate document. 665 The high order bit of the option type indicates that this is a 666 critical option. If the receiving endpoint does not recognize 667 this option and this bit is set then the frame MUST be dropped. 668 If the critical bit is set in any option then the 'C' bit in the 669 Geneve base header MUST also be set. Transit devices MUST NOT 670 drop packets on the basis of this bit. The following figure shows 671 the location of the 'C' bit in the 'Type' field: 673 0 1 2 3 4 5 6 7 8 674 +-+-+-+-+-+-+-+-+ 675 |C| Type | 676 +-+-+-+-+-+-+-+-+ 678 The requirement to drop a packet with an unknown critical option 679 applies to the entire tunnel endpoint system and not a particular 680 component of the implementation. For example, in a system 681 comprised of a forwarding ASIC and a general purpose CPU, this 682 does not mean that the packet must be dropped in the ASIC. An 683 implementation may send the packet to the CPU using a rate-limited 684 control channel for slow-path exception handling. 686 R (3 bits): Option control flags reserved for future use. MUST be 687 zero on transmission and ignored on receipt. 689 Length (5 bits): Length of the option, expressed in four byte 690 multiples excluding the option header. The total length of each 691 option may be between 4 and 128 bytes. Packets in which the total 692 length of all options is not equal to the 'Opt Len' in the base 693 header are invalid and MUST be silently dropped if received by an 694 endpoint. 696 Variable Option Data: Option data interpreted according to 'Type'. 698 3.5.1. Options Processing 700 Geneve options are primarily intended to be originated and processed 701 by tunnel endpoints. However, options MAY be processed by transit 702 devices along the tunnel path as well. Transit devices not 703 processing Geneve options SHOULD process Geneve frame as any other 704 UDP frame and maintain consistent forwarding behavior. 706 In tunnel endpoints, the generation and interpretation of options is 707 determined by the control plane, which is out of the scope of this 708 document. However, to ensure interoperability between heterogeneous 709 devices two requirements are imposed on endpoint devices: 711 o Receiving endpoints MUST drop packets containing unknown options 712 with the 'C' bit set in the option type. 714 o Sending endpoints MUST NOT assume that options will be processed 715 sequentially by the receiver in the order they were transmitted. 717 When designing a Geneve option, it is important to consider how the 718 option will evolve in the future. Once an option is defined it is 719 reasonable to expect that implementations may come to depend on a 720 specific behavior. As a result, the scope of any future changes must 721 be carefully described upfront. 723 Unexpectedly significant interoperability issues may result from 724 changing the length of an option that was defined to be a certain 725 size. A particular option is specified to have either a fixed 726 length, which is constant, or a variable length, which may change 727 over time or for different use cases. This property is part of the 728 definition of the option and conveyed by the 'Type'. For fixed 729 length options, some implementations may choose to ignore the length 730 field in the option header and instead parse based on the well known 731 length associated with the type. In this case, redefining the length 732 will impact not only parsing of the option in question but also any 733 options that follow. Therefore, options that are defined to be fixed 734 length in size MUST NOT be redefined to a different length. Instead, 735 a new 'Type' should be allocated. 737 4. Implementation and Deployment Considerations 739 4.1. Encapsulation of Geneve in IP 741 As an IP-based tunnel protocol, Geneve shares many properties and 742 techniques with existing protocols. The application of some of these 743 are described in further detail, although in general most concepts 744 applicable to the IP layer or to IP tunnels generally also function 745 in the context of Geneve. 747 4.1.1. IP Fragmentation 749 To prevent fragmentation and maximize performance, the best practice 750 when using Geneve is to ensure that the MTU of the physical network 751 is greater than or equal to the MTU of the encapsulated network plus 752 tunnel headers. Manual or upper layer (such as TCP MSS clamping) 753 configuration can be used to ensure that fragmentation never takes 754 place, however, in some situations this may not be feasible. 756 It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], 757 [RFC1981]) be used by setting the DF bit in the IP header when Geneve 758 packets are transmitted over IPv4 (this is the default with IPv6). 759 The use of Path MTU Discovery on the transit network provides the 760 encapsulating endpoint with soft-state about the link that it may use 761 to prevent or minimize fragmentation depending on its role in the 762 virtualized network. 764 Note that some implementations may not be capable of supporting 765 fragmentation or other less common features of the IP header, such as 766 options and extension headers. 768 4.1.2. DSCP and ECN 770 When encapsulating IP (including over Ethernet) frames in Geneve, 771 there are several options for propagating DSCP and ECN bits from the 772 inner header to the tunnel on transmission and the reverse on 773 reception. 775 [RFC2983] lists considerations for mapping DSCP between inner and 776 outer IP headers. Network virtualization is typically more closely 777 aligned with the Pipe model described, where the DSCP value on the 778 tunnel header is set based on a policy (which may be a fixed value, 779 one based on the inner traffic class, or some other mechanism for 780 grouping traffic). Aspects of the Uniform model (which treats the 781 inner and outer DSCP value as a single field by copying on ingress 782 and egress) may also apply, such as the ability to remark the inner 783 header on tunnel egress based on transit marking. However, the 784 Uniform model is not conceptually consistent with network 785 virtualization, which seeks to provide strong isolation between 786 encapsulated traffic and the physical network. 788 [RFC6040] describes the mechanism for exposing ECN capabilities on IP 789 tunnels and propagating congestion markers to the inner packets. 790 This behavior SHOULD be followed for IP packets encapsulated in 791 Geneve. 793 4.1.3. Broadcast and Multicast 795 Geneve tunnels may either be point-to-point unicast between two 796 endpoints or may utilize broadcast or multicast addressing. It is 797 not required that inner and outer addressing match in this respect. 798 For example, in physical networks that do not support multicast, 799 encapsulated multicast traffic may be replicated into multiple 800 unicast tunnels or forwarded by policy to a unicast location 801 (possibly to be replicated there). 803 With physical networks that do support multicast it may be desirable 804 to use this capability to take advantage of hardware replication for 805 encapsulated packets. In this case, multicast addresses may be 806 allocated in the physical network corresponding to tenants, 807 encapsulated multicast groups, or some other factor. The allocation 808 of these groups is a component of the control plane and therefore 809 outside of the scope of this document. When physical multicast is in 810 use, the 'C' bit in the Geneve header may be used with groups of 811 devices with heterogeneous capabilities as each device can interpret 812 only the options that are significant to it if they are not critical. 814 4.1.4. Unidirectional Tunnels 816 Generally speaking, a Geneve tunnel is a unidirectional concept. IP 817 is not a connection oriented protocol and it is possible for two 818 endpoints to communicate with each other using different paths or to 819 have one side not transmit anything at all. As Geneve is an IP-based 820 protocol, the tunnel layer inherits these same characteristics. 822 It is possible for a tunnel to encapsulate a protocol, such as TCP, 823 which is connection oriented and maintains session state at that 824 layer. In addition, implementations MAY model Geneve tunnels as 825 connected, bidirectional links, such as to provide the abstraction of 826 a virtual port. In both of these cases, bidirectionality of the 827 tunnel is handled at a higher layer and does not affect the operation 828 of Geneve itself. 830 4.2. Constraints on Protocol Features 832 Geneve is intended to be flexible to a wide range of current and 833 future applications. As a result, certain constraints may be placed 834 on the use of metadata or other aspects of the protocol in order to 835 optimize for a particular use case. For example, some applications 836 may limit the types of options which are supported or enforce a 837 maximum number or length of options. Other applications may only 838 handle certain encapsulated payload types, such as Ethernet or IP. 839 This could be either globally throughout the system or, for example, 840 restricted to certain classes of devices or network paths. 842 These constraints may be communicated to tunnel endpoints either 843 explicitly through a control plane or implicitly by the nature of the 844 application. As Geneve is defined as a data plane protocol that is 845 control plane agnostic, the exact mechanism is not defined in this 846 document. 848 4.3. NIC Offloads 850 Modern NICs currently provide a variety of offloads to enable the 851 efficient processing of packets. The implementation of many of these 852 offloads requires only that the encapsulated packet be easily parsed 853 (for example, checksum offload). However, optimizations such as LSO 854 and LRO involve some processing of the options themselves since they 855 must be replicated/merged across multiple packets. In these 856 situations, it is desirable to not require changes to the offload 857 logic to handle the introduction of new options. To enable this, 858 some constraints are placed on the definitions of options to allow 859 for simple processing rules: 861 o When performing LSO, a NIC MUST replicate the entire Geneve header 862 and all options, including those unknown to the device, onto each 863 resulting segment. However, a given option definition may 864 override this rule and specify different behavior in supporting 865 devices. Conversely, when performing LRO, a NIC MAY assume that a 866 binary comparison of the options (including unknown options) is 867 sufficient to ensure equality and MAY merge packets with equal 868 Geneve headers. 870 o Option ordering is not significant and packets with the same 871 options in a different order MAY be processed alike. 873 o NICs performing offloads MUST NOT drop packets with unknown 874 options, including those marked as critical. 876 There is no requirement that a given implementation of Geneve employ 877 the offloads listed as examples above. However, as these offloads 878 are currently widely deployed in commercially available NICs, the 879 rules described here are intended to enable efficient handling of 880 current and future options across a variety of devices. 882 4.4. Inner VLAN Handling 884 Geneve is capable of encapsulating a wide range of protocols and 885 therefore a given implementation is likely to support only a small 886 subset of the possibilities. However, as Ethernet is expected to be 887 widely deployed, it is useful to describe the behavior of VLANs 888 inside encapsulated Ethernet frames. 890 As with any protocol, support for inner VLAN headers is OPTIONAL. In 891 many cases, the use of encapsulated VLANs may be disallowed due to 892 security or implementation considerations. However, in other cases 893 trunking of VLAN frames across a Geneve tunnel can prove useful. As 894 a result, the processing of inner VLAN tags upon ingress or egress 895 from a tunnel endpoint is based upon the configuration of the 896 endpoint and/or control plane and not explicitly defined as part of 897 the data format. 899 5. Interoperability Issues 901 Viewed exclusively from the data plane, Geneve does not introduce any 902 interoperability issues as it appears to most devices as UDP frames. 903 However, as there are already a number of tunnel protocols deployed 904 in network virtualization environments, there is a practical question 905 of transition and coexistence. 907 Since Geneve is a superset of the functionality of the three most 908 common protocols used for network virtualization (VXLAN, NVGRE, and 909 STT) it should be straightforward to port an existing control plane 910 to run on top of it with minimal effort. With both the old and new 911 frame formats supporting the same set of capabilities, there is no 912 need for a hard transition - endpoints directly communicating with 913 each other use any common protocol, which may be different even 914 within a single overall system. As transit devices are primarily 915 forwarding frames on the basis of the IP header, all protocols appear 916 similar and these devices do not introduce additional 917 interoperability concerns. 919 To assist with this transition, it is strongly suggested that 920 implementations support simultaneous operation of both Geneve and 921 existing tunnel protocols as it is expected to be common for a single 922 node to communicate with a mixture of other nodes. Eventually, older 923 protocols may be phased out as they are no longer in use. 925 6. Security Considerations 927 As UDP/IP packets, Geneve does not have any inherent security 928 mechanisms. As a result, an attacker with access to the underlay 929 network transporting the IP frames has the ability to snoop or inject 930 packets. Legitimate but malicious tunnel endpoints may also spoof 931 identifiers in the tunnel header to gain access to networks owned by 932 other tenants. 934 Within a particular security domain, such as a data center operated 935 by a single provider, the most common and highest performing security 936 mechanism is isolation of trusted components. Tunnel traffic can be 937 carried over a separate VLAN and filtered at any untrusted 938 boundaries. In addition, tunnel endpoints should only be operated in 939 environments controlled by the service provider, such as the 940 hypervisor itself rather than within a customer VM. 942 When crossing an untrusted link, such as the public Internet, IPsec 943 [RFC4301] may be used to provide authentication and/or encryption of 944 the IP packets formed as part of Geneve encapsulation. If the remote 945 tunnel endpoint is not completely trusted, for example it resides on 946 a customer premises, then it may also be necessary to sanitize any 947 tunnel metadata to prevent tenant-hopping attacks. 949 Geneve does not otherwise affect the security of the encapsulated 950 packets. 952 7. IANA Considerations 954 IANA has allocated UDP port 6081 as the well-known destination port 955 for Geneve. Upon publication, the registry should be updated to cite 956 this document. The original request was: 958 Service Name: geneve 959 Transport Protocol(s): UDP 960 Assignee: Jesse Gross 961 Contact: Jesse Gross 962 Description: Generic Network Virtualization Encapsulation (Geneve) 963 Reference: This document 964 Port Number: 6081 966 In addition, IANA is requested to create a "Geneve Option Class" 967 registry to allocate Option Classes. This shall be a registry of 968 16-bit hexadecimal values along with descriptive strings. The 969 identifiers 0x0-0xFF are to be reserved for standardized options for 970 allocation by IETF Review [RFC5226] and 0xFFFF for Experimental Use. 971 Otherwise, identifiers are to be assigned to any organization with an 972 interest in creating Geneve options on a First Come First Served 973 basis. The registry is to be populated with the following initial 974 values: 976 +----------------+--------------------------------------+ 977 | Option Class | Description | 978 +----------------+--------------------------------------+ 979 | 0x0000..0x00FF | Unassigned - IETF Review | 980 | 0x0100 | Linux | 981 | 0x0101 | Open vSwitch | 982 | 0x0102 | Open Virtual Networking (OVN) | 983 | 0x0103..0xFFFE | Unassigned - First Come First Served | 984 | 0xFFFF | Experimental | 985 +----------------+--------------------------------------+ 987 8. Contributors 989 The following individuals were authors of an earlier version of this 990 document and made significant contributions: 992 T. Sridhar 993 VMware, Inc. 994 3401 Hillview Ave. 995 Palo Alto, CA 94304 996 USA 998 Email: tsridhar@vmware.com 1000 Pankaj Garg 1001 Microsoft Corporation 1002 1 Microsoft Way 1003 Redmond, WA 98052 1004 USA 1006 Email: pankajg@microsoft.com 1008 Chris Wright 1009 Red Hat Inc. 1010 1801 Varsity Drive 1011 Raleigh, NC 27606 1012 USA 1014 Email: chrisw@redhat.com 1016 Puneet Agarwal 1017 Innovium, Inc. 1019 6001 America Center Drive 1020 San Jose, CA 95002 1021 USA 1023 Email: puneet@innovium.com 1025 Kenneth Duda 1026 Arista Networks 1027 5453 Great America Parkway 1028 Santa Clara, CA 95054 1029 USA 1031 Email: kduda@arista.com 1033 Dinesh G. Dutt 1034 Cumulus Networks 1035 140C S. Whisman Road 1036 Mountain View, CA 94041 1037 USA 1039 Email: ddutt@cumulusnetworks.com 1041 Jon Hudson 1042 Brocade Communications Systems, Inc. 1043 130 Holger Way 1044 San Jose, CA 95134 1045 USA 1047 Email: jon.hudson@gmail.com 1049 Ariel Hendel 1050 Broadcom Corporation 1051 3151 Zanker Road 1052 San Jose, CA 95134 1053 USA 1055 Email: ahendel@broadcom.com 1057 9. Acknowledgements 1059 The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler 1060 for their input, feedback, and helpful suggestions. 1062 10. References 1064 10.1. Normative References 1066 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1067 DOI 10.17487/RFC0768, August 1980, 1068 . 1070 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1071 Requirement Levels", BCP 14, RFC 2119, 1072 DOI 10.17487/RFC2119, March 1997, 1073 . 1075 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1076 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1077 DOI 10.17487/RFC5226, May 2008, 1078 . 1080 10.2. Informative References 1082 [ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 2013, 1083 . 1086 [I-D.davie-stt] 1087 Davie, B. and J. Gross, "A Stateless Transport Tunneling 1088 Protocol for Network Virtualization (STT)", draft-davie- 1089 stt-06 (work in progress), April 2014. 1091 [I-D.ietf-nvo3-dataplane-requirements] 1092 Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., 1093 and B. Khasnabish, "NVO3 Data Plane Requirements", draft- 1094 ietf-nvo3-dataplane-requirements-03 (work in progress), 1095 April 2014. 1097 [IEEE.802.1Q-2014] 1098 IEEE, "IEEE Standard for Local and metropolitan area 1099 networks -- Bridges and Bridged Networks", IEEE 1100 Std 802.1Q, 2014. 1102 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1103 DOI 10.17487/RFC1191, November 1990, 1104 . 1106 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1107 for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August 1108 1996, . 1110 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1111 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1112 . 1114 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1115 Label Switching Architecture", RFC 3031, 1116 DOI 10.17487/RFC3031, January 2001, 1117 . 1119 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1120 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1121 December 2005, . 1123 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1124 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1125 2010, . 1127 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1128 UDP Checksums for Tunneled Packets", RFC 6935, 1129 DOI 10.17487/RFC6935, April 2013, 1130 . 1132 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1133 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1134 eXtensible Local Area Network (VXLAN): A Framework for 1135 Overlaying Virtualized Layer 2 Networks over Layer 3 1136 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1137 . 1139 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1140 Rekhter, "Framework for Data Center (DC) Network 1141 Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 1142 2014, . 1144 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1145 Virtualization Using Generic Routing Encapsulation", 1146 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1147 . 1149 [VL2] Greenberg et al, , "VL2: A Scalable and Flexible Data 1150 Center Network", 2009. 1152 Proc. ACM SIGCOMM 2009 1154 Authors' Addresses 1156 Jesse Gross (editor) 1157 VMware, Inc. 1158 3401 Hillview Ave. 1159 Palo Alto, CA 94304 1160 USA 1162 Email: jgross@vmware.com 1164 Ilango Ganga (editor) 1165 Intel Corporation 1166 2200 Mission College Blvd. 1167 Santa Clara, CA 95054 1168 USA 1170 Email: ilango.s.ganga@intel.com