idnits 2.17.1 draft-ietf-nvo3-geneve-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 13, 2017) is 2417 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-12) exists of draft-ietf-nvo3-encap-00 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Gross, Ed. 3 Internet-Draft 4 Intended status: Standards Track I. Ganga, Ed. 5 Expires: March 17, 2018 Intel 6 T. Sridhar, Ed. 7 VMware 8 September 13, 2017 10 Geneve: Generic Network Virtualization Encapsulation 11 draft-ietf-nvo3-geneve-05 13 Abstract 15 Network virtualization involves the cooperation of devices with a 16 wide variety of capabilities such as software and hardware tunnel 17 endpoints, transit fabrics, and centralized control clusters. As a 18 result of their role in tying together different elements in the 19 system, the requirements on tunnels are influenced by all of these 20 components. Flexibility is therefore the most important aspect of a 21 tunnel protocol if it is to keep pace with the evolution of the 22 system. This draft describes Geneve, a protocol designed to 23 recognize and accommodate these changing capabilities and needs. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on March 17, 2018. 42 Copyright Notice 44 Copyright (c) 2017 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (https://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 61 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 62 2. Design Requirements . . . . . . . . . . . . . . . . . . . . . 5 63 2.1. Control Plane Independence . . . . . . . . . . . . . . . 6 64 2.2. Data Plane Extensibility . . . . . . . . . . . . . . . . 7 65 2.2.1. Efficient Implementation . . . . . . . . . . . . . . 7 66 2.3. Use of Standard IP Fabrics . . . . . . . . . . . . . . . 8 67 3. Geneve Encapsulation Details . . . . . . . . . . . . . . . . 9 68 3.1. Geneve Packet Format Over IPv4 . . . . . . . . . . . . . 9 69 3.2. Geneve Packet Format Over IPv6 . . . . . . . . . . . . . 10 70 3.3. UDP Header . . . . . . . . . . . . . . . . . . . . . . . 12 71 3.4. Tunnel Header Fields . . . . . . . . . . . . . . . . . . 13 72 3.5. Tunnel Options . . . . . . . . . . . . . . . . . . . . . 14 73 3.5.1. Options Processing . . . . . . . . . . . . . . . . . 16 74 4. Implementation and Deployment Considerations . . . . . . . . 17 75 4.1. Encapsulation of Geneve in IP . . . . . . . . . . . . . . 17 76 4.1.1. IP Fragmentation . . . . . . . . . . . . . . . . . . 17 77 4.1.2. DSCP and ECN . . . . . . . . . . . . . . . . . . . . 17 78 4.1.3. Broadcast and Multicast . . . . . . . . . . . . . . . 18 79 4.1.4. Unidirectional Tunnels . . . . . . . . . . . . . . . 18 80 4.2. Constraints on Protocol Features . . . . . . . . . . . . 19 81 4.2.1. Constraints on Options . . . . . . . . . . . . . . . 19 82 4.3. NIC Offloads . . . . . . . . . . . . . . . . . . . . . . 19 83 4.4. Inner VLAN Handling . . . . . . . . . . . . . . . . . . . 20 84 5. Interoperability Issues . . . . . . . . . . . . . . . . . . . 20 85 6. Security Considerations . . . . . . . . . . . . . . . . . . . 21 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 87 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 22 88 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 24 89 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 90 10.1. Normative References . . . . . . . . . . . . . . . . . . 24 91 10.2. Informative References . . . . . . . . . . . . . . . . . 24 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 94 1. Introduction 96 Networking has long featured a variety of tunneling, tagging, and 97 other encapsulation mechanisms. However, the advent of network 98 virtualization has caused a surge of renewed interest and a 99 corresponding increase in the introduction of new protocols. The 100 large number of protocols in this space, ranging all the way from 101 VLANs [IEEE.802.1Q_2014] and MPLS [RFC3031] through the more recent 102 VXLAN [RFC7348], NVGRE [RFC7637], and STT [I-D.davie-stt], often 103 leads to questions about the need for new encapsulation formats and 104 what it is about network virtualization in particular that leads to 105 their proliferation. 107 While many encapsulation protocols seek to simply partition the 108 underlay network or bridge between two domains, network 109 virtualization views the transit network as providing connectivity 110 between multiple components of a distributed system. In many ways 111 this system is similar to a chassis switch with the IP underlay 112 network playing the role of the backplane and tunnel endpoints on the 113 edge as line cards. When viewed in this light, the requirements 114 placed on the tunnel protocol are significantly different in terms of 115 the quantity of metadata necessary and the role of transit nodes. 117 Current work such as VL2 [VL2] and the NVO3 working group 118 [I-D.ietf-nvo3-dataplane-requirements] have described some of the 119 properties that the data plane must have to support network 120 virtualization. However, one additional defining requirement is the 121 need to carry system state along with the packet data. The use of 122 some metadata is certainly not a foreign concept - nearly all 123 protocols used for virtualization have at least 24 bits of identifier 124 space as a way to partition between tenants. This is often described 125 as overcoming the limits of 12-bit VLANs, and when seen in that 126 context, or any context where it is a true tenant identifier, 16 127 million possible entries is a large number. However, the reality is 128 that the metadata is not exclusively used to identify tenants and 129 encoding other information quickly starts to crowd the space. In 130 fact, when compared to the tags used to exchange metadata between 131 line cards on a chassis switch, 24-bit identifiers start to look 132 quite small. There are nearly endless uses for this metadata, 133 ranging from storing input ports for simple security policies to 134 service based context for interposing advanced middleboxes. 136 Existing tunnel protocols have each attempted to solve different 137 aspects of these new requirements, only to be quickly rendered out of 138 date by changing control plane implementations and advancements. 139 Furthermore, software and hardware components and controllers all 140 have different advantages and rates of evolution - a fact that should 141 be viewed as a benefit, not a liability or limitation. This draft 142 describes Geneve, a protocol which seeks to avoid these problems by 143 providing a framework for tunneling for network virtualization rather 144 than being prescriptive about the entire system. 146 1.1. Requirements Language 148 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 149 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 150 document are to be interpreted as described in [RFC2119]. 152 In this document, these words will appear with that interpretation 153 only when in ALL CAPS. Lower case uses of these words are not to be 154 interpreted as carrying RFC-2119 significance. 156 1.2. Terminology 158 The NVO3 framework [RFC7365] defines many of the concepts commonly 159 used in network virtualization. In addition, the following terms are 160 specifically meaningful in this document: 162 Checksum offload. An optimization implemented by many NICs which 163 enables computation and verification of upper layer protocol 164 checksums in hardware on transmit and receive, respectively. This 165 typically includes IP and TCP/UDP checksums which would otherwise be 166 computed by the protocol stack in software. 168 Clos network. A technique for composing network fabrics larger than 169 a single switch while maintaining non-blocking bandwidth across 170 connection points. ECMP is used to divide traffic across the 171 multiple links and switches that constitute the fabric. Sometimes 172 termed "leaf and spine" or "fat tree" topologies. 174 ECMP. Equal Cost Multipath. A routing mechanism for selecting from 175 among multiple best next hop paths by hashing packet headers in order 176 to better utilize network bandwidth while avoiding reordering a 177 single stream. 179 Geneve. Generic Network Virtualization Encapsulation. The tunnel 180 protocol described in this draft. 182 LRO. Large Receive Offload. The receive-side equivalent function of 183 LSO, in which multiple protocol segments (primarily TCP) are 184 coalesced into larger data units. 186 NIC. Network Interface Card. A NIC could be part of a tunnel 187 endpoint or transit device and can either process Geneve packets or 188 aid in the processing of Geneve packets. 190 OAM. Operations, Administration, and Management. A suite of tools 191 used to monitor and troubleshoot network problems. 193 Transit device. A forwarding element along the path of the tunnel 194 making up part of the Underlay Network. A transit device MAY be 195 capable of understanding the Geneve packet format but does not 196 originate or terminate Geneve packets. 198 LSO. Large Segmentation Offload. A function provided by many 199 commercial NICs that allows data units larger than the MTU to be 200 passed to the NIC to improve performance, the NIC being responsible 201 for creating smaller segments of size less than or equal to the MTU 202 with correct protocol headers. When referring specifically to TCP/ 203 IP, this feature is often known as TSO (TCP Segmentation Offload). 205 Tunnel endpoint. A component performing encapsulation and 206 decapsulation of packets, such as Ethernet frames or IP datagrams, in 207 Geneve headers. As the ultimate consumer of any tunnel metadata, 208 endpoints have the highest level of requirements for parsing and 209 interpreting tunnel headers. Tunnel endpoints may consist of either 210 software or hardware implementations or a combination of the two. 211 Endpoints are frequently a component of an NVE but may also be found 212 in middleboxes or other elements making up an NVO3 Network. 214 VM. Virtual Machine. 216 2. Design Requirements 218 Geneve is designed to support network virtualization use cases, where 219 tunnels are typically established to act as a backplane between the 220 virtual switches residing in hypervisors, physical switches, or 221 middleboxes or other appliances. An arbitrary IP network can be used 222 as an underlay although Clos networks composed using ECMP links are a 223 common choice to provide consistent bisectional bandwidth across all 224 connection points. Figure 1 shows an example of a hypervisor, top of 225 rack switch for connectivity to physical servers, and a WAN uplink 226 connected using Geneve tunnels over a simplified Clos network. These 227 tunnels are used to encapsulate and forward frames from the attached 228 components such as VMs or physical links. 230 +---------------------+ +-------+ +------+ 231 | +--+ +-------+---+ | |Transit|--|Top of|==Physical 232 | |VM|--| | | | +------+ /|Router | | Rack |==Servers 233 | +--+ |Virtual|NIC|---|Top of|/ +-------+\/+------+ 234 | +--+ |Switch | | | | Rack |\ +-------+/\+------+ 235 | |VM|--| | | | +------+ \|Transit| |Uplink| WAN 236 | +--+ +-------+---+ | |Router |--| |=========> 237 +---------------------+ +-------+ +------+ 238 Hypervisor 240 ()===================================() 241 Switch-Switch Geneve Tunnels 243 Figure 1: Sample Geneve Deployment 245 To support the needs of network virtualization, the tunnel protocol 246 should be able to take advantage of the differing (and evolving) 247 capabilities of each type of device in both the underlay and overlay 248 networks. This results in the following requirements being placed on 249 the data plane tunneling protocol: 251 o The data plane is generic and extensible enough to support current 252 and future control planes. 254 o Tunnel components are efficiently implementable in both hardware 255 and software without restricting capabilities to the lowest common 256 denominator. 258 o High performance over existing IP fabrics. 260 These requirements are described further in the following 261 subsections. 263 2.1. Control Plane Independence 265 Although some protocols for network virtualization have included a 266 control plane as part of the tunnel format specification (most 267 notably, the original VXLAN spec prescribed a multicast learning- 268 based control plane), these specifications have largely been treated 269 as describing only the data format. The VXLAN packet format has 270 actually seen a wide variety of control planes built on top of it. 272 There is a clear advantage in settling on a data format: most of the 273 protocols are only superficially different and there is little 274 advantage in duplicating effort. However, the same cannot be said of 275 control planes, which are diverse in very fundamental ways. The case 276 for standardization is also less clear given the wide variety in 277 requirements, goals, and deployment scenarios. 279 As a result of this reality, Geneve aims to be a pure tunnel format 280 specification that is capable of fulfilling the needs of many control 281 planes by explicitly not selecting any one of them. This 282 simultaneously promotes a shared data format and increases the 283 chances that it will not be obsoleted by future control plane 284 enhancements. 286 2.2. Data Plane Extensibility 288 Achieving the level of flexibility needed to support current and 289 future control planes effectively requires an options infrastructure 290 to allow new metadata types to be defined, deployed, and either 291 finalized or retired. Options also allow for differentiation of 292 products by encouraging independent development in each vendor's core 293 specialty, leading to an overall faster pace of advancement. By far 294 the most common mechanism for implementing options is Type-Length- 295 Value (TLV) format. 297 It should be noted that while options can be used to support non- 298 wirespeed control packets, they are equally important on data packets 299 as well to segregate and direct forwarding (for instance, the 300 examples given before of input port based security policies and 301 service interposition both require tags to be placed on data 302 packets). Therefore, while it would be desirable to limit the 303 extensibility to only control packets for the purposes of simplifying 304 the datapath, that would not satisfy the design requirements. 306 2.2.1. Efficient Implementation 308 There is often a conflict between software flexibility and hardware 309 performance that is difficult to resolve. For a given set of 310 functionality, it is obviously desirable to maximize performance. 311 However, that does not mean new features that cannot be run at that 312 speed today should be disallowed. Therefore, for a protocol to be 313 efficiently implementable means that a set of common capabilities can 314 be reasonably handled across platforms along with a graceful 315 mechanism to handle more advanced features in the appropriate 316 situations. 318 The use of a variable length header and options in a protocol often 319 raises questions about whether it is truly efficiently implementable 320 in hardware. To answer this question in the context of Geneve, it is 321 important to first divide "hardware" into two categories: tunnel 322 endpoints and transit devices. 324 Endpoints must be able to parse the variable header, including any 325 options, and take action. Since these devices are actively 326 participating in the protocol, they are the most affected by Geneve. 328 However, as endpoints are the ultimate consumers of the data, 329 transmitters can tailor their output to the capabilities of the 330 recipient. As new functionality becomes sufficiently well defined to 331 add to endpoints, supporting options can be designed using ordering 332 restrictions and other techniques to ease parsing. 334 Transit devices MAY be able to interpret the options and participate 335 in Geneve packet processing. However, as non-terminating devices, 336 they do not originate or terminate the Geneve packet. The 337 participation of transit devices in Geneve packet processing is 338 OPTIONAL. 340 Further, either tunnel endpoints or transit devices MAY use offload 341 capabilities of NICs such as checksum offload to improve the 342 performance of Geneve packet processing. The presence of a Geneve 343 variable length header SHOULD NOT prevent the tunnel endpoints and 344 transit devices from using such offload capabilities. 346 2.3. Use of Standard IP Fabrics 348 IP has clearly cemented its place as the dominant transport mechanism 349 and many techniques have evolved over time to make it robust, 350 efficient, and inexpensive. As a result, it is natural to use IP 351 fabrics as a transit network for Geneve. Fortunately, the use of IP 352 encapsulation and addressing is enough to achieve the primary goal of 353 delivering packets to the correct point in the network through 354 standard switching and routing. 356 In addition, nearly all underlay fabrics are designed to exploit 357 parallelism in traffic to spread load across multiple links without 358 introducing reordering in individual flows. These equal cost 359 multipathing (ECMP) techniques typically involve parsing and hashing 360 the addresses and port numbers from the packet to select an outgoing 361 link. However, the use of tunnels often results in poor ECMP 362 performance without additional knowledge of the protocol as the 363 encapsulated traffic is hidden from the fabric by design and only 364 endpoint addresses are available for hashing. 366 Since it is desirable for Geneve to perform well on these existing 367 fabrics, it is necessary for entropy from encapsulated packets to be 368 exposed in the tunnel header. The most common technique for this is 369 to use the UDP source port, which is discussed further in 370 Section 3.3. 372 3. Geneve Encapsulation Details 374 The Geneve packet format consists of a compact tunnel header 375 encapsulated in UDP over either IPv4 or IPv6. A small fixed tunnel 376 header provides control information plus a base level of 377 functionality and interoperability with a focus on simplicity. This 378 header is then followed by a set of variable options to allow for 379 future innovation. Finally, the payload consists of a protocol data 380 unit of the indicated type, such as an Ethernet frame. Section 3.1 381 and Section 3.2 illustrate the Geneve packet format transported (for 382 example) over Ethernet along with an Ethernet payload. 384 3.1. Geneve Packet Format Over IPv4 386 0 1 2 3 387 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 388 Outer Ethernet Header: 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 | Outer Destination MAC Address | 391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 | Outer Destination MAC Address | Outer Source MAC Address | 393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 394 | Outer Source MAC Address | 395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 396 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 398 | Ethertype=0x0800 | 399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 Outer IPv4 Header: 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 |Version| IHL |Type of Service| Total Length | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | Identification |Flags| Fragment Offset | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | Time to Live |Protocol=17 UDP| Header Checksum | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | Outer Source IPv4 Address | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | Outer Destination IPv4 Address | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 Outer UDP Header: 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 | Source Port = xxxx | Dest Port = 6081 | 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 | UDP Length | UDP Checksum | 419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 Geneve Header: 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | Virtual Network Identifier (VNI) | Reserved | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 427 | Variable Length Options | 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 Inner Ethernet Header (example payload): 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | Inner Destination MAC Address | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | Inner Destination MAC Address | Inner Source MAC Address | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | Inner Source MAC Address | 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 441 Payload: 442 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 443 | Ethertype of Original Payload | | 444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 445 | Original Ethernet Payload | 446 | | 447 | (Note that the original Ethernet Frame's FCS is not included) | 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 450 Frame Check Sequence: 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | New FCS (Frame Check Sequence) for Outer Ethernet Frame | 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 3.2. Geneve Packet Format Over IPv6 457 0 1 2 3 458 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 459 Outer Ethernet Header: 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 | Outer Destination MAC Address | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 463 | Outer Destination MAC Address | Outer Source MAC Address | 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 | Outer Source MAC Address | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | Ethertype=0x86DD | 470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 Outer IPv6 Header: 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 |Version| Traffic Class | Flow Label | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | Payload Length | NxtHdr=17 UDP | Hop Limit | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 | | 479 + + 480 | | 481 + Outer Source IPv6 Address + 482 | | 483 + + 484 | | 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 | | 487 + + 488 | | 489 + Outer Destination IPv6 Address + 490 | | 491 + + 492 | | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 Outer UDP Header: 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Source Port = xxxx | Dest Port = 6081 | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | UDP Length | UDP Checksum | 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 Geneve Header: 503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 506 | Virtual Network Identifier (VNI) | Reserved | 507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 508 | Variable Length Options | 509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 Inner Ethernet Header (example payload): 512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 513 | Inner Destination MAC Address | 514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 515 | Inner Destination MAC Address | Inner Source MAC Address | 516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 517 | Inner Source MAC Address | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 519 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 520 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 522 Payload: 523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 524 | Ethertype of Original Payload | | 525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 526 | Original Ethernet Payload | 527 | | 528 | (Note that the original Ethernet Frame's FCS is not included) | 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 Frame Check Sequence: 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 533 | New FCS (Frame Check Sequence) for Outer Ethernet Frame | 534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 536 3.3. UDP Header 538 The use of an encapsulating UDP [RFC0768] header follows the 539 connectionless semantics of Ethernet and IP in addition to providing 540 entropy to routers performing ECMP. The header fields are therefore 541 interpreted as follows: 543 Source port: A source port selected by the originating tunnel 544 endpoint. This source port SHOULD be the same for all packets 545 belonging to a single encapsulated flow to prevent reordering due 546 to the use of different paths. To encourage an even distribution 547 of flows across multiple links, the source port SHOULD be 548 calculated using a hash of the encapsulated packet headers using, 549 for example, a traditional 5-tuple. Since the port represents a 550 flow identifier rather than a true UDP connection, the entire 551 16-bit range MAY be used to maximize entropy. 553 Dest port: IANA has assigned port 6081 as the fixed well-known 554 destination port for Geneve. Although the well-known value should 555 be used by default, it is RECOMMENDED that implementations make 556 this configurable. The chosen port is used for identification of 557 Geneve packets and MUST NOT be reversed for different ends of a 558 connection as is done with TCP. 560 UDP length: The length of the UDP packet including the UDP header. 562 UDP checksum: The checksum MAY be set to zero on transmit for 563 packets encapsulated in both IPv4 and IPv6 [RFC6935]. When a 564 packet is received with a UDP checksum of zero it MUST be accepted 565 and decapsulated. If the originating tunnel endpoint optionally 566 encapsulates a packet with a non-zero checksum, it MUST be a 567 correctly computed UDP checksum. Upon receiving such a packet, 568 the egress endpoint MUST validate the checksum. If the checksum 569 is not correct, the packet MUST be dropped, otherwise the packet 570 MUST be accepted for decapsulation. It is RECOMMENDED that the 571 UDP checksum be computed to protect the Geneve header and options 572 in situations where the network reliability is not high and the 573 packet is not protected by another checksum or CRC. 575 3.4. Tunnel Header Fields 577 Ver (2 bits): The current version number is 0. Packets received by 578 an endpoint with an unknown version MUST be dropped. Non- 579 terminating devices processing Geneve packets with an unknown 580 version number MUST treat them as UDP packets with an unknown 581 payload. 583 Opt Len (6 bits): The length of the options fields, expressed in 584 four byte multiples, not including the eight byte fixed tunnel 585 header. This results in a minimum total Geneve header size of 8 586 bytes and a maximum of 260 bytes. The start of the payload 587 headers can be found using this offset from the end of the base 588 Geneve header. 590 O (1 bit): OAM packet. This packet contains a control message 591 instead of a data payload. Endpoints MUST NOT forward the payload 592 and transit devices MUST NOT attempt to interpret or process it. 593 Since these are infrequent control messages, it is RECOMMENDED 594 that endpoints direct these packets to a high priority control 595 queue (for example, to direct the packet to a general purpose CPU 596 from a forwarding ASIC or to separate out control traffic on a 597 NIC). Transit devices MUST NOT alter forwarding behavior on the 598 basis of this bit, such as ECMP link selection. 600 C (1 bit): Critical options present. One or more options has the 601 critical bit set (see Section 3.5). If this bit is set then 602 tunnel endpoints MUST parse the options list to interpret any 603 critical options. On endpoints where option parsing is not 604 supported the packet MUST be dropped on the basis of the 'C' bit 605 in the base header. If the bit is not set tunnel endpoints MAY 606 strip all options using 'Opt Len' and forward the decapsulated 607 packet. Transit devices MUST NOT drop or modify packets on the 608 basis of this bit. 610 The critical bit allows hardware implementations the flexibility 611 to handle options processing in the hardware fastpath or in the 612 exception (slow) path without the need to process all the options. 613 For example, a critical option such as secure hash to provide 614 Geneve header integrity check must be processed by tunnel 615 endpoints and typically processed in the hardware fastpath. 617 Rsvd. (6 bits): Reserved field which MUST be zero on transmission 618 and ignored on receipt. 620 Protocol Type (16 bits): The type of the protocol data unit 621 appearing after the Geneve header. This follows the EtherType 622 [ETYPES] convention with Ethernet itself being represented by the 623 value 0x6558. 625 Virtual Network Identifier (VNI) (24 bits): An identifier for a 626 unique element of a virtual network. In many situations this may 627 represent an L2 segment, however, the control plane defines the 628 forwarding semantics of decapsulated packets. The VNI MAY be used 629 as part of ECMP forwarding decisions or MAY be used as a mechanism 630 to distinguish between overlapping address spaces contained in the 631 encapsulated packet when load balancing across CPUs. 633 Reserved (8 bits): Reserved field which MUST be zero on transmission 634 and ignored on receipt. 636 Transit devices MUST maintain consistent forwarding behavior 637 irrespective of the value of 'Opt Len', including ECMP link 638 selection. These devices SHOULD be able to forward packets 639 containing options without resorting to a slow path. 641 3.5. Tunnel Options 643 0 1 2 3 644 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 645 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 646 | Option Class | Type |R|R|R| Length | 647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 648 | Variable Option Data | 649 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 Geneve Option 653 The base Geneve header is followed by zero or more options in Type- 654 Length-Value format. Each option consists of a four byte option 655 header and a variable amount of option data interpreted according to 656 the type. 658 Option Class (16 bits): Namespace for the 'Type' field. IANA will 659 be requested to create a "Geneve Option Class" registry to 660 allocate identifiers for organizations, technologies, and vendors 661 that have an interest in creating types for options. Each 662 organization may allocate types independently to allow 663 experimentation and rapid innovation. It is expected that over 664 time certain options will become well known and a given 665 implementation may use option types from a variety of sources. In 666 addition, IANA will be requested to reserve specific ranges for 667 standardized and experimental options. 669 Type (8 bits): Type indicating the format of the data contained in 670 this option. Options are primarily designed to encourage future 671 extensibility and innovation and so standardized forms of these 672 options will be defined in a separate document. 674 The high order bit of the option type indicates that this is a 675 critical option. If the receiving endpoint does not recognize 676 this option and this bit is set then the packet MUST be dropped. 677 If the critical bit is set in any option then the 'C' bit in the 678 Geneve base header MUST also be set. Transit devices MUST NOT 679 drop packets on the basis of this bit. The following figure shows 680 the location of the 'C' bit in the 'Type' field: 682 0 1 2 3 4 5 6 7 8 683 +-+-+-+-+-+-+-+-+ 684 |C| Type | 685 +-+-+-+-+-+-+-+-+ 687 The requirement to drop a packet with an unknown critical option 688 applies to the entire tunnel endpoint system and not a particular 689 component of the implementation. For example, in a system 690 comprised of a forwarding ASIC and a general purpose CPU, this 691 does not mean that the packet must be dropped in the ASIC. An 692 implementation may send the packet to the CPU using a rate-limited 693 control channel for slow-path exception handling. 695 R (3 bits): Option control flags reserved for future use. MUST be 696 zero on transmission and ignored on receipt. 698 Length (5 bits): Length of the option, expressed in four byte 699 multiples excluding the option header. The total length of each 700 option may be between 4 and 128 bytes. Packets in which the total 701 length of all options is not equal to the 'Opt Len' in the base 702 header are invalid and MUST be silently dropped if received by an 703 endpoint. 705 Variable Option Data: Option data interpreted according to 'Type'. 707 3.5.1. Options Processing 709 Geneve options are primarily intended to be originated and processed 710 by tunnel endpoints. However, options MAY be processed by transit 711 devices along the tunnel path as well. Transit devices not 712 processing Geneve headers SHOULD process Geneve packets as any other 713 UDP packet and maintain consistent forwarding behavior. 715 In tunnel endpoints, the generation and interpretation of options is 716 determined by the control plane, which is out of the scope of this 717 document. However, to ensure interoperability between heterogeneous 718 devices some requirements are imposed on options and the devices that 719 process them: 721 o Receiving endpoints MUST drop packets containing unknown options 722 with the 'C' bit set in the option type. Conversely, transit 723 devices MUST NOT drop packets as a result of encountering unknown 724 options, including those with the 'C' bit set. 726 o Some options may be defined in such a way that the position in the 727 option list is significant. Therefore, options MUST NOT be 728 reordered by transit devices. 730 o An option MUST NOT affect the parsing or interpretation of any 731 other option. 733 When designing a Geneve option, it is important to consider how the 734 option will evolve in the future. Once an option is defined it is 735 reasonable to expect that implementations may come to depend on a 736 specific behavior. As a result, the scope of any future changes must 737 be carefully described upfront. 739 Unexpectedly significant interoperability issues may result from 740 changing the length of an option that was defined to be a certain 741 size. A particular option is specified to have either a fixed 742 length, which is constant, or a variable length, which may change 743 over time or for different use cases. This property is part of the 744 definition of the option and conveyed by the 'Type'. For fixed 745 length options, some implementations may choose to ignore the length 746 field in the option header and instead parse based on the well known 747 length associated with the type. In this case, redefining the length 748 will impact not only parsing of the option in question but also any 749 options that follow. Therefore, options that are defined to be fixed 750 length in size MUST NOT be redefined to a different length. Instead, 751 a new 'Type' should be allocated. 753 4. Implementation and Deployment Considerations 755 4.1. Encapsulation of Geneve in IP 757 As an IP-based tunnel protocol, Geneve shares many properties and 758 techniques with existing protocols. The application of some of these 759 are described in further detail, although in general most concepts 760 applicable to the IP layer or to IP tunnels generally also function 761 in the context of Geneve. 763 4.1.1. IP Fragmentation 765 To prevent fragmentation and maximize performance, the best practice 766 when using Geneve is to ensure that the MTU of the physical network 767 is greater than or equal to the MTU of the encapsulated network plus 768 tunnel headers. Manual or upper layer (such as TCP MSS clamping) 769 configuration can be used to ensure that fragmentation never takes 770 place, however, in some situations this may not be feasible. 772 It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], 773 [RFC1981]) be used by setting the DF bit in the IP header when Geneve 774 packets are transmitted over IPv4 (this is the default with IPv6). 775 The use of Path MTU Discovery on the transit network provides the 776 encapsulating endpoint with soft-state about the link that it may use 777 to prevent or minimize fragmentation depending on its role in the 778 virtualized network. For example, recommendations/guidance for 779 handling fragmenation in similar overlay encapsulation services like 780 PWE3 are provided in section 5.3 of [RFC3985]. 782 Note that some implementations may not be capable of supporting 783 fragmentation or other less common features of the IP header, such as 784 options and extension headers. 786 4.1.2. DSCP and ECN 788 When encapsulating IP (including over Ethernet) packets in Geneve, 789 there are several considerations for propagating DSCP and ECN bits 790 from the inner header to the tunnel on transmission and the reverse 791 on reception. 793 [RFC2983] provides guidance for mapping DSCP between inner and outer 794 IP headers. Network virtualization is typically more closely aligned 795 with the Pipe model described, where the DSCP value on the tunnel 796 header is set based on a policy (which may be a fixed value, one 797 based on the inner traffic class, or some other mechanism for 798 grouping traffic). Aspects of the Uniform model (which treats the 799 inner and outer DSCP value as a single field by copying on ingress 800 and egress) may also apply, such as the ability to remark the inner 801 header on tunnel egress based on transit marking. However, the 802 Uniform model is not conceptually consistent with network 803 virtualization, which seeks to provide strong isolation between 804 encapsulated traffic and the physical network. 806 [RFC6040] describes the mechanism for exposing ECN capabilities on IP 807 tunnels and propagating congestion markers to the inner packets. 808 This behavior MUST be followed for IP packets encapsulated in Geneve. 810 4.1.3. Broadcast and Multicast 812 Geneve tunnels may either be point-to-point unicast between two 813 endpoints or may utilize broadcast or multicast addressing. It is 814 not required that inner and outer addressing match in this respect. 815 For example, in physical networks that do not support multicast, 816 encapsulated multicast traffic may be replicated into multiple 817 unicast tunnels or forwarded by policy to a unicast location 818 (possibly to be replicated there). 820 With physical networks that do support multicast it may be desirable 821 to use this capability to take advantage of hardware replication for 822 encapsulated packets. In this case, multicast addresses may be 823 allocated in the physical network corresponding to tenants, 824 encapsulated multicast groups, or some other factor. The allocation 825 of these groups is a component of the control plane and therefore 826 outside of the scope of this document. When physical multicast is in 827 use, the 'C' bit in the Geneve header may be used with groups of 828 devices with heterogeneous capabilities as each device can interpret 829 only the options that are significant to it if they are not critical. 831 4.1.4. Unidirectional Tunnels 833 Generally speaking, a Geneve tunnel is a unidirectional concept. IP 834 is not a connection oriented protocol and it is possible for two 835 endpoints to communicate with each other using different paths or to 836 have one side not transmit anything at all. As Geneve is an IP-based 837 protocol, the tunnel layer inherits these same characteristics. 839 It is possible for a tunnel to encapsulate a protocol, such as TCP, 840 which is connection oriented and maintains session state at that 841 layer. In addition, implementations MAY model Geneve tunnels as 842 connected, bidirectional links, such as to provide the abstraction of 843 a virtual port. In both of these cases, bidirectionality of the 844 tunnel is handled at a higher layer and does not affect the operation 845 of Geneve itself. 847 4.2. Constraints on Protocol Features 849 Geneve is intended to be flexible to a wide range of current and 850 future applications. As a result, certain constraints may be placed 851 on the use of metadata or other aspects of the protocol in order to 852 optimize for a particular use case. For example, some applications 853 may limit the types of options which are supported or enforce a 854 maximum number or length of options. Other applications may only 855 handle certain encapsulated payload types, such as Ethernet or IP. 856 This could be either globally throughout the system or, for example, 857 restricted to certain classes of devices or network paths. 859 These constraints may be communicated to tunnel endpoints either 860 explicitly through a control plane or implicitly by the nature of the 861 application. As Geneve is defined as a data plane protocol that is 862 control plane agnostic, the exact mechanism is not defined in this 863 document. 865 4.2.1. Constraints on Options 867 While Geneve options are more flexible, a control plane may restrict 868 the number of option TLVs as well as the order and size of the TLVs, 869 between tunnel endpoints, to make it simpler for a data plane 870 implementation in software or hardware to handle 871 [I-D.ietf-nvo3-encap]. For example, there may be some critical 872 information such as a secure hash that must be processed in a certain 873 order to provide lowest latency. 875 A control plane may negotiate a subset of option TLVs and certain TLV 876 ordering, as well may limit the total number of option TLVs present 877 in the packet, for example, to accommodate hardware capable of 878 processing fewer options [I-D.ietf-nvo3-encap]. Hence, a control 879 plane needs to have the ability to describe the supported TLVs subset 880 and their order to the tunnel end points. In the absence of a 881 control plane, alternative configuration mechanisms may be used for 882 this purpose. The exact mechanism is not defined in this document. 884 4.3. NIC Offloads 886 Modern NICs currently provide a variety of offloads to enable the 887 efficient processing of packets. The implementation of many of these 888 offloads requires only that the encapsulated packet be easily parsed 889 (for example, checksum offload). However, optimizations such as LSO 890 and LRO involve some processing of the options themselves since they 891 must be replicated/merged across multiple packets. In these 892 situations, it is desirable to not require changes to the offload 893 logic to handle the introduction of new options. To enable this, 894 some constraints are placed on the definitions of options to allow 895 for simple processing rules: 897 o When performing LSO, a NIC MUST replicate the entire Geneve header 898 and all options, including those unknown to the device, onto each 899 resulting segment. However, a given option definition may 900 override this rule and specify different behavior in supporting 901 devices. Conversely, when performing LRO, a NIC MAY assume that a 902 binary comparison of the options (including unknown options) is 903 sufficient to ensure equality and MAY merge packets with equal 904 Geneve headers. 906 o Options MUST NOT be reordered during the course of offload 907 processing, including when merging packets for the purpose of LRO. 909 o NICs performing offloads MUST NOT drop packets with unknown 910 options, including those marked as critical. 912 There is no requirement that a given implementation of Geneve employ 913 the offloads listed as examples above. However, as these offloads 914 are currently widely deployed in commercially available NICs, the 915 rules described here are intended to enable efficient handling of 916 current and future options across a variety of devices. 918 4.4. Inner VLAN Handling 920 Geneve is capable of encapsulating a wide range of protocols and 921 therefore a given implementation is likely to support only a small 922 subset of the possibilities. However, as Ethernet is expected to be 923 widely deployed, it is useful to describe the behavior of VLANs 924 inside encapsulated Ethernet frames. 926 As with any protocol, support for inner VLAN headers is OPTIONAL. In 927 many cases, the use of encapsulated VLANs may be disallowed due to 928 security or implementation considerations. However, in other cases 929 trunking of VLAN frames across a Geneve tunnel can prove useful. As 930 a result, the processing of inner VLAN tags upon ingress or egress 931 from a tunnel endpoint is based upon the configuration of the 932 endpoint and/or control plane and not explicitly defined as part of 933 the data format. 935 5. Interoperability Issues 937 Viewed exclusively from the data plane, Geneve does not introduce any 938 interoperability issues as it appears to most devices as UDP packets. 939 However, as there are already a number of tunnel protocols deployed 940 in network virtualization environments, there is a practical question 941 of transition and coexistence. 943 Since Geneve is a superset of the functionality of the three most 944 common protocols used for network virtualization (VXLAN, NVGRE, and 945 STT) it should be straightforward to port an existing control plane 946 to run on top of it with minimal effort. With both the old and new 947 packet formats supporting the same set of capabilities, there is no 948 need for a hard transition - endpoints directly communicating with 949 each other use any common protocol, which may be different even 950 within a single overall system. As transit devices are primarily 951 forwarding packets on the basis of the IP header, all protocols 952 appear similar and these devices do not introduce additional 953 interoperability concerns. 955 To assist with this transition, it is strongly suggested that 956 implementations support simultaneous operation of both Geneve and 957 existing tunnel protocols as it is expected to be common for a single 958 node to communicate with a mixture of other nodes. Eventually, older 959 protocols may be phased out as they are no longer in use. 961 6. Security Considerations 963 As UDP/IP packets, Geneve does not have any inherent security 964 mechanisms. As a result, an attacker with access to the underlay 965 network transporting the IP packets has the ability to snoop or 966 inject packets. Legitimate but malicious tunnel endpoints may also 967 spoof identifiers in the tunnel header to gain access to networks 968 owned by other tenants. 970 Within a particular security domain, such as a data center operated 971 by a single provider, the most common and highest performing security 972 mechanism is isolation of trusted components. Tunnel traffic can be 973 carried over a separate VLAN and filtered at any untrusted 974 boundaries. In addition, tunnel endpoints should only be operated in 975 environments controlled by the service provider, such as the 976 hypervisor itself rather than within a customer VM. 978 When crossing an untrusted link, such as the public Internet, IPsec 979 [RFC4301] may be used to provide authentication and/or encryption of 980 the IP packets formed as part of Geneve encapsulation. If the remote 981 tunnel endpoint is not completely trusted, for example it resides on 982 a customer premises, then it may also be necessary to sanitize any 983 tunnel metadata to prevent tenant-hopping attacks. 985 Geneve does not otherwise affect the security of the encapsulated 986 packets. 988 7. IANA Considerations 990 IANA has allocated UDP port 6081 as the well-known destination port 991 for Geneve. Upon publication, the registry should be updated to cite 992 this document. The original request was: 994 Service Name: geneve 995 Transport Protocol(s): UDP 996 Assignee: Jesse Gross 997 Contact: Jesse Gross 998 Description: Generic Network Virtualization Encapsulation (Geneve) 999 Reference: This document 1000 Port Number: 6081 1002 In addition, IANA is requested to create a "Geneve Option Class" 1003 registry to allocate Option Classes. This shall be a registry of 1004 16-bit hexadecimal values along with descriptive strings. The 1005 identifiers 0x0-0xFF are to be reserved for standardized options for 1006 allocation by IETF Review [RFC5226] and 0xFFF0-0xFFFF for 1007 Experimental Use. Otherwise, identifiers are to be assigned to any 1008 organization with an interest in creating Geneve options on a First 1009 Come First Served basis. The registry is to be populated with the 1010 following initial values: 1012 +----------------+--------------------------------------+ 1013 | Option Class | Description | 1014 +----------------+--------------------------------------+ 1015 | 0x0000..0x00FF | Unassigned - IETF Review | 1016 | 0x0100 | Linux | 1017 | 0x0101 | Open vSwitch | 1018 | 0x0102 | Open Virtual Networking (OVN) | 1019 | 0x0103 | In-band Network Telemetry (INT) | 1020 | 0x0104 | VMware | 1021 | 0x0105..0xFFEF | Unassigned - First Come First Served | 1022 | 0xFFF0..FFFF | Experimental | 1023 +----------------+--------------------------------------+ 1025 8. Contributors 1027 The following individuals were authors of an earlier version of this 1028 document and made significant contributions: 1030 Pankaj Garg 1031 Microsoft Corporation 1032 1 Microsoft Way 1033 Redmond, WA 98052 1034 USA 1035 Email: pankajg@microsoft.com 1037 Chris Wright 1038 Red Hat Inc. 1039 1801 Varsity Drive 1040 Raleigh, NC 27606 1041 USA 1043 Email: chrisw@redhat.com 1045 Puneet Agarwal 1046 Innovium, Inc. 1047 6001 America Center Drive 1048 San Jose, CA 95002 1049 USA 1051 Email: puneet@innovium.com 1053 Kenneth Duda 1054 Arista Networks 1055 5453 Great America Parkway 1056 Santa Clara, CA 95054 1057 USA 1059 Email: kduda@arista.com 1061 Dinesh G. Dutt 1062 Cumulus Networks 1063 140C S. Whisman Road 1064 Mountain View, CA 94041 1065 USA 1067 Email: ddutt@cumulusnetworks.com 1069 Jon Hudson 1070 Brocade Communications Systems, Inc. 1071 130 Holger Way 1072 San Jose, CA 95134 1073 USA 1075 Email: jon.hudson@gmail.com 1077 Ariel Hendel 1078 Broadcom Limited 1079 3151 Zanker Road 1080 San Jose, CA 95134 1081 USA 1082 Email: ariel.hendel@broadcom.com 1084 9. Acknowledgements 1086 The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler 1087 for their input, feedback, and helpful suggestions. 1089 10. References 1091 10.1. Normative References 1093 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1094 DOI 10.17487/RFC0768, August 1980, 1095 . 1097 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1098 Requirement Levels", BCP 14, RFC 2119, 1099 DOI 10.17487/RFC2119, March 1997, 1100 . 1102 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1103 IANA Considerations Section in RFCs", RFC 5226, 1104 DOI 10.17487/RFC5226, May 2008, 1105 . 1107 10.2. Informative References 1109 [ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 2013, 1110 . 1113 [I-D.davie-stt] 1114 Davie, B. and J. Gross, "A Stateless Transport Tunneling 1115 Protocol for Network Virtualization (STT)", draft-davie- 1116 stt-08 (work in progress), April 2016. 1118 [I-D.ietf-nvo3-dataplane-requirements] 1119 Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., 1120 and B. Khasnabish, "NVO3 Data Plane Requirements", draft- 1121 ietf-nvo3-dataplane-requirements-03 (work in progress), 1122 April 2014. 1124 [I-D.ietf-nvo3-encap] 1125 Boutros, S., Ganga, I., Garg, P., Manur, R., Mizrahi, T., 1126 Mozes, D., and E. Nordmark, "NVO3 Encapsulation 1127 Considerations", draft-ietf-nvo3-encap-00 (work in 1128 progress), June 2017. 1130 [IEEE.802.1Q_2014] 1131 IEEE, "IEEE Standard for Local and metropolitan area 1132 networks--Bridges and Bridged Networks", IEEE 802.1Q-2014, 1133 DOI 10.1109/ieeestd.2014.6991462, December 2014, 1134 . 1137 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1138 DOI 10.17487/RFC1191, November 1990, 1139 . 1141 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 1142 for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August 1143 1996, . 1145 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1146 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1147 . 1149 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1150 Label Switching Architecture", RFC 3031, 1151 DOI 10.17487/RFC3031, January 2001, 1152 . 1154 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 1155 Edge-to-Edge (PWE3) Architecture", RFC 3985, 1156 DOI 10.17487/RFC3985, March 2005, 1157 . 1159 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1160 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1161 December 2005, . 1163 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1164 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1165 2010, . 1167 [RFC6935] Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and 1168 UDP Checksums for Tunneled Packets", RFC 6935, 1169 DOI 10.17487/RFC6935, April 2013, 1170 . 1172 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1173 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1174 eXtensible Local Area Network (VXLAN): A Framework for 1175 Overlaying Virtualized Layer 2 Networks over Layer 3 1176 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1177 . 1179 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1180 Rekhter, "Framework for Data Center (DC) Network 1181 Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 1182 2014, . 1184 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1185 Virtualization Using Generic Routing Encapsulation", 1186 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1187 . 1189 [VL2] Greenberg, A., et al., "VL2: A Scalable and Flexible Data 1190 Center Network", ACM SIGCOMM Computer Communication 1191 Review, DOI 10.1145/1594977.1592576, 2009, 1192 . 1195 Authors' Addresses 1197 Jesse Gross (editor) 1199 Email: jesse@kernel.org 1201 Ilango Ganga (editor) 1202 Intel Corporation 1203 2200 Mission College Blvd. 1204 Santa Clara, CA 95054 1205 USA 1207 Email: ilango.s.ganga@intel.com 1209 T. Sridhar (editor) 1210 VMware, Inc. 1211 3401 Hillview Ave. 1212 Palo Alto, CA 94304 1213 USA 1215 Email: tsridhar@vmware.com