idnits 2.17.1 draft-ietf-nvo3-geneve-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 29, 2020) is 1490 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC-to-be' is mentioned on line 1393, but not defined ** Downref: Normative reference to an Informational RFC: RFC 7365 == Outdated reference: A later version (-13) exists of draft-ietf-intarea-tunnels-10 == Outdated reference: A later version (-12) exists of draft-ietf-nvo3-encap-05 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Gross, Ed. 3 Internet-Draft 4 Intended status: Standards Track I. Ganga, Ed. 5 Expires: September 1, 2020 Intel 6 T. Sridhar, Ed. 7 VMware 8 February 29, 2020 10 Geneve: Generic Network Virtualization Encapsulation 11 draft-ietf-nvo3-geneve-15 13 Abstract 15 Network virtualization involves the cooperation of devices with a 16 wide variety of capabilities such as software and hardware tunnel 17 endpoints, transit fabrics, and centralized control clusters. As a 18 result of their role in tying together different elements in the 19 system, the requirements on tunnels are influenced by all of these 20 components. Flexibility is therefore the most important aspect of a 21 tunnel protocol if it is to keep pace with the evolution of the 22 system. This document describes Geneve, an encapsulation protocol 23 designed to recognize and accommodate these changing capabilities and 24 needs. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at https://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on September 1, 2020. 43 Copyright Notice 45 Copyright (c) 2020 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (https://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 62 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 2. Design Requirements . . . . . . . . . . . . . . . . . . . . . 6 64 2.1. Control Plane Independence . . . . . . . . . . . . . . . 7 65 2.2. Data Plane Extensibility . . . . . . . . . . . . . . . . 7 66 2.2.1. Efficient Implementation . . . . . . . . . . . . . . 8 67 2.3. Use of Standard IP Fabrics . . . . . . . . . . . . . . . 8 68 3. Geneve Encapsulation Details . . . . . . . . . . . . . . . . 9 69 3.1. Geneve Packet Format Over IPv4 . . . . . . . . . . . . . 9 70 3.2. Geneve Packet Format Over IPv6 . . . . . . . . . . . . . 11 71 3.3. UDP Header . . . . . . . . . . . . . . . . . . . . . . . 13 72 3.4. Tunnel Header Fields . . . . . . . . . . . . . . . . . . 14 73 3.5. Tunnel Options . . . . . . . . . . . . . . . . . . . . . 15 74 3.5.1. Options Processing . . . . . . . . . . . . . . . . . 17 75 4. Implementation and Deployment Considerations . . . . . . . . 18 76 4.1. Applicability Statement . . . . . . . . . . . . . . . . . 18 77 4.2. Congestion Control Functionality . . . . . . . . . . . . 19 78 4.3. UDP Checksum . . . . . . . . . . . . . . . . . . . . . . 19 79 4.3.1. UDP Zero Checksum Handling with IPv6 . . . . . . . . 19 80 4.4. Encapsulation of Geneve in IP . . . . . . . . . . . . . . 21 81 4.4.1. IP Fragmentation . . . . . . . . . . . . . . . . . . 21 82 4.4.2. DSCP, ECN and TTL . . . . . . . . . . . . . . . . . . 22 83 4.4.3. Broadcast and Multicast . . . . . . . . . . . . . . . 23 84 4.4.4. Unidirectional Tunnels . . . . . . . . . . . . . . . 23 85 4.5. Constraints on Protocol Features . . . . . . . . . . . . 24 86 4.5.1. Constraints on Options . . . . . . . . . . . . . . . 24 87 4.6. NIC Offloads . . . . . . . . . . . . . . . . . . . . . . 25 88 4.7. Inner VLAN Handling . . . . . . . . . . . . . . . . . . . 25 89 5. Transition Considerations . . . . . . . . . . . . . . . . . . 26 90 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26 91 6.1. Data Confidentiality . . . . . . . . . . . . . . . . . . 27 92 6.1.1. Inter-Data Center Traffic . . . . . . . . . . . . . . 27 93 6.2. Data Integrity . . . . . . . . . . . . . . . . . . . . . 28 94 6.3. Authentication of NVE peers . . . . . . . . . . . . . . . 29 95 6.4. Options Interpretation by Transit Devices . . . . . . . . 29 96 6.5. Multicast/Broadcast . . . . . . . . . . . . . . . . . . . 29 97 6.6. Control Plane Communications . . . . . . . . . . . . . . 29 98 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 99 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 31 100 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32 101 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 102 10.1. Normative References . . . . . . . . . . . . . . . . . . 33 103 10.2. Informative References . . . . . . . . . . . . . . . . . 34 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 37 106 1. Introduction 108 Networking has long featured a variety of tunneling, tagging, and 109 other encapsulation mechanisms. However, the advent of network 110 virtualization has caused a surge of renewed interest and a 111 corresponding increase in the introduction of new protocols. The 112 large number of protocols in this space, for example, ranging all the 113 way from VLANs [IEEE.802.1Q_2018] and MPLS [RFC3031] through the more 114 recent VXLAN [RFC7348] (Virtual eXtensible Local Area Network) and 115 NVGRE [RFC7637] (Network Virtualization Using Generic Routing 116 Encapsulation), often leads to questions about the need for new 117 encapsulation formats and what it is about network virtualization in 118 particular that leads to their proliferation. Note that the list of 119 protocols presented above is non-exhaustive. 121 While many encapsulation protocols seek to simply partition the 122 underlay network or bridge between two domains, network 123 virtualization views the transit network as providing connectivity 124 between multiple components of a distributed system. In many ways 125 this system is similar to a chassis switch with the IP underlay 126 network playing the role of the backplane and tunnel endpoints on the 127 edge as line cards. When viewed in this light, the requirements 128 placed on the tunnel protocol are significantly different in terms of 129 the quantity of metadata necessary and the role of transit nodes. 131 Work such as [VL2] (A Scalable and Flexible Data Center Network) and 132 the NVO3 Data Plane Requirements 133 [I-D.ietf-nvo3-dataplane-requirements] have described some of the 134 properties that the data plane must have to support network 135 virtualization. However, one additional defining requirement is the 136 need to carry metadata (e.g. system state) along with the packet 137 data; example use cases of metadata are noted below. The use of some 138 metadata is certainly not a foreign concept - nearly all protocols 139 used for network virtualization have at least 24 bits of identifier 140 space as a way to partition between tenants. This is often described 141 as overcoming the limits of 12-bit VLANs, and when seen in that 142 context, or any context where it is a true tenant identifier, 16 143 million possible entries is a large number. However, the reality is 144 that the metadata is not exclusively used to identify tenants and 145 encoding other information quickly starts to crowd the space. In 146 fact, when compared to the tags used to exchange metadata between 147 line cards on a chassis switch, 24-bit identifiers start to look 148 quite small. There are nearly endless uses for this metadata, 149 ranging from storing input port identifiers for simple security 150 policies to sending service based context for advanced middlebox 151 applications that terminate and re-encapsulate Geneve traffic. 153 Existing tunnel protocols have each attempted to solve different 154 aspects of these new requirements, only to be quickly rendered out of 155 date by changing control plane implementations and advancements. 156 Furthermore, software and hardware components and controllers all 157 have different advantages and rates of evolution - a fact that should 158 be viewed as a benefit, not a liability or limitation. This draft 159 describes Geneve, a protocol which seeks to avoid these problems by 160 providing a framework for tunneling for network virtualization rather 161 than being prescriptive about the entire system. 163 1.1. Requirements Language 165 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 166 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 167 "OPTIONAL" in this document are to be interpreted as described in BCP 168 14 [RFC2119] [RFC8174] when, and only when, they appear in all 169 capitals, as shown here. 171 1.2. Terminology 173 The NVO3 Framework [RFC7365] defines many of the concepts commonly 174 used in network virtualization. In addition, the following terms are 175 specifically meaningful in this document: 177 Checksum offload. An optimization implemented by many NICs (Network 178 Interface Controller) which enables computation and verification of 179 upper layer protocol checksums in hardware on transmit and receive, 180 respectively. This typically includes IP and TCP/UDP checksums which 181 would otherwise be computed by the protocol stack in software. 183 Clos network. A technique for composing network fabrics larger than 184 a single switch while maintaining non-blocking bandwidth across 185 connection points. ECMP is used to divide traffic across the 186 multiple links and switches that constitute the fabric. Sometimes 187 termed "leaf and spine" or "fat tree" topologies. 189 ECMP. Equal Cost Multipath. A routing mechanism for selecting from 190 among multiple best next hop paths by hashing packet headers in order 191 to better utilize network bandwidth while avoiding reordering of 192 packets within a flow. 194 Geneve. Generic Network Virtualization Encapsulation. The tunnel 195 protocol described in this document. 197 LRO. Large Receive Offload. The receive-side equivalent function of 198 LSO, in which multiple protocol segments (primarily TCP) are 199 coalesced into larger data units. 201 LSO. Large Segmentation Offload. A function provided by many 202 commercial NICs that allows data units larger than the MTU to be 203 passed to the NIC to improve performance, the NIC being responsible 204 for creating smaller segments of size less than or equal to the MTU 205 with correct protocol headers. When referring specifically to TCP/ 206 IP, this feature is often known as TSO (TCP Segmentation Offload). 208 Middlebox. The term middlebox in the context of this document refers 209 to network service functions or appliances for service interposition 210 that would typically implement NVE functionality, which terminate or 211 re-encapsulate Geneve traffic. 213 NIC. Network Interface Controller. Also called as Network Interface 214 Card or Network Adapter. A NIC could be part of a tunnel endpoint or 215 transit device and can either process Geneve packets or aid in the 216 processing of Geneve packets. 218 Transit device. A forwarding element (e.g. router or switch) along 219 the path of the tunnel making up part of the Underlay Network. A 220 transit device may be capable of understanding the Geneve packet 221 format but does not originate or terminate Geneve packets. 223 Tunnel endpoint. A component performing encapsulation and 224 decapsulation of packets, such as Ethernet frames or IP datagrams, in 225 Geneve headers. As the ultimate consumer of any tunnel metadata, 226 tunnel endpoints have the highest level of requirements for parsing 227 and interpreting tunnel headers. Tunnel endpoints may consist of 228 either software or hardware implementations or a combination of the 229 two. Tunnel endpoints are frequently a component of an NVE (Network 230 Virtualization Edge) but may also be found in middleboxes or other 231 elements making up an NVO3 Network. 233 VM. Virtual Machine. 235 2. Design Requirements 237 Geneve is designed to support network virtualization use cases for 238 data center environments, where tunnels are typically established to 239 act as a backplane between the virtual switches residing in 240 hypervisors, physical switches, or middleboxes or other appliances. 241 An arbitrary IP network can be used as an underlay although Clos 242 networks composed using ECMP links are a common choice to provide 243 consistent bisectional bandwidth across all connection points. Many 244 of the concepts of network virtualization overlays over Layer 3 IP 245 networks are described in the NVO3 Framework [RFC7365]. Figure 1 246 shows an example of a hypervisor, top of rack switch for connectivity 247 to physical servers, and a WAN uplink connected using Geneve tunnels 248 over a simplified Clos network. These tunnels are used to 249 encapsulate and forward frames from the attached components such as 250 VMs or physical links. 252 +---------------------+ +-------+ +------+ 253 | +--+ +-------+---+ | |Transit|--|Top of|==Physical 254 | |VM|--| | | | +------+ /|Router | | Rack |==Servers 255 | +--+ |Virtual|NIC|---|Top of|/ +-------+\/+------+ 256 | +--+ |Switch | | | | Rack |\ +-------+/\+------+ 257 | |VM|--| | | | +------+ \|Transit| |Uplink| WAN 258 | +--+ +-------+---+ | |Router |--| |=========> 259 +---------------------+ +-------+ +------+ 260 Hypervisor 262 ()===================================() 263 Switch-Switch Geneve Tunnels 265 Figure 1: Sample Geneve Deployment 267 To support the needs of network virtualization, the tunnel protocol 268 should be able to take advantage of the differing (and evolving) 269 capabilities of each type of device in both the underlay and overlay 270 networks. This results in the following requirements being placed on 271 the data plane tunneling protocol: 273 o The data plane is generic and extensible enough to support current 274 and future control planes. 276 o Tunnel components are efficiently implementable in both hardware 277 and software without restricting capabilities to the lowest common 278 denominator. 280 o High performance over existing IP fabrics. 282 These requirements are described further in the following 283 subsections. 285 2.1. Control Plane Independence 287 Although some protocols for network virtualization have included a 288 control plane as part of the tunnel format specification (most 289 notably, VXLAN [RFC7348] prescribed a multicast learning- based 290 control plane), these specifications have largely been treated as 291 describing only the data format. The VXLAN packet format has 292 actually seen a wide variety of control planes built on top of it. 294 There is a clear advantage in settling on a data format: most of the 295 protocols are only superficially different and there is little 296 advantage in duplicating effort. However, the same cannot be said of 297 control planes, which are diverse in very fundamental ways. The case 298 for standardization is also less clear given the wide variety in 299 requirements, goals, and deployment scenarios. 301 As a result of this reality, Geneve is a pure tunnel format 302 specification that is capable of fulfilling the needs of many control 303 planes by explicitly not selecting any one of them. This 304 simultaneously promotes a shared data format and reduces the chance 305 of obsolescence by future control plane enhancements. 307 2.2. Data Plane Extensibility 309 Achieving the level of flexibility needed to support current and 310 future control planes effectively requires an options infrastructure 311 to allow new metadata types to be defined, deployed, and either 312 finalized or retired. Options also allow for differentiation of 313 products by encouraging independent development in each vendor's core 314 specialty, leading to an overall faster pace of advancement. By far 315 the most common mechanism for implementing options is Type-Length- 316 Value (TLV) format. 318 It should be noted that while options can be used to support non- 319 wirespeed control packets, they are equally important on data packets 320 as well to segregate and direct forwarding (for instance, the 321 examples given before of input port based security policies and 322 terminating/re-encapsulating service interposition both require tags 323 to be placed on data packets). Therefore, while it would be 324 desirable to limit the extensibility to only control packets for the 325 purposes of simplifying the datapath, that would not satisfy the 326 design requirements. 328 2.2.1. Efficient Implementation 330 There is often a conflict between software flexibility and hardware 331 performance that is difficult to resolve. For a given set of 332 functionality, it is obviously desirable to maximize performance. 333 However, that does not mean new features that cannot be run at a 334 desired speed today should be disallowed. Therefore, for a protocol 335 to be efficiently implementable means that a set of common 336 capabilities can be reasonably handled across platforms along with a 337 graceful mechanism to handle more advanced features in the 338 appropriate situations. 340 The use of a variable length header and options in a protocol often 341 raises questions about whether it is truly efficiently implementable 342 in hardware. To answer this question in the context of Geneve, it is 343 important to first divide "hardware" into two categories: tunnel 344 endpoints and transit devices. 346 Tunnel endpoints must be able to parse the variable header, including 347 any options, and take action. Since these devices are actively 348 participating in the protocol, they are the most affected by Geneve. 349 However, as tunnel endpoints are the ultimate consumers of the data, 350 transmitters can tailor their output to the capabilities of the 351 recipient. 353 Transit devices may be able to interpret the options, however, as 354 non-terminating devices, transit devices do not originate or 355 terminate the Geneve packet, hence MUST NOT modify Geneve headers and 356 MUST NOT insert or delete options, which is the responsibility of 357 tunnel endpoints. Options, if present in the packet, MUST only be 358 generated and terminated by tunnel endpoints. The participation of 359 transit devices in interpreting options is OPTIONAL. 361 Further, either tunnel endpoints or transit devices MAY use offload 362 capabilities of NICs such as checksum offload to improve the 363 performance of Geneve packet processing. The presence of a Geneve 364 variable length header should not prevent the tunnel endpoints and 365 transit devices from using such offload capabilities. 367 2.3. Use of Standard IP Fabrics 369 IP has clearly cemented its place as the dominant transport mechanism 370 and many techniques have evolved over time to make it robust, 371 efficient, and inexpensive. As a result, it is natural to use IP 372 fabrics as a transit network for Geneve. Fortunately, the use of IP 373 encapsulation and addressing is enough to achieve the primary goal of 374 delivering packets to the correct point in the network through 375 standard switching and routing. 377 In addition, nearly all underlay fabrics are designed to exploit 378 parallelism in traffic to spread load across multiple links without 379 introducing reordering in individual flows. These equal cost 380 multipathing (ECMP) techniques typically involve parsing and hashing 381 the addresses and port numbers from the packet to select an outgoing 382 link. However, the use of tunnels often results in poor ECMP 383 performance without additional knowledge of the protocol as the 384 encapsulated traffic is hidden from the fabric by design and only 385 tunnel endpoint addresses are available for hashing. 387 Since it is desirable for Geneve to perform well on these existing 388 fabrics, it is necessary for entropy from encapsulated packets to be 389 exposed in the tunnel header. The most common technique for this is 390 to use the UDP source port, which is discussed further in 391 Section 3.3. 393 3. Geneve Encapsulation Details 395 The Geneve packet format consists of a compact tunnel header 396 encapsulated in UDP over either IPv4 or IPv6. A small fixed tunnel 397 header provides control information plus a base level of 398 functionality and interoperability with a focus on simplicity. This 399 header is then followed by a set of variable options to allow for 400 future innovation. Finally, the payload consists of a protocol data 401 unit of the indicated type, such as an Ethernet frame. Section 3.1 402 and Section 3.2 illustrate the Geneve packet format transported (for 403 example) over Ethernet along with an Ethernet payload. 405 3.1. Geneve Packet Format Over IPv4 407 0 1 2 3 408 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 409 Outer Ethernet Header: 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | Outer Destination MAC Address | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 | Outer Destination MAC Address | Outer Source MAC Address | 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 415 | Outer Source MAC Address | 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 417 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 | Ethertype=0x0800 | 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 422 Outer IPv4 Header: 423 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 424 |Version| IHL |Type of Service| Total Length | 425 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 426 | Identification |Flags| Fragment Offset | 427 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 428 | Time to Live |Protocol=17 UDP| Header Checksum | 429 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 430 | Outer Source IPv4 Address | 431 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 432 | Outer Destination IPv4 Address | 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 435 Outer UDP Header: 436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 437 | Source Port = xxxx | Dest Port = 6081 | 438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 | UDP Length | UDP Checksum | 440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 Geneve Header: 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 444 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | Virtual Network Identifier (VNI) | Reserved | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 | Variable Length Options | 449 ~ ~ 450 | | 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 Inner Ethernet Header (example payload): 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 | Inner Destination MAC Address | 456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 | Inner Destination MAC Address | Inner Source MAC Address | 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 459 | Inner Source MAC Address | 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 Payload: 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | Ethertype of Original Payload | | 467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 468 | Original Ethernet Payload | 469 | | 470 | (Note that the original Ethernet Frame's Preamble, Start Frame| 471 | Delimiter(SFD) & Frame Check Sequence(FCS) are not included | 472 | and the Ethernet Payload need not be 4-byte aligned) | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 475 Frame Check Sequence: 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 | New Frame Check Sequence (FCS) for Outer Ethernet Frame | 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 3.2. Geneve Packet Format Over IPv6 482 0 1 2 3 483 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 484 Outer Ethernet Header: 485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 | Outer Destination MAC Address | 487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 488 | Outer Destination MAC Address | Outer Source MAC Address | 489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 490 | Outer Source MAC Address | 491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 492 |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | 493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 | Ethertype=0x86DD | 495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 Outer IPv6 Header: 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 |Version| Traffic Class | Flow Label | 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 501 | Payload Length | NxtHdr=17 UDP | Hop Limit | 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 503 | | 504 + + 505 | | 506 + Outer Source IPv6 Address + 507 | | 508 + + 509 | | 510 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 511 | | 512 + + 513 | | 514 + Outer Destination IPv6 Address + 515 | | 516 + + 517 | | 518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 520 Outer UDP Header: 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 | Source Port = xxxx | Dest Port = 6081 | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 525 | UDP Length | UDP Checksum | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 528 Geneve Header: 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 530 |Ver| Opt Len |O|C| Rsvd. | Protocol Type | 531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 532 | Virtual Network Identifier (VNI) | Reserved | 533 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 534 | Variable Length Options | 535 ~ ~ 536 | | 537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 Inner Ethernet Header (example payload): 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | Inner Destination MAC Address | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | Inner Destination MAC Address | Inner Source MAC Address | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | Inner Source MAC Address | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 |Optional Ethertype=C-Tag 802.1Q| Inner VLAN Tag Information | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 550 Payload: 551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 552 | Ethertype of Original Payload | | 553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 554 | Original Ethernet Payload | 555 | | 556 | (Note that the original Ethernet Frame's Preamble, Start Frame| 557 | Delimiter(SFD) & Frame Check Sequence(FCS) are not included | 558 | and the Ethernet Payload need not be 4-byte aligned) | 559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 Frame Check Sequence: 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 563 | New Frame Check Sequence (FCS) for Outer Ethernet Frame | 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 3.3. UDP Header 568 The use of an encapsulating UDP [RFC0768] header follows the 569 connectionless semantics of Ethernet and IP in addition to providing 570 entropy to routers performing ECMP. The header fields are therefore 571 interpreted as follows: 573 Source port: A source port selected by the originating tunnel 574 endpoint. This source port SHOULD be the same for all packets 575 belonging to a single encapsulated flow to prevent reordering due 576 to the use of different paths. To encourage an even distribution 577 of flows across multiple links, the source port SHOULD be 578 calculated using a hash of the encapsulated packet headers using, 579 for example, a traditional 5-tuple. Since the port represents a 580 flow identifier rather than a true UDP connection, the entire 581 16-bit range MAY be used to maximize entropy. In addition to 582 setting the source port, for IPv6, flow label MAY also be used for 583 providing entropy. For an example of using IPv6 flow label for 584 tunnel use cases, see [RFC6438]. 586 If Geneve traffic is shared with other UDP listeners on the same 587 IP address, tunnel endpoints SHOULD implement a mechanism to 588 ensure ICMP return traffic arising from network errors is directed 589 to the correct listener. The definition of such a mechanism is 590 beyond the scope of this document. 592 Dest port: IANA has assigned port 6081 as the fixed well-known 593 destination port for Geneve. Although the well-known value should 594 be used by default, it is RECOMMENDED that implementations make 595 this configurable. The chosen port is used for identification of 596 Geneve packets and MUST NOT be reversed for different ends of a 597 connection as is done with TCP. It is the responsibility of the 598 control plane for any reconfiguration of the assigned port and its 599 interpretation by respective devices. The definition of the 600 control plane is beyond the scope of this document. 602 UDP length: The length of the UDP packet including the UDP header. 604 UDP checksum: In order to protect the Geneve header, options and 605 payload from potential data corruption, UDP checksum SHOULD be 606 generated as specified in [RFC0768] and [RFC1112] when Geneve is 607 encapsulated in IPv4. To protect the IP header, Geneve header, 608 options and payload from potential data corruption, the UDP 609 checksum MUST be generated by default as specified in [RFC0768] 610 and [RFC8200] when Geneve is encapsulated in IPv6, except for 611 certain conditions, which are outlined in the next paragraph. 612 Upon receiving such packets with non-zero UDP checksum, the 613 receiving tunnel endpoints MUST validate the checksum. If the 614 checksum is not correct, the packet MUST be dropped, otherwise the 615 packet MUST be accepted for decapsulation. 617 Under certain conditions, the UDP checksum MAY be set to zero on 618 transmit for packets encapsulated in both IPv4 and IPv6 [RFC8200]. 619 See Section 4.3 for additional requirements that apply when using 620 zero UDP checksum with IPv4 and IPv6. Disabling the use of UDP 621 checksums is an operational consideration that should take into 622 account the risks and effects of packet corruption. 624 3.4. Tunnel Header Fields 626 Ver (2 bits): The current version number is 0. Packets received by 627 a tunnel endpoint with an unknown version MUST be dropped. 628 Transit devices interpreting Geneve packets with an unknown 629 version number MUST treat them as UDP packets with an unknown 630 payload. 632 Opt Len (6 bits): The length of the options fields, expressed in 633 four byte multiples, not including the eight byte fixed tunnel 634 header. This results in a minimum total Geneve header size of 8 635 bytes and a maximum of 260 bytes. The start of the payload 636 headers can be found using this offset from the end of the base 637 Geneve header. 639 Transit devices MUST maintain consistent forwarding behavior 640 irrespective of the value of 'Opt Len', including ECMP link 641 selection. 643 O (1 bit): Control packet. This packet contains a control message. 644 Control messages are sent between tunnel endpoints. Tunnel 645 endpoints MUST NOT forward the payload and transit devices MUST 646 NOT attempt to interpret it. Since control messages are less 647 frequent, it is RECOMMENDED that tunnel endpoints direct these 648 packets to a high priority control queue (for example, to direct 649 the packet to a general purpose CPU from a forwarding ASIC or to 650 separate out control traffic on a NIC). Transit devices MUST NOT 651 alter forwarding behavior on the basis of this bit, such as ECMP 652 link selection. 654 C (1 bit): Critical options present. One or more options has the 655 critical bit set (see Section 3.5). If this bit is set then 656 tunnel endpoints MUST parse the options list to interpret any 657 critical options. On tunnel endpoints where option parsing is not 658 supported the packet MUST be dropped on the basis of the 'C' bit 659 in the base header. If the bit is not set tunnel endpoints MAY 660 strip all options using 'Opt Len' and forward the decapsulated 661 packet. Transit devices MUST NOT drop packets on the basis of 662 this bit. 664 Rsvd. (6 bits): Reserved field, which MUST be zero on transmission 665 and MUST be ignored on receipt. 667 Protocol Type (16 bits): The type of the protocol data unit 668 appearing after the Geneve header. This follows the EtherType 669 [ETYPES] convention; with Ethernet itself being represented by the 670 value 0x6558. 672 Virtual Network Identifier (VNI) (24 bits): An identifier for a 673 unique element of a virtual network. In many situations this may 674 represent an L2 segment, however, the control plane defines the 675 forwarding semantics of decapsulated packets. The VNI MAY be used 676 as part of ECMP forwarding decisions or MAY be used as a mechanism 677 to distinguish between overlapping address spaces contained in the 678 encapsulated packet when load balancing across CPUs. 680 Reserved (8 bits): Reserved field which MUST be zero on transmission 681 and ignored on receipt. 683 3.5. Tunnel Options 685 0 1 2 3 686 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 688 | Option Class | Type |R|R|R| Length | 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 690 | Variable Option Data | 691 ~ ~ 692 | | 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 695 Geneve Option 697 The base Geneve header is followed by zero or more options in Type- 698 Length-Value format. Each option consists of a four byte option 699 header and a variable amount of option data interpreted according to 700 the type. 702 Option Class (16 bits): Namespace for the 'Type' field. IANA will 703 be requested to create a "Geneve Option Class" registry to 704 allocate identifiers for organizations, technologies, and vendors 705 that have an interest in creating types for options. Each 706 organization may allocate types independently to allow 707 experimentation and rapid innovation. It is expected that over 708 time certain options will become well known and a given 709 implementation may use option types from a variety of sources. In 710 addition, IANA will be requested to reserve specific ranges for 711 allocation by IETF Review and for Experimental Use (see 712 Section 7). 714 Type (8 bits): Type indicating the format of the data contained in 715 this option. Options are primarily designed to encourage future 716 extensibility and innovation and so standardized forms of these 717 options will be defined in separate documents. 719 The high order bit of the option type indicates that this is a 720 critical option. If the receiving tunnel endpoint does not 721 recognize this option and this bit is set then the packet MUST be 722 dropped. If this bit is set in any option then the 'C' bit in the 723 Geneve base header MUST also be set. Transit devices MUST NOT 724 drop packets on the basis of this bit. The following figure shows 725 the location of the 'C' bit in the 'Type' field: 727 0 1 2 3 4 5 6 7 8 728 +-+-+-+-+-+-+-+-+ 729 |C| Type | 730 +-+-+-+-+-+-+-+-+ 732 The requirement to drop a packet with an unknown option with the 733 'C' bit set applies to the entire tunnel endpoint system and not a 734 particular component of the implementation. For example, in a 735 system comprised of a forwarding ASIC and a general purpose CPU, 736 this does not mean that the packet must be dropped in the ASIC. 737 An implementation may send the packet to the CPU using a rate- 738 limited control channel for slow-path exception handling. 740 R (3 bits): Option control flags reserved for future use. These 741 bits MUST be zero on transmission and MUST be ignored on receipt. 743 Length (5 bits): Length of the option, expressed in four byte 744 multiples excluding the option header. The total length of each 745 option may be between 4 and 128 bytes. A value of 0 in the Length 746 field implies an option with only an option header and no variable 747 option data. Packets in which the total length of all options is 748 not equal to the 'Opt Len' in the base header are invalid and MUST 749 be silently dropped if received by a tunnel endpoint that 750 processes the options. 752 Variable Option Data: Option data interpreted according to 'Type'. 754 3.5.1. Options Processing 756 Geneve options are intended to be originated and processed by tunnel 757 endpoints. However, options MAY be interpreted by transit devices 758 along the tunnel path. Transit devices not interpreting Geneve 759 headers (which may or may not include options) MUST handle Geneve 760 packets as any other UDP packet and maintain consistent forwarding 761 behavior. 763 In tunnel endpoints, the generation and interpretation of options is 764 determined by the control plane, which is beyond the the scope of 765 this document. However, to ensure interoperability between 766 heterogeneous devices some requirements are imposed on options and 767 the devices that process them: 769 o Receiving tunnel endpoints MUST drop packets containing unknown 770 options with the 'C' bit set in the option type. Conversely, 771 transit devices MUST NOT drop packets as a result of encountering 772 unknown options, including those with the 'C' bit set. 774 o The contents of the options and their ordering MUST NOT be 775 modified by transit devices. 777 o If a tunnel endpoint receives a Geneve packet with 'Opt Len' 778 (total length of all options) that exceeds the options processing 779 capability of the tunnel endpoint then the tunnel endpoint MUST 780 drop such packets. An implementation may raise an exception to 781 the control plane of such an event. It is the responsibility of 782 the control plane to ensure the communicating peer tunnel 783 endpoints have the processing capability to handle the total 784 length of options. The definition of the control plane is beyond 785 the scope of this document. 787 When designing a Geneve option, it is important to consider how the 788 option will evolve in the future. Once an option is defined it is 789 reasonable to expect that implementations may come to depend on a 790 specific behavior. As a result, the scope of any future changes must 791 be carefully described upfront. 793 Architecturally, options are intended to be self-descriptive and 794 independent. This enables parallelism in option processing and 795 reduces implementation complexity. However, the control plane may 796 impose certain ordering restrictions as described in Section 4.5.1. 798 Unexpectedly significant interoperability issues may result from 799 changing the length of an option that was defined to be a certain 800 size. A particular option is specified to have either a fixed 801 length, which is constant, or a variable length, which may change 802 over time or for different use cases. This property is part of the 803 definition of the option and conveyed by the 'Type'. For fixed 804 length options, some implementations may choose to ignore the length 805 field in the option header and instead parse based on the well known 806 length associated with the type. In this case, redefining the length 807 will impact not only parsing of the option in question but also any 808 options that follow. Therefore, options that are defined to be fixed 809 length in size MUST NOT be redefined to a different length. Instead, 810 a new 'Type' should be allocated. Actual definition of the option 811 type is beyond the scope of this document. The option type and its 812 interpretation should be defined by the entity that owns the option 813 class. 815 Options may be processed by NIC hardware utilizing offloads (e.g. 816 LSO and LRO) as described in Section 4.6. Careful consideration 817 should be given to how the offload capabilities outlined in 818 Section 4.6 impact an option's design. 820 4. Implementation and Deployment Considerations 822 4.1. Applicability Statement 824 Geneve is a network virtualization overlay encapsulation protocol 825 designed to establish tunnels between NVEs over an existing IP 826 network. It is intended for use in public or private data center 827 environments, for deploying multi-tenant overlay networks over an 828 existing IP underlay network. 830 Geneve is a UDP based encapsulation protocol transported over 831 existing IPv4 and IPv6 networks. Hence, as a UDP based protocol, 832 Geneve adheres to the UDP usage guidelines as specified in [RFC8085]. 833 The applicability of these guidelines are dependent on the underlay 834 IP network and the nature of Geneve payload protocol (example TCP/IP, 835 IP/Ethernet). 837 Geneve is intended to be deployed in a data center network 838 environment operated by a single operator or adjacent set of 839 cooperating network operators that fits with the definition of 840 controlled environments in [RFC8085]. A network in a controlled 841 environment can be managed to operate under certain conditions 842 whereas in the general Internet this cannot be done. Hence 843 requirements for a tunnel protocol operating under a controlled 844 environment can be less restrictive than the requirements of the 845 general Internet. 847 For the purpose of this document, a traffic-managed controlled 848 environment (TMCE) is defined as an IP network that is traffic- 849 engineered and/or otherwise managed (e.g., via use of traffic rate 850 limiters) to avoid congestion. The concept of TMCE is outlined in 851 [RFC8086]. Significant portions of the text in Section 4.1 through 852 Section 4.3 are based on [RFC8086] as applicable to Geneve. 854 It is the responsibility of the operator to ensure that the 855 guidelines/requirements in this section are followed as applicable to 856 their Geneve deployment(s). 858 4.2. Congestion Control Functionality 860 Geneve does not natively provide congestion control functionality and 861 relies on the payload protocol traffic for congestion control. As 862 such Geneve MUST be used with congestion controlled traffic or within 863 a network that is traffic managed to avoid congestion (TMCE). An 864 operator of a traffic managed network (TMCE) may avoid congestion by 865 careful provisioning of their networks, rate-limiting of user data 866 traffic and traffic engineering according to path capacity. 868 4.3. UDP Checksum 870 In order to provide integrity of Geneve headers, options and payload, 871 (for example to avoid misdelivery of payload to different tenant 872 systems) in case of data corruption, the outer UDP checksum SHOULD be 873 used with Geneve when transported over IPv4. The UDP checksum 874 provides a statistical guarantee that a payload was not corrupted in 875 transit. These integrity checks are not strong from a coding or 876 cryptographic perspective and are not designed to detect physical- 877 layer errors or malicious modification of the datagram (see 878 Section 3.4 of [RFC8085]). In deployments where such a risk exists, 879 an operator SHOULD use additional data integrity mechanisms such as 880 offered by IPsec (see Section 6.2). 882 An operator MAY choose to disable UDP checksums and use zero 883 checksums if Geneve packet integrity is provided by other data 884 integrity mechanisms such as IPsec or additional checksums or if one 885 of the conditions in Section 4.3.1 a, b, c are met. 887 By default, UDP checksums MUST be used when Geneve is transported 888 over IPv6. A tunnel endpoint MAY be configured for use with zero UDP 889 checksum if additional requirements in Section 4.3.1 are met. 891 4.3.1. UDP Zero Checksum Handling with IPv6 893 When Geneve is used over IPv6, the UDP checksum is used to protect 894 IPv6 headers, UDP headers and Geneve headers, options and payload 895 from potential data corruption. As such by default Geneve MUST use 896 UDP checksums when transported over IPv6. An operator MAY choose to 897 configure to operate with zero UDP checksum if operating in a traffic 898 managed controlled environment as stated in Section 4.1 if one of the 899 following conditions are met. 901 a. It is known that the packet corruption is exceptionally unlikely 902 (perhaps based on knowledge of equipment types in their underlay 903 network) and the operator is willing to take a risk of undetected 904 packet corruption 906 b. It is judged through observational measurements (perhaps through 907 historic or current traffic flows that use non zero checksum) 908 that the level of packet corruption is tolerably low and where 909 the operator is willing to take the risk of undetected 910 corruption. 912 c. Geneve payload is carrying applications that are tolerant of 913 misdelivered or corrupted packets (perhaps through higher layer 914 checksum validation and/or reliability through retransmission) 916 In addition Geneve tunnel implementations using zero UDP checksum 917 MUST meet the following requirements: 919 1. Use of UDP checksum over IPv6 MUST be the default configuration 920 for all Geneve tunnels. 922 2. If Geneve is used with zero UDP checksum over IPv6 then such 923 tunnel endpoint implementation MUST meet all the requirements 924 specified in Section 4 of [RFC6936] and requirement 1 as 925 specified in Section 5 of [RFC6936] as that is relevant to 926 Geneve. 928 3. The Geneve tunnel endpoint that decapsulates the tunnel SHOULD 929 check the source and destination IPv6 addresses are valid for the 930 Geneve tunnel that is configured to receive zero UDP checksum and 931 discard other packets for which such check fails. 933 4. The Geneve tunnel endpoint that encapsulates the tunnel MAY use 934 different IPv6 source addresses for each Geneve tunnel that uses 935 zero UDP checksum mode in order to strengthen the decapsulator's 936 check of the IPv6 source address (i.e the same IPv6 source 937 address is not to be used with more than one IPv6 destination 938 address, irrespective of whether that destination address is a 939 unicast or multicast address). When this is not possible, it is 940 RECOMMENDED to use each source address for as few Geneve tunnels 941 that use zero UDP checksum as is feasible. 943 Note that (for requirements 3 and 4) the receiving tunnel 944 endpoint can apply these checks only if it has out-of-band 945 knowledge that the encapsulating tunnel endpoint is applying the 946 indicated behavior. One possibility to obtain this out-of-band 947 knowledge is through signaling by the control plane. The 948 definition of the control plane is beyond the scope of this 949 document. 951 5. Measures SHOULD be taken to prevent Geneve traffic over IPv6 with 952 zero UDP checksum from escaping into the general Internet. 953 Examples of such measures include employing packet filters at the 954 gateways or edge of Geneve network and/or keeping logical or 955 physical separation of the Geneve network from networks carrying 956 the general Internet traffic. 958 The above requirements do not change either the requirements 959 specified in [RFC8200] or the requirements specified in [RFC6936]. 961 The use of the source IPv6 address in addition to the destination 962 IPv6 address, plus the recommendation against reuse of source IPv6 963 addresses among Geneve tunnels collectively provide some mitigation 964 for the absence of UDP checksum coverage of the IPv6 header. A 965 traffic-managed controlled environment that satisfies at least one of 966 three conditions listed at the beginning of this section provides 967 additional assurance. 969 Editorial Note (The following paragraph to be removed by the RFC 970 Editor before publication) 972 It was discussed during TSVART early review if the level of 973 requirement for using different IPv6 source addresses for different 974 tunnel destinations would need to be "MAY" or "SHOULD". The 975 discussion concluded that it was appropriate to keep this as "MAY", 976 since it was considered not realistic for control planes having to 977 maintain a high level of state on a per tunnel destination basis. In 978 addition, the text above provides sufficient guidance to operators 979 and implementors on possible mitigations. 981 4.4. Encapsulation of Geneve in IP 983 As an IP-based tunnel protocol, Geneve shares many properties and 984 techniques with existing protocols. The application of some of these 985 are described in further detail, although in general most concepts 986 applicable to the IP layer or to IP tunnels generally also function 987 in the context of Geneve. 989 4.4.1. IP Fragmentation 991 It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191], 992 [RFC8201]) be used to prevent or minimize fragmentation. The use of 993 Path MTU Discovery on the transit network provides the encapsulating 994 tunnel endpoint with soft-state about the link that it may use to 995 prevent or minimize fragmentation depending on its role in the 996 virtualized network. The NVE can maintain this state (the MTU size 997 of the tunnel link(s) associated with the tunnel endpoint), so if a 998 tenant system sends large packets that when encapsulated exceed the 999 MTU size of the tunnel link, the tunnel endpoint can discard such 1000 packets and send exception messages to the tenant system(s). If the 1001 tunnel endpoint is associated with a routing or forwarding function 1002 and/or has the capability to send ICMP messages, the encapsulating 1003 tunnel endpoint MAY send ICMP fragmentation needed [RFC0792] or 1004 Packet Too Big [RFC4443] messages to the tenant system(s). When 1005 determining the MTU size of a tunnel link, maximum length of options 1006 MUST be assumed as options may vary on a per-packet basis. For 1007 example, recommendations/guidance for handling fragmentation in 1008 similar overlay encapsulation services like PWE3 are provided in 1009 Section 5.3 of [RFC3985]. 1011 Note that some implementations may not be capable of supporting 1012 fragmentation or other less common features of the IP header, such as 1013 options and extension headers. For example, some of the issues 1014 associated with MTU size and fragmentation in IP tunneling and use of 1015 ICMP messages is outlined in Section 4.2 of 1016 [I-D.ietf-intarea-tunnels]. 1018 4.4.2. DSCP, ECN and TTL 1020 When encapsulating IP (including over Ethernet) packets in Geneve, 1021 there are several considerations for propagating DSCP and ECN bits 1022 from the inner header to the tunnel on transmission and the reverse 1023 on reception. 1025 [RFC2983] provides guidance for mapping DSCP between inner and outer 1026 IP headers. Network virtualization is typically more closely aligned 1027 with the Pipe model described, where the DSCP value on the tunnel 1028 header is set based on a policy (which may be a fixed value, one 1029 based on the inner traffic class, or some other mechanism for 1030 grouping traffic). Aspects of the Uniform model (which treats the 1031 inner and outer DSCP value as a single field by copying on ingress 1032 and egress) may also apply, such as the ability to remark the inner 1033 header on tunnel egress based on transit marking. However, the 1034 Uniform model is not conceptually consistent with network 1035 virtualization, which seeks to provide strong isolation between 1036 encapsulated traffic and the physical network. 1038 [RFC6040] describes the mechanism for exposing ECN capabilities on IP 1039 tunnels and propagating congestion markers to the inner packets. 1040 This behavior MUST be followed for IP packets encapsulated in Geneve. 1042 Though Uniform or Pipe models could be used for TTL (or Hop Limit in 1043 case of IPv6) handling when tunneling IP packets, the Pipe model is 1044 more aligned with network virtualization. [RFC2003] provides 1045 guidance on handling TTL between inner IP header and outer IP 1046 tunnels; this model is more aligned with the Pipe model and is 1047 RECOMMENDED for use with Geneve for network virtualization 1048 applications. 1050 4.4.3. Broadcast and Multicast 1052 Geneve tunnels may either be point-to-point unicast between two 1053 tunnel endpoints or may utilize broadcast or multicast addressing. 1054 It is not required that inner and outer addressing match in this 1055 respect. For example, in physical networks that do not support 1056 multicast, encapsulated multicast traffic may be replicated into 1057 multiple unicast tunnels or forwarded by policy to a unicast location 1058 (possibly to be replicated there). 1060 With physical networks that do support multicast it may be desirable 1061 to use this capability to take advantage of hardware replication for 1062 encapsulated packets. In this case, multicast addresses may be 1063 allocated in the physical network corresponding to tenants, 1064 encapsulated multicast groups, or some other factor. The allocation 1065 of these groups is a component of the control plane and therefore is 1066 beyond the scope of this document. 1068 When physical multicast is in use, devices with heterogeneous 1069 capabilities may be present in the same group. Some options may only 1070 be interpretable by a subset of the devices in the group. Other 1071 devices can safely ignore such options unless the 'C' bit is set to 1072 mark the unknown option as critical. Requirements outlined in 1073 Section 3.4 apply for critical options. 1075 In addition, [RFC8293] provides examples of various mechanisms that 1076 can be used for multicast handling in network virtualization overlay 1077 networks. 1079 4.4.4. Unidirectional Tunnels 1081 Generally speaking, a Geneve tunnel is a unidirectional concept. IP 1082 is not a connection oriented protocol and it is possible for two 1083 tunnel endpoints to communicate with each other using different paths 1084 or to have one side not transmit anything at all. As Geneve is an 1085 IP-based protocol, the tunnel layer inherits these same 1086 characteristics. 1088 It is possible for a tunnel to encapsulate a protocol, such as TCP, 1089 which is connection oriented and maintains session state at that 1090 layer. In addition, implementations MAY model Geneve tunnels as 1091 connected, bidirectional links, such as to provide the abstraction of 1092 a virtual port. In both of these cases, bidirectionality of the 1093 tunnel is handled at a higher layer and does not affect the operation 1094 of Geneve itself. 1096 4.5. Constraints on Protocol Features 1098 Geneve is intended to be flexible to a wide range of current and 1099 future applications. As a result, certain constraints may be placed 1100 on the use of metadata or other aspects of the protocol in order to 1101 optimize for a particular use case. For example, some applications 1102 may limit the types of options which are supported or enforce a 1103 maximum number or length of options. Other applications may only 1104 handle certain encapsulated payload types, such as Ethernet or IP. 1105 This could be either globally throughout the system or, for example, 1106 restricted to certain classes of devices or network paths. 1108 These constraints may be communicated to tunnel endpoints either 1109 explicitly through a control plane or implicitly by the nature of the 1110 application. As Geneve is defined as a data plane protocol that is 1111 control plane agnostic, definition of such mechanisms are beyond the 1112 scope of this document. 1114 4.5.1. Constraints on Options 1116 While Geneve options are flexible, a control plane may restrict the 1117 number of option TLVs as well as the order and size of the TLVs 1118 between tunnel endpoints to make it simpler for a data plane 1119 implementation in software or hardware to handle 1120 [I-D.ietf-nvo3-encap]. For example, there may be some critical 1121 information such as a secure hash that must be processed in a certain 1122 order to provide lowest latency or there may be other scenarios where 1123 the options must be processed in a certain order due to protocol 1124 semantics. 1126 A control plane may negotiate a subset of option TLVs and certain TLV 1127 ordering, as well may limit the total number of option TLVs present 1128 in the packet, for example, to accommodate hardware capable of 1129 processing fewer options [I-D.ietf-nvo3-encap]. Hence, a control 1130 plane needs to have the ability to describe the supported TLVs subset 1131 and their order to the tunnel endpoints. In the absence of a control 1132 plane, alternative configuration mechanisms may be used for this 1133 purpose. Such mechanisms are beyond the scope of this document. 1135 4.6. NIC Offloads 1137 Modern NICs currently provide a variety of offloads to enable the 1138 efficient processing of packets. The implementation of many of these 1139 offloads requires only that the encapsulated packet be easily parsed 1140 (for example, checksum offload). However, optimizations such as LSO 1141 and LRO involve some processing of the options themselves since they 1142 must be replicated/merged across multiple packets. In these 1143 situations, it is desirable to not require changes to the offload 1144 logic to handle the introduction of new options. To enable this, 1145 some constraints are placed on the definitions of options to allow 1146 for simple processing rules: 1148 o When performing LSO, a NIC MUST replicate the entire Geneve header 1149 and all options, including those unknown to the device, onto each 1150 resulting segment unless an option allows an exception. 1151 Conversely, when performing LRO, a NIC may assume that a binary 1152 comparison of the options (including unknown options) is 1153 sufficient to ensure equality and MAY merge packets with equal 1154 Geneve headers. 1156 o Options MUST NOT be reordered during the course of offload 1157 processing, including when merging packets for the purpose of LRO. 1159 o NICs performing offloads MUST NOT drop packets with unknown 1160 options, including those marked as critical, unless explicitly 1161 configured. 1163 There is no requirement that a given implementation of Geneve employ 1164 the offloads listed as examples above. However, as these offloads 1165 are currently widely deployed in commercially available NICs, the 1166 rules described here are intended to enable efficient handling of 1167 current and future options across a variety of devices. 1169 4.7. Inner VLAN Handling 1171 Geneve is capable of encapsulating a wide range of protocols and 1172 therefore a given implementation is likely to support only a small 1173 subset of the possibilities. However, as Ethernet is expected to be 1174 widely deployed, it is useful to describe the behavior of VLANs 1175 inside encapsulated Ethernet frames. 1177 As with any protocol, support for inner VLAN headers is OPTIONAL. In 1178 many cases, the use of encapsulated VLANs may be disallowed due to 1179 security or implementation considerations. However, in other cases 1180 trunking of VLAN frames across a Geneve tunnel can prove useful. As 1181 a result, the processing of inner VLAN tags upon ingress or egress 1182 from a tunnel endpoint is based upon the configuration of the tunnel 1183 endpoint and/or control plane and not explicitly defined as part of 1184 the data format. 1186 5. Transition Considerations 1188 Viewed exclusively from the data plane, Geneve is compatible with 1189 existing IP networks as it appears to most devices as UDP packets. 1190 However, as there are already a number of tunnel protocols deployed 1191 in network virtualization environments, there is a practical question 1192 of transition and coexistence. 1194 Since Geneve builds on the base data plane functionality provided by 1195 the most common protocols used for network virtualization (VXLAN, 1196 NVGRE) it should be straightforward to port an existing control plane 1197 to run on top of it with minimal effort. With both the old and new 1198 packet formats supporting the same set of capabilities, there is no 1199 need for a hard transition - tunnel endpoints directly communicating 1200 with each other can use any common protocol, which may be different 1201 even within a single overall system. As transit devices are 1202 primarily forwarding packets on the basis of the IP header, all 1203 protocols appear similar and these devices do not introduce 1204 additional interoperability concerns. 1206 To assist with this transition, it is strongly suggested that 1207 implementations support simultaneous operation of both Geneve and 1208 existing tunnel protocols as it is expected to be common for a single 1209 node to communicate with a mixture of other nodes. Eventually, older 1210 protocols may be phased out as they are no longer in use. 1212 6. Security Considerations 1214 As encapsulated within a UDP/IP packet, Geneve does not have any 1215 inherent security mechanisms. As a result, an attacker with access 1216 to the underlay network transporting the IP packets has the ability 1217 to snoop, alter or inject packets. Compromised tunnel endpoints or 1218 transit devices may also spoof identifiers in the tunnel header to 1219 gain access to networks owned by other tenants. 1221 Within a particular security domain, such as a data center operated 1222 by a single service provider, the most common and highest performing 1223 security mechanism is isolation of trusted components. Tunnel 1224 traffic can be carried over a separate VLAN and filtered at any 1225 untrusted boundaries. 1227 When crossing an untrusted link, such as the general Internet, VPN 1228 technologies such as IPsec [RFC4301] should be used to provide 1229 authentication and/or encryption of the IP packets formed as part of 1230 Geneve encapsulation (See Section 6.1.1). 1232 Geneve does not otherwise affect the security of the encapsulated 1233 packets. As per the guidelines of BCP 72 [RFC3552], the following 1234 sections describe potential security risks that may be applicable to 1235 Geneve deployments and approaches to mitigate such risks. It is also 1236 noted that not all such risks are applicable to all Geneve deployment 1237 scenarios, i.e., only a subset may be applicable to certain 1238 deployments. So an operator has to make an assessment based on their 1239 network environment and determine the risks that are applicable to 1240 their specific environment and use appropriate mitigation approaches 1241 as applicable. 1243 6.1. Data Confidentiality 1245 Geneve is a network virtualization overlay encapsulation protocol 1246 designed to establish tunnels between NVEs over an existing IP 1247 network. It can be used to deploy multi-tenant overlay networks over 1248 an existing IP underlay network in a public or private data center. 1249 The overlay service is typically provided by a service provider, for 1250 example a cloud services provider or a private data center operator, 1251 this may or not may be the same provider as an underlay service 1252 provider. Due to the nature of multi-tenancy in such environments, a 1253 tenant system may expect data confidentiality to ensure its packet 1254 data is not tampered with (active attack) in transit or a target of 1255 unauthorized monitoring (passive attack) for example by other tenant 1256 systems or underlay service provider. A compromised network node or 1257 a transit device within a data center may passively monitor Geneve 1258 packet data between NVEs; or route traffic for further inspection. A 1259 tenant may expect the overlay service provider to provide data 1260 confidentiality as part of the service or a tenant may bring its own 1261 data confidentiality mechanisms like IPsec or TLS to protect the data 1262 end to end between its tenant systems. The overlay provider is 1263 expected to provide cryptographic protection in cases where the 1264 underlay provider is not the same as the overlay provider to ensure 1265 the payload is not exposed to the underlay. 1267 If an operator determines data confidentiality is necessary in their 1268 environment based on their risk analysis, for example as in multi- 1269 tenant environments, then an encryption mechanism SHOULD be used to 1270 encrypt the tenant data end to end between the NVEs. The NVEs may 1271 use existing well established encryption mechanisms such as IPsec, 1272 DTLS, etc. 1274 6.1.1. Inter-Data Center Traffic 1276 A tenant system in a customer premises (private data center) may want 1277 to connect to tenant systems on their tenant overlay network in a 1278 public cloud data center or a tenant may want to have its tenant 1279 systems located in multiple geographically separated data centers for 1280 high availability. Geneve data traffic between tenant systems across 1281 such separated networks should be protected from threats when 1282 traversing public networks. Any Geneve overlay data leaving the data 1283 center network beyond the operator's security domain SHOULD be 1284 secured by encryption mechanisms such as IPsec or other VPN 1285 technologies to protect the communications between the NVEs when they 1286 are geographically separated over untrusted network links. 1287 Specification of data protection mechanisms employed between data 1288 centers is beyond the scope of this document. 1290 The principles described in Section 4 regarding controlled 1291 environments still apply to the geographically separated data center 1292 usage outlined in this section. 1294 6.2. Data Integrity 1296 Geneve encapsulation is used between NVEs to establish overlay 1297 tunnels over an existing IP underlay network. In a multi-tenant data 1298 center, a rogue or compromised tenant system may try to launch a 1299 passive attack such as monitoring the traffic of other tenants, or an 1300 active attack such as trying to inject unauthorized Geneve 1301 encapsulated traffic such as spoofing, replay, etc., into the 1302 network. To prevent such attacks, an NVE MUST NOT propagate Geneve 1303 packets beyond the NVE to tenant systems and SHOULD employ packet 1304 filtering mechanisms so as not to forward unauthorized traffic 1305 between tenant systems in different tenant networks. An NVE MUST NOT 1306 interpret Geneve packets from tenant systems other than as frames to 1307 be encapsulated. 1309 A compromised network node or a transit device within a data center 1310 may launch an active attack trying to tamper with the Geneve packet 1311 data between NVEs. Malicious tampering of Geneve header fields may 1312 cause the packet from one tenant to be forwarded to a different 1313 tenant network. If an operator determines the possibility of such 1314 threat in their environment, the operator may choose to employ data 1315 integrity mechanisms between NVEs. In order to prevent such risks, a 1316 data integrity mechanism SHOULD be used in such environments to 1317 protect the integrity of Geneve packets including packet headers, 1318 options and payload on communications between NVE pairs. A 1319 cryptographic data protection mechanism such as IPsec may be used to 1320 provide data integrity protection. A data center operator may choose 1321 to deploy any other data integrity mechanisms as applicable and 1322 supported in their underlay networks, although non-cryptographic 1323 mechanisms may not protect the Geneve portion of the packet from 1324 tampering. 1326 6.3. Authentication of NVE peers 1328 A rogue network device or a compromised NVE in a data center 1329 environment might be able to spoof Geneve packets as if it came from 1330 a legitimate NVE. In order to mitigate such a risk, an operator 1331 SHOULD use an authentication mechanism, such as IPsec to ensure that 1332 the Geneve packet originated from the intended NVE peer, in 1333 environments where the operator determines spoofing or rogue devices 1334 is a potential threat. Other simpler source checks such as ingress 1335 filtering for VLAN/MAC/IP address, reverse path forwarding checks, 1336 etc., may be used in certain trusted environments to ensure Geneve 1337 packets originated from the intended NVE peer. 1339 6.4. Options Interpretation by Transit Devices 1341 Options, if present in the packet, are generated and terminated by 1342 tunnel endpoints. As indicated in Section 2.2.1, transit devices may 1343 interpret the options. However, if the packet is protected by tunnel 1344 endpoint to tunnel endpoint encryption, for example through IPsec, 1345 transit devices will not have visibility into the Geneve header or 1346 options in the packet. In such cases transit devices MUST handle 1347 Geneve packets as any other IP packet and maintain consistent 1348 forwarding behavior. In cases where options are interpreted by 1349 transit devices, the operator MUST ensure that transit devices are 1350 trusted and not compromised. The definition of a mechanism to ensure 1351 this trust is beyond the scope of this document. 1353 6.5. Multicast/Broadcast 1355 In typical data center networks where IP multicasting is not 1356 supported in the underlay network, multicasting may be supported 1357 using multiple unicast tunnels. The same security requirements as 1358 described in the above sections can be used to protect Geneve 1359 communications between NVE peers. If IP multicasting is supported in 1360 the underlay network and the operator chooses to use it for multicast 1361 traffic among tunnel endpoints, then the operator in such 1362 environments may use data protection mechanisms such as IPsec with 1363 multicast extensions [RFC5374] to protect multicast traffic among 1364 Geneve NVE groups. 1366 6.6. Control Plane Communications 1368 A Network Virtualization Authority (NVA) as outlined in [RFC8014] may 1369 be used as a control plane for configuring and managing the Geneve 1370 NVEs. The data center operator is expected to use security 1371 mechanisms to protect the communications between the NVA to NVEs and 1372 use authentication mechanisms to detect any rogue or compromised NVEs 1373 within their administrative domain. Data protection mechanisms for 1374 control plane communication or authentication mechanisms between the 1375 NVA and the NVEs are beyond the scope of this document. 1377 7. IANA Considerations 1379 IANA has allocated UDP port 6081 as the well-known destination port 1380 for Geneve. An early registration for Geneve has been made at the 1381 Service Name and Transport Protocol Port Number Registry [IANA-SN] as 1382 noted below: 1384 Service Name: geneve 1385 Transport Protocol(s): UDP 1386 Assignee: Jesse Gross 1387 Contact: Jesse Gross 1388 Description: Generic Network Virtualization Encapsulation (Geneve) 1389 Reference: This document 1390 Port Number: 6081 1392 Upon publication of this document, this registration will have its 1393 reference changed to cite this document [RFC-to-be] and inline with 1394 [RFC6335] the Assignee and Contact of the port entry should be 1395 changed to IESG and IETF Chair 1396 respectively. 1398 In addition, IANA is requested to create a new "Geneve Option Class" 1399 registry to allocate Option Classes. This registry is to be placed 1400 under a new Network Virtualization Overlay (NVO3) protocols page (to 1401 be created) in IANA protocol registries [IANA-PR]. The Geneve Option 1402 Class registry shall consist of 16-bit hexadecimal values along with 1403 descriptive strings, Assignee/Contact information and References. 1404 The registration rules for the new registry are (as defined by 1405 [RFC8126]): 1407 +----------------+-------------------------+ 1408 | Range | Registration Procedures | 1409 +----------------+-------------------------+ 1410 | 0x0000..0x00FF | IETF Review | 1411 | 0x0100..0xFEFF | First Come First Served | 1412 | 0xFF00..0xFFFF | Experimental Use | 1413 +----------------+-------------------------+ 1415 Inital registrations in the new registry are as follows: 1417 +----------------+------------------+------------------+------------+ 1418 | Option Class | Description | Assignee/Contact | References | 1419 +----------------+------------------+------------------+------------+ 1420 | 0x0100 | Linux | | | 1421 | 0x0101 | Open vSwitch | | | 1422 | | (OVS) | | | 1423 | 0x0102 | Open Virtual | | | 1424 | | Networking (OVN) | | | 1425 | 0x0103 | In-band Network | | | 1426 | | Telemetry (INT) | | | 1427 | 0x0104 | VMware, Inc. | | | 1428 | 0x0105 | Amazon.com, Inc. | | | 1429 | 0x0106 | Cisco Systems, | | | 1430 | | Inc. | | | 1431 | 0x0107 | Oracle | | | 1432 | | Corporation | | | 1433 | 0x0108..0x0110 | Amazon.com, Inc. | | | 1434 +----------------+------------------+------------------+------------+ 1436 8. Contributors 1438 The following individuals were authors of an earlier version of this 1439 document and made significant contributions: 1441 Pankaj Garg 1442 Microsoft Corporation 1443 1 Microsoft Way 1444 Redmond, WA 98052 1445 USA 1447 Email: pankajg@microsoft.com 1449 Chris Wright 1450 Red Hat Inc. 1451 1801 Varsity Drive 1452 Raleigh, NC 27606 1453 USA 1455 Email: chrisw@redhat.com 1457 Kenneth Duda 1458 Arista Networks 1459 5453 Great America Parkway 1460 Santa Clara, CA 95054 1461 USA 1463 Email: kduda@arista.com 1465 Dinesh G. Dutt 1466 Independent 1468 Email: didutt@gmail.com 1470 Jon Hudson 1471 Independent 1473 Email: jon.hudson@gmail.com 1475 Ariel Hendel 1476 Facebook, Inc. 1477 1 Hacker Way 1478 Menlo Park, CA 94025 1479 USA 1481 Email: ahendel@fb.com 1483 9. Acknowledgements 1485 The authors wish to acknowledge Puneet Agarwal, David Black, Sami 1486 Boutros, Scott Bradner, Martin Casado, Alissa Cooper, Roman Danyliw, 1487 Bruce Davie, Anoop Ghanwani, Benjamin Kaduk, Suresh Krishnan, Mirja 1488 Kuhlewind, Barry Leiba, Daniel Migault, Greg Mirksy, Tal Mizrahi, 1489 Kathleen Moriarty, Magnus Nystrom, Adam Roach, Sabrina Tanamal, Dave 1490 Thaler, Eric Vyncke, Magnus Westerlund and many other members of the 1491 NVO3 WG for their reviews, comments and suggestions. 1493 The authors would like to thank Sam Aldrin, Alia Atlas, Matthew 1494 Bocci, Benson Schliesser, and Martin Vigoureux for their guidance 1495 throughout the process. 1497 10. References 1499 10.1. Normative References 1501 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 1502 DOI 10.17487/RFC0768, August 1980, 1503 . 1505 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1506 RFC 792, DOI 10.17487/RFC0792, September 1981, 1507 . 1509 [RFC1112] Deering, S., "Host extensions for IP multicasting", STD 5, 1510 RFC 1112, DOI 10.17487/RFC1112, August 1989, 1511 . 1513 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 1514 DOI 10.17487/RFC1191, November 1990, 1515 . 1517 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 1518 DOI 10.17487/RFC2003, October 1996, 1519 . 1521 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1522 Requirement Levels", BCP 14, RFC 2119, 1523 DOI 10.17487/RFC2119, March 1997, 1524 . 1526 [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet 1527 Control Message Protocol (ICMPv6) for the Internet 1528 Protocol Version 6 (IPv6) Specification", STD 89, 1529 RFC 4443, DOI 10.17487/RFC4443, March 2006, 1530 . 1532 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 1533 Notification", RFC 6040, DOI 10.17487/RFC6040, November 1534 2010, . 1536 [RFC6936] Fairhurst, G. and M. Westerlund, "Applicability Statement 1537 for the Use of IPv6 UDP Datagrams with Zero Checksums", 1538 RFC 6936, DOI 10.17487/RFC6936, April 2013, 1539 . 1541 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1542 Rekhter, "Framework for Data Center (DC) Network 1543 Virtualization", RFC 7365, DOI 10.17487/RFC7365, October 1544 2014, . 1546 [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage 1547 Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, 1548 March 2017, . 1550 [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for 1551 Writing an IANA Considerations Section in RFCs", BCP 26, 1552 RFC 8126, DOI 10.17487/RFC8126, June 2017, 1553 . 1555 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1556 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1557 May 2017, . 1559 [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 1560 (IPv6) Specification", STD 86, RFC 8200, 1561 DOI 10.17487/RFC8200, July 2017, 1562 . 1564 [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., 1565 "Path MTU Discovery for IP version 6", STD 87, RFC 8201, 1566 DOI 10.17487/RFC8201, July 2017, 1567 . 1569 10.2. Informative References 1571 [ETYPES] The IEEE Registration Authority, "IEEE 802 Numbers", 1572 . 1574 [I-D.ietf-intarea-tunnels] 1575 Touch, J. and M. Townsley, "IP Tunnels in the Internet 1576 Architecture", draft-ietf-intarea-tunnels-10 (work in 1577 progress), September 2019. 1579 [I-D.ietf-nvo3-dataplane-requirements] 1580 Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., 1581 and B. Khasnabish, "NVO3 Data Plane Requirements", draft- 1582 ietf-nvo3-dataplane-requirements-03 (work in progress), 1583 April 2014. 1585 [I-D.ietf-nvo3-encap] 1586 Boutros, S., "NVO3 Encapsulation Considerations", draft- 1587 ietf-nvo3-encap-05 (work in progress), February 2020. 1589 [IANA-PR] IANA, "Protocol Registries", 1590 . 1592 [IANA-SN] IANA, "Service Name and Transport Protocol Port Number 1593 Registry", . 1596 [IEEE.802.1Q_2018] 1597 IEEE, "IEEE Standard for Local and Metropolitan Area 1598 Networks--Bridges and Bridged Networks", IEEE 802.1Q-2018, 1599 DOI 10.1109/ieeestd.2018.8403927, July 2018, 1600 . 1603 [RFC2983] Black, D., "Differentiated Services and Tunnels", 1604 RFC 2983, DOI 10.17487/RFC2983, October 2000, 1605 . 1607 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 1608 Label Switching Architecture", RFC 3031, 1609 DOI 10.17487/RFC3031, January 2001, 1610 . 1612 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1613 Text on Security Considerations", BCP 72, RFC 3552, 1614 DOI 10.17487/RFC3552, July 2003, 1615 . 1617 [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation 1618 Edge-to-Edge (PWE3) Architecture", RFC 3985, 1619 DOI 10.17487/RFC3985, March 2005, 1620 . 1622 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 1623 Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, 1624 December 2005, . 1626 [RFC5374] Weis, B., Gross, G., and D. Ignjatic, "Multicast 1627 Extensions to the Security Architecture for the Internet 1628 Protocol", RFC 5374, DOI 10.17487/RFC5374, November 2008, 1629 . 1631 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 1632 Cheshire, "Internet Assigned Numbers Authority (IANA) 1633 Procedures for the Management of the Service Name and 1634 Transport Protocol Port Number Registry", BCP 165, 1635 RFC 6335, DOI 10.17487/RFC6335, August 2011, 1636 . 1638 [RFC6438] Carpenter, B. and S. Amante, "Using the IPv6 Flow Label 1639 for Equal Cost Multipath Routing and Link Aggregation in 1640 Tunnels", RFC 6438, DOI 10.17487/RFC6438, November 2011, 1641 . 1643 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1644 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1645 eXtensible Local Area Network (VXLAN): A Framework for 1646 Overlaying Virtualized Layer 2 Networks over Layer 3 1647 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1648 . 1650 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1651 Virtualization Using Generic Routing Encapsulation", 1652 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1653 . 1655 [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. 1656 Narten, "An Architecture for Data-Center Network 1657 Virtualization over Layer 3 (NVO3)", RFC 8014, 1658 DOI 10.17487/RFC8014, December 2016, 1659 . 1661 [RFC8086] Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE- 1662 in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086, 1663 March 2017, . 1665 [RFC8293] Ghanwani, A., Dunbar, L., McBride, M., Bannai, V., and R. 1666 Krishnan, "A Framework for Multicast in Network 1667 Virtualization over Layer 3", RFC 8293, 1668 DOI 10.17487/RFC8293, January 2018, 1669 . 1671 [VL2] "VL2: A Scalable and Flexible Data Center Network", ACM 1672 SIGCOMM Computer Communication Review, 1673 DOI 10.1145/1594977.1592576, 2009, 1674 . 1677 Authors' Addresses 1679 Jesse Gross (editor) 1681 Email: jesse@kernel.org 1683 Ilango Ganga (editor) 1684 Intel Corporation 1685 2200 Mission College Blvd. 1686 Santa Clara, CA 95054 1687 USA 1689 Email: ilango.s.ganga@intel.com 1691 T. Sridhar (editor) 1692 VMware, Inc. 1693 3401 Hillview Ave. 1694 Palo Alto, CA 94304 1695 USA 1697 Email: tsridhar@vmware.com