idnits 2.17.1 

draft-ietf-nvo3-geneve-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 26, 2019) is 1851 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-intarea-tunnels-09

  == Outdated reference: A later version (-12) exists of
     draft-ietf-nvo3-encap-02

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                      J. Gross, Ed.
3	Internet-Draft
4	Intended status: Standards Track                           I. Ganga, Ed.
5	Expires: September 27, 2019                                        Intel
6	                                                         T. Sridhar, Ed.
7	                                                                  VMware
8	                                                          March 26, 2019

10	          Geneve: Generic Network Virtualization Encapsulation
11	                       draft-ietf-nvo3-geneve-13

13	Abstract

15	   Network virtualization involves the cooperation of devices with a
16	   wide variety of capabilities such as software and hardware tunnel
17	   endpoints, transit fabrics, and centralized control clusters.  As a
18	   result of their role in tying together different elements in the
19	   system, the requirements on tunnels are influenced by all of these
20	   components.  Flexibility is therefore the most important aspect of a
21	   tunnel protocol if it is to keep pace with the evolution of the
22	   system.  This document describes Geneve, an encapsulation protocol
23	   designed to recognize and accommodate these changing capabilities and
24	   needs.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on September 27, 2019.

43	Copyright Notice

45	   Copyright (c) 2019 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
61	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
62	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
63	   2.  Design Requirements . . . . . . . . . . . . . . . . . . . . .   5
64	     2.1.  Control Plane Independence  . . . . . . . . . . . . . . .   6
65	     2.2.  Data Plane Extensibility  . . . . . . . . . . . . . . . .   7
66	       2.2.1.  Efficient Implementation  . . . . . . . . . . . . . .   7
67	     2.3.  Use of Standard IP Fabrics  . . . . . . . . . . . . . . .   8
68	   3.  Geneve Encapsulation Details  . . . . . . . . . . . . . . . .   9
69	     3.1.  Geneve Packet Format Over IPv4  . . . . . . . . . . . . .   9
70	     3.2.  Geneve Packet Format Over IPv6  . . . . . . . . . . . . .  10
71	     3.3.  UDP Header  . . . . . . . . . . . . . . . . . . . . . . .  12
72	     3.4.  Tunnel Header Fields  . . . . . . . . . . . . . . . . . .  13
73	     3.5.  Tunnel Options  . . . . . . . . . . . . . . . . . . . . .  14
74	       3.5.1.  Options Processing  . . . . . . . . . . . . . . . . .  16
75	   4.  Implementation and Deployment Considerations  . . . . . . . .  17
76	     4.1.  Applicability Statement . . . . . . . . . . . . . . . . .  17
77	     4.2.  Congestion Control Functionality  . . . . . . . . . . . .  18
78	     4.3.  UDP Checksum  . . . . . . . . . . . . . . . . . . . . . .  18
79	       4.3.1.  UDP Zero Checksum Handling with IPv6  . . . . . . . .  19
80	     4.4.  Encapsulation of Geneve in IP . . . . . . . . . . . . . .  20
81	       4.4.1.  IP Fragmentation  . . . . . . . . . . . . . . . . . .  20
82	       4.4.2.  DSCP, ECN and TTL . . . . . . . . . . . . . . . . . .  21
83	       4.4.3.  Broadcast and Multicast . . . . . . . . . . . . . . .  22
84	       4.4.4.  Unidirectional Tunnels  . . . . . . . . . . . . . . .  22
85	     4.5.  Constraints on Protocol Features  . . . . . . . . . . . .  23
86	       4.5.1.  Constraints on Options  . . . . . . . . . . . . . . .  23
87	     4.6.  NIC Offloads  . . . . . . . . . . . . . . . . . . . . . .  24
88	     4.7.  Inner VLAN Handling . . . . . . . . . . . . . . . . . . .  24
89	   5.  Interoperability Issues . . . . . . . . . . . . . . . . . . .  25
90	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  25
91	     6.1.  Data Confidentiality  . . . . . . . . . . . . . . . . . .  26
92	       6.1.1.  Inter-Data Center Traffic . . . . . . . . . . . . . .  26
93	     6.2.  Data Integrity  . . . . . . . . . . . . . . . . . . . . .  27
94	     6.3.  Authentication of NVE peers . . . . . . . . . . . . . . .  27
95	     6.4.  Options Interpretation by Transit Devices . . . . . . . .  28
96	     6.5.  Multicast/Broadcast . . . . . . . . . . . . . . . . . . .  28
97	     6.6.  Control Plane Communications  . . . . . . . . . . . . . .  28
98	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
99	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  29
100	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  30
101	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  31
102	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  31
103	     10.2.  Informative References . . . . . . . . . . . . . . . . .  32
104	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35

106	1.  Introduction

108	   Networking has long featured a variety of tunneling, tagging, and
109	   other encapsulation mechanisms.  However, the advent of network
110	   virtualization has caused a surge of renewed interest and a
111	   corresponding increase in the introduction of new protocols.  The
112	   large number of protocols in this space, ranging all the way from
113	   VLANs [IEEE.802.1Q_2014] and MPLS [RFC3031] through the more recent
114	   VXLAN [RFC7348] (Virtual eXtensible Local Area Network) and NVGRE
115	   [RFC7637] (Network Virtualization Using Generic Routing
116	   Encapsulation), often leads to questions about the need for new
117	   encapsulation formats and what it is about network virtualization in
118	   particular that leads to their proliferation.

120	   While many encapsulation protocols seek to simply partition the
121	   underlay network or bridge between two domains, network
122	   virtualization views the transit network as providing connectivity
123	   between multiple components of a distributed system.  In many ways
124	   this system is similar to a chassis switch with the IP underlay
125	   network playing the role of the backplane and tunnel endpoints on the
126	   edge as line cards.  When viewed in this light, the requirements
127	   placed on the tunnel protocol are significantly different in terms of
128	   the quantity of metadata necessary and the role of transit nodes.

130	   Current work such as [VL2] (A Scalable and Flexible Data Center
131	   Network) and the NVO3 Data Plane Requirements
132	   [I-D.ietf-nvo3-dataplane-requirements] have described some of the
133	   properties that the data plane must have to support network
134	   virtualization.  However, one additional defining requirement is the
135	   need to carry system state along with the packet data.  The use of
136	   some metadata is certainly not a foreign concept - nearly all
137	   protocols used for virtualization have at least 24 bits of identifier
138	   space as a way to partition between tenants.  This is often described
139	   as overcoming the limits of 12-bit VLANs, and when seen in that
140	   context, or any context where it is a true tenant identifier, 16
141	   million possible entries is a large number.  However, the reality is
142	   that the metadata is not exclusively used to identify tenants and
143	   encoding other information quickly starts to crowd the space.  In
144	   fact, when compared to the tags used to exchange metadata between
145	   line cards on a chassis switch, 24-bit identifiers start to look
146	   quite small.  There are nearly endless uses for this metadata,
147	   ranging from storing input ports for simple security policies to
148	   service based context for interposing advanced middleboxes.

150	   Existing tunnel protocols have each attempted to solve different
151	   aspects of these new requirements, only to be quickly rendered out of
152	   date by changing control plane implementations and advancements.
153	   Furthermore, software and hardware components and controllers all
154	   have different advantages and rates of evolution - a fact that should
155	   be viewed as a benefit, not a liability or limitation.  This draft
156	   describes Geneve, a protocol which seeks to avoid these problems by
157	   providing a framework for tunneling for network virtualization rather
158	   than being prescriptive about the entire system.

160	1.1.  Requirements Language

162	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
163	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
164	   "OPTIONAL" in this document are to be interpreted as described in BCP
165	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
166	   capitals, as shown here.

168	1.2.  Terminology

170	   The NVO3 framework [RFC7365] defines many of the concepts commonly
171	   used in network virtualization.  In addition, the following terms are
172	   specifically meaningful in this document:

174	   Checksum offload.  An optimization implemented by many NICs (Network
175	   Interface Controller) which enables computation and verification of
176	   upper layer protocol checksums in hardware on transmit and receive,
177	   respectively.  This typically includes IP and TCP/UDP checksums which
178	   would otherwise be computed by the protocol stack in software.

180	   Clos network.  A technique for composing network fabrics larger than
181	   a single switch while maintaining non-blocking bandwidth across
182	   connection points.  ECMP is used to divide traffic across the
183	   multiple links and switches that constitute the fabric.  Sometimes
184	   termed "leaf and spine" or "fat tree" topologies.

186	   ECMP.  Equal Cost Multipath.  A routing mechanism for selecting from
187	   among multiple best next hop paths by hashing packet headers in order
188	   to better utilize network bandwidth while avoiding reordering of
189	   packets within a flow.

191	   Geneve.  Generic Network Virtualization Encapsulation.  The tunnel
192	   protocol described in this document.

194	   LRO.  Large Receive Offload.  The receive-side equivalent function of
195	   LSO, in which multiple protocol segments (primarily TCP) are
196	   coalesced into larger data units.

198	   NIC.  Network Interface Controller.  Also called as Network Interface
199	   Card or Network Adapter.  A NIC could be part of a tunnel endpoint or
200	   transit device and can either process Geneve packets or aid in the
201	   processing of Geneve packets.

203	   Transit device.  A forwarding element (e.g. router or switch) along
204	   the path of the tunnel making up part of the Underlay Network.  A
205	   transit device MAY be capable of understanding the Geneve packet
206	   format but does not originate or terminate Geneve packets.

208	   LSO.  Large Segmentation Offload.  A function provided by many
209	   commercial NICs that allows data units larger than the MTU to be
210	   passed to the NIC to improve performance, the NIC being responsible
211	   for creating smaller segments of size less than or equal to the MTU
212	   with correct protocol headers.  When referring specifically to TCP/
213	   IP, this feature is often known as TSO (TCP Segmentation Offload).

215	   Tunnel endpoint.  A component performing encapsulation and
216	   decapsulation of packets, such as Ethernet frames or IP datagrams, in
217	   Geneve headers.  As the ultimate consumer of any tunnel metadata,
218	   tunnel endpoints have the highest level of requirements for parsing
219	   and interpreting tunnel headers.  Tunnel endpoints may consist of
220	   either software or hardware implementations or a combination of the
221	   two.  Tunnel endpoints are frequently a component of an NVE (Network
222	   Virtualization Edge) but may also be found in middleboxes or other
223	   elements making up an NVO3 Network.

225	   VM.  Virtual Machine.

227	2.  Design Requirements

229	   Geneve is designed to support network virtualization use cases, where
230	   tunnels are typically established to act as a backplane between the
231	   virtual switches residing in hypervisors, physical switches, or
232	   middleboxes or other appliances.  An arbitrary IP network can be used
233	   as an underlay although Clos networks composed using ECMP links are a
234	   common choice to provide consistent bisectional bandwidth across all
235	   connection points.  Many of the concepts of network virtualization
236	   overlays over Layer 3 IP networks are described in NVO3 Framework
237	   framework [RFC7365].  Figure 1 shows an example of a hypervisor, top
238	   of rack switch for connectivity to physical servers, and a WAN uplink
239	   connected using Geneve tunnels over a simplified Clos network.  These
240	   tunnels are used to encapsulate and forward frames from the attached
241	   components such as VMs or physical links.

243	     +---------------------+           +-------+  +------+
244	     | +--+  +-------+---+ |           |Transit|--|Top of|==Physical
245	     | |VM|--|       |   | | +------+ /|Router |  | Rack |==Servers
246	     | +--+  |Virtual|NIC|---|Top of|/ +-------+\/+------+
247	     | +--+  |Switch |   | | | Rack |\ +-------+/\+------+
248	     | |VM|--|       |   | | +------+ \|Transit|  |Uplink|   WAN
249	     | +--+  +-------+---+ |           |Router |--|      |=========>
250	     +---------------------+           +-------+  +------+
251	            Hypervisor

253	                 ()===================================()
254	                         Switch-Switch Geneve Tunnels

256	                    Figure 1: Sample Geneve Deployment

258	   To support the needs of network virtualization, the tunnel protocol
259	   should be able to take advantage of the differing (and evolving)
260	   capabilities of each type of device in both the underlay and overlay
261	   networks.  This results in the following requirements being placed on
262	   the data plane tunneling protocol:

264	   o  The data plane is generic and extensible enough to support current
265	      and future control planes.

267	   o  Tunnel components are efficiently implementable in both hardware
268	      and software without restricting capabilities to the lowest common
269	      denominator.

271	   o  High performance over existing IP fabrics.

273	   These requirements are described further in the following
274	   subsections.

276	2.1.  Control Plane Independence

278	   Although some protocols for network virtualization have included a
279	   control plane as part of the tunnel format specification (most
280	   notably, the VXLAN spec prescribed a multicast learning- based
281	   control plane), these specifications have largely been treated as
282	   describing only the data format.  The VXLAN packet format has
283	   actually seen a wide variety of control planes built on top of it.

285	   There is a clear advantage in settling on a data format: most of the
286	   protocols are only superficially different and there is little
287	   advantage in duplicating effort.  However, the same cannot be said of
288	   control planes, which are diverse in very fundamental ways.  The case
289	   for standardization is also less clear given the wide variety in
290	   requirements, goals, and deployment scenarios.

292	   As a result of this reality, Geneve is a pure tunnel format
293	   specification that is capable of fulfilling the needs of many control
294	   planes by explicitly not selecting any one of them.  This
295	   simultaneously promotes a shared data format and reduces the chance
296	   of obsolescence by future control plane enhancements.

298	2.2.  Data Plane Extensibility

300	   Achieving the level of flexibility needed to support current and
301	   future control planes effectively requires an options infrastructure
302	   to allow new metadata types to be defined, deployed, and either
303	   finalized or retired.  Options also allow for differentiation of
304	   products by encouraging independent development in each vendor's core
305	   specialty, leading to an overall faster pace of advancement.  By far
306	   the most common mechanism for implementing options is Type-Length-
307	   Value (TLV) format.

309	   It should be noted that while options can be used to support non-
310	   wirespeed control packets, they are equally important on data packets
311	   as well to segregate and direct forwarding (for instance, the
312	   examples given before of input port based security policies and
313	   service interposition both require tags to be placed on data
314	   packets).  Therefore, while it would be desirable to limit the
315	   extensibility to only control packets for the purposes of simplifying
316	   the datapath, that would not satisfy the design requirements.

318	2.2.1.  Efficient Implementation

320	   There is often a conflict between software flexibility and hardware
321	   performance that is difficult to resolve.  For a given set of
322	   functionality, it is obviously desirable to maximize performance.
323	   However, that does not mean new features that cannot be run at a
324	   desired speed today should be disallowed.  Therefore, for a protocol
325	   to be efficiently implementable means that a set of common
326	   capabilities can be reasonably handled across platforms along with a
327	   graceful mechanism to handle more advanced features in the
328	   appropriate situations.

330	   The use of a variable length header and options in a protocol often
331	   raises questions about whether it is truly efficiently implementable
332	   in hardware.  To answer this question in the context of Geneve, it is
333	   important to first divide "hardware" into two categories: tunnel
334	   endpoints and transit devices.

336	   Tunnel endpoints must be able to parse the variable header, including
337	   any options, and take action.  Since these devices are actively
338	   participating in the protocol, they are the most affected by Geneve.

340	   However, as tunnel endpoints are the ultimate consumers of the data,
341	   transmitters can tailor their output to the capabilities of the
342	   recipient.  As new functionality becomes sufficiently well defined to
343	   add to tunnel endpoints, supporting options can be designed using
344	   ordering restrictions and other techniques to ease parsing.

346	   Options, if present in the packet, MUST only be generated and
347	   terminated by tunnel endpoints.  Transit devices MAY be able to
348	   interpret the options, however, as non-terminating devices, transit
349	   devices do not originate or terminate the Geneve packet, hence MUST
350	   NOT modify Geneve headers and MUST NOT insert or delete options,
351	   which is the responsibility of tunnel endpoints.  The participation
352	   of transit devices in interpreting options is OPTIONAL.

354	   Further, either tunnel endpoints or transit devices MAY use offload
355	   capabilities of NICs such as checksum offload to improve the
356	   performance of Geneve packet processing.  The presence of a Geneve
357	   variable length header SHOULD NOT prevent the tunnel endpoints and
358	   transit devices from using such offload capabilities.

360	2.3.  Use of Standard IP Fabrics

362	   IP has clearly cemented its place as the dominant transport mechanism
363	   and many techniques have evolved over time to make it robust,
364	   efficient, and inexpensive.  As a result, it is natural to use IP
365	   fabrics as a transit network for Geneve.  Fortunately, the use of IP
366	   encapsulation and addressing is enough to achieve the primary goal of
367	   delivering packets to the correct point in the network through
368	   standard switching and routing.

370	   In addition, nearly all underlay fabrics are designed to exploit
371	   parallelism in traffic to spread load across multiple links without
372	   introducing reordering in individual flows.  These equal cost
373	   multipathing (ECMP) techniques typically involve parsing and hashing
374	   the addresses and port numbers from the packet to select an outgoing
375	   link.  However, the use of tunnels often results in poor ECMP
376	   performance without additional knowledge of the protocol as the
377	   encapsulated traffic is hidden from the fabric by design and only
378	   tunnel endpoint addresses are available for hashing.

380	   Since it is desirable for Geneve to perform well on these existing
381	   fabrics, it is necessary for entropy from encapsulated packets to be
382	   exposed in the tunnel header.  The most common technique for this is
383	   to use the UDP source port, which is discussed further in
384	   Section 3.3.

386	3.  Geneve Encapsulation Details

388	   The Geneve packet format consists of a compact tunnel header
389	   encapsulated in UDP over either IPv4 or IPv6.  A small fixed tunnel
390	   header provides control information plus a base level of
391	   functionality and interoperability with a focus on simplicity.  This
392	   header is then followed by a set of variable options to allow for
393	   future innovation.  Finally, the payload consists of a protocol data
394	   unit of the indicated type, such as an Ethernet frame.  Section 3.1
395	   and Section 3.2 illustrate the Geneve packet format transported (for
396	   example) over Ethernet along with an Ethernet payload.

398	3.1.  Geneve Packet Format Over IPv4

400	       0                   1                   2                   3
401	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
402	   Outer Ethernet Header:
403	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
404	      |                 Outer Destination MAC Address                 |
405	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
406	      | Outer Destination MAC Address |   Outer Source MAC Address    |
407	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
408	      |                   Outer Source MAC Address                    |
409	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
410	      |Optional Ethertype=C-Tag 802.1Q|  Outer VLAN Tag Information   |
411	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
412	      |       Ethertype=0x0800        |
413	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

415	   Outer IPv4 Header:
416	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
417	      |Version|  IHL  |Type of Service|          Total Length         |
418	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
419	      |         Identification        |Flags|      Fragment Offset    |
420	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421	      |  Time to Live |Protocol=17 UDP|         Header Checksum       |
422	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423	      |                     Outer Source IPv4 Address                 |
424	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
425	      |                   Outer Destination IPv4 Address              |
426	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

428	   Outer UDP Header:
429	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
430	      |       Source Port = xxxx      |       Dest Port = 6081        |
431	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
432	      |           UDP Length          |        UDP Checksum           |
433	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

435	   Geneve Header:
436	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
437	      |Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |
438	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
439	      |        Virtual Network Identifier (VNI)       |    Reserved   |
440	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
441	      |                    Variable Length Options                    |
442	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

444	   Inner Ethernet Header (example payload):
445	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
446	      |                 Inner Destination MAC Address                 |
447	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
448	      | Inner Destination MAC Address |   Inner Source MAC Address    |
449	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
450	      |                   Inner Source MAC Address                    |
451	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	      |Optional Ethertype=C-Tag 802.1Q|  Inner VLAN Tag Information   |
453	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

455	   Payload:
456	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
457	      | Ethertype of Original Payload |                               |
458	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
459	      |                                  Original Ethernet Payload    |
460	      |                                                               |
461	      | (Note that the original Ethernet Frame's FCS is not included) |
462	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

464	   Frame Check Sequence:
465	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
466	      |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
467	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

469	3.2.  Geneve Packet Format Over IPv6

471	       0                   1                   2                   3
472	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
473	   Outer Ethernet Header:
474	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
475	      |                 Outer Destination MAC Address                 |
476	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
477	      | Outer Destination MAC Address |   Outer Source MAC Address    |
478	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
479	      |                   Outer Source MAC Address                    |
480	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
481	      |Optional Ethertype=C-Tag 802.1Q|  Outer VLAN Tag Information   |
482	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
483	      |       Ethertype=0x86DD        |
484	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

486	   Outer IPv6 Header:
487	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
488	      |Version| Traffic Class |           Flow Label                  |
489	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
490	      |         Payload Length        | NxtHdr=17 UDP |   Hop Limit   |
491	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
492	      |                                                               |
493	      +                                                               +
494	      |                                                               |
495	      +                     Outer Source IPv6 Address                 +
496	      |                                                               |
497	      +                                                               +
498	      |                                                               |
499	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
500	      |                                                               |
501	      +                                                               +
502	      |                                                               |
503	      +                  Outer Destination IPv6 Address               +
504	      |                                                               |
505	      +                                                               +
506	      |                                                               |
507	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

509	   Outer UDP Header:
510	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511	      |       Source Port = xxxx      |       Dest Port = 6081        |
512	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
513	      |           UDP Length          |        UDP Checksum           |
514	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

516	   Geneve Header:
517	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
518	      |Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |
519	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
520	      |        Virtual Network Identifier (VNI)       |    Reserved   |
521	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
522	      |                    Variable Length Options                    |
523	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

525	   Inner Ethernet Header (example payload):
526	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
527	      |                 Inner Destination MAC Address                 |
528	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
529	      | Inner Destination MAC Address |   Inner Source MAC Address    |
530	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
531	      |                   Inner Source MAC Address                    |
532	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
533	      |Optional Ethertype=C-Tag 802.1Q|  Inner VLAN Tag Information   |
534	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

536	   Payload:
537	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
538	      | Ethertype of Original Payload |                               |
539	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
540	      |                                  Original Ethernet Payload    |
541	      |                                                               |
542	      | (Note that the original Ethernet Frame's FCS is not included) |
543	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

545	   Frame Check Sequence:
546	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
547	      |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
548	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

550	3.3.  UDP Header

552	   The use of an encapsulating UDP [RFC0768] header follows the
553	   connectionless semantics of Ethernet and IP in addition to providing
554	   entropy to routers performing ECMP.  The header fields are therefore
555	   interpreted as follows:

557	   Source port:  A source port selected by the originating tunnel
558	      endpoint.  This source port SHOULD be the same for all packets
559	      belonging to a single encapsulated flow to prevent reordering due
560	      to the use of different paths.  To encourage an even distribution
561	      of flows across multiple links, the source port SHOULD be
562	      calculated using a hash of the encapsulated packet headers using,
563	      for example, a traditional 5-tuple.  Since the port represents a
564	      flow identifier rather than a true UDP connection, the entire
565	      16-bit range MAY be used to maximize entropy.

567	   Dest port:  IANA has assigned port 6081 as the fixed well-known
568	      destination port for Geneve.  Although the well-known value should
569	      be used by default, it is RECOMMENDED that implementations make
570	      this configurable.  The chosen port is used for identification of
571	      Geneve packets and MUST NOT be reversed for different ends of a
572	      connection as is done with TCP.

574	   UDP length:  The length of the UDP packet including the UDP header.

576	   UDP checksum:  In order to protect the Geneve header, options and
577	      payload from potential data corruption, UDP checksum SHOULD be
578	      generated as specified in [RFC0768] and [RFC1112] when Geneve is
579	      encapsulated in IPv4.  To protect the IP header, Geneve header,
580	      options and payload from potential data corruption, the UDP
581	      checksum MUST be generated by default as specified in [RFC0768]
582	      and [RFC2460] when Geneve is encapsulated in IPv6.  Upon receiving
583	      such packets with non-zero UDP checksum, the receiving tunnel
584	      endpoints MUST validate the checksum.  If the checksum is not
585	      correct, the packet MUST be dropped, otherwise the packet MUST be
586	      accepted for decapsulation.

588	      Under certain conditions, the UDP checksum MAY be set to zero on
589	      transmit for packets encapsulated in both IPv4 and IPv6 [RFC6935].
590	      See Section 4.3 for additional requirements that apply for using
591	      zero UDP checksum with IPv4 and IPv6.  Disabling the use of UDP
592	      checksums is an operational consideration that should take into
593	      account the risks and effects of packet corruption.

595	3.4.  Tunnel Header Fields

597	   Ver (2 bits):  The current version number is 0.  Packets received by
598	      a tunnel endpoint with an unknown version MUST be dropped.
599	      Transit devices interpreting Geneve packets with an unknown
600	      version number MUST treat them as UDP packets with an unknown
601	      payload.

603	   Opt Len (6 bits):  The length of the options fields, expressed in
604	      four byte multiples, not including the eight byte fixed tunnel
605	      header.  This results in a minimum total Geneve header size of 8
606	      bytes and a maximum of 260 bytes.  The start of the payload
607	      headers can be found using this offset from the end of the base
608	      Geneve header.

610	   O (1 bit):  Control packet.  This packet contains a control message.
611	      Control messages are sent between tunnel endpoints.  Tunnel
612	      Endpoints MUST NOT forward the payload and transit devices MUST
613	      NOT attempt to interpret it.  Since these are infrequent control
614	      messages, it is RECOMMENDED that tunnel endpoints direct these
615	      packets to a high priority control queue (for example, to direct
616	      the packet to a general purpose CPU from a forwarding ASIC or to
617	      separate out control traffic on a NIC).  Transit devices MUST NOT
618	      alter forwarding behavior on the basis of this bit, such as ECMP
619	      link selection.

621	   C (1 bit):  Critical options present.  One or more options has the
622	      critical bit set (see Section 3.5).  If this bit is set then
623	      tunnel endpoints MUST parse the options list to interpret any
624	      critical options.  On tunnel endpoints where option parsing is not
625	      supported the packet MUST be dropped on the basis of the 'C' bit
626	      in the base header.  If the bit is not set tunnel endpoints MAY
627	      strip all options using 'Opt Len' and forward the decapsulated
628	      packet.  Transit devices MUST NOT drop packets on the basis of
629	      this bit.

631	      The critical bit allows hardware implementations the flexibility
632	      to handle options processing in the hardware fastpath or in the
633	      exception (slow) path without the need to process all the options.
634	      For example, a critical option such as secure hash to provide
635	      Geneve header integrity check must be processed by tunnel
636	      endpoints and typically processed in the hardware fastpath.

638	   Rsvd. (6 bits):  Reserved field, which MUST be zero on transmission
639	      and MUST be ignored on receipt.

641	   Protocol Type (16 bits):  The type of the protocol data unit
642	      appearing after the Geneve header.  This follows the EtherType
643	      [ETYPES] convention with Ethernet itself being represented by the
644	      value 0x6558.

646	   Virtual Network Identifier (VNI) (24 bits):  An identifier for a
647	      unique element of a virtual network.  In many situations this may
648	      represent an L2 segment, however, the control plane defines the
649	      forwarding semantics of decapsulated packets.  The VNI MAY be used
650	      as part of ECMP forwarding decisions or MAY be used as a mechanism
651	      to distinguish between overlapping address spaces contained in the
652	      encapsulated packet when load balancing across CPUs.

654	   Reserved (8 bits):  Reserved field which MUST be zero on transmission
655	      and ignored on receipt.

657	   Transit devices MUST maintain consistent forwarding behavior
658	   irrespective of the value of 'Opt Len', including ECMP link
659	   selection.  These devices SHOULD be able to forward packets
660	   containing options without resorting to a slow path.

662	3.5.  Tunnel Options
663	   0                   1                   2                   3
664	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
665	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
666	   |          Option Class         |      Type     |R|R|R| Length  |
667	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
668	   |                      Variable Option Data                     |
669	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

671	                               Geneve Option

673	   The base Geneve header is followed by zero or more options in Type-
674	   Length-Value format.  Each option consists of a four byte option
675	   header and a variable amount of option data interpreted according to
676	   the type.

678	   Option Class (16 bits):  Namespace for the 'Type' field.  IANA will
679	      be requested to create a "Geneve Option Class" registry to
680	      allocate identifiers for organizations, technologies, and vendors
681	      that have an interest in creating types for options.  Each
682	      organization may allocate types independently to allow
683	      experimentation and rapid innovation.  It is expected that over
684	      time certain options will become well known and a given
685	      implementation may use option types from a variety of sources.  In
686	      addition, IANA will be requested to reserve specific ranges for
687	      standardized and experimental options.

689	   Type (8 bits):  Type indicating the format of the data contained in
690	      this option.  Options are primarily designed to encourage future
691	      extensibility and innovation and so standardized forms of these
692	      options will be defined in a separate document.

694	      The high order bit of the option type indicates that this is a
695	      critical option.  If the receiving tunnel endpoint does not
696	      recognize this option and this bit is set then the packet MUST be
697	      dropped.  If the 'C' bit (critical bit) is set in any option then
698	      the 'C' bit in the Geneve base header MUST also be set.  Transit
699	      devices MUST NOT drop packets on the basis of this bit.  The
700	      following figure shows the location of the 'C' bit in the 'Type'
701	      field:

703	      0 1 2 3 4 5 6 7 8
704	      +-+-+-+-+-+-+-+-+
705	      |C|    Type     |
706	      +-+-+-+-+-+-+-+-+

708	      The requirement to drop a packet with an unknown option with the
709	      'C' bit set applies to the entire tunnel endpoint system and not a
710	      particular component of the implementation.  For example, in a
711	      system comprised of a forwarding ASIC and a general purpose CPU,
712	      this does not mean that the packet must be dropped in the ASIC.
713	      An implementation may send the packet to the CPU using a rate-
714	      limited control channel for slow-path exception handling.

716	   R (3 bits):  Option control flags reserved for future use.  MUST be
717	      zero on transmission and ignored on receipt.

719	   Length (5 bits):  Length of the option, expressed in four byte
720	      multiples excluding the option header.  The total length of each
721	      option may be between 4 and 128 bytes.  A value of 0 in the Length
722	      field implies an option with only the option header without the
723	      variable option data.  Packets in which the total length of all
724	      options is not equal to the 'Opt Len' in the base header are
725	      invalid and MUST be silently dropped if received by a tunnel
726	      endpoint that processes the options.

728	   Variable Option Data:  Option data interpreted according to 'Type'.

730	3.5.1.  Options Processing

732	   Geneve options are intended to be originated and processed by tunnel
733	   endpoints.  However, options MAY be interpreted by transit devices
734	   along the tunnel path.  Transit devices not interpreting Geneve
735	   headers (that may or may not include options) MUST handle Geneve
736	   packets as any other UDP packet and maintain consistent forwarding
737	   behavior.

739	   In tunnel endpoints, the generation and interpretation of options is
740	   determined by the control plane, which is out of the scope of this
741	   document.  However, to ensure interoperability between heterogeneous
742	   devices some requirements are imposed on options and the devices that
743	   process them:

745	   o  Receiving tunnel endpoints MUST drop packets containing unknown
746	      options with the 'C' bit set in the option type.  Conversely,
747	      transit devices MUST NOT drop packets as a result of encountering
748	      unknown options, including those with the 'C' bit set.

750	   o  Some options may be defined in such a way that the position in the
751	      option list is significant.  Options MUST NOT be changed by
752	      transit devices.

754	   o  An option SHOULD NOT be dependent upon any other option in the
755	      packet, i.e., options can be processed independently of one
756	      another.  Architecturally, options are intended to be self-
757	      descriptive and independent.  This enables parallelism in option
758	      processing and reduces implementation complexity.

760	   When designing a Geneve option, it is important to consider how the
761	   option will evolve in the future.  Once an option is defined it is
762	   reasonable to expect that implementations may come to depend on a
763	   specific behavior.  As a result, the scope of any future changes must
764	   be carefully described upfront.

766	   Unexpectedly significant interoperability issues may result from
767	   changing the length of an option that was defined to be a certain
768	   size.  A particular option is specified to have either a fixed
769	   length, which is constant, or a variable length, which may change
770	   over time or for different use cases.  This property is part of the
771	   definition of the option and conveyed by the 'Type'.  For fixed
772	   length options, some implementations may choose to ignore the length
773	   field in the option header and instead parse based on the well known
774	   length associated with the type.  In this case, redefining the length
775	   will impact not only parsing of the option in question but also any
776	   options that follow.  Therefore, options that are defined to be fixed
777	   length in size MUST NOT be redefined to a different length.  Instead,
778	   a new 'Type' should be allocated.

780	   Options may be processed by NIC hardware utilizing offloads (e.g.
781	   LSO and LRO) as described in Section 4.6.  Careful consideration
782	   should be given to how the offload capabilities outlined in
783	   Section 4.6 impact an option's design.

785	4.  Implementation and Deployment Considerations

787	4.1.  Applicability Statement

789	   Geneve is a network virtualization overlay encapsulation protocol
790	   designed to establish tunnels between NVEs over an existing IP
791	   network.  It is intended for use in public or private data center
792	   environments, for deploying multi-tenant overlay networks over an
793	   existing IP underlay network.

795	   Geneve is a UDP based encapsulation protocol transported over
796	   existing IPv4 and IPv6 networks.  Hence, as a UDP based protocol,
797	   Geneve adheres to the UDP usage guidelines as specified in [RFC8085].
798	   The applicability of these guidelines are dependent on the underlay
799	   IP network and the nature of Geneve payload protocol (example TCP/IP,
800	   IP/Ethernet).

802	   [RFC8085] outlines two applicability scenarios for UDP applications,
803	   1) general Internet and 2) controlled environment.  The controlled
804	   environment means a single administrative domain or adjacent set of
805	   cooperating domains.  A network in a controlled environment can be
806	   managed to operate under certain conditions whereas in general
807	   Internet this cannot be done.  Hence requirements for a tunnel
808	   protocol operating under a controlled environment can be less
809	   restrictive than the requirements of general internet.

811	   Geneve is intended to be deployed in a data center network
812	   environment operated by a single operator or adjacent set of
813	   cooperating network operators that fits with the definition of
814	   controlled environments in [RFC8085].

816	   For the purpose of this document, a traffic-managed controlled
817	   environment (TMCE) is defined as an IP network that is traffic-
818	   engineered and/or otherwise managed (e.g., via use of traffic rate
819	   limiters) to avoid congestion.  The concept of TMCE is outlined in
820	   [RFC8086].  Significant portions of text in Section 4.1 through
821	   Section 4.3 are based on [RFC8086] as applicable to Geneve.

823	   It is the responsibility of the operator to ensure that the
824	   guidelines/requirements in this section are followed as applicable to
825	   their Geneve deployment(s).

827	4.2.  Congestion Control Functionality

829	   Geneve does not natively provide congestion control functionality and
830	   relies on the payload protocol traffic for congestion control.  As
831	   such Geneve MUST be used with congestion controlled traffic or within
832	   a network that is traffic managed to avoid congestion (TMCE).  An
833	   operator of a traffic managed network (TMCE) may avoid congestion by
834	   careful provisioning of their networks, rate-limiting of user data
835	   traffic and traffic engineering according to path capacity.

837	4.3.  UDP Checksum

839	   In order to provide integrity of Geneve headers, options and payload,
840	   for example to avoid mis-delivery of payload to different tenant
841	   systems in case of data corruption, outer UDP checksum SHOULD be used
842	   with Geneve when transported over IPv4.  An operator MAY choose to
843	   disable UDP checksum and use zero checksum if Geneve packet integrity
844	   is provided by other data integrity mechanisms such as IPsec or
845	   additional checksums or if one of the conditions in Section 4.3.1 a,
846	   b, c are met.

848	   By default, UDP checksum MUST be used when Geneve is transported over
849	   IPv6.  A tunnel endpoint MAY be configured for use with zero UDP
850	   checksum if additional requirements in Section 4.3.1 are met.

852	4.3.1.  UDP Zero Checksum Handling with IPv6

854	   When Geneve is used over IPv6, UDP checksum is used to protect IPv6
855	   headers, UDP headers and Geneve headers, options and payload from
856	   potential data corruption.  As such by default Geneve MUST use UDP
857	   checksum when transported over IPv6.  An operator MAY choose to
858	   configure to operate with zero UDP checksum if operating in a traffic
859	   managed controlled environment as stated in Section 4.1 if one of the
860	   following conditions are met.

862	   a.  It is known that the packet corruption is exceptionally unlikely
863	       (perhaps based on knowledge of equipment types in their underlay
864	       network) and the operator is willing to take a risk of undetected
865	       packet corruption

867	   b.  It is judged through observational measurements (perhaps through
868	       historic or current traffic flows that use non zero checksum)
869	       that the level of packet corruption is tolerably low and where
870	       the operator is willing to take the risk of undetected
871	       corruption.

873	   c.  Geneve payload is carrying applications that are tolerant of
874	       misdelivered or corrupted packets (perhaps through higher layer
875	       checksum validation and/or reliability through retransmission)

877	   In addition Geneve tunnel implementations using Zero UDP checksum
878	   MUST meet the following requirements:

880	   1.  Use of UDP checksum over IPv6 MUST be the default configuration
881	       for all Geneve tunnels.

883	   2.  If Geneve is used with zero UDP checksum over IPv6 then such
884	       tunnel endpoint implementation MUST meet all the requirements
885	       specified in section 4 of [RFC6936] and requirements 1 as
886	       specified in section 5 of [RFC6936].

888	   3.  The Geneve tunnel endpoint that decapsulates the tunnel SHOULD
889	       check the source and destination IPv6 addresses are valid for the
890	       Geneve tunnel that is configured to receive Zero UDP checksum and
891	       discard other packets for which such check fails.

893	   4.  The Geneve tunnel endpoint that encapsulates the tunnel MAY use
894	       different IPv6 source addresses for each Geneve tunnel that uses
895	       Zero UDP checksum mode in order to strengthen the decapsulator's
896	       check of the IPv6 source address (i.e the same IPv6 source
897	       address is not to be used with more than one IPv6 destination
898	       address, irrespective of whether that destination address is a
899	       unicast or multicast address).  When this is not possible, it is
900	       RECOMMENDED to use each source address for as few Geneve tunnels
901	       that use zero UDP checksum as is feasible.

903	   5.  Measures SHOULD be taken to prevent Geneve traffic over IPv6 with
904	       zero UDP checksum from escaping into the general Internet.
905	       Examples of such measures include employing packet filters at the
906	       Gateways or edge of Geneve network and/or keeping logical or
907	       physical separation of Geneve network from networks carrying
908	       General Internet.

910	   The above requirements do not change either the requirements
911	   specified in [RFC2460] as modified by [RFC6935] or the requirements
912	   specified in [RFC6936].

914	   The requirement to check the source IPv6 address in addition to the
915	   destination IPv6 address, plus the recommendation against reuse of
916	   source IPv6 addresses among Geneve tunnels collectively provide some
917	   mitigation for the absence of UDP checksum coverage of the IPv6
918	   header.  A traffic-managed controlled environment that satisfies at
919	   least one of three conditions listed at the beginning of this section
920	   provides additional assurance.

922	   Editorial Note (The following paragraph to be removed by the RFC
923	   Editor before publication)

925	   It was discussed during TSVART early review if the level of
926	   requirement for using different IPv6 source addresses for different
927	   tunnel destinations would need to be "MAY" or "SHOULD".  The
928	   discussion concluded that it was appropriate to keep this as "MAY",
929	   since it was considered not realistic for control planes having to
930	   maintain a high level of state on a per tunnel destination basis.  In
931	   addition, the text above provides sufficient guidance to operators
932	   and implementors on possible mitigations.

934	4.4.  Encapsulation of Geneve in IP

936	   As an IP-based tunnel protocol, Geneve shares many properties and
937	   techniques with existing protocols.  The application of some of these
938	   are described in further detail, although in general most concepts
939	   applicable to the IP layer or to IP tunnels generally also function
940	   in the context of Geneve.

942	4.4.1.  IP Fragmentation

944	   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
945	   [RFC8201]) be used by setting the DF bit in the IP header when Geneve
946	   packets are transmitted over IPv4 (this is the default with IPv6).
947	   The use of Path MTU Discovery on the transit network provides the
948	   encapsulating tunnel endpoint with soft-state about the link that it
949	   may use to prevent or minimize fragmentation depending on its role in
950	   the virtualized network.  The NVE control plane MAY use configuration
951	   mechanism or path discovery information to maintain the MTU size of
952	   the tunnel link(s) associated with the tunnel endpoint, so if a
953	   tenant system sends large packets that when encapsulated exceed the
954	   MTU size of the tunnel link, the tunnel endpoint can discard such
955	   packets and send exception messages to the tenant system(s).  If the
956	   tunnel endpoint is associated with a routing or forwarding function
957	   and/or has the capability to send ICMP messages, the encapsulating
958	   tunnel endpoint MAY send ICMP fragmentation needed [RFC0792] or
959	   Packet Too Big [RFC4443] messages to the tenant system(s).  For
960	   example, recommendations/guidance for handling fragmentation in
961	   similar overlay encapsulation services like PWE3 are provided in
962	   section 5.3 of [RFC3985].

964	   Note that some implementations may not be capable of supporting
965	   fragmentation or other less common features of the IP header, such as
966	   options and extension headers.  For example, some of the issues
967	   associated with MTU size and fragmentation in IP tunneling and use of
968	   ICMP messages is outlined in section 4.2 of
969	   [I-D.ietf-intarea-tunnels].

971	   Editorial Note (The following paragraph to be removed by the RFC
972	   Editor before publication)

974	   It was discussed during TSVART early review if the level of
975	   requirement for maintaining tunnel MTU at the ingress has to be "MAY"
976	   or "SHOULD".  The discussion concluded that it was appropriate to
977	   leave this as "MAY", considering the high level of state to be
978	   maintained.

980	4.4.2.  DSCP, ECN and TTL

982	   When encapsulating IP (including over Ethernet) packets in Geneve,
983	   there are several considerations for propagating DSCP and ECN bits
984	   from the inner header to the tunnel on transmission and the reverse
985	   on reception.

987	   [RFC2983] provides guidance for mapping DSCP between inner and outer
988	   IP headers.  Network virtualization is typically more closely aligned
989	   with the Pipe model described, where the DSCP value on the tunnel
990	   header is set based on a policy (which may be a fixed value, one
991	   based on the inner traffic class, or some other mechanism for
992	   grouping traffic).  Aspects of the Uniform model (which treats the
993	   inner and outer DSCP value as a single field by copying on ingress
994	   and egress) may also apply, such as the ability to remark the inner
995	   header on tunnel egress based on transit marking.  However, the
996	   Uniform model is not conceptually consistent with network
997	   virtualization, which seeks to provide strong isolation between
998	   encapsulated traffic and the physical network.

1000	   [RFC6040] describes the mechanism for exposing ECN capabilities on IP
1001	   tunnels and propagating congestion markers to the inner packets.
1002	   This behavior MUST be followed for IP packets encapsulated in Geneve.

1004	   Though Uniform or Pipe models could be used for TTL (or Hop Limit in
1005	   case of IPv6) handling when tunneling IP packets, Pipe model is more
1006	   aligned with network virtualization.  [RFC2003] provides guidance on
1007	   handling TTL between inner IP header and outer IP tunnels; this model
1008	   is more aligned with the Pipe model and is recommended for use with
1009	   Geneve for network virtualization applications.

1011	4.4.3.  Broadcast and Multicast

1013	   Geneve tunnels may either be point-to-point unicast between two
1014	   tunnel endpoints or may utilize broadcast or multicast addressing.
1015	   It is not required that inner and outer addressing match in this
1016	   respect.  For example, in physical networks that do not support
1017	   multicast, encapsulated multicast traffic may be replicated into
1018	   multiple unicast tunnels or forwarded by policy to a unicast location
1019	   (possibly to be replicated there).

1021	   With physical networks that do support multicast it may be desirable
1022	   to use this capability to take advantage of hardware replication for
1023	   encapsulated packets.  In this case, multicast addresses may be
1024	   allocated in the physical network corresponding to tenants,
1025	   encapsulated multicast groups, or some other factor.  The allocation
1026	   of these groups is a component of the control plane and therefore
1027	   outside of the scope of this document.  When physical multicast is in
1028	   use, the 'C' bit in the Geneve header may be used with groups of
1029	   devices with heterogeneous capabilities as each device can interpret
1030	   only the options that are significant to it if they are not critical.

1032	   In addition, [RFC8293] provides examples of various mechanisms that
1033	   can be used for multicast handling in network virtualization overlay
1034	   networks.

1036	4.4.4.  Unidirectional Tunnels

1038	   Generally speaking, a Geneve tunnel is a unidirectional concept.  IP
1039	   is not a connection oriented protocol and it is possible for two
1040	   tunnel endpoints to communicate with each other using different paths
1041	   or to have one side not transmit anything at all.  As Geneve is an
1042	   IP-based protocol, the tunnel layer inherits these same
1043	   characteristics.

1045	   It is possible for a tunnel to encapsulate a protocol, such as TCP,
1046	   which is connection oriented and maintains session state at that
1047	   layer.  In addition, implementations MAY model Geneve tunnels as
1048	   connected, bidirectional links, such as to provide the abstraction of
1049	   a virtual port.  In both of these cases, bidirectionality of the
1050	   tunnel is handled at a higher layer and does not affect the operation
1051	   of Geneve itself.

1053	4.5.  Constraints on Protocol Features

1055	   Geneve is intended to be flexible to a wide range of current and
1056	   future applications.  As a result, certain constraints may be placed
1057	   on the use of metadata or other aspects of the protocol in order to
1058	   optimize for a particular use case.  For example, some applications
1059	   may limit the types of options which are supported or enforce a
1060	   maximum number or length of options.  Other applications may only
1061	   handle certain encapsulated payload types, such as Ethernet or IP.
1062	   This could be either globally throughout the system or, for example,
1063	   restricted to certain classes of devices or network paths.

1065	   These constraints may be communicated to tunnel endpoints either
1066	   explicitly through a control plane or implicitly by the nature of the
1067	   application.  As Geneve is defined as a data plane protocol that is
1068	   control plane agnostic, the exact mechanism is not defined in this
1069	   document.

1071	4.5.1.  Constraints on Options

1073	   While Geneve options are more flexible, a control plane may restrict
1074	   the number of option TLVs as well as the order and size of the TLVs,
1075	   between tunnel endpoints, to make it simpler for a data plane
1076	   implementation in software or hardware to handle
1077	   [I-D.ietf-nvo3-encap].  For example, there may be some critical
1078	   information such as a secure hash that must be processed in a certain
1079	   order to provide lowest latency.

1081	   A control plane may negotiate a subset of option TLVs and certain TLV
1082	   ordering, as well may limit the total number of option TLVs present
1083	   in the packet, for example, to accommodate hardware capable of
1084	   processing fewer options [I-D.ietf-nvo3-encap].  Hence, a control
1085	   plane needs to have the ability to describe the supported TLVs subset
1086	   and their order to the tunnel endpoints.  In the absence of a control
1087	   plane, alternative configuration mechanisms may be used for this
1088	   purpose.  The exact mechanism is not defined in this document.

1090	4.6.  NIC Offloads

1092	   Modern NICs currently provide a variety of offloads to enable the
1093	   efficient processing of packets.  The implementation of many of these
1094	   offloads requires only that the encapsulated packet be easily parsed
1095	   (for example, checksum offload).  However, optimizations such as LSO
1096	   and LRO involve some processing of the options themselves since they
1097	   must be replicated/merged across multiple packets.  In these
1098	   situations, it is desirable to not require changes to the offload
1099	   logic to handle the introduction of new options.  To enable this,
1100	   some constraints are placed on the definitions of options to allow
1101	   for simple processing rules:

1103	   o  When performing LSO, a NIC MUST replicate the entire Geneve header
1104	      and all options, including those unknown to the device, onto each
1105	      resulting segment.  However, a given option definition may
1106	      override this rule and specify different behavior in supporting
1107	      devices.  Conversely, when performing LRO, a NIC MAY assume that a
1108	      binary comparison of the options (including unknown options) is
1109	      sufficient to ensure equality and MAY merge packets with equal
1110	      Geneve headers.

1112	   o  Options MUST NOT be reordered during the course of offload
1113	      processing, including when merging packets for the purpose of LRO.

1115	   o  NICs performing offloads MUST NOT drop packets with unknown
1116	      options, including those marked as critical, unless explicitly
1117	      configured.

1119	   There is no requirement that a given implementation of Geneve employ
1120	   the offloads listed as examples above.  However, as these offloads
1121	   are currently widely deployed in commercially available NICs, the
1122	   rules described here are intended to enable efficient handling of
1123	   current and future options across a variety of devices.

1125	4.7.  Inner VLAN Handling

1127	   Geneve is capable of encapsulating a wide range of protocols and
1128	   therefore a given implementation is likely to support only a small
1129	   subset of the possibilities.  However, as Ethernet is expected to be
1130	   widely deployed, it is useful to describe the behavior of VLANs
1131	   inside encapsulated Ethernet frames.

1133	   As with any protocol, support for inner VLAN headers is OPTIONAL.  In
1134	   many cases, the use of encapsulated VLANs may be disallowed due to
1135	   security or implementation considerations.  However, in other cases
1136	   trunking of VLAN frames across a Geneve tunnel can prove useful.  As
1137	   a result, the processing of inner VLAN tags upon ingress or egress
1138	   from a tunnel endpoint is based upon the configuration of the tunnel
1139	   endpoint and/or control plane and not explicitly defined as part of
1140	   the data format.

1142	5.  Interoperability Issues

1144	   Viewed exclusively from the data plane, Geneve does not introduce any
1145	   interoperability issues as it appears to most devices as UDP packets.
1146	   However, as there are already a number of tunnel protocols deployed
1147	   in network virtualization environments, there is a practical question
1148	   of transition and coexistence.

1150	   Since Geneve is a superset of the functionality of the most common
1151	   protocols used for network virtualization (VXLAN,NVGRE) it should be
1152	   straightforward to port an existing control plane to run on top of it
1153	   with minimal effort.  With both the old and new packet formats
1154	   supporting the same set of capabilities, there is no need for a hard
1155	   transition - tunnel endpoints directly communicating with each other
1156	   use any common protocol, which may be different even within a single
1157	   overall system.  As transit devices are primarily forwarding packets
1158	   on the basis of the IP header, all protocols appear similar and these
1159	   devices do not introduce additional interoperability concerns.

1161	   To assist with this transition, it is strongly suggested that
1162	   implementations support simultaneous operation of both Geneve and
1163	   existing tunnel protocols as it is expected to be common for a single
1164	   node to communicate with a mixture of other nodes.  Eventually, older
1165	   protocols may be phased out as they are no longer in use.

1167	6.  Security Considerations

1169	   As encapsulated within a UDP/IP packet, Geneve does not have any
1170	   inherent security mechanisms.  As a result, an attacker with access
1171	   to the underlay network transporting the IP packets has the ability
1172	   to snoop or inject packets.  Compromised tunnel endpoints may also
1173	   spoof identifiers in the tunnel header to gain access to networks
1174	   owned by other tenants.

1176	   Within a particular security domain, such as a data center operated
1177	   by a single service provider, the most common and highest performing
1178	   security mechanism is isolation of trusted components.  Tunnel
1179	   traffic can be carried over a separate VLAN and filtered at any
1180	   untrusted boundaries.  In addition, tunnel endpoints should only be
1181	   operated in environments controlled by the service provider, such as
1182	   the hypervisor itself rather than within a customer VM.

1184	   When crossing an untrusted link, such as the public Internet, IPsec
1185	   [RFC4301] may be used to provide authentication and/or encryption of
1186	   the IP packets formed as part of Geneve encapsulation.

1188	   Geneve does not otherwise affect the security of the encapsulated
1189	   packets.  As per the guidelines of BCP 72 [RFC3552], the following
1190	   sections describe potential security risks that may be applicable to
1191	   Geneve deployments and approaches to mitigate such risks.  It is also
1192	   noted that not all such risks are applicable to all Geneve deployment
1193	   scenarios, i.e., only a subset may be applicable to certain
1194	   deployments.  So an operator has to make an assessment based on their
1195	   network environment and determine the risks that are applicable to
1196	   their specific environment and use appropriate mitigation approaches
1197	   as applicable.

1199	6.1.  Data Confidentiality

1201	   Geneve is a network virtualization overlay encapsulation protocol
1202	   designed to establish tunnels between NVEs over an existing IP
1203	   network.  It can be used to deploy multi-tenant overlay networks over
1204	   an existing IP underlay network in a public or private data center.
1205	   The overlay service is typically provided by a service provider, for
1206	   example a cloud services provider or a private data center operator,
1207	   this may or not may be the same provider as an underlay service
1208	   provider.  Due to the nature of multi-tenancy in such environments, a
1209	   tenant system may expect data confidentiality to ensure its packet
1210	   data is not tampered with (active attack) in transit or a target of
1211	   unauthorized monitoring (passive attack).  A tenant may expect the
1212	   overlay service provider to provide data confidentiality as part of
1213	   the service or a tenant may bring its own data confidentiality
1214	   mechanisms like IPsec or TLS to protect the data end to end between
1215	   its tenant systems.

1217	   If an operator determines data confidentiality is necessary in their
1218	   environment based on their risk analysis, for example as in multi-
1219	   tenant environments, then an encryption mechanism SHOULD be used to
1220	   encrypt the tenant data end to end between the NVEs.  The NVEs may
1221	   use existing well established encryption mechanisms such as IPsec,
1222	   DTLS, etc.

1224	6.1.1.  Inter-Data Center Traffic

1226	   A tenant system in a customer premises (private data center) may want
1227	   to connect to tenant systems on their tenant overlay network in a
1228	   public cloud data center or a tenant may want to have its tenant
1229	   systems located in multiple geographically separated data centers for
1230	   high availability.  Geneve data traffic between tenant systems across
1231	   such separated networks should be protected from threats when
1232	   traversing public networks.  Any Geneve overlay data leaving the data
1233	   center network beyond the operator's security domain SHOULD be
1234	   secured by encryption mechanisms such as IPsec or other VPN
1235	   mechanisms to protect the communications between the NVEs when they
1236	   are geographically separated over untrusted network links.
1237	   Specification of data protection mechanisms employed between data
1238	   centers is beyond the scope of this document.

1240	6.2.  Data Integrity

1242	   Geneve encapsulation is used between NVEs to establish overlay
1243	   tunnels over an existing IP underlay network.  In a multi-tenant data
1244	   center, a rogue or compromised tenant system may try to launch a
1245	   passive attack such as monitoring the traffic of other tenants, or an
1246	   active attack such as trying to inject unauthorized Geneve
1247	   encapsulated traffic such as spoofing, replay, etc., into the
1248	   network.  To prevent such attacks, an NVE MUST NOT propagate Geneve
1249	   packets beyond the NVE to tenant systems and SHOULD employ packet
1250	   filtering mechanisms so as not to forward unauthorized traffic
1251	   between TSs in different tenant networks.

1253	   A compromised network node or a transit device within a data center
1254	   may launch an active attack trying to tamper with the Geneve packet
1255	   data between NVEs.  Malicious tampering of Geneve header fields may
1256	   cause the packet from one tenant to be forwarded to a different
1257	   tenant network.  If an operator determines the possibility of such
1258	   threat in their environment, the operator may choose to employ data
1259	   integrity mechanisms between NVEs.  In order to prevent such risks, a
1260	   data integrity mechanism SHOULD be used in such environments to
1261	   protect the integrity of Geneve packets including packet headers,
1262	   options and payload on communications between NVE pairs.  A
1263	   cryptographic data protection mechanism such as IPsec may be used to
1264	   provide data integrity protection.  A data center operator may choose
1265	   to deploy any other data integrity mechanisms as applicable and
1266	   supported in their underlay networks.

1268	6.3.  Authentication of NVE peers

1270	   A rogue network device or a compromised NVE in a data center
1271	   environment might be able to spoof Geneve packets as if it came from
1272	   a legitimate NVE.  In order to mitigate such a risk, an operator
1273	   SHOULD use an authentication mechanism, such as IPsec to ensure that
1274	   the Geneve packet originated from the intended NVE peer, in
1275	   environments where the operator determines spoofing or rogue devices
1276	   is a potential threat.  Other simpler source checks such as ingress
1277	   filtering for VLAN/MAC/IP address, reverse path forwarding checks,
1278	   etc., may be used in certain trusted environments to ensure Geneve
1279	   packets originated from the intended NVE peer.

1281	6.4.  Options Interpretation by Transit Devices

1283	   Options, if present in the packet, are generated and terminated by
1284	   tunnel endpoints.  As indicated in Section 2.2.1, transit devices may
1285	   interpret the options.  However, if the packet is protected by tunnel
1286	   endpoint to tunnel endpoint encryption, for example through IPsec,
1287	   transit devices will not have visibility into the Geneve header or
1288	   options in the packet.  In such cases transit devices MUST handle
1289	   Geneve packets as any other IP packet and maintain consistent
1290	   forwarding behavior.  In cases where options are interpreted by
1291	   transit devices, the operator MUST ensure that transit devices are
1292	   trusted and not compromised.  Implementation of a mechanism to ensure
1293	   this trust is beyond the scope of this document.

1295	6.5.  Multicast/Broadcast

1297	   In typical data center networks where IP multicasting is not
1298	   supported in the underlay network, multicasting may be supported
1299	   using multiple unicast tunnels.  The same security requirements as
1300	   described in the above sections can be used to protect Geneve
1301	   communications between NVE peers.  If IP multicasting is supported in
1302	   the underlay network and the operator chooses to use it for multicast
1303	   traffic among tunnel endpoints, then the operator in such
1304	   environments may use data protection mechanisms such as IPsec with
1305	   Multicast extensions [RFC5374] to protect multicast traffic among
1306	   Geneve NVE groups.

1308	6.6.  Control Plane Communications

1310	   A Network Virtualization Authority (NVA) as outlined in [RFC8014] may
1311	   be used as a control plane for configuring and managing the Geneve
1312	   NVEs.  The data center operator is expected to use security
1313	   mechanisms to protect the communications between the NVA to NVEs and
1314	   use authentication mechanisms to detect any rogue or compromised NVEs
1315	   within their administrative domain.  Data protection mechanisms for
1316	   control plane communication or authentication mechanisms between the
1317	   NVA and the NVEs is beyond the scope of this document.

1319	7.  IANA Considerations

1321	   IANA has allocated UDP port 6081 as the well-known destination port
1322	   for Geneve.  Upon publication, the registry should be updated to cite
1323	   this document.  The original request was:

1325	   Service Name: geneve
1326	   Transport Protocol(s): UDP
1327	   Assignee: Jesse Gross <jesse@kernel.org>
1328	   Contact: Jesse Gross <jesse@kernel.org>
1329	   Description: Generic Network Virtualization Encapsulation (Geneve)
1330	   Reference: This document
1331	   Port Number: 6081

1333	   In addition, IANA is requested to create a "Geneve Option Class"
1334	   registry to allocate Option Classes.  This shall be a registry of
1335	   16-bit hexadecimal values along with descriptive strings.  The
1336	   identifiers 0x0-0xFF are to be reserved for standardized options for
1337	   allocation by IETF Review [RFC8126] and 0xFFF0-0xFFFF for
1338	   Experimental Use. Otherwise, identifiers are to be assigned to any
1339	   organization with an interest in creating Geneve options on a First
1340	   Come First Served basis.  The registry is to be populated with the
1341	   following initial values:

1343	         +----------------+--------------------------------------+
1344	         | Option Class   | Description                          |
1345	         +----------------+--------------------------------------+
1346	         | 0x0000..0x00FF | Unassigned - IETF Review             |
1347	         | 0x0100         | Linux                                |
1348	         | 0x0101         | Open vSwitch (OVS)                   |
1349	         | 0x0102         | Open Virtual Networking (OVN)        |
1350	         | 0x0103         | In-band Network Telemetry (INT)      |
1351	         | 0x0104         | VMware, Inc.                         |
1352	         | 0x0105         | Amazon.com, Inc.                     |
1353	         | 0x0106         | Cisco Systems, Inc.                  |
1354	         | 0x0107         | Oracle Corporation                   |
1355	         | 0x0108..0x110  | Amazon.com, Inc.                     |
1356	         | 0x0111..0xFFEF | Unassigned - First Come First Served |
1357	         | 0xFFF0..FFFF   | Experimental                         |
1358	         +----------------+--------------------------------------+

1360	8.  Contributors

1362	   The following individuals were authors of an earlier version of this
1363	   document and made significant contributions:

1365	   Pankaj Garg
1366	   Microsoft Corporation
1367	   1 Microsoft Way
1368	   Redmond, WA  98052
1369	   USA

1371	   Email: pankajg@microsoft.com

1373	   Chris Wright
1374	   Red Hat Inc.
1375	   1801 Varsity Drive
1376	   Raleigh, NC  27606
1377	   USA

1379	   Email: chrisw@redhat.com

1381	   Kenneth Duda
1382	   Arista Networks
1383	   5453 Great America Parkway
1384	   Santa Clara, CA  95054
1385	   USA

1387	   Email: kduda@arista.com

1389	   Dinesh G. Dutt
1390	   Independent

1392	   Email: didutt@gmail.com

1394	   Jon Hudson
1395	   Independent

1397	   Email: jon.hudson@gmail.com

1399	   Ariel Hendel
1400	   Facebook, Inc.
1401	   1 Hacker Way
1402	   Menlo Park, CA  94025
1403	   USA

1405	   Email: ahendel@fb.com

1407	9.  Acknowledgements

1409	   The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler
1410	   for their input, feedback, and helpful suggestions.

1412	   The authors would like to thank Magnus Nystrom for his reviews and
1413	   feedback.

1415	   Thanks to Daniel Migault, Anoop Ghanwani, Greg Mirksy, Puneet
1416	   Agarwal, and Tal Mizrahi for their reviews, comments and feedback.

1418	   The authors would like to thank David Black for his detailed reviews
1419	   and valuable inputs.

1421	   Thanks to Sami Boutros for his inputs and helpful feedback.

1423	   The authors would like to thank Matthew Bocci, Sam Aldrin, Benson
1424	   Schliesser, Martin Vigoureux, and Alia Atlas for their guidance
1425	   throughout the process.

1427	10.  References

1429	10.1.  Normative References

1431	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
1432	              DOI 10.17487/RFC0768, August 1980,
1433	              <https://www.rfc-editor.org/info/rfc768>.

1435	   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
1436	              RFC 792, DOI 10.17487/RFC0792, September 1981,
1437	              <https://www.rfc-editor.org/info/rfc792>.

1439	   [RFC1112]  Deering, S., "Host extensions for IP multicasting", STD 5,
1440	              RFC 1112, DOI 10.17487/RFC1112, August 1989,
1441	              <https://www.rfc-editor.org/info/rfc1112>.

1443	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1444	              Requirement Levels", BCP 14, RFC 2119,
1445	              DOI 10.17487/RFC2119, March 1997,
1446	              <https://www.rfc-editor.org/info/rfc2119>.

1448	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, Ed., "Internet
1449	              Control Message Protocol (ICMPv6) for the Internet
1450	              Protocol Version 6 (IPv6) Specification", STD 89,
1451	              RFC 4443, DOI 10.17487/RFC4443, March 2006,
1452	              <https://www.rfc-editor.org/info/rfc4443>.

1454	   [RFC6935]  Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and
1455	              UDP Checksums for Tunneled Packets", RFC 6935,
1456	              DOI 10.17487/RFC6935, April 2013,
1457	              <https://www.rfc-editor.org/info/rfc6935>.

1459	   [RFC6936]  Fairhurst, G. and M. Westerlund, "Applicability Statement
1460	              for the Use of IPv6 UDP Datagrams with Zero Checksums",
1461	              RFC 6936, DOI 10.17487/RFC6936, April 2013,
1462	              <https://www.rfc-editor.org/info/rfc6936>.

1464	   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
1465	              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
1466	              March 2017, <https://www.rfc-editor.org/info/rfc8085>.

1468	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
1469	              Writing an IANA Considerations Section in RFCs", BCP 26,
1470	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
1471	              <https://www.rfc-editor.org/info/rfc8126>.

1473	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1474	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1475	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1477	10.2.  Informative References

1479	   [ETYPES]   The IEEE Registration Authority, "IEEE 802 Numbers", 2013,
1480	              <http://www.iana.org/assignments/ieee-802-numbers/
1481	              ieee-802-numbers.xml>.

1483	   [I-D.ietf-intarea-tunnels]
1484	              Touch, J. and M. Townsley, "IP Tunnels in the Internet
1485	              Architecture", draft-ietf-intarea-tunnels-09 (work in
1486	              progress), July 2018.

1488	   [I-D.ietf-nvo3-dataplane-requirements]
1489	              Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L.,
1490	              and B. Khasnabish, "NVO3 Data Plane Requirements", draft-
1491	              ietf-nvo3-dataplane-requirements-03 (work in progress),
1492	              April 2014.

1494	   [I-D.ietf-nvo3-encap]
1495	              Boutros, S., "NVO3 Encapsulation Considerations", draft-
1496	              ietf-nvo3-encap-02 (work in progress), September 2018.

1498	   [IEEE.802.1Q_2014]
1499	              IEEE, "IEEE Standard for Local and metropolitan area
1500	              networks--Bridges and Bridged Networks", IEEE 802.1Q-2014,
1501	              DOI 10.1109/ieeestd.2014.6991462, December 2014,
1502	              <http://ieeexplore.ieee.org/servlet/
1503	              opac?punumber=6991460>.

1505	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
1506	              DOI 10.17487/RFC1191, November 1990,
1507	              <https://www.rfc-editor.org/info/rfc1191>.

1509	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
1510	              DOI 10.17487/RFC2003, October 1996,
1511	              <https://www.rfc-editor.org/info/rfc2003>.

1513	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1514	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
1515	              December 1998, <https://www.rfc-editor.org/info/rfc2460>.

1517	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1518	              RFC 2983, DOI 10.17487/RFC2983, October 2000,
1519	              <https://www.rfc-editor.org/info/rfc2983>.

1521	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
1522	              Label Switching Architecture", RFC 3031,
1523	              DOI 10.17487/RFC3031, January 2001,
1524	              <https://www.rfc-editor.org/info/rfc3031>.

1526	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
1527	              Text on Security Considerations", BCP 72, RFC 3552,
1528	              DOI 10.17487/RFC3552, July 2003,
1529	              <https://www.rfc-editor.org/info/rfc3552>.

1531	   [RFC3985]  Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation
1532	              Edge-to-Edge (PWE3) Architecture", RFC 3985,
1533	              DOI 10.17487/RFC3985, March 2005,
1534	              <https://www.rfc-editor.org/info/rfc3985>.

1536	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1537	              Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
1538	              December 2005, <https://www.rfc-editor.org/info/rfc4301>.

1540	   [RFC5374]  Weis, B., Gross, G., and D. Ignjatic, "Multicast
1541	              Extensions to the Security Architecture for the Internet
1542	              Protocol", RFC 5374, DOI 10.17487/RFC5374, November 2008,
1543	              <https://www.rfc-editor.org/info/rfc5374>.

1545	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
1546	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
1547	              2010, <https://www.rfc-editor.org/info/rfc6040>.

1549	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1550	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
1551	              eXtensible Local Area Network (VXLAN): A Framework for
1552	              Overlaying Virtualized Layer 2 Networks over Layer 3
1553	              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
1554	              <https://www.rfc-editor.org/info/rfc7348>.

1556	   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
1557	              Rekhter, "Framework for Data Center (DC) Network
1558	              Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
1559	              2014, <https://www.rfc-editor.org/info/rfc7365>.

1561	   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
1562	              Virtualization Using Generic Routing Encapsulation",
1563	              RFC 7637, DOI 10.17487/RFC7637, September 2015,
1564	              <https://www.rfc-editor.org/info/rfc7637>.

1566	   [RFC8014]  Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
1567	              Narten, "An Architecture for Data-Center Network
1568	              Virtualization over Layer 3 (NVO3)", RFC 8014,
1569	              DOI 10.17487/RFC8014, December 2016,
1570	              <https://www.rfc-editor.org/info/rfc8014>.

1572	   [RFC8086]  Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE-
1573	              in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086,
1574	              March 2017, <https://www.rfc-editor.org/info/rfc8086>.

1576	   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
1577	              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
1578	              DOI 10.17487/RFC8201, July 2017,
1579	              <https://www.rfc-editor.org/info/rfc8201>.

1581	   [RFC8293]  Ghanwani, A., Dunbar, L., McBride, M., Bannai, V., and R.
1582	              Krishnan, "A Framework for Multicast in Network
1583	              Virtualization over Layer 3", RFC 8293,
1584	              DOI 10.17487/RFC8293, January 2018,
1585	              <https://www.rfc-editor.org/info/rfc8293>.

1587	   [VL2]      "VL2: A Scalable and Flexible Data Center Network", ACM
1588	              SIGCOMM Computer Communication Review,
1589	              DOI 10.1145/1594977.1592576, 2009,
1590	              <http://www.sigcomm.org/sites/default/files/ccr/
1591	              papers/2009/October/1594977-1592576.pdf>.

1593	Authors' Addresses

1595	   Jesse Gross (editor)

1597	   Email: jesse@kernel.org

1599	   Ilango Ganga (editor)
1600	   Intel Corporation
1601	   2200 Mission College Blvd.
1602	   Santa Clara, CA  95054
1603	   USA

1605	   Email: ilango.s.ganga@intel.com

1607	   T. Sridhar (editor)
1608	   VMware, Inc.
1609	   3401 Hillview Ave.
1610	   Palo Alto, CA  94304
1611	   USA

1613	   Email: tsridhar@vmware.com