idnits 2.17.1 

draft-ietf-nvo3-geneve-14.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 12, 2019) is 1687 days in the past.  Is
     this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-intarea-tunnels-09

  == Outdated reference: A later version (-12) exists of
     draft-ietf-nvo3-encap-02

  -- Obsolete informational reference (is this intentional?): RFC 2460
     (Obsoleted by RFC 8200)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                      J. Gross, Ed.
3	Internet-Draft
4	Intended status: Standards Track                           I. Ganga, Ed.
5	Expires: March 15, 2020                                            Intel
6	                                                         T. Sridhar, Ed.
7	                                                                  VMware
8	                                                      September 12, 2019

10	          Geneve: Generic Network Virtualization Encapsulation
11	                       draft-ietf-nvo3-geneve-14

13	Abstract

15	   Network virtualization involves the cooperation of devices with a
16	   wide variety of capabilities such as software and hardware tunnel
17	   endpoints, transit fabrics, and centralized control clusters.  As a
18	   result of their role in tying together different elements in the
19	   system, the requirements on tunnels are influenced by all of these
20	   components.  Flexibility is therefore the most important aspect of a
21	   tunnel protocol if it is to keep pace with the evolution of the
22	   system.  This document describes Geneve, an encapsulation protocol
23	   designed to recognize and accommodate these changing capabilities and
24	   needs.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on March 15, 2020.

43	Copyright Notice

45	   Copyright (c) 2019 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (https://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
61	     1.1.  Requirements Language . . . . . . . . . . . . . . . . . .   4
62	     1.2.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   4
63	   2.  Design Requirements . . . . . . . . . . . . . . . . . . . . .   5
64	     2.1.  Control Plane Independence  . . . . . . . . . . . . . . .   6
65	     2.2.  Data Plane Extensibility  . . . . . . . . . . . . . . . .   7
66	       2.2.1.  Efficient Implementation  . . . . . . . . . . . . . .   7
67	     2.3.  Use of Standard IP Fabrics  . . . . . . . . . . . . . . .   8
68	   3.  Geneve Encapsulation Details  . . . . . . . . . . . . . . . .   9
69	     3.1.  Geneve Packet Format Over IPv4  . . . . . . . . . . . . .   9
70	     3.2.  Geneve Packet Format Over IPv6  . . . . . . . . . . . . .  10
71	     3.3.  UDP Header  . . . . . . . . . . . . . . . . . . . . . . .  12
72	     3.4.  Tunnel Header Fields  . . . . . . . . . . . . . . . . . .  13
73	     3.5.  Tunnel Options  . . . . . . . . . . . . . . . . . . . . .  14
74	       3.5.1.  Options Processing  . . . . . . . . . . . . . . . . .  16
75	   4.  Implementation and Deployment Considerations  . . . . . . . .  17
76	     4.1.  Applicability Statement . . . . . . . . . . . . . . . . .  17
77	     4.2.  Congestion Control Functionality  . . . . . . . . . . . .  18
78	     4.3.  UDP Checksum  . . . . . . . . . . . . . . . . . . . . . .  18
79	       4.3.1.  UDP Zero Checksum Handling with IPv6  . . . . . . . .  19
80	     4.4.  Encapsulation of Geneve in IP . . . . . . . . . . . . . .  20
81	       4.4.1.  IP Fragmentation  . . . . . . . . . . . . . . . . . .  21
82	       4.4.2.  DSCP, ECN and TTL . . . . . . . . . . . . . . . . . .  21
83	       4.4.3.  Broadcast and Multicast . . . . . . . . . . . . . . .  22
84	       4.4.4.  Unidirectional Tunnels  . . . . . . . . . . . . . . .  23
85	     4.5.  Constraints on Protocol Features  . . . . . . . . . . . .  23
86	       4.5.1.  Constraints on Options  . . . . . . . . . . . . . . .  23
87	     4.6.  NIC Offloads  . . . . . . . . . . . . . . . . . . . . . .  24
88	     4.7.  Inner VLAN Handling . . . . . . . . . . . . . . . . . . .  24
89	   5.  Interoperability Issues . . . . . . . . . . . . . . . . . . .  25
90	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  25
91	     6.1.  Data Confidentiality  . . . . . . . . . . . . . . . . . .  26
92	       6.1.1.  Inter-Data Center Traffic . . . . . . . . . . . . . .  26
93	     6.2.  Data Integrity  . . . . . . . . . . . . . . . . . . . . .  27
94	     6.3.  Authentication of NVE peers . . . . . . . . . . . . . . .  27
95	     6.4.  Options Interpretation by Transit Devices . . . . . . . .  28
96	     6.5.  Multicast/Broadcast . . . . . . . . . . . . . . . . . . .  28
97	     6.6.  Control Plane Communications  . . . . . . . . . . . . . .  28
98	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
99	   8.  Contributors  . . . . . . . . . . . . . . . . . . . . . . . .  29
100	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  30
101	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  31
102	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  31
103	     10.2.  Informative References . . . . . . . . . . . . . . . . .  32
104	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35

106	1.  Introduction

108	   Networking has long featured a variety of tunneling, tagging, and
109	   other encapsulation mechanisms.  However, the advent of network
110	   virtualization has caused a surge of renewed interest and a
111	   corresponding increase in the introduction of new protocols.  The
112	   large number of protocols in this space, ranging all the way from
113	   VLANs [IEEE.802.1Q_2014] and MPLS [RFC3031] through the more recent
114	   VXLAN [RFC7348] (Virtual eXtensible Local Area Network) and NVGRE
115	   [RFC7637] (Network Virtualization Using Generic Routing
116	   Encapsulation), often leads to questions about the need for new
117	   encapsulation formats and what it is about network virtualization in
118	   particular that leads to their proliferation.

120	   While many encapsulation protocols seek to simply partition the
121	   underlay network or bridge between two domains, network
122	   virtualization views the transit network as providing connectivity
123	   between multiple components of a distributed system.  In many ways
124	   this system is similar to a chassis switch with the IP underlay
125	   network playing the role of the backplane and tunnel endpoints on the
126	   edge as line cards.  When viewed in this light, the requirements
127	   placed on the tunnel protocol are significantly different in terms of
128	   the quantity of metadata necessary and the role of transit nodes.

130	   Current work such as [VL2] (A Scalable and Flexible Data Center
131	   Network) and the NVO3 Data Plane Requirements
132	   [I-D.ietf-nvo3-dataplane-requirements] have described some of the
133	   properties that the data plane must have to support network
134	   virtualization.  However, one additional defining requirement is the
135	   need to carry system state along with the packet data.  The use of
136	   some metadata is certainly not a foreign concept - nearly all
137	   protocols used for virtualization have at least 24 bits of identifier
138	   space as a way to partition between tenants.  This is often described
139	   as overcoming the limits of 12-bit VLANs, and when seen in that
140	   context, or any context where it is a true tenant identifier, 16
141	   million possible entries is a large number.  However, the reality is
142	   that the metadata is not exclusively used to identify tenants and
143	   encoding other information quickly starts to crowd the space.  In
144	   fact, when compared to the tags used to exchange metadata between
145	   line cards on a chassis switch, 24-bit identifiers start to look
146	   quite small.  There are nearly endless uses for this metadata,
147	   ranging from storing input ports for simple security policies to
148	   service based context for interposing advanced middleboxes.

150	   Existing tunnel protocols have each attempted to solve different
151	   aspects of these new requirements, only to be quickly rendered out of
152	   date by changing control plane implementations and advancements.
153	   Furthermore, software and hardware components and controllers all
154	   have different advantages and rates of evolution - a fact that should
155	   be viewed as a benefit, not a liability or limitation.  This draft
156	   describes Geneve, a protocol which seeks to avoid these problems by
157	   providing a framework for tunneling for network virtualization rather
158	   than being prescriptive about the entire system.

160	1.1.  Requirements Language

162	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
163	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
164	   "OPTIONAL" in this document are to be interpreted as described in BCP
165	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
166	   capitals, as shown here.

168	1.2.  Terminology

170	   The NVO3 framework [RFC7365] defines many of the concepts commonly
171	   used in network virtualization.  In addition, the following terms are
172	   specifically meaningful in this document:

174	   Checksum offload.  An optimization implemented by many NICs (Network
175	   Interface Controller) which enables computation and verification of
176	   upper layer protocol checksums in hardware on transmit and receive,
177	   respectively.  This typically includes IP and TCP/UDP checksums which
178	   would otherwise be computed by the protocol stack in software.

180	   Clos network.  A technique for composing network fabrics larger than
181	   a single switch while maintaining non-blocking bandwidth across
182	   connection points.  ECMP is used to divide traffic across the
183	   multiple links and switches that constitute the fabric.  Sometimes
184	   termed "leaf and spine" or "fat tree" topologies.

186	   ECMP.  Equal Cost Multipath.  A routing mechanism for selecting from
187	   among multiple best next hop paths by hashing packet headers in order
188	   to better utilize network bandwidth while avoiding reordering of
189	   packets within a flow.

191	   Geneve.  Generic Network Virtualization Encapsulation.  The tunnel
192	   protocol described in this document.

194	   LRO.  Large Receive Offload.  The receive-side equivalent function of
195	   LSO, in which multiple protocol segments (primarily TCP) are
196	   coalesced into larger data units.

198	   NIC.  Network Interface Controller.  Also called as Network Interface
199	   Card or Network Adapter.  A NIC could be part of a tunnel endpoint or
200	   transit device and can either process Geneve packets or aid in the
201	   processing of Geneve packets.

203	   Transit device.  A forwarding element (e.g. router or switch) along
204	   the path of the tunnel making up part of the Underlay Network.  A
205	   transit device MAY be capable of understanding the Geneve packet
206	   format but does not originate or terminate Geneve packets.

208	   LSO.  Large Segmentation Offload.  A function provided by many
209	   commercial NICs that allows data units larger than the MTU to be
210	   passed to the NIC to improve performance, the NIC being responsible
211	   for creating smaller segments of size less than or equal to the MTU
212	   with correct protocol headers.  When referring specifically to TCP/
213	   IP, this feature is often known as TSO (TCP Segmentation Offload).

215	   Tunnel endpoint.  A component performing encapsulation and
216	   decapsulation of packets, such as Ethernet frames or IP datagrams, in
217	   Geneve headers.  As the ultimate consumer of any tunnel metadata,
218	   tunnel endpoints have the highest level of requirements for parsing
219	   and interpreting tunnel headers.  Tunnel endpoints may consist of
220	   either software or hardware implementations or a combination of the
221	   two.  Tunnel endpoints are frequently a component of an NVE (Network
222	   Virtualization Edge) but may also be found in middleboxes or other
223	   elements making up an NVO3 Network.

225	   VM.  Virtual Machine.

227	2.  Design Requirements

229	   Geneve is designed to support network virtualization use cases, where
230	   tunnels are typically established to act as a backplane between the
231	   virtual switches residing in hypervisors, physical switches, or
232	   middleboxes or other appliances.  An arbitrary IP network can be used
233	   as an underlay although Clos networks composed using ECMP links are a
234	   common choice to provide consistent bisectional bandwidth across all
235	   connection points.  Many of the concepts of network virtualization
236	   overlays over Layer 3 IP networks are described in NVO3 Framework
237	   framework [RFC7365].  Figure 1 shows an example of a hypervisor, top
238	   of rack switch for connectivity to physical servers, and a WAN uplink
239	   connected using Geneve tunnels over a simplified Clos network.  These
240	   tunnels are used to encapsulate and forward frames from the attached
241	   components such as VMs or physical links.

243	     +---------------------+           +-------+  +------+
244	     | +--+  +-------+---+ |           |Transit|--|Top of|==Physical
245	     | |VM|--|       |   | | +------+ /|Router |  | Rack |==Servers
246	     | +--+  |Virtual|NIC|---|Top of|/ +-------+\/+------+
247	     | +--+  |Switch |   | | | Rack |\ +-------+/\+------+
248	     | |VM|--|       |   | | +------+ \|Transit|  |Uplink|   WAN
249	     | +--+  +-------+---+ |           |Router |--|      |=========>
250	     +---------------------+           +-------+  +------+
251	            Hypervisor

253	                 ()===================================()
254	                         Switch-Switch Geneve Tunnels

256	                    Figure 1: Sample Geneve Deployment

258	   To support the needs of network virtualization, the tunnel protocol
259	   should be able to take advantage of the differing (and evolving)
260	   capabilities of each type of device in both the underlay and overlay
261	   networks.  This results in the following requirements being placed on
262	   the data plane tunneling protocol:

264	   o  The data plane is generic and extensible enough to support current
265	      and future control planes.

267	   o  Tunnel components are efficiently implementable in both hardware
268	      and software without restricting capabilities to the lowest common
269	      denominator.

271	   o  High performance over existing IP fabrics.

273	   These requirements are described further in the following
274	   subsections.

276	2.1.  Control Plane Independence

278	   Although some protocols for network virtualization have included a
279	   control plane as part of the tunnel format specification (most
280	   notably, the VXLAN spec prescribed a multicast learning- based
281	   control plane), these specifications have largely been treated as
282	   describing only the data format.  The VXLAN packet format has
283	   actually seen a wide variety of control planes built on top of it.

285	   There is a clear advantage in settling on a data format: most of the
286	   protocols are only superficially different and there is little
287	   advantage in duplicating effort.  However, the same cannot be said of
288	   control planes, which are diverse in very fundamental ways.  The case
289	   for standardization is also less clear given the wide variety in
290	   requirements, goals, and deployment scenarios.

292	   As a result of this reality, Geneve is a pure tunnel format
293	   specification that is capable of fulfilling the needs of many control
294	   planes by explicitly not selecting any one of them.  This
295	   simultaneously promotes a shared data format and reduces the chance
296	   of obsolescence by future control plane enhancements.

298	2.2.  Data Plane Extensibility

300	   Achieving the level of flexibility needed to support current and
301	   future control planes effectively requires an options infrastructure
302	   to allow new metadata types to be defined, deployed, and either
303	   finalized or retired.  Options also allow for differentiation of
304	   products by encouraging independent development in each vendor's core
305	   specialty, leading to an overall faster pace of advancement.  By far
306	   the most common mechanism for implementing options is Type-Length-
307	   Value (TLV) format.

309	   It should be noted that while options can be used to support non-
310	   wirespeed control packets, they are equally important on data packets
311	   as well to segregate and direct forwarding (for instance, the
312	   examples given before of input port based security policies and
313	   service interposition both require tags to be placed on data
314	   packets).  Therefore, while it would be desirable to limit the
315	   extensibility to only control packets for the purposes of simplifying
316	   the datapath, that would not satisfy the design requirements.

318	2.2.1.  Efficient Implementation

320	   There is often a conflict between software flexibility and hardware
321	   performance that is difficult to resolve.  For a given set of
322	   functionality, it is obviously desirable to maximize performance.
323	   However, that does not mean new features that cannot be run at a
324	   desired speed today should be disallowed.  Therefore, for a protocol
325	   to be efficiently implementable means that a set of common
326	   capabilities can be reasonably handled across platforms along with a
327	   graceful mechanism to handle more advanced features in the
328	   appropriate situations.

330	   The use of a variable length header and options in a protocol often
331	   raises questions about whether it is truly efficiently implementable
332	   in hardware.  To answer this question in the context of Geneve, it is
333	   important to first divide "hardware" into two categories: tunnel
334	   endpoints and transit devices.

336	   Tunnel endpoints must be able to parse the variable header, including
337	   any options, and take action.  Since these devices are actively
338	   participating in the protocol, they are the most affected by Geneve.

340	   However, as tunnel endpoints are the ultimate consumers of the data,
341	   transmitters can tailor their output to the capabilities of the
342	   recipient.  As new functionality becomes sufficiently well defined to
343	   add to tunnel endpoints, supporting options can be designed using
344	   ordering restrictions and other techniques to ease parsing.

346	   Options, if present in the packet, MUST only be generated and
347	   terminated by tunnel endpoints.  Transit devices MAY be able to
348	   interpret the options, however, as non-terminating devices, transit
349	   devices do not originate or terminate the Geneve packet, hence MUST
350	   NOT modify Geneve headers and MUST NOT insert or delete options,
351	   which is the responsibility of tunnel endpoints.  The participation
352	   of transit devices in interpreting options is OPTIONAL.

354	   Further, either tunnel endpoints or transit devices MAY use offload
355	   capabilities of NICs such as checksum offload to improve the
356	   performance of Geneve packet processing.  The presence of a Geneve
357	   variable length header SHOULD NOT prevent the tunnel endpoints and
358	   transit devices from using such offload capabilities.

360	2.3.  Use of Standard IP Fabrics

362	   IP has clearly cemented its place as the dominant transport mechanism
363	   and many techniques have evolved over time to make it robust,
364	   efficient, and inexpensive.  As a result, it is natural to use IP
365	   fabrics as a transit network for Geneve.  Fortunately, the use of IP
366	   encapsulation and addressing is enough to achieve the primary goal of
367	   delivering packets to the correct point in the network through
368	   standard switching and routing.

370	   In addition, nearly all underlay fabrics are designed to exploit
371	   parallelism in traffic to spread load across multiple links without
372	   introducing reordering in individual flows.  These equal cost
373	   multipathing (ECMP) techniques typically involve parsing and hashing
374	   the addresses and port numbers from the packet to select an outgoing
375	   link.  However, the use of tunnels often results in poor ECMP
376	   performance without additional knowledge of the protocol as the
377	   encapsulated traffic is hidden from the fabric by design and only
378	   tunnel endpoint addresses are available for hashing.

380	   Since it is desirable for Geneve to perform well on these existing
381	   fabrics, it is necessary for entropy from encapsulated packets to be
382	   exposed in the tunnel header.  The most common technique for this is
383	   to use the UDP source port, which is discussed further in
384	   Section 3.3.

386	3.  Geneve Encapsulation Details

388	   The Geneve packet format consists of a compact tunnel header
389	   encapsulated in UDP over either IPv4 or IPv6.  A small fixed tunnel
390	   header provides control information plus a base level of
391	   functionality and interoperability with a focus on simplicity.  This
392	   header is then followed by a set of variable options to allow for
393	   future innovation.  Finally, the payload consists of a protocol data
394	   unit of the indicated type, such as an Ethernet frame.  Section 3.1
395	   and Section 3.2 illustrate the Geneve packet format transported (for
396	   example) over Ethernet along with an Ethernet payload.

398	3.1.  Geneve Packet Format Over IPv4

400	       0                   1                   2                   3
401	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
402	   Outer Ethernet Header:
403	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
404	      |                 Outer Destination MAC Address                 |
405	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
406	      | Outer Destination MAC Address |   Outer Source MAC Address    |
407	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
408	      |                   Outer Source MAC Address                    |
409	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
410	      |Optional Ethertype=C-Tag 802.1Q|  Outer VLAN Tag Information   |
411	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
412	      |       Ethertype=0x0800        |
413	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

415	   Outer IPv4 Header:
416	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
417	      |Version|  IHL  |Type of Service|          Total Length         |
418	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
419	      |         Identification        |Flags|      Fragment Offset    |
420	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421	      |  Time to Live |Protocol=17 UDP|         Header Checksum       |
422	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423	      |                     Outer Source IPv4 Address                 |
424	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
425	      |                   Outer Destination IPv4 Address              |
426	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

428	   Outer UDP Header:
429	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
430	      |       Source Port = xxxx      |       Dest Port = 6081        |
431	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
432	      |           UDP Length          |        UDP Checksum           |
433	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

435	   Geneve Header:
436	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
437	      |Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |
438	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
439	      |        Virtual Network Identifier (VNI)       |    Reserved   |
440	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
441	      |                    Variable Length Options                    |
442	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

444	   Inner Ethernet Header (example payload):
445	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
446	      |                 Inner Destination MAC Address                 |
447	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
448	      | Inner Destination MAC Address |   Inner Source MAC Address    |
449	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
450	      |                   Inner Source MAC Address                    |
451	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	      |Optional Ethertype=C-Tag 802.1Q|  Inner VLAN Tag Information   |
453	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

455	   Payload:
456	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
457	      | Ethertype of Original Payload |                               |
458	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
459	      |                                  Original Ethernet Payload    |
460	      |                                                               |
461	      | (Note that the original Ethernet Frame's FCS is not included) |
462	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

464	   Frame Check Sequence:
465	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
466	      |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
467	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

469	3.2.  Geneve Packet Format Over IPv6

471	       0                   1                   2                   3
472	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
473	   Outer Ethernet Header:
474	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
475	      |                 Outer Destination MAC Address                 |
476	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
477	      | Outer Destination MAC Address |   Outer Source MAC Address    |
478	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
479	      |                   Outer Source MAC Address                    |
480	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
481	      |Optional Ethertype=C-Tag 802.1Q|  Outer VLAN Tag Information   |
482	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
483	      |       Ethertype=0x86DD        |
484	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

486	   Outer IPv6 Header:
487	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
488	      |Version| Traffic Class |           Flow Label                  |
489	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
490	      |         Payload Length        | NxtHdr=17 UDP |   Hop Limit   |
491	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
492	      |                                                               |
493	      +                                                               +
494	      |                                                               |
495	      +                     Outer Source IPv6 Address                 +
496	      |                                                               |
497	      +                                                               +
498	      |                                                               |
499	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
500	      |                                                               |
501	      +                                                               +
502	      |                                                               |
503	      +                  Outer Destination IPv6 Address               +
504	      |                                                               |
505	      +                                                               +
506	      |                                                               |
507	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

509	   Outer UDP Header:
510	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
511	      |       Source Port = xxxx      |       Dest Port = 6081        |
512	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
513	      |           UDP Length          |        UDP Checksum           |
514	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

516	   Geneve Header:
517	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
518	      |Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |
519	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
520	      |        Virtual Network Identifier (VNI)       |    Reserved   |
521	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
522	      |                    Variable Length Options                    |
523	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

525	   Inner Ethernet Header (example payload):
526	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
527	      |                 Inner Destination MAC Address                 |
528	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
529	      | Inner Destination MAC Address |   Inner Source MAC Address    |
530	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
531	      |                   Inner Source MAC Address                    |
532	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
533	      |Optional Ethertype=C-Tag 802.1Q|  Inner VLAN Tag Information   |
534	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

536	   Payload:
537	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
538	      | Ethertype of Original Payload |                               |
539	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
540	      |                                  Original Ethernet Payload    |
541	      |                                                               |
542	      | (Note that the original Ethernet Frame's FCS is not included) |
543	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

545	   Frame Check Sequence:
546	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
547	      |   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
548	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

550	3.3.  UDP Header

552	   The use of an encapsulating UDP [RFC0768] header follows the
553	   connectionless semantics of Ethernet and IP in addition to providing
554	   entropy to routers performing ECMP.  The header fields are therefore
555	   interpreted as follows:

557	   Source port:  A source port selected by the originating tunnel
558	      endpoint.  This source port SHOULD be the same for all packets
559	      belonging to a single encapsulated flow to prevent reordering due
560	      to the use of different paths.  To encourage an even distribution
561	      of flows across multiple links, the source port SHOULD be
562	      calculated using a hash of the encapsulated packet headers using,
563	      for example, a traditional 5-tuple.  Since the port represents a
564	      flow identifier rather than a true UDP connection, the entire
565	      16-bit range MAY be used to maximize entropy.

567	   Dest port:  IANA has assigned port 6081 as the fixed well-known
568	      destination port for Geneve.  Although the well-known value should
569	      be used by default, it is RECOMMENDED that implementations make
570	      this configurable.  The chosen port is used for identification of
571	      Geneve packets and MUST NOT be reversed for different ends of a
572	      connection as is done with TCP.

574	   UDP length:  The length of the UDP packet including the UDP header.

576	   UDP checksum:  In order to protect the Geneve header, options and
577	      payload from potential data corruption, UDP checksum SHOULD be
578	      generated as specified in [RFC0768] and [RFC1112] when Geneve is
579	      encapsulated in IPv4.  To protect the IP header, Geneve header,
580	      options and payload from potential data corruption, the UDP
581	      checksum MUST be generated by default as specified in [RFC0768]
582	      and [RFC2460] when Geneve is encapsulated in IPv6.  Upon receiving
583	      such packets with non-zero UDP checksum, the receiving tunnel
584	      endpoints MUST validate the checksum.  If the checksum is not
585	      correct, the packet MUST be dropped, otherwise the packet MUST be
586	      accepted for decapsulation.

588	      Under certain conditions, the UDP checksum MAY be set to zero on
589	      transmit for packets encapsulated in both IPv4 and IPv6 [RFC6935].
590	      See Section 4.3 for additional requirements that apply for using
591	      zero UDP checksum with IPv4 and IPv6.  Disabling the use of UDP
592	      checksums is an operational consideration that should take into
593	      account the risks and effects of packet corruption.

595	3.4.  Tunnel Header Fields

597	   Ver (2 bits):  The current version number is 0.  Packets received by
598	      a tunnel endpoint with an unknown version MUST be dropped.
599	      Transit devices interpreting Geneve packets with an unknown
600	      version number MUST treat them as UDP packets with an unknown
601	      payload.

603	   Opt Len (6 bits):  The length of the options fields, expressed in
604	      four byte multiples, not including the eight byte fixed tunnel
605	      header.  This results in a minimum total Geneve header size of 8
606	      bytes and a maximum of 260 bytes.  The start of the payload
607	      headers can be found using this offset from the end of the base
608	      Geneve header.

610	   O (1 bit):  Control packet.  This packet contains a control message.
611	      Control messages are sent between tunnel endpoints.  Tunnel
612	      Endpoints MUST NOT forward the payload and transit devices MUST
613	      NOT attempt to interpret it.  Since these are infrequent control
614	      messages, it is RECOMMENDED that tunnel endpoints direct these
615	      packets to a high priority control queue (for example, to direct
616	      the packet to a general purpose CPU from a forwarding ASIC or to
617	      separate out control traffic on a NIC).  Transit devices MUST NOT
618	      alter forwarding behavior on the basis of this bit, such as ECMP
619	      link selection.

621	   C (1 bit):  Critical options present.  One or more options has the
622	      critical bit set (see Section 3.5).  If this bit is set then
623	      tunnel endpoints MUST parse the options list to interpret any
624	      critical options.  On tunnel endpoints where option parsing is not
625	      supported the packet MUST be dropped on the basis of the 'C' bit
626	      in the base header.  If the bit is not set tunnel endpoints MAY
627	      strip all options using 'Opt Len' and forward the decapsulated
628	      packet.  Transit devices MUST NOT drop packets on the basis of
629	      this bit.

631	      The critical bit allows hardware implementations the flexibility
632	      to handle options processing in the hardware fastpath or in the
633	      exception (slow) path without the need to process all the options.
634	      For example, a critical option such as secure hash to provide
635	      Geneve header integrity check must be processed by tunnel
636	      endpoints and typically processed in the hardware fastpath.

638	   Rsvd. (6 bits):  Reserved field, which MUST be zero on transmission
639	      and MUST be ignored on receipt.

641	   Protocol Type (16 bits):  The type of the protocol data unit
642	      appearing after the Geneve header.  This follows the EtherType
643	      [ETYPES] convention with Ethernet itself being represented by the
644	      value 0x6558.

646	   Virtual Network Identifier (VNI) (24 bits):  An identifier for a
647	      unique element of a virtual network.  In many situations this may
648	      represent an L2 segment, however, the control plane defines the
649	      forwarding semantics of decapsulated packets.  The VNI MAY be used
650	      as part of ECMP forwarding decisions or MAY be used as a mechanism
651	      to distinguish between overlapping address spaces contained in the
652	      encapsulated packet when load balancing across CPUs.

654	   Reserved (8 bits):  Reserved field which MUST be zero on transmission
655	      and ignored on receipt.

657	   Transit devices MUST maintain consistent forwarding behavior
658	   irrespective of the value of 'Opt Len', including ECMP link
659	   selection.  These devices SHOULD be able to forward packets
660	   containing options without resorting to a slow path.

662	3.5.  Tunnel Options
663	   0                   1                   2                   3
664	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
665	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
666	   |          Option Class         |      Type     |R|R|R| Length  |
667	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
668	   |                      Variable Option Data                     |
669	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

671	                               Geneve Option

673	   The base Geneve header is followed by zero or more options in Type-
674	   Length-Value format.  Each option consists of a four byte option
675	   header and a variable amount of option data interpreted according to
676	   the type.

678	   Option Class (16 bits):  Namespace for the 'Type' field.  IANA will
679	      be requested to create a "Geneve Option Class" registry to
680	      allocate identifiers for organizations, technologies, and vendors
681	      that have an interest in creating types for options.  Each
682	      organization may allocate types independently to allow
683	      experimentation and rapid innovation.  It is expected that over
684	      time certain options will become well known and a given
685	      implementation may use option types from a variety of sources.  In
686	      addition, IANA will be requested to reserve specific ranges for
687	      standardized and experimental options.

689	   Type (8 bits):  Type indicating the format of the data contained in
690	      this option.  Options are primarily designed to encourage future
691	      extensibility and innovation and so standardized forms of these
692	      options will be defined in a separate document.

694	      The high order bit of the option type indicates that this is a
695	      critical option.  If the receiving tunnel endpoint does not
696	      recognize this option and this bit is set then the packet MUST be
697	      dropped.  If the 'C' bit (critical bit) is set in any option then
698	      the 'C' bit in the Geneve base header MUST also be set.  Transit
699	      devices MUST NOT drop packets on the basis of this bit.  The
700	      following figure shows the location of the 'C' bit in the 'Type'
701	      field:

703	      0 1 2 3 4 5 6 7 8
704	      +-+-+-+-+-+-+-+-+
705	      |C|    Type     |
706	      +-+-+-+-+-+-+-+-+

708	      The requirement to drop a packet with an unknown option with the
709	      'C' bit set applies to the entire tunnel endpoint system and not a
710	      particular component of the implementation.  For example, in a
711	      system comprised of a forwarding ASIC and a general purpose CPU,
712	      this does not mean that the packet must be dropped in the ASIC.
713	      An implementation may send the packet to the CPU using a rate-
714	      limited control channel for slow-path exception handling.

716	   R (3 bits):  Option control flags reserved for future use.  MUST be
717	      zero on transmission and ignored on receipt.

719	   Length (5 bits):  Length of the option, expressed in four byte
720	      multiples excluding the option header.  The total length of each
721	      option may be between 4 and 128 bytes.  A value of 0 in the Length
722	      field implies an option with only the option header without the
723	      variable option data.  Packets in which the total length of all
724	      options is not equal to the 'Opt Len' in the base header are
725	      invalid and MUST be silently dropped if received by a tunnel
726	      endpoint that processes the options.

728	   Variable Option Data:  Option data interpreted according to 'Type'.

730	3.5.1.  Options Processing

732	   Geneve options are intended to be originated and processed by tunnel
733	   endpoints.  However, options MAY be interpreted by transit devices
734	   along the tunnel path.  Transit devices not interpreting Geneve
735	   headers (that may or may not include options) MUST handle Geneve
736	   packets as any other UDP packet and maintain consistent forwarding
737	   behavior.

739	   In tunnel endpoints, the generation and interpretation of options is
740	   determined by the control plane, which is out of the scope of this
741	   document.  However, to ensure interoperability between heterogeneous
742	   devices some requirements are imposed on options and the devices that
743	   process them:

745	   o  Receiving tunnel endpoints MUST drop packets containing unknown
746	      options with the 'C' bit set in the option type.  Conversely,
747	      transit devices MUST NOT drop packets as a result of encountering
748	      unknown options, including those with the 'C' bit set.

750	   o  Some options may be defined in such a way that the position in the
751	      option list is significant.  Options MUST NOT be changed by
752	      transit devices.

754	   o  An option SHOULD NOT be dependent upon any other option in the
755	      packet, i.e., options can be processed independently of one
756	      another.  Architecturally, options are intended to be self-
757	      descriptive and independent.  This enables parallelism in option
758	      processing and reduces implementation complexity.

760	   When designing a Geneve option, it is important to consider how the
761	   option will evolve in the future.  Once an option is defined it is
762	   reasonable to expect that implementations may come to depend on a
763	   specific behavior.  As a result, the scope of any future changes must
764	   be carefully described upfront.

766	   Unexpectedly significant interoperability issues may result from
767	   changing the length of an option that was defined to be a certain
768	   size.  A particular option is specified to have either a fixed
769	   length, which is constant, or a variable length, which may change
770	   over time or for different use cases.  This property is part of the
771	   definition of the option and conveyed by the 'Type'.  For fixed
772	   length options, some implementations may choose to ignore the length
773	   field in the option header and instead parse based on the well known
774	   length associated with the type.  In this case, redefining the length
775	   will impact not only parsing of the option in question but also any
776	   options that follow.  Therefore, options that are defined to be fixed
777	   length in size MUST NOT be redefined to a different length.  Instead,
778	   a new 'Type' should be allocated.

780	   Options may be processed by NIC hardware utilizing offloads (e.g.
781	   LSO and LRO) as described in Section 4.6.  Careful consideration
782	   should be given to how the offload capabilities outlined in
783	   Section 4.6 impact an option's design.

785	4.  Implementation and Deployment Considerations

787	4.1.  Applicability Statement

789	   Geneve is a network virtualization overlay encapsulation protocol
790	   designed to establish tunnels between NVEs over an existing IP
791	   network.  It is intended for use in public or private data center
792	   environments, for deploying multi-tenant overlay networks over an
793	   existing IP underlay network.

795	   Geneve is a UDP based encapsulation protocol transported over
796	   existing IPv4 and IPv6 networks.  Hence, as a UDP based protocol,
797	   Geneve adheres to the UDP usage guidelines as specified in [RFC8085].
798	   The applicability of these guidelines are dependent on the underlay
799	   IP network and the nature of Geneve payload protocol (example TCP/IP,
800	   IP/Ethernet).

802	   [RFC8085] outlines two applicability scenarios for UDP applications,
803	   1) general Internet and 2) controlled environment.  The controlled
804	   environment means a single administrative domain or adjacent set of
805	   cooperating domains.  A network in a controlled environment can be
806	   managed to operate under certain conditions whereas in general
807	   Internet this cannot be done.  Hence requirements for a tunnel
808	   protocol operating under a controlled environment can be less
809	   restrictive than the requirements of general internet.

811	   Geneve is intended to be deployed in a data center network
812	   environment operated by a single operator or adjacent set of
813	   cooperating network operators that fits with the definition of
814	   controlled environments in [RFC8085].

816	   For the purpose of this document, a traffic-managed controlled
817	   environment (TMCE) is defined as an IP network that is traffic-
818	   engineered and/or otherwise managed (e.g., via use of traffic rate
819	   limiters) to avoid congestion.  The concept of TMCE is outlined in
820	   [RFC8086].  Significant portions of text in Section 4.1 through
821	   Section 4.3 are based on [RFC8086] as applicable to Geneve.

823	   It is the responsibility of the operator to ensure that the
824	   guidelines/requirements in this section are followed as applicable to
825	   their Geneve deployment(s).

827	4.2.  Congestion Control Functionality

829	   Geneve does not natively provide congestion control functionality and
830	   relies on the payload protocol traffic for congestion control.  As
831	   such Geneve MUST be used with congestion controlled traffic or within
832	   a network that is traffic managed to avoid congestion (TMCE).  An
833	   operator of a traffic managed network (TMCE) may avoid congestion by
834	   careful provisioning of their networks, rate-limiting of user data
835	   traffic and traffic engineering according to path capacity.

837	4.3.  UDP Checksum

839	   In order to provide integrity of Geneve headers, options and payload,
840	   for example to avoid mis-delivery of payload to different tenant
841	   systems in case of data corruption, outer UDP checksum SHOULD be used
842	   with Geneve when transported over IPv4.  The UDP checksum provides a
843	   statistical guarantee that a payload was not corrupted in transit.
844	   These integrity checks are not strong from a coding or cryptographic
845	   perspective and are not designed to detect physical-layer errors or
846	   malicious modification of the datagram (see Section 3.4 of
847	   [RFC8085]).  In deployments where such a risk exists, an operator
848	   SHOULD use additional data integrity mechanisms such as offered by
849	   IPSec (see Section 6.2).

851	   An operator MAY choose to disable UDP checksum and use zero checksum
852	   if Geneve packet integrity is provided by other data integrity
853	   mechanisms such as IPsec or additional checksums or if one of the
854	   conditions in Section 4.3.1 a, b, c are met.

856	   By default, UDP checksum MUST be used when Geneve is transported over
857	   IPv6.  A tunnel endpoint MAY be configured for use with zero UDP
858	   checksum if additional requirements in Section 4.3.1 are met.

860	4.3.1.  UDP Zero Checksum Handling with IPv6

862	   When Geneve is used over IPv6, UDP checksum is used to protect IPv6
863	   headers, UDP headers and Geneve headers, options and payload from
864	   potential data corruption.  As such by default Geneve MUST use UDP
865	   checksum when transported over IPv6.  An operator MAY choose to
866	   configure to operate with zero UDP checksum if operating in a traffic
867	   managed controlled environment as stated in Section 4.1 if one of the
868	   following conditions are met.

870	   a.  It is known that the packet corruption is exceptionally unlikely
871	       (perhaps based on knowledge of equipment types in their underlay
872	       network) and the operator is willing to take a risk of undetected
873	       packet corruption

875	   b.  It is judged through observational measurements (perhaps through
876	       historic or current traffic flows that use non zero checksum)
877	       that the level of packet corruption is tolerably low and where
878	       the operator is willing to take the risk of undetected
879	       corruption.

881	   c.  Geneve payload is carrying applications that are tolerant of
882	       misdelivered or corrupted packets (perhaps through higher layer
883	       checksum validation and/or reliability through retransmission)

885	   In addition Geneve tunnel implementations using Zero UDP checksum
886	   MUST meet the following requirements:

888	   1.  Use of UDP checksum over IPv6 MUST be the default configuration
889	       for all Geneve tunnels.

891	   2.  If Geneve is used with zero UDP checksum over IPv6 then such
892	       tunnel endpoint implementation MUST meet all the requirements
893	       specified in section 4 of [RFC6936] and requirements 1 as
894	       specified in section 5 of [RFC6936].

896	   3.  The Geneve tunnel endpoint that decapsulates the tunnel SHOULD
897	       check the source and destination IPv6 addresses are valid for the
898	       Geneve tunnel that is configured to receive Zero UDP checksum and
899	       discard other packets for which such check fails.

901	   4.  The Geneve tunnel endpoint that encapsulates the tunnel MAY use
902	       different IPv6 source addresses for each Geneve tunnel that uses
903	       Zero UDP checksum mode in order to strengthen the decapsulator's
904	       check of the IPv6 source address (i.e the same IPv6 source
905	       address is not to be used with more than one IPv6 destination
906	       address, irrespective of whether that destination address is a
907	       unicast or multicast address).  When this is not possible, it is
908	       RECOMMENDED to use each source address for as few Geneve tunnels
909	       that use zero UDP checksum as is feasible.

911	   5.  Measures SHOULD be taken to prevent Geneve traffic over IPv6 with
912	       zero UDP checksum from escaping into the general Internet.
913	       Examples of such measures include employing packet filters at the
914	       Gateways or edge of Geneve network and/or keeping logical or
915	       physical separation of Geneve network from networks carrying
916	       General Internet.

918	   The above requirements do not change either the requirements
919	   specified in [RFC2460] as modified by [RFC6935] or the requirements
920	   specified in [RFC6936].

922	   The requirement to check the source IPv6 address in addition to the
923	   destination IPv6 address, plus the recommendation against reuse of
924	   source IPv6 addresses among Geneve tunnels collectively provide some
925	   mitigation for the absence of UDP checksum coverage of the IPv6
926	   header.  A traffic-managed controlled environment that satisfies at
927	   least one of three conditions listed at the beginning of this section
928	   provides additional assurance.

930	   Editorial Note (The following paragraph to be removed by the RFC
931	   Editor before publication)

933	   It was discussed during TSVART early review if the level of
934	   requirement for using different IPv6 source addresses for different
935	   tunnel destinations would need to be "MAY" or "SHOULD".  The
936	   discussion concluded that it was appropriate to keep this as "MAY",
937	   since it was considered not realistic for control planes having to
938	   maintain a high level of state on a per tunnel destination basis.  In
939	   addition, the text above provides sufficient guidance to operators
940	   and implementors on possible mitigations.

942	4.4.  Encapsulation of Geneve in IP

944	   As an IP-based tunnel protocol, Geneve shares many properties and
945	   techniques with existing protocols.  The application of some of these
946	   are described in further detail, although in general most concepts
947	   applicable to the IP layer or to IP tunnels generally also function
948	   in the context of Geneve.

950	4.4.1.  IP Fragmentation

952	   It is strongly RECOMMENDED that Path MTU Discovery ([RFC1191],
953	   [RFC8201]) be used by setting the DF bit in the IP header when Geneve
954	   packets are transmitted over IPv4 (this is the default with IPv6).
955	   The use of Path MTU Discovery on the transit network provides the
956	   encapsulating tunnel endpoint with soft-state about the link that it
957	   may use to prevent or minimize fragmentation depending on its role in
958	   the virtualized network.  The NVE control plane MAY use configuration
959	   mechanism or path discovery information to maintain the MTU size of
960	   the tunnel link(s) associated with the tunnel endpoint, so if a
961	   tenant system sends large packets that when encapsulated exceed the
962	   MTU size of the tunnel link, the tunnel endpoint can discard such
963	   packets and send exception messages to the tenant system(s).  If the
964	   tunnel endpoint is associated with a routing or forwarding function
965	   and/or has the capability to send ICMP messages, the encapsulating
966	   tunnel endpoint MAY send ICMP fragmentation needed [RFC0792] or
967	   Packet Too Big [RFC4443] messages to the tenant system(s).  For
968	   example, recommendations/guidance for handling fragmentation in
969	   similar overlay encapsulation services like PWE3 are provided in
970	   section 5.3 of [RFC3985].

972	   Note that some implementations may not be capable of supporting
973	   fragmentation or other less common features of the IP header, such as
974	   options and extension headers.  For example, some of the issues
975	   associated with MTU size and fragmentation in IP tunneling and use of
976	   ICMP messages is outlined in section 4.2 of
977	   [I-D.ietf-intarea-tunnels].

979	   Editorial Note (The following paragraph to be removed by the RFC
980	   Editor before publication)

982	   It was discussed during TSVART early review if the level of
983	   requirement for maintaining tunnel MTU at the ingress has to be "MAY"
984	   or "SHOULD".  The discussion concluded that it was appropriate to
985	   leave this as "MAY", considering the high level of state to be
986	   maintained.

988	4.4.2.  DSCP, ECN and TTL

990	   When encapsulating IP (including over Ethernet) packets in Geneve,
991	   there are several considerations for propagating DSCP and ECN bits
992	   from the inner header to the tunnel on transmission and the reverse
993	   on reception.

995	   [RFC2983] provides guidance for mapping DSCP between inner and outer
996	   IP headers.  Network virtualization is typically more closely aligned
997	   with the Pipe model described, where the DSCP value on the tunnel
998	   header is set based on a policy (which may be a fixed value, one
999	   based on the inner traffic class, or some other mechanism for
1000	   grouping traffic).  Aspects of the Uniform model (which treats the
1001	   inner and outer DSCP value as a single field by copying on ingress
1002	   and egress) may also apply, such as the ability to remark the inner
1003	   header on tunnel egress based on transit marking.  However, the
1004	   Uniform model is not conceptually consistent with network
1005	   virtualization, which seeks to provide strong isolation between
1006	   encapsulated traffic and the physical network.

1008	   [RFC6040] describes the mechanism for exposing ECN capabilities on IP
1009	   tunnels and propagating congestion markers to the inner packets.
1010	   This behavior MUST be followed for IP packets encapsulated in Geneve.

1012	   Though Uniform or Pipe models could be used for TTL (or Hop Limit in
1013	   case of IPv6) handling when tunneling IP packets, Pipe model is more
1014	   aligned with network virtualization.  [RFC2003] provides guidance on
1015	   handling TTL between inner IP header and outer IP tunnels; this model
1016	   is more aligned with the Pipe model and is recommended for use with
1017	   Geneve for network virtualization applications.

1019	4.4.3.  Broadcast and Multicast

1021	   Geneve tunnels may either be point-to-point unicast between two
1022	   tunnel endpoints or may utilize broadcast or multicast addressing.
1023	   It is not required that inner and outer addressing match in this
1024	   respect.  For example, in physical networks that do not support
1025	   multicast, encapsulated multicast traffic may be replicated into
1026	   multiple unicast tunnels or forwarded by policy to a unicast location
1027	   (possibly to be replicated there).

1029	   With physical networks that do support multicast it may be desirable
1030	   to use this capability to take advantage of hardware replication for
1031	   encapsulated packets.  In this case, multicast addresses may be
1032	   allocated in the physical network corresponding to tenants,
1033	   encapsulated multicast groups, or some other factor.  The allocation
1034	   of these groups is a component of the control plane and therefore
1035	   outside of the scope of this document.  When physical multicast is in
1036	   use, the 'C' bit in the Geneve header may be used with groups of
1037	   devices with heterogeneous capabilities as each device can interpret
1038	   only the options that are significant to it if they are not critical.

1040	   In addition, [RFC8293] provides examples of various mechanisms that
1041	   can be used for multicast handling in network virtualization overlay
1042	   networks.

1044	4.4.4.  Unidirectional Tunnels

1046	   Generally speaking, a Geneve tunnel is a unidirectional concept.  IP
1047	   is not a connection oriented protocol and it is possible for two
1048	   tunnel endpoints to communicate with each other using different paths
1049	   or to have one side not transmit anything at all.  As Geneve is an
1050	   IP-based protocol, the tunnel layer inherits these same
1051	   characteristics.

1053	   It is possible for a tunnel to encapsulate a protocol, such as TCP,
1054	   which is connection oriented and maintains session state at that
1055	   layer.  In addition, implementations MAY model Geneve tunnels as
1056	   connected, bidirectional links, such as to provide the abstraction of
1057	   a virtual port.  In both of these cases, bidirectionality of the
1058	   tunnel is handled at a higher layer and does not affect the operation
1059	   of Geneve itself.

1061	4.5.  Constraints on Protocol Features

1063	   Geneve is intended to be flexible to a wide range of current and
1064	   future applications.  As a result, certain constraints may be placed
1065	   on the use of metadata or other aspects of the protocol in order to
1066	   optimize for a particular use case.  For example, some applications
1067	   may limit the types of options which are supported or enforce a
1068	   maximum number or length of options.  Other applications may only
1069	   handle certain encapsulated payload types, such as Ethernet or IP.
1070	   This could be either globally throughout the system or, for example,
1071	   restricted to certain classes of devices or network paths.

1073	   These constraints may be communicated to tunnel endpoints either
1074	   explicitly through a control plane or implicitly by the nature of the
1075	   application.  As Geneve is defined as a data plane protocol that is
1076	   control plane agnostic, the exact mechanism is not defined in this
1077	   document.

1079	4.5.1.  Constraints on Options

1081	   While Geneve options are more flexible, a control plane may restrict
1082	   the number of option TLVs as well as the order and size of the TLVs,
1083	   between tunnel endpoints, to make it simpler for a data plane
1084	   implementation in software or hardware to handle
1085	   [I-D.ietf-nvo3-encap].  For example, there may be some critical
1086	   information such as a secure hash that must be processed in a certain
1087	   order to provide lowest latency.

1089	   A control plane may negotiate a subset of option TLVs and certain TLV
1090	   ordering, as well may limit the total number of option TLVs present
1091	   in the packet, for example, to accommodate hardware capable of
1092	   processing fewer options [I-D.ietf-nvo3-encap].  Hence, a control
1093	   plane needs to have the ability to describe the supported TLVs subset
1094	   and their order to the tunnel endpoints.  In the absence of a control
1095	   plane, alternative configuration mechanisms may be used for this
1096	   purpose.  The exact mechanism is not defined in this document.

1098	4.6.  NIC Offloads

1100	   Modern NICs currently provide a variety of offloads to enable the
1101	   efficient processing of packets.  The implementation of many of these
1102	   offloads requires only that the encapsulated packet be easily parsed
1103	   (for example, checksum offload).  However, optimizations such as LSO
1104	   and LRO involve some processing of the options themselves since they
1105	   must be replicated/merged across multiple packets.  In these
1106	   situations, it is desirable to not require changes to the offload
1107	   logic to handle the introduction of new options.  To enable this,
1108	   some constraints are placed on the definitions of options to allow
1109	   for simple processing rules:

1111	   o  When performing LSO, a NIC MUST replicate the entire Geneve header
1112	      and all options, including those unknown to the device, onto each
1113	      resulting segment.  However, a given option definition may
1114	      override this rule and specify different behavior in supporting
1115	      devices.  Conversely, when performing LRO, a NIC MAY assume that a
1116	      binary comparison of the options (including unknown options) is
1117	      sufficient to ensure equality and MAY merge packets with equal
1118	      Geneve headers.

1120	   o  Options MUST NOT be reordered during the course of offload
1121	      processing, including when merging packets for the purpose of LRO.

1123	   o  NICs performing offloads MUST NOT drop packets with unknown
1124	      options, including those marked as critical, unless explicitly
1125	      configured.

1127	   There is no requirement that a given implementation of Geneve employ
1128	   the offloads listed as examples above.  However, as these offloads
1129	   are currently widely deployed in commercially available NICs, the
1130	   rules described here are intended to enable efficient handling of
1131	   current and future options across a variety of devices.

1133	4.7.  Inner VLAN Handling

1135	   Geneve is capable of encapsulating a wide range of protocols and
1136	   therefore a given implementation is likely to support only a small
1137	   subset of the possibilities.  However, as Ethernet is expected to be
1138	   widely deployed, it is useful to describe the behavior of VLANs
1139	   inside encapsulated Ethernet frames.

1141	   As with any protocol, support for inner VLAN headers is OPTIONAL.  In
1142	   many cases, the use of encapsulated VLANs may be disallowed due to
1143	   security or implementation considerations.  However, in other cases
1144	   trunking of VLAN frames across a Geneve tunnel can prove useful.  As
1145	   a result, the processing of inner VLAN tags upon ingress or egress
1146	   from a tunnel endpoint is based upon the configuration of the tunnel
1147	   endpoint and/or control plane and not explicitly defined as part of
1148	   the data format.

1150	5.  Interoperability Issues

1152	   Viewed exclusively from the data plane, Geneve does not introduce any
1153	   interoperability issues as it appears to most devices as UDP packets.
1154	   However, as there are already a number of tunnel protocols deployed
1155	   in network virtualization environments, there is a practical question
1156	   of transition and coexistence.

1158	   Since Geneve is a superset of the functionality of the most common
1159	   protocols used for network virtualization (VXLAN,NVGRE) it should be
1160	   straightforward to port an existing control plane to run on top of it
1161	   with minimal effort.  With both the old and new packet formats
1162	   supporting the same set of capabilities, there is no need for a hard
1163	   transition - tunnel endpoints directly communicating with each other
1164	   use any common protocol, which may be different even within a single
1165	   overall system.  As transit devices are primarily forwarding packets
1166	   on the basis of the IP header, all protocols appear similar and these
1167	   devices do not introduce additional interoperability concerns.

1169	   To assist with this transition, it is strongly suggested that
1170	   implementations support simultaneous operation of both Geneve and
1171	   existing tunnel protocols as it is expected to be common for a single
1172	   node to communicate with a mixture of other nodes.  Eventually, older
1173	   protocols may be phased out as they are no longer in use.

1175	6.  Security Considerations

1177	   As encapsulated within a UDP/IP packet, Geneve does not have any
1178	   inherent security mechanisms.  As a result, an attacker with access
1179	   to the underlay network transporting the IP packets has the ability
1180	   to snoop or inject packets.  Compromised tunnel endpoints may also
1181	   spoof identifiers in the tunnel header to gain access to networks
1182	   owned by other tenants.

1184	   Within a particular security domain, such as a data center operated
1185	   by a single service provider, the most common and highest performing
1186	   security mechanism is isolation of trusted components.  Tunnel
1187	   traffic can be carried over a separate VLAN and filtered at any
1188	   untrusted boundaries.  In addition, tunnel endpoints should only be
1189	   operated in environments controlled by the service provider, such as
1190	   the hypervisor itself rather than within a customer VM.

1192	   When crossing an untrusted link, such as the public Internet, IPsec
1193	   [RFC4301] may be used to provide authentication and/or encryption of
1194	   the IP packets formed as part of Geneve encapsulation.

1196	   Geneve does not otherwise affect the security of the encapsulated
1197	   packets.  As per the guidelines of BCP 72 [RFC3552], the following
1198	   sections describe potential security risks that may be applicable to
1199	   Geneve deployments and approaches to mitigate such risks.  It is also
1200	   noted that not all such risks are applicable to all Geneve deployment
1201	   scenarios, i.e., only a subset may be applicable to certain
1202	   deployments.  So an operator has to make an assessment based on their
1203	   network environment and determine the risks that are applicable to
1204	   their specific environment and use appropriate mitigation approaches
1205	   as applicable.

1207	6.1.  Data Confidentiality

1209	   Geneve is a network virtualization overlay encapsulation protocol
1210	   designed to establish tunnels between NVEs over an existing IP
1211	   network.  It can be used to deploy multi-tenant overlay networks over
1212	   an existing IP underlay network in a public or private data center.
1213	   The overlay service is typically provided by a service provider, for
1214	   example a cloud services provider or a private data center operator,
1215	   this may or not may be the same provider as an underlay service
1216	   provider.  Due to the nature of multi-tenancy in such environments, a
1217	   tenant system may expect data confidentiality to ensure its packet
1218	   data is not tampered with (active attack) in transit or a target of
1219	   unauthorized monitoring (passive attack).  A tenant may expect the
1220	   overlay service provider to provide data confidentiality as part of
1221	   the service or a tenant may bring its own data confidentiality
1222	   mechanisms like IPsec or TLS to protect the data end to end between
1223	   its tenant systems.

1225	   If an operator determines data confidentiality is necessary in their
1226	   environment based on their risk analysis, for example as in multi-
1227	   tenant environments, then an encryption mechanism SHOULD be used to
1228	   encrypt the tenant data end to end between the NVEs.  The NVEs may
1229	   use existing well established encryption mechanisms such as IPsec,
1230	   DTLS, etc.

1232	6.1.1.  Inter-Data Center Traffic

1234	   A tenant system in a customer premises (private data center) may want
1235	   to connect to tenant systems on their tenant overlay network in a
1236	   public cloud data center or a tenant may want to have its tenant
1237	   systems located in multiple geographically separated data centers for
1238	   high availability.  Geneve data traffic between tenant systems across
1239	   such separated networks should be protected from threats when
1240	   traversing public networks.  Any Geneve overlay data leaving the data
1241	   center network beyond the operator's security domain SHOULD be
1242	   secured by encryption mechanisms such as IPsec or other VPN
1243	   mechanisms to protect the communications between the NVEs when they
1244	   are geographically separated over untrusted network links.
1245	   Specification of data protection mechanisms employed between data
1246	   centers is beyond the scope of this document.

1248	6.2.  Data Integrity

1250	   Geneve encapsulation is used between NVEs to establish overlay
1251	   tunnels over an existing IP underlay network.  In a multi-tenant data
1252	   center, a rogue or compromised tenant system may try to launch a
1253	   passive attack such as monitoring the traffic of other tenants, or an
1254	   active attack such as trying to inject unauthorized Geneve
1255	   encapsulated traffic such as spoofing, replay, etc., into the
1256	   network.  To prevent such attacks, an NVE MUST NOT propagate Geneve
1257	   packets beyond the NVE to tenant systems and SHOULD employ packet
1258	   filtering mechanisms so as not to forward unauthorized traffic
1259	   between TSs in different tenant networks.

1261	   A compromised network node or a transit device within a data center
1262	   may launch an active attack trying to tamper with the Geneve packet
1263	   data between NVEs.  Malicious tampering of Geneve header fields may
1264	   cause the packet from one tenant to be forwarded to a different
1265	   tenant network.  If an operator determines the possibility of such
1266	   threat in their environment, the operator may choose to employ data
1267	   integrity mechanisms between NVEs.  In order to prevent such risks, a
1268	   data integrity mechanism SHOULD be used in such environments to
1269	   protect the integrity of Geneve packets including packet headers,
1270	   options and payload on communications between NVE pairs.  A
1271	   cryptographic data protection mechanism such as IPsec may be used to
1272	   provide data integrity protection.  A data center operator may choose
1273	   to deploy any other data integrity mechanisms as applicable and
1274	   supported in their underlay networks.

1276	6.3.  Authentication of NVE peers

1278	   A rogue network device or a compromised NVE in a data center
1279	   environment might be able to spoof Geneve packets as if it came from
1280	   a legitimate NVE.  In order to mitigate such a risk, an operator
1281	   SHOULD use an authentication mechanism, such as IPsec to ensure that
1282	   the Geneve packet originated from the intended NVE peer, in
1283	   environments where the operator determines spoofing or rogue devices
1284	   is a potential threat.  Other simpler source checks such as ingress
1285	   filtering for VLAN/MAC/IP address, reverse path forwarding checks,
1286	   etc., may be used in certain trusted environments to ensure Geneve
1287	   packets originated from the intended NVE peer.

1289	6.4.  Options Interpretation by Transit Devices

1291	   Options, if present in the packet, are generated and terminated by
1292	   tunnel endpoints.  As indicated in Section 2.2.1, transit devices may
1293	   interpret the options.  However, if the packet is protected by tunnel
1294	   endpoint to tunnel endpoint encryption, for example through IPsec,
1295	   transit devices will not have visibility into the Geneve header or
1296	   options in the packet.  In such cases transit devices MUST handle
1297	   Geneve packets as any other IP packet and maintain consistent
1298	   forwarding behavior.  In cases where options are interpreted by
1299	   transit devices, the operator MUST ensure that transit devices are
1300	   trusted and not compromised.  Implementation of a mechanism to ensure
1301	   this trust is beyond the scope of this document.

1303	6.5.  Multicast/Broadcast

1305	   In typical data center networks where IP multicasting is not
1306	   supported in the underlay network, multicasting may be supported
1307	   using multiple unicast tunnels.  The same security requirements as
1308	   described in the above sections can be used to protect Geneve
1309	   communications between NVE peers.  If IP multicasting is supported in
1310	   the underlay network and the operator chooses to use it for multicast
1311	   traffic among tunnel endpoints, then the operator in such
1312	   environments may use data protection mechanisms such as IPsec with
1313	   Multicast extensions [RFC5374] to protect multicast traffic among
1314	   Geneve NVE groups.

1316	6.6.  Control Plane Communications

1318	   A Network Virtualization Authority (NVA) as outlined in [RFC8014] may
1319	   be used as a control plane for configuring and managing the Geneve
1320	   NVEs.  The data center operator is expected to use security
1321	   mechanisms to protect the communications between the NVA to NVEs and
1322	   use authentication mechanisms to detect any rogue or compromised NVEs
1323	   within their administrative domain.  Data protection mechanisms for
1324	   control plane communication or authentication mechanisms between the
1325	   NVA and the NVEs is beyond the scope of this document.

1327	7.  IANA Considerations

1329	   IANA has allocated UDP port 6081 as the well-known destination port
1330	   for Geneve.  Upon publication, the registry should be updated to cite
1331	   this document.  The original request was:

1333	   Service Name: geneve
1334	   Transport Protocol(s): UDP
1335	   Assignee: Jesse Gross <jesse@kernel.org>
1336	   Contact: Jesse Gross <jesse@kernel.org>
1337	   Description: Generic Network Virtualization Encapsulation (Geneve)
1338	   Reference: This document
1339	   Port Number: 6081

1341	   In addition, IANA is requested to create a "Geneve Option Class"
1342	   registry to allocate Option Classes.  This shall be a registry of
1343	   16-bit hexadecimal values along with descriptive strings.  The
1344	   identifiers 0x0-0xFF are to be reserved for standardized options for
1345	   allocation by IETF Review [RFC8126] and 0xFFF0-0xFFFF for
1346	   Experimental Use. Otherwise, identifiers are to be assigned to any
1347	   organization with an interest in creating Geneve options on a First
1348	   Come First Served basis.  The registry is to be populated with the
1349	   following initial values:

1351	         +----------------+--------------------------------------+
1352	         | Option Class   | Description                          |
1353	         +----------------+--------------------------------------+
1354	         | 0x0000..0x00FF | Unassigned - IETF Review             |
1355	         | 0x0100         | Linux                                |
1356	         | 0x0101         | Open vSwitch (OVS)                   |
1357	         | 0x0102         | Open Virtual Networking (OVN)        |
1358	         | 0x0103         | In-band Network Telemetry (INT)      |
1359	         | 0x0104         | VMware, Inc.                         |
1360	         | 0x0105         | Amazon.com, Inc.                     |
1361	         | 0x0106         | Cisco Systems, Inc.                  |
1362	         | 0x0107         | Oracle Corporation                   |
1363	         | 0x0108..0x110  | Amazon.com, Inc.                     |
1364	         | 0x0111..0xFFEF | Unassigned - First Come First Served |
1365	         | 0xFFF0..FFFF   | Experimental                         |
1366	         +----------------+--------------------------------------+

1368	8.  Contributors

1370	   The following individuals were authors of an earlier version of this
1371	   document and made significant contributions:

1373	   Pankaj Garg
1374	   Microsoft Corporation
1375	   1 Microsoft Way
1376	   Redmond, WA  98052
1377	   USA

1379	   Email: pankajg@microsoft.com

1381	   Chris Wright
1382	   Red Hat Inc.
1383	   1801 Varsity Drive
1384	   Raleigh, NC  27606
1385	   USA

1387	   Email: chrisw@redhat.com

1389	   Kenneth Duda
1390	   Arista Networks
1391	   5453 Great America Parkway
1392	   Santa Clara, CA  95054
1393	   USA

1395	   Email: kduda@arista.com

1397	   Dinesh G. Dutt
1398	   Independent

1400	   Email: didutt@gmail.com

1402	   Jon Hudson
1403	   Independent

1405	   Email: jon.hudson@gmail.com

1407	   Ariel Hendel
1408	   Facebook, Inc.
1409	   1 Hacker Way
1410	   Menlo Park, CA  94025
1411	   USA

1413	   Email: ahendel@fb.com

1415	9.  Acknowledgements

1417	   The authors wish to thank Martin Casado, Bruce Davie and Dave Thaler
1418	   for their input, feedback, and helpful suggestions.

1420	   The authors would like to thank Magnus Nystrom for his reviews and
1421	   feedback.

1423	   Thanks to Daniel Migault, Anoop Ghanwani, Greg Mirksy, Puneet
1424	   Agarwal, and Tal Mizrahi for their reviews, comments and feedback.

1426	   The authors would like to thank David Black for his detailed reviews
1427	   and valuable inputs.

1429	   Thanks to Sami Boutros for his inputs and helpful feedback.

1431	   The authors would like to thank Matthew Bocci, Sam Aldrin, Benson
1432	   Schliesser, Martin Vigoureux, and Alia Atlas for their guidance
1433	   throughout the process.

1435	10.  References

1437	10.1.  Normative References

1439	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
1440	              DOI 10.17487/RFC0768, August 1980,
1441	              <https://www.rfc-editor.org/info/rfc768>.

1443	   [RFC0792]  Postel, J., "Internet Control Message Protocol", STD 5,
1444	              RFC 792, DOI 10.17487/RFC0792, September 1981,
1445	              <https://www.rfc-editor.org/info/rfc792>.

1447	   [RFC1112]  Deering, S., "Host extensions for IP multicasting", STD 5,
1448	              RFC 1112, DOI 10.17487/RFC1112, August 1989,
1449	              <https://www.rfc-editor.org/info/rfc1112>.

1451	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1452	              Requirement Levels", BCP 14, RFC 2119,
1453	              DOI 10.17487/RFC2119, March 1997,
1454	              <https://www.rfc-editor.org/info/rfc2119>.

1456	   [RFC4443]  Conta, A., Deering, S., and M. Gupta, Ed., "Internet
1457	              Control Message Protocol (ICMPv6) for the Internet
1458	              Protocol Version 6 (IPv6) Specification", STD 89,
1459	              RFC 4443, DOI 10.17487/RFC4443, March 2006,
1460	              <https://www.rfc-editor.org/info/rfc4443>.

1462	   [RFC6935]  Eubanks, M., Chimento, P., and M. Westerlund, "IPv6 and
1463	              UDP Checksums for Tunneled Packets", RFC 6935,
1464	              DOI 10.17487/RFC6935, April 2013,
1465	              <https://www.rfc-editor.org/info/rfc6935>.

1467	   [RFC6936]  Fairhurst, G. and M. Westerlund, "Applicability Statement
1468	              for the Use of IPv6 UDP Datagrams with Zero Checksums",
1469	              RFC 6936, DOI 10.17487/RFC6936, April 2013,
1470	              <https://www.rfc-editor.org/info/rfc6936>.

1472	   [RFC8085]  Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
1473	              Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
1474	              March 2017, <https://www.rfc-editor.org/info/rfc8085>.

1476	   [RFC8126]  Cotton, M., Leiba, B., and T. Narten, "Guidelines for
1477	              Writing an IANA Considerations Section in RFCs", BCP 26,
1478	              RFC 8126, DOI 10.17487/RFC8126, June 2017,
1479	              <https://www.rfc-editor.org/info/rfc8126>.

1481	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1482	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1483	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1485	10.2.  Informative References

1487	   [ETYPES]   The IEEE Registration Authority, "IEEE 802 Numbers", 2013,
1488	              <http://www.iana.org/assignments/ieee-802-numbers/ieee-
1489	              802-numbers.xml>.

1491	   [I-D.ietf-intarea-tunnels]
1492	              Touch, J. and M. Townsley, "IP Tunnels in the Internet
1493	              Architecture", draft-ietf-intarea-tunnels-09 (work in
1494	              progress), July 2018.

1496	   [I-D.ietf-nvo3-dataplane-requirements]
1497	              Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L.,
1498	              and B. Khasnabish, "NVO3 Data Plane Requirements", draft-
1499	              ietf-nvo3-dataplane-requirements-03 (work in progress),
1500	              April 2014.

1502	   [I-D.ietf-nvo3-encap]
1503	              Boutros, S., "NVO3 Encapsulation Considerations", draft-
1504	              ietf-nvo3-encap-02 (work in progress), September 2018.

1506	   [IEEE.802.1Q_2014]
1507	              IEEE, "IEEE Standard for Local and metropolitan area
1508	              networks--Bridges and Bridged Networks", IEEE 802.1Q-2014,
1509	              DOI 10.1109/ieeestd.2014.6991462, December 2014,
1510	              <http://ieeexplore.ieee.org/servlet/
1511	              opac?punumber=6991460>.

1513	   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
1514	              DOI 10.17487/RFC1191, November 1990,
1515	              <https://www.rfc-editor.org/info/rfc1191>.

1517	   [RFC2003]  Perkins, C., "IP Encapsulation within IP", RFC 2003,
1518	              DOI 10.17487/RFC2003, October 1996,
1519	              <https://www.rfc-editor.org/info/rfc2003>.

1521	   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
1522	              (IPv6) Specification", RFC 2460, DOI 10.17487/RFC2460,
1523	              December 1998, <https://www.rfc-editor.org/info/rfc2460>.

1525	   [RFC2983]  Black, D., "Differentiated Services and Tunnels",
1526	              RFC 2983, DOI 10.17487/RFC2983, October 2000,
1527	              <https://www.rfc-editor.org/info/rfc2983>.

1529	   [RFC3031]  Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol
1530	              Label Switching Architecture", RFC 3031,
1531	              DOI 10.17487/RFC3031, January 2001,
1532	              <https://www.rfc-editor.org/info/rfc3031>.

1534	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
1535	              Text on Security Considerations", BCP 72, RFC 3552,
1536	              DOI 10.17487/RFC3552, July 2003,
1537	              <https://www.rfc-editor.org/info/rfc3552>.

1539	   [RFC3985]  Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation
1540	              Edge-to-Edge (PWE3) Architecture", RFC 3985,
1541	              DOI 10.17487/RFC3985, March 2005,
1542	              <https://www.rfc-editor.org/info/rfc3985>.

1544	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
1545	              Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
1546	              December 2005, <https://www.rfc-editor.org/info/rfc4301>.

1548	   [RFC5374]  Weis, B., Gross, G., and D. Ignjatic, "Multicast
1549	              Extensions to the Security Architecture for the Internet
1550	              Protocol", RFC 5374, DOI 10.17487/RFC5374, November 2008,
1551	              <https://www.rfc-editor.org/info/rfc5374>.

1553	   [RFC6040]  Briscoe, B., "Tunnelling of Explicit Congestion
1554	              Notification", RFC 6040, DOI 10.17487/RFC6040, November
1555	              2010, <https://www.rfc-editor.org/info/rfc6040>.

1557	   [RFC7348]  Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1558	              L., Sridhar, T., Bursell, M., and C. Wright, "Virtual
1559	              eXtensible Local Area Network (VXLAN): A Framework for
1560	              Overlaying Virtualized Layer 2 Networks over Layer 3
1561	              Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014,
1562	              <https://www.rfc-editor.org/info/rfc7348>.

1564	   [RFC7365]  Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
1565	              Rekhter, "Framework for Data Center (DC) Network
1566	              Virtualization", RFC 7365, DOI 10.17487/RFC7365, October
1567	              2014, <https://www.rfc-editor.org/info/rfc7365>.

1569	   [RFC7637]  Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network
1570	              Virtualization Using Generic Routing Encapsulation",
1571	              RFC 7637, DOI 10.17487/RFC7637, September 2015,
1572	              <https://www.rfc-editor.org/info/rfc7637>.

1574	   [RFC8014]  Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
1575	              Narten, "An Architecture for Data-Center Network
1576	              Virtualization over Layer 3 (NVO3)", RFC 8014,
1577	              DOI 10.17487/RFC8014, December 2016,
1578	              <https://www.rfc-editor.org/info/rfc8014>.

1580	   [RFC8086]  Yong, L., Ed., Crabbe, E., Xu, X., and T. Herbert, "GRE-
1581	              in-UDP Encapsulation", RFC 8086, DOI 10.17487/RFC8086,
1582	              March 2017, <https://www.rfc-editor.org/info/rfc8086>.

1584	   [RFC8201]  McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
1585	              "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
1586	              DOI 10.17487/RFC8201, July 2017,
1587	              <https://www.rfc-editor.org/info/rfc8201>.

1589	   [RFC8293]  Ghanwani, A., Dunbar, L., McBride, M., Bannai, V., and R.
1590	              Krishnan, "A Framework for Multicast in Network
1591	              Virtualization over Layer 3", RFC 8293,
1592	              DOI 10.17487/RFC8293, January 2018,
1593	              <https://www.rfc-editor.org/info/rfc8293>.

1595	   [VL2]      "VL2: A Scalable and Flexible Data Center Network", ACM
1596	              SIGCOMM Computer Communication Review,
1597	              DOI 10.1145/1594977.1592576, 2009,
1598	              <http://www.sigcomm.org/sites/default/files/ccr/
1599	              papers/2009/October/1594977-1592576.pdf>.

1601	Authors' Addresses

1603	   Jesse Gross (editor)

1605	   Email: jesse@kernel.org

1607	   Ilango Ganga (editor)
1608	   Intel Corporation
1609	   2200 Mission College Blvd.
1610	   Santa Clara, CA  95054
1611	   USA

1613	   Email: ilango.s.ganga@intel.com

1615	   T. Sridhar (editor)
1616	   VMware, Inc.
1617	   3401 Hillview Ave.
1618	   Palo Alto, CA  94304
1619	   USA

1621	   Email: tsridhar@vmware.com