idnits 2.17.1 

draft-narten-nvo3-overlay-problem-statement-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 31, 2011) is 4562 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'I-D.hasmit-otv' is defined on line 674, but no
     explicit reference was found in the text

  == Unused Reference: 'RFC2890' is defined on line 718, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-04) exists of draft-hasmit-otv-03

  == Outdated reference: A later version (-12) exists of
     draft-ietf-6man-udpzero-04

  == Outdated reference: A later version (-24) exists of draft-ietf-lisp-15

  == Outdated reference: A later version (-09) exists of
     draft-mahalingam-dutt-dcops-vxlan-00

  == Outdated reference: A later version (-08) exists of
     draft-sridharan-virtualization-nvgre-00


     Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           T. Narten, Ed.
3	Internet-Draft                                                       IBM
4	Intended status: Informational                              M. Sridharan
5	Expires: May 3, 2012                                           Microsoft
6	                                                                 D. Dutt
7	                                                                   Cisco
8	                                                                D. Black
9	                                                                     EMC
10	                                                              L. Kreeger
11	                                                                   Cisco
12	                                                        October 31, 2011

14	         Problem Statement: Overlays for Network Virtualization
15	             draft-narten-nvo3-overlay-problem-statement-01

17	Abstract

19	   This document describes issues associated with providing multi-
20	   tenancy in large data center networks and an overlay-based network
21	   virtualization approach to addressing them.  A key multi-tenancy
22	   requirement is traffic isolation, so that a tenant's traffic is not
23	   visible to any other tenant.  This isolation can be achieved by
24	   assigning one or more virtual networks to each tenant such that
25	   traffic within a virtual network is isolated from traffic in other
26	   virtual networks.  The primary functionality required is provisioning
27	   virtual networks, associating a virtual machine's NIC with the
28	   appropriate virtual network, and maintaining that association as the
29	   virtual machine is activated, migrated and/or deactivated.  Use of an
30	   overlay-based approach enables scalable deployment on large network
31	   infrastructures.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on May 3, 2012.

50	Copyright Notice

52	   Copyright (c) 2011 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Problem Details  . . . . . . . . . . . . . . . . . . . . . . .  5
69	     2.1.  Multi-tenant Environment Scale . . . . . . . . . . . . . .  5
70	     2.2.  Virtual Machine Mobility Requirements  . . . . . . . . . .  5
71	     2.3.  Span of Virtual Networks . . . . . . . . . . . . . . . . .  5
72	     2.4.  Inadequate Forwarding Table Sizes in Switches  . . . . . .  6
73	     2.5.  Decoupling Logical and Physical Configuration  . . . . . .  6
74	     2.6.  Support Communication Between VMs and Non-virtualized
75	           Devices  . . . . . . . . . . . . . . . . . . . . . . . . .  6
76	     2.7.  Overlay Design Characteristics . . . . . . . . . . . . . .  6
77	   3.  Defining Virtual Networks and Tenants  . . . . . . . . . . . .  7
78	     3.1.  Limitations of Existing Virtual Network Models . . . . . .  8
79	     3.2.  Virtual Network Instance . . . . . . . . . . . . . . . . .  8
80	     3.3.  Tenant . . . . . . . . . . . . . . . . . . . . . . . . . .  9
81	   4.  Network Overlays . . . . . . . . . . . . . . . . . . . . . . .  9
82	     4.1.  Benefits of an Overlay Approach  . . . . . . . . . . . . . 10
83	     4.2.  Standardization Issues for Overlay Networks  . . . . . . . 10
84	       4.2.1.  Overlay Header Format  . . . . . . . . . . . . . . . . 10
85	       4.2.2.  Fragmentation  . . . . . . . . . . . . . . . . . . . . 11
86	       4.2.3.  Checksums and FCS  . . . . . . . . . . . . . . . . . . 11
87	       4.2.4.  Middlebox Traversal  . . . . . . . . . . . . . . . . . 12
88	       4.2.5.  OAM  . . . . . . . . . . . . . . . . . . . . . . . . . 12
89	   5.  Control Plane  . . . . . . . . . . . . . . . . . . . . . . . . 12
90	     5.1.  Populating the Forwarding Table of a Virtual Network
91	           Instance . . . . . . . . . . . . . . . . . . . . . . . . . 12
92	     5.2.  Handling Multi-destination Frames  . . . . . . . . . . . . 13
93	     5.3.  Associating a VNID With An Endpoint  . . . . . . . . . . . 13
94	     5.4.  Disassociating a VNID on Termination or Move . . . . . . . 13
95	   6.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13
96	     6.1.  ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
97	     6.2.  TRILL  . . . . . . . . . . . . . . . . . . . . . . . . . . 14
98	     6.3.  L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
99	     6.4.  Proxy Mobile IP  . . . . . . . . . . . . . . . . . . . . . 14
100	     6.5.  LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
101	     6.6.  Individual Submissions . . . . . . . . . . . . . . . . . . 15
102	   7.  Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15
103	   8.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
104	   9.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 15
105	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
106	   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 15
107	   12. Informative References . . . . . . . . . . . . . . . . . . . . 16
108	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17

110	1.  Introduction

112	   Server virtualization is increasingly becoming the norm in data
113	   centers.  With server virtualization, each physical server supports
114	   multiple virtual machines (VMs), each running its own operating
115	   system, middleware and applications.  Virtualization is a key enabler
116	   of workload agility, i.e., allowing any server to host any
117	   application and providing the flexibility of adding, shrinking, or
118	   moving services within the physical infrastructure.  Server
119	   virtualization provides numerous benefits, including higher
120	   utilization, increased data security, reduced user downtime, reduced
121	   power usage, etc.

123	   Large scale multi-tenant data centers are taking advantage of the
124	   benefits of server virtualization to provide a new kind of hosting, a
125	   virtual hosted data center.  Multi-tenant data centers are ones in
126	   which each tenant could belong to a different company (in the case of
127	   a public provider) or a different department (in the case of a
128	   internal company data center).  Each tenant has the expectation of a
129	   level of security and privacy separating their resources from those
130	   of other tenants.  Each virtual data center looks similar to its
131	   physical counterpart, consisting of end stations connected by a
132	   network, complete with services such as load balancers and firewalls.
133	   The network within each virtual data center can be a pure routed
134	   network, a pure bridged network or a combination of bridged and
135	   routed network.  The key requirement is that each such virtual
136	   network is isolated from the others, whether the networks belong to
137	   the same tenant or different tenants.

139	   This document outlines the problems encountered in scaling the number
140	   of isolated networks in a data center, as well as the problems of
141	   managing the creation/deletion, membership and span of these networks
142	   and makes the case that an overlay based approach, where individual
143	   networks are implemented within individual virtual networks that are
144	   dynamically controlled by a standardized control plane provides a
145	   number of advantages over current approaches.  The purpose of this
146	   document is to identify the set of problems that any solution has to
147	   address in building multi-tenant data centers.  With this approach,
148	   the goal is to allow the construction of standardized, interoperable
149	   implementations to allow the construction of multi-tenant data
150	   centers.

152	   Section 2 describes the problem space details.  Section 3 defines
153	   virtual networks.  Section 4 provides a general discussion of
154	   overlays and standardization issues.  Section 5 discusses the control
155	   plane issues that require addressing for virtual networks.  Section 6
156	   and 7 discuss related work and further work.

158	2.  Problem Details

160	   The following subsections describe aspects of multi-tenant networking
161	   that pose problems for large scale network infrastructure.  Different
162	   problem aspects may arise based on the network architecture and
163	   scale.

165	2.1.  Multi-tenant Environment Scale

167	   Cloud computing involves on-demand elastic provisioning of resources
168	   for multi-tenant environments.  A common example of cloud computing
169	   is the public cloud, where a cloud service provider offers these
170	   elastic services to multiple customers over the same infrastructure.
171	   This elastic on-demand nature in conjunction with trusted hypervisors
172	   to control network access by VMs calls for resilient distributed
173	   network control mechanisms.

175	2.2.  Virtual Machine Mobility Requirements

177	   A key benefit of server virtualization is virtual machine (VM)
178	   mobility.  A VM can be migrated from one server to another, live i.e.
179	   as it continues to run and without shutting down the VM and
180	   restarting it at a new location.  A key requirement for live
181	   migration is that a VM retain its IP address(es) and MAC address(es)
182	   in its new location (to avoid tearing down existing communication).
183	   Today, servers are assigned IP addresses based on their physical
184	   location, typically based on the ToR (Top of Rack) switch for the
185	   server rack or the VLAN configured to the server.  This works well
186	   for physical servers, which cannot move, but it restricts the
187	   placement and movement of the more mobile VMs within the data center
188	   (DC).  Any solution for a scalable multi-tenant DC must allow a VM to
189	   be placed (or moved to) anywhere within the data center, without
190	   being constrained by the subnet boundary concerns of the host
191	   servers.

193	2.3.  Span of Virtual Networks

195	   Another use case is cross pod expansion.  A pod typically consists of
196	   one or more racks of servers with its associated network and storage
197	   connectivity.  Tenants may start off on a pod and, due to expansion,
198	   require servers/VMs on other pods, especially the case when tenants
199	   on the other pods are not fully utilizing all their resources.  This
200	   use case requires that virtual networks span multiple pods in order
201	   to provide connectivity to all of the tenant's servers/VMs.

203	2.4.  Inadequate Forwarding Table Sizes in Switches

205	   Today's virtualized environments place additional demands on the
206	   forwarding tables of switches.  Instead of just one link-layer
207	   address per server, the switching infrastructure has to learn
208	   addresses of the individual VMs (which could range in the 100s per
209	   server).  This is a requirement since traffic from/to the VMs to the
210	   rest of the physical network will traverse the physical network
211	   infrastructure.  This places a much larger demand on the switches'
212	   forwarding table capacity compared to non-virtualized environments,
213	   causing more traffic to be flooded or dropped when the addresses in
214	   use exceeds the forwarding table capacity.

216	2.5.  Decoupling Logical and Physical Configuration

218	   Data center operators must be able to achieve high utilization of
219	   server and network capacity.  For efficient and flexible allocation,
220	   operators should be able to spread a virtual network instance across
221	   servers in any rack in the data center.  It should also be possible
222	   to migrate compute workloads to any server anywhere in the network
223	   while retaining the workload's addresses.  This can be achieved today
224	   by stretching VLANs (e.g., by using TRILL or OTV).

226	   However, in order to limit the broadcast domain of each VLAN, multi-
227	   destination frames within a VLAN should optimally flow only to those
228	   devices that have that VLAN configured.  When workloads migrate, the
229	   physical network (e.g., access lists) may need to be reconfigured
230	   which is typically time consuming and error prone.

232	2.6.  Support Communication Between VMs and Non-virtualized Devices

234	   Within data centers, not all communication will be between VMs.
235	   Network operators will continue to use non-virtualized servers for
236	   various reasons, traditional routers to provide L2VPN and L3VPN
237	   services, traditional load balancers, firewalls, intrusion detection
238	   engines and so on.  Any virtual network solution should be capable of
239	   working with these existing systems.

241	2.7.  Overlay Design Characteristics

243	   There are existing layer 2 overlay protocols in existence, but they
244	   were not necessarily designed to solve the problem in the environment
245	   of a highly virtualized data center.  Below are some of the
246	   characteristics of environments that must be taken into account by
247	   the overlay technology:

249	   1.  Highly distributed systems.  The overlay should work in an
250	       environment where there could be many thousands of access
251	       switches (e.g. residing within the hypervisors) and many more end
252	       systems (e.g.  VMs) connected to them.  This leads to a
253	       distributed mapping system that puts a low overhead on the
254	       overlay tunnel endpoints.

256	   2.  Many highly distributed virtual networks with sparse
257	       connectivity.  Each virtual network could be highly dispersed
258	       inside the data center.  Also, along with expectation of many
259	       virtual networks, the number of end systems connected to any one
260	       virtual network is expected to be relatively low; Therefore, the
261	       percentage of access switches participating in any given virtual
262	       network would also be expected to be low.  For this reason,
263	       efficient pruning of multi-destination traffic should be taken
264	       into consideration.

266	   3.  Highly dynamic end systems.  End systems connected to virtual
267	       networks can be very dynamic, both in terms of creation/deletion/
268	       power-on/off and in terms of mobility across the access switches.

270	   4.  Work with existing, widely deployed network Ethernet switches and
271	       IP routers without requiring wholesale replacement.  The first
272	       hop switch that adds and removes the overlay header will require
273	       new equipment and/or new software.

275	   5.  Network infrastructure administered by a single administrative
276	       domain.  This is consistent with operation within a data center,
277	       and not across the Internet.

279	3.  Defining Virtual Networks and Tenants

281	   Virtual Networks are used to isolate a tenant's traffic from other
282	   tenants (or even traffic within the same tenant that requires
283	   isolation).  There are two main characteristics of virtual networks:

285	   1.  Providing network address space that is isolated from other
286	       virtual networks.  The same network addresses may be used in
287	       different virtual networks on the same underlying network
288	       infrastructure.

290	   2.  Limiting the scope of frames to not exit a virtual network except
291	       through controlled exit points or "gateways".

293	3.1.  Limitations of Existing Virtual Network Models

295	   Virtual networks are not new to networking.  VLANs are a well known
296	   construct in the networking industry.  VLAN is a bridging construct
297	   which provides the semantics of virtual networks mentioned above: a
298	   MAC address is unique within a VLAN, but not necessarily across VLANs
299	   and broadcast traffic is limited to the VLAN it originates from.  In
300	   the case of IP networks, routers have the concept of a Virtual
301	   Routing and Forwarding (VRF).  The same router can run multiple
302	   instances of routing protocols, each with their own forwarding table.
303	   Each instance is referred to as a VRF, which is a mechanism that
304	   provides address isolation.  Since broadcasts are never forwarded
305	   across IP subnets, limiting broadcasts are not applicable to VRFs.
306	   In the case of both VLAN and VRF, the forwarding table is looked up
307	   using the tuple {VLAN, MAC address} or {VRF, IP address}.

309	   But there are two problems with these constructs.  VLANs are a pure
310	   bridging construct while VRF is a pure routing construct.  VLANs are
311	   carried along with a frame to allow each forwarding point to know
312	   what VLAN the frame belongs to.  VLAN today is defined as a 12 bit
313	   number, limiting the total number of VLANs to 4096 (though typically,
314	   this number is 4094 since 0 and 4095 are reserved).  Due to the large
315	   number of tenants that a cloud provider might service, the 4094 VLAN
316	   limit is often inadequate.  In addition, there is often a need for
317	   multiple VLANs per tenant, which exacerbates the issue.

319	   There is no VRF indicator carried in frames.  The VRF is derived at
320	   each hop using a combination of incoming interface and some
321	   information in the frame.  Furthermore, the VRF model has typically
322	   assumed that a separate control plane governs the population of the
323	   forwarding table within that VRF.  Thus, a traditional VRF model
324	   assumes multiple, independent control planes and has no specific tag
325	   within a frame to identify the VRF of the frame.

327	3.2.  Virtual Network Instance

329	   To overcome the limitations of a traditional VLAN or VRF model, we
330	   define a new mechanism for virtual networks called a virtual network
331	   instance.  Each virtual network is assigned a virtual network
332	   instance ID, shortened to VNID for convenience.  A virtual network
333	   instance provides the semantics of a virtual network: address
334	   disambiguation and multi-destination frame scoping.  A virtual
335	   network can be either routed or bridged.  So, a VNID can be used for
336	   both bridged networks and routed networks and so is unlike a VLAN or
337	   a VRF.  To build large multi-tenant data centers, a larger number
338	   space than the 12b VLAN is required. 24 bits is the most common value
339	   identified by multiple solutions that attempt to address this problem
340	   space (or similar problem spaces).  To simplify the building and
341	   administration of these large data centers, we require that the VNID
342	   be carried with each frame (similar to a VLAN, but unlike a VRF).
343	   Finally, because of the nature of a virtual data center and to allow
344	   scaling virtual networks to massive scales, we don't require a
345	   separate control plane to run for each virtual network.  We'll
346	   identify other possible mechanisms to populate the forwarding tables
347	   for virtual networks in section 5.1.

349	3.3.  Tenant

351	   Tenant is the administrative entity that that is responsible for and
352	   manages a specific virtual network and its associated services
353	   (whether virtual or physical).  In a cloud environment, a tenant
354	   would correspond to the customer that has defined and is using a
355	   particular virtual network.  However, there is a one-to-many mapping
356	   between tenants and virtual network instances.  A single tenant may
357	   operate multiple individual virtual networks, each associated with a
358	   different service.

360	4.  Network Overlays

362	   To address the problems of decoupling physical and logical
363	   configuration and allowing VM mobility without exploding the
364	   forwarding table sizes in the switches and routers, a network overlay
365	   model can be used.

367	   The idea behind an overlay is quite straightforward.  The original
368	   frame is encapsulated by the first hop network device.  The
369	   encapsulation identifies the destination as the device that will
370	   perform the decapsulation before delivering the frame to the
371	   endpoint.  The rest of the network forwards the frame based on the
372	   encapsulation header and can be oblivious to the payload that is
373	   carried inside.  To avoid belaboring the point each time, the first
374	   hop network device can be a traditional switch or router or the
375	   virtual switch residing inside a hypervisor.  Furthermore, the
376	   endpoint can be a VM or it can be a physical server.  Some examples
377	   of network overlays are tunnels such as IP GRE [RFC2784],
378	   LISP[I-D.ietf-lisp] or TRILL [RFC6325].

380	   With an overlay, the VNID can be carried within the overlay header so
381	   that every frame has its VNID explicitly identified in the frame.
382	   Since both routed and bridged semantics can be supported by a virtual
383	   data center, the original frame carried within the overlay header can
384	   be an Ethernet frame complete with MAC addresses or just the IP
385	   packet.

387	4.1.  Benefits of an Overlay Approach

389	   The use of a large (e.g., 24-bit) VNID would allow 16 million
390	   distinct virtual networks within a single data center, eliminating
391	   current VLAN size limitations.  This VNID needs to be carried in the
392	   data plane along with the packet.  Adding an overlay header provides
393	   a place to carry this VNID.

395	   A key aspect of overlays is the decoupling of the "virtual" MAC and
396	   IP addresses used by VMs from the physical network infrastructure and
397	   the infrastructure IP addresses used by the data center.  If a VM
398	   changes location, the switches at the edge of the overlay simply
399	   update their mapping tables to reflect the new location of the VM
400	   within the data center's infrastructure space.  Because an overlay
401	   network is used, a VM can now be located anywhere in the data center
402	   that the overlay reaches without regards to traditional constraints
403	   implied by L2 properties such as VLAN numbering, or the span of an L2
404	   broadcast domain scoped to a single pod or access switch.

406	   Multi-tenancy is supported by isolating the traffic of one virtual
407	   network instance from traffic of another.  Traffic from one virtual
408	   network instance cannot be delivered to another instance without
409	   (conceptually) exiting the instance and entering the other instance
410	   via an entity that has connectivity to both virtual network
411	   instances.  Without the existence of this entity, tenant traffic
412	   remains isolated within each individual virtual network instance.
413	   External communications (from a VM within a virtual network instance
414	   to a machine outside of any virtual network instance, e.g. on the
415	   Internet) is handled by having an ingress switch forward traffic to
416	   an external router, where an egress switch decapsulates a tunneled
417	   packet and delivers it to the router for normal processing.  This
418	   router is external to the overlay, and behaves much like existing
419	   external facing routers in data centers today.

421	   Overlays are designed to allow a set of VMs to be placed within a
422	   single virtual network instance, whether that virtual network
423	   provides the bridged network or a routed network.

425	4.2.  Standardization Issues for Overlay Networks

427	4.2.1.  Overlay Header Format

429	   Different overlay header formats are possible as are different
430	   possible encodings of the VNID.  Existing overlay headers maybe
431	   extended or new ones defined.  This document does not address the
432	   exact header format or VNID encoding except to state that any
433	   solution MUST:

435	   1.  Carry the VNID in each frame

437	   2.  Allow the payload to be either a complete Ethernet frame or only
438	       an IP packet

440	4.2.2.  Fragmentation

442	   Whenever tunneling is used, one faces the potential problem that the
443	   packet plus the encapsulation overhead will exceed the MTU of the
444	   path to the egress router.  If the outer encapsulation is IP,
445	   fragmentation could be left to the IP layer, or it could be done at
446	   the overlay level in a more optimized fashion that is independent of
447	   the overlay encapsulation header, or it could be left out altogether,
448	   if it is believed that data center networks can be engineered to
449	   prevent MTU issues from arising.

451	   Related to fragmentation is the question of how best to handle Path
452	   MTU issues, should they occur.  Ideally, the original source of any
453	   packet (i.e, the sending VM) would be notified of the optimal MTU to
454	   use.  Path MTU problems occurring within an overlay network would
455	   result in ICMP MTU exceeded messages being sent back to the egress
456	   tunnel switch at the entry point of the overlay.  If the switch is
457	   embedded within a hypervisor, the hypervisor could notify the VM of a
458	   more appropriate MTU to use.  It may be appropriate to specify a set
459	   of best practices for implementers related to the handling of Path
460	   MTU issues.

462	4.2.3.  Checksums and FCS

464	   When tunneling packets, both the inner and outer headers could have
465	   their own checksum, duplicating effort and impacting performance.
466	   Therefore, we strongly recommend that any solution carry only one set
467	   of checksum or frame FCS.

469	   When the inner packet is TCP or UDP, they already include their own
470	   checksum, and adding a second outer checksum (using the same 1's
471	   complement algorithm) provides little value.  Similarly, if the inner
472	   packet is an Ethernet frame, the frame FCS protects the original
473	   frame and a new frame FCS over both the original frame and the
474	   overlay header protects the new encapsulated frame.

476	   In IPv4, UDP checksums can be disabled on a per-packet basis simply
477	   by setting the checksum field to zero.  IPv6, however, specifies that
478	   UDP checksums must always be included.  But even for IPv6, the LISP
479	   protocol[I-D.ietf-lisp] already allows a zero checksum field.  The
480	   6man working group is also currently considering relaxing the IPv6
481	   UDP checksum requirement [I-D.ietf-6man-udpzero].

483	   For Ethernet frames, L2 overlays such as TRILL already mandate only a
484	   single frame FCS.

486	4.2.4.  Middlebox Traversal

488	   One issue to consider is to whether the overlay will need to run over
489	   networks that include middleboxes such as NAT.  Middleboxes may have
490	   difficulty properly supporting multicast or other aspects of an
491	   overlay header.  Inside a data center, it may well be the case that
492	   middlebox traversal is a non-issue.  But if overlays are extended
493	   across the broader Internet, the presence of middleboxes may be of
494	   concern.

496	4.2.5.  OAM

498	   Successful deployment of an overlay approach will likely require
499	   appropriate Operations, Administration and Maintenance (OAM)
500	   facilities.

502	5.  Control Plane

504	   The control plane needs to address the following pieces, at least:

506	   1.  A mechanism to populate the forwarding table of a virtual network
507	       instance.

509	   2.  A mechanism to handle multi-destination frames within a virtual
510	       network instance.

512	   3.  A mechanism to allow an endpoint to inform the access switch
513	       which virtual network instance it wishes to join on a virtual
514	       network interface.

516	   4.  A mechanism to allow an endpoint to inform the access switch
517	       about its leaving the network so that the access switch can clean
518	       up state.

520	5.1.  Populating the Forwarding Table of a Virtual Network Instance

522	   When an access switch has to forward a frame from one endpoint to
523	   another, across the network, it has to consult some form of a
524	   forwarding table.  When we use network overlays, the problem boils
525	   down to deriving the mapping between the inner and outer addresses
526	   i.e. deriving the destination address in the overlay header based on
527	   the destination address sent by the endpoint.  Two well known
528	   mechanisms for populating the forwarding table (or deriving the
529	   mapping table) of a switch are (i) via a routing control protocol and
530	   (ii) learning from the data plane as Ethernet bridges do.  Another
531	   mechanism is through a centralized mapping database.  Any solution
532	   must avoid problems associated with scaling a virtual network
533	   instance across a large data center.

535	5.2.  Handling Multi-destination Frames

537	   Another aspect of address mapping concerns the handling of multi-
538	   destination frames, i.e. broadcast and multicast frames, or the
539	   delivery of unicast packets when no mapping exists.  Associating a
540	   infrastructure multicast address is one possible way of connecting
541	   together all the machines belonging to the same VNID.  However,
542	   existing multicast implementations do not scale to efficiently handle
543	   hundreds of thousands of multicast groups, as would be required if
544	   one multicast group were assigned to each VNID.

546	5.3.  Associating a VNID With An Endpoint

548	   When an endpoint, such as VM or physical server, connects to the
549	   infrastructure, we must define a mechanism to allow the endpoint to
550	   identify to the access switch the network instance that it wishes to
551	   join.  Typically, it is a virtual NIC (the one connected to the VM)
552	   coming up that triggers this association.  The access switch can then
553	   determine the VNID to be associated with this virtual NIC.  A
554	   standard protocol that all types of overlay encapsulation points can
555	   use to identify the VNID associated with an endpoint will be
556	   beneficial for supporting multi-vendor implementations.  This
557	   protocol could also be used to distribute any per virtual network
558	   information (e.g. a multicast group address).  This signaling can
559	   provide the stimulus to trigger the overlay termination points to
560	   perform any actions needed within the infrastructure network (e.g.
561	   use IGMP to join a multicast group).

563	5.4.  Disassociating a VNID on Termination or Move

565	   To enable cleaning up state in the access switch, we must define a
566	   mechanism to allow an endpoint to signal its disconnection from the
567	   network.

569	6.  Related Work

571	6.1.  ARMD

573	   ARMD is chartered to look at data center scaling issues with a focus
574	   on address resolution.  ARMD is currently chartered to develop a
575	   problem statement and is not currently developing solutions.  While
576	   an overlay-based approach may address some of the "pain points" that
577	   have been raised in ARMD (e.g., better support for multi-tenancy), an
578	   overlay approach may also push some of the L2 scaling concerns (e.g.,
579	   excessive flooding) to the IP level (flooding via IP multicast).
580	   Analysis will be needed to understand the scaling trade offs of an
581	   overlay based approach compared with existing approaches.  On the
582	   other hand, existing IP-based approaches such as proxy ARP may help
583	   mitigate some concerns.

585	6.2.  TRILL

587	   TRILL is an L2 based approach aimed at improving deficiencies and
588	   limitations with current Ethernet networks.  Approaches to extend
589	   TRILL to support more than 4094 VLANs are currently under
590	   investigation [I-D.eastlake-trill-rbridge-fine-labeling]

592	6.3.  L2VPNs

594	   The IETF has specified a number of approaches for connecting L2
595	   domains together as part of the L2VPN Working Group.  That group,
596	   however has historically been focused on Provider-provisioned L2
597	   VPNs, where the service provider participates in management and
598	   provisioning of the VPN.  In addition, much of the target environment
599	   for such deployments involves carrying L2 traffic over WANs.  Overlay
600	   approaches are intended be used within data centers where the overlay
601	   network is managed by the data center operator, rather than by an
602	   outside party.  While overlays can run across the Internet as well,
603	   they will extend well into the data center itself (e.g., up to and
604	   including hypervisors) and include large numbers of machines within
605	   the data center itself.

607	   Other L2VPN approaches, such as L2TP [RFC2661] require significant
608	   tunnel state at the encapsulating and decapsulating end points.
609	   Overlays require less tunnel state than other approaches, which is
610	   important to allow overlays to scale to hundreds of thousands of end
611	   points.  It is assumed that smaller switches (i.e., virtual switches
612	   in hypervisors or the physical switches to which VMs connect) will be
613	   part of the overlay network and be responsible for encapsulating and
614	   decapsulating packets.

616	6.4.  Proxy Mobile IP

618	   Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
619	   [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.

621	6.5.  LISP

623	   LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
624	   the internal addresses are end station Identifiers and the outer IP
625	   addresses represent the location of the end station within the core
626	   IP network topology.  The LISP overlay header uses a 24 bit Instance
627	   ID used to support overlapping inner IP addresses.

629	6.6.  Individual Submissions

631	   Many individual submissions also look to addressing some or all of
632	   the issues addressed in this draft.  Examples of such drafts are
633	   VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
634	   [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in
635	   L3 networks[I-D.wkumari-dcops-l3-vmmobility].

637	7.  Further Work

639	   It is believed that overlay-based approaches may be able to reduce
640	   the overall amount of flooding and other multicast and broadcast
641	   related traffic (e.g, ARP and ND) currently experienced within
642	   current data centers with a large flat L2 network.  Further analysis
643	   is needed to characterize expected improvements.

645	8.  Summary

647	   This document has argued that network virtualization using L3
648	   overlays addresses a number of issues being faced as data centers
649	   scale in size.  In addition, careful consideration of a number of
650	   issues would lead to the development of interoperable implementation
651	   of virtualization overlays.

653	9.  Acknowledgments

655	   Helpful comments and improvements to this document have come from
656	   Ariel Hendel, Vinit Jain, and Benson Schliesser.

658	10.  IANA Considerations

660	   This memo includes no request to IANA.

662	11.  Security Considerations

664	   TBD

666	12.  Informative References

668	   [I-D.eastlake-trill-rbridge-fine-labeling]
669	              Eastlake, D., Zhang, M., Agarwal, P., Dutt, D., and R.
670	              Perlman, "RBridges: Fine-Grained Labeling",
671	              draft-eastlake-trill-rbridge-fine-labeling-02 (work in
672	              progress), October 2011.

674	   [I-D.hasmit-otv]
675	              Grover, H., Rao, D., Farinacci, D., and V. Moreno,
676	              "Overlay Transport Virtualization", draft-hasmit-otv-03
677	              (work in progress), July 2011.

679	   [I-D.ietf-6man-udpzero]
680	              Fairhurst, G. and M. Westerlund, "IPv6 UDP Checksum
681	              Considerations", draft-ietf-6man-udpzero-04 (work in
682	              progress), October 2011.

684	   [I-D.ietf-lisp]
685	              Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
686	              "Locator/ID Separation Protocol (LISP)",
687	              draft-ietf-lisp-15 (work in progress), July 2011.

689	   [I-D.mahalingam-dutt-dcops-vxlan]
690	              Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
691	              L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
692	              Framework for Overlaying Virtualized Layer 2 Networks over
693	              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-00
694	              (work in progress), August 2011.

696	   [I-D.sridharan-virtualization-nvgre]
697	              Sridharan, M., Duda, K., Ganga, I., Greenberg, A., Lin,
698	              G., Pearson, M., Thaler, P., Tumuluri, C., Venkataramaiah,
699	              N., and Y. Wang, "NVGRE: Network Virtualization using
700	              Generic Routing Encapsulation",
701	              draft-sridharan-virtualization-nvgre-00 (work in
702	              progress), September 2011.

704	   [I-D.wkumari-dcops-l3-vmmobility]
705	              Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
706	              Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
707	              progress), August 2011.

709	   [RFC2661]  Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
710	              G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
711	              RFC 2661, August 1999.

713	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.

715	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
716	              March 2000.

718	   [RFC2890]  Dommety, G., "Key and Sequence Number Extensions to GRE",
719	              RFC 2890, September 2000.

721	   [RFC5213]  Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
722	              and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.

724	   [RFC5844]  Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
725	              Mobile IPv6", RFC 5844, May 2010.

727	   [RFC5845]  Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung,
728	              "Generic Routing Encapsulation (GRE) Key Option for Proxy
729	              Mobile IPv6", RFC 5845, June 2010.

731	   [RFC6245]  Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J.
732	              Navali, "Generic Routing Encapsulation (GRE) Key Extension
733	              for Mobile IPv4", RFC 6245, May 2011.

735	   [RFC6325]  Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
736	              Ghanwani, "Routing Bridges (RBridges): Base Protocol
737	              Specification", RFC 6325, July 2011.

739	Authors' Addresses

741	   Thomas Narten (editor)
742	   IBM

744	   Email: narten@us.ibm.com

746	   Murari Sridharan
747	   Microsoft

749	   Email: muraris@microsoft.com

751	   Dinesh Dutt
752	   Cisco

754	   Email: ddutt@cisco.com
755	   David Black
756	   EMC

758	   Email: david.black@emc.com

760	   Lawrence Kreeger
761	   Cisco

763	   Email: kreeger@cisco.com