idnits 2.17.1 

draft-narten-nvo3-overlay-problem-statement-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 15, 2012) is 4334 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-24) exists of draft-ietf-lisp-23

  == Outdated reference: A later version (-07) exists of
     draft-ietf-trill-fine-labeling-01

  == Outdated reference: A later version (-04) exists of
     draft-kreeger-nvo3-overlay-cp-00

  == Outdated reference: A later version (-03) exists of
     draft-lasserre-nvo3-framework-01

  == Outdated reference: A later version (-09) exists of
     draft-mahalingam-dutt-dcops-vxlan-01

  == Outdated reference: A later version (-08) exists of
     draft-sridharan-virtualization-nvgre-00


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           T. Narten, Ed.
3	Internet-Draft                                                       IBM
4	Intended status: Informational                              M. Sridharan
5	Expires: December 17, 2012                                     Microsoft
6	                                                                 D. Dutt

8	                                                                D. Black
9	                                                                     EMC
10	                                                              L. Kreeger
11	                                                                   Cisco
12	                                                           June 15, 2012

14	         Problem Statement: Overlays for Network Virtualization
15	             draft-narten-nvo3-overlay-problem-statement-02

17	Abstract

19	   This document describes issues associated with providing multi-
20	   tenancy in large data center networks and an overlay-based network
21	   virtualization approach to addressing them.  A key multi-tenancy
22	   requirement is traffic isolation, so that a tenant's traffic is not
23	   visible to any other tenant.  This isolation can be achieved by
24	   assigning one or more virtual networks to each tenant such that
25	   traffic within a virtual network is isolated from traffic in other
26	   virtual networks.  The primary functionality required is provisioning
27	   virtual networks, associating a virtual machine's NIC with the
28	   appropriate virtual network, and maintaining that association as the
29	   virtual machine is activated, migrated and/or deactivated.  Use of an
30	   overlay-based approach enables scalable deployment on large network
31	   infrastructures.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on December 17, 2012.

50	Copyright Notice

52	   Copyright (c) 2012 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Problem Details  . . . . . . . . . . . . . . . . . . . . . . .  5
69	     2.1.  Multi-tenant Environment Scale . . . . . . . . . . . . . .  5
70	     2.2.  Virtual Machine Mobility Requirements  . . . . . . . . . .  5
71	     2.3.  Span of Virtual Networks . . . . . . . . . . . . . . . . .  6
72	     2.4.  Inadequate Forwarding Table Sizes in Switches  . . . . . .  6
73	     2.5.  Decoupling Logical and Physical Configuration  . . . . . .  6
74	     2.6.  Support Communication Between VMs and Non-virtualized
75	           Devices  . . . . . . . . . . . . . . . . . . . . . . . . .  7
76	     2.7.  Overlay Design Characteristics . . . . . . . . . . . . . .  7
77	   3.  Network Overlays . . . . . . . . . . . . . . . . . . . . . . .  8
78	     3.1.  Limitations of Existing Virtual Network Models . . . . . .  8
79	     3.2.  Benefits of Network Overlays . . . . . . . . . . . . . . .  9
80	     3.3.  Overlay Networking Work Areas  . . . . . . . . . . . . . . 10
81	   4.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 12
82	     4.1.  IEEE 802.1aq - Shortest Path Bridging  . . . . . . . . . . 12
83	     4.2.  ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
84	     4.3.  TRILL  . . . . . . . . . . . . . . . . . . . . . . . . . . 12
85	     4.4.  L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 13
86	     4.5.  Proxy Mobile IP  . . . . . . . . . . . . . . . . . . . . . 13
87	     4.6.  LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
88	     4.7.  Individual Submissions . . . . . . . . . . . . . . . . . . 13
89	   5.  Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 14
90	   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
91	   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 14
92	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
93	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
94	   10. Informative References . . . . . . . . . . . . . . . . . . . . 14
95	   Appendix A.  Change Log  . . . . . . . . . . . . . . . . . . . . . 16
96	     A.1.  Changes from -01 . . . . . . . . . . . . . . . . . . . . . 16

98	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16

100	1.  Introduction

102	   Server virtualization is increasingly becoming the norm in data
103	   centers.  With server virtualization, each physical server supports
104	   multiple virtual machines (VMs), each running its own operating
105	   system, middleware and applications.  Virtualization is a key enabler
106	   of workload agility, i.e., allowing any server to host any
107	   application and providing the flexibility of adding, shrinking, or
108	   moving services within the physical infrastructure.  Server
109	   virtualization provides numerous benefits, including higher
110	   utilization, increased data security, reduced user downtime, reduced
111	   power usage, etc.

113	   Large scale multi-tenant data centers are taking advantage of the
114	   benefits of server virtualization to provide a new kind of hosting, a
115	   virtual hosted data center.  Multi-tenant data centers are ones where
116	   individual tenants could belong to a different company (in the case
117	   of a public provider) or a different department (in the case of an
118	   internal company data center).  Each tenant has the expectation of a
119	   level of security and privacy separating their resources from those
120	   of other tenants.  For example, one tenant's traffic must never be
121	   exposed to another tenant, except through carefully controlled
122	   interfaces, such as a security gateway.

124	   To a tenant, virtual data centers are similar to their physical
125	   counterparts, consisting of end stations attached to a network,
126	   complete with services such as load balancers and firewalls.  But
127	   unlike a physical data center, end stations connect to a virtual
128	   network.  To end stations, a virtual network looks like a normal
129	   network (e.g., providing an ethernet service), except that the only
130	   end stations connected to the virtual network are those belonging to
131	   the tenant.

133	   A tenant is the administrative entity that is responsible for and
134	   manages a specific virtual network instance and its associated
135	   services (whether virtual or physical).  In a cloud environment, a
136	   tenant would correspond to the customer that has defined and is using
137	   a particular virtual network.  However, a tenant may also find it
138	   useful to create multiple different virtual network instances.
139	   Hence, there is a one-to-many mapping between tenants and virtual
140	   network instances.  A single tenant may operate multiple individual
141	   virtual network instances, each associated with a different service.

143	   How a virtual network is implemented does not matter to the tenant.
144	   It could be a pure routed network, a pure bridged network or a
145	   combination of bridged and routed networks.  The key requirement is
146	   that each individual virtual network instance be isolated from other
147	   virtual network instances.

149	   This document outlines the problems encountered in scaling the number
150	   of isolated networks in a data center, as well as the problems of
151	   managing the creation/deletion, membership and span of these networks
152	   and makes the case that an overlay based approach, where individual
153	   networks are implemented within individual virtual networks that are
154	   dynamically controlled by a standardized control plane provides a
155	   number of advantages over current approaches.  The purpose of this
156	   document is to identify the set of problems that any solution has to
157	   address in building multi-tenant data centers.  With this approach,
158	   the goal is to allow the construction of standardized, interoperable
159	   implementations to allow the construction of multi-tenant data
160	   centers.

162	   Section 2 describes the problem space details.  Section 3 describes
163	   network overlays in more detail and the potential work areas.
164	   Sections 4 and 5 review related and further work, while Section 6
165	   closes with a summary.

167	2.  Problem Details

169	   The following subsections describe aspects of multi-tenant networking
170	   that pose problems for large scale network infrastructure.  Different
171	   problem aspects may arise based on the network architecture and
172	   scale.

174	2.1.  Multi-tenant Environment Scale

176	   Cloud computing involves on-demand elastic provisioning of resources
177	   for multi-tenant environments.  A common example of cloud computing
178	   is the public cloud, where a cloud service provider offers these
179	   elastic services to multiple customers over the same infrastructure.
180	   This elastic on-demand nature in conjunction with trusted hypervisors
181	   to control network access by VMs calls for resilient distributed
182	   network control mechanisms.

184	2.2.  Virtual Machine Mobility Requirements

186	   A key benefit of server virtualization is virtual machine (VM)
187	   mobility.  A VM can be migrated from one server to another, live i.e.
188	   as it continues to run and without shutting down the VM and
189	   restarting it at a new location.  A key requirement for live
190	   migration is that a VM retain its IP address(es) and MAC address(es)
191	   in its new location (to avoid tearing down existing communication).
192	   Today, servers are assigned IP addresses based on their physical
193	   location, typically based on the ToR (Top of Rack) switch for the
194	   server rack or the VLAN configured to the server.  This works well
195	   for physical servers, which cannot move, but it restricts the
196	   placement and movement of the more mobile VMs within the data center
197	   (DC).  Any solution for a scalable multi-tenant DC must allow a VM to
198	   be placed (or moved to) anywhere within the data center, without
199	   being constrained by the subnet boundary concerns of the host
200	   servers.

202	2.3.  Span of Virtual Networks

204	   Another use case is cross pod expansion.  A pod typically consists of
205	   one or more racks of servers with its associated network and storage
206	   connectivity.  Tenants may start off on a pod and, due to expansion,
207	   require servers/VMs on other pods, especially the case when tenants
208	   on the other pods are not fully utilizing all their resources.  This
209	   use case requires that virtual networks span multiple pods in order
210	   to provide connectivity to all of the tenant's servers/VMs.

212	2.4.  Inadequate Forwarding Table Sizes in Switches

214	   Today's virtualized environments place additional demands on the
215	   forwarding tables of switches.  Instead of just one link-layer
216	   address per server, the switching infrastructure has to learn
217	   addresses of the individual VMs (which could range in the 100s per
218	   server).  This is a requirement since traffic from/to the VMs to the
219	   rest of the physical network will traverse the physical network
220	   infrastructure.  This places a much larger demand on the switches'
221	   forwarding table capacity compared to non-virtualized environments,
222	   causing more traffic to be flooded or dropped when the addresses in
223	   use exceeds the forwarding table capacity.

225	2.5.  Decoupling Logical and Physical Configuration

227	   Data center operators must be able to achieve high utilization of
228	   server and network capacity.  For efficient and flexible allocation,
229	   operators should be able to spread a virtual network instance across
230	   servers in any rack in the data center.  It should also be possible
231	   to migrate compute workloads to any server anywhere in the network
232	   while retaining the workload's addresses.  This can be achieved today
233	   by stretching VLANs (e.g., by using TRILL or SPB).

235	   However, in order to limit the broadcast domain of each VLAN, multi-
236	   destination frames within a VLAN should optimally flow only to those
237	   devices that have that VLAN configured.  When workloads migrate, the
238	   physical network (e.g., access lists) may need to be reconfigured
239	   which is typically time consuming and error prone.

241	2.6.  Support Communication Between VMs and Non-virtualized Devices

243	   Within data centers, not all communication will be between VMs.
244	   Network operators will continue to use non-virtualized servers for
245	   various reasons, traditional routers to provide L2VPN and L3VPN
246	   services, traditional load balancers, firewalls, intrusion detection
247	   engines and so on.  Any virtual network solution should be capable of
248	   working with these existing systems.

250	2.7.  Overlay Design Characteristics

252	   There are existing layer 2 overlay protocols in existence, but they
253	   were not necessarily designed to solve the problem in the environment
254	   of a highly virtualized data center.  Below are some of the
255	   characteristics of environments that must be taken into account by
256	   the overlay technology:

258	   1.  Highly distributed systems.  The overlay should work in an
259	       environment where there could be many thousands of access
260	       switches (e.g. residing within the hypervisors) and many more end
261	       systems (e.g.  VMs) connected to them.  This leads to a
262	       distributed mapping system that puts a low overhead on the
263	       overlay tunnel endpoints.

265	   2.  Many highly distributed virtual networks with sparse membership.
266	       Each virtual network could be highly dispersed inside the data
267	       center.  Also, along with expectation of many virtual networks,
268	       the number of end systems connected to any one virtual network is
269	       expected to be relatively low; Therefore, the percentage of
270	       access switches participating in any given virtual network would
271	       also be expected to be low.  For this reason, efficient pruning
272	       of multi-destination traffic should be taken into consideration.

274	   3.  Highly dynamic end systems.  End systems connected to virtual
275	       networks can be very dynamic, both in terms of creation/deletion/
276	       power-on/off and in terms of mobility across the access switches.

278	   4.  Work with existing, widely deployed network Ethernet switches and
279	       IP routers without requiring wholesale replacement.  The first
280	       hop switch that adds and removes the overlay header will require
281	       new equipment and/or new software.

283	   5.  Network infrastructure administered by a single administrative
284	       domain.  This is consistent with operation within a data center,
285	       and not across the Internet.

287	3.  Network Overlays

289	   Virtual Networks are used to isolate a tenant's traffic from that of
290	   other tenants (or even traffic within the same tenant that requires
291	   isolation).  There are two main characteristics of virtual networks:

293	   1.  Providing network address space that is isolated from other
294	       virtual networks.  The same network addresses may be used in
295	       different virtual networks on the same underlying network
296	       infrastructure.

298	   2.  Limiting the scope of frames sent on the virtual network.  Frames
299	       sent by end systems attached to a virtual network are delivered
300	       as expected to other end systems on that virtual network and may
301	       exit a virtual network only through controlled exit points such
302	       as a security gateway.  Likewise, frames sourced outside of the
303	       virtual network may enter the virtual network only through
304	       controlled entry points, such as a security gateway.

306	3.1.  Limitations of Existing Virtual Network Models

308	   Virtual networks are not new to networking.  For example, VLANs are a
309	   well known construct in the networking industry.  A VLAN is an L2
310	   bridging construct that provides some of the semantics of virtual
311	   networks mentioned above: a MAC address is unique within a VLAN, but
312	   not necessarily across VLANs.  Traffic sourced within a VLAN
313	   (including broadcast and multicast traffic) remains within the VLAN
314	   it originates from.  Traffic forwarded from one VLAN to another
315	   typically involves router (L3) processing.  The forwarding table look
316	   up operation is keyed on {VLAN, MAC address} tuples.

318	   But there are problems and limitations with L2 VLANs.  VLANs are a
319	   pure L2 bridging construct and VLAN identifiers are carried along
320	   with data frames to allow each forwarding point to know what VLAN the
321	   frame belongs to.  A VLAN today is defined as a 12 bit number,
322	   limiting the total number of VLANs to 4096 (though typically, this
323	   number is 4094 since 0 and 4095 are reserved).  Due to the large
324	   number of tenants that a cloud provider might service, the 4094 VLAN
325	   limit is often inadequate.  In addition, there is often a need for
326	   multiple VLANs per tenant, which exacerbates the issue.

328	   In the case of IP networks, many routers provide a Virtual Routing
329	   and Forwarding (VRF) service.  The same router operates multiple
330	   instances of forwarding tables, one for each tenant.  Each forwarding
331	   table instance is populated separately via routing protocols, either
332	   running (conceptually) as separate instances for each VRF, or as a
333	   single instance-aware routing protocol that supports VRFs directly
334	   (e.g., [RFC4364]).  Each VRF instance provides address and traffic
335	   isolation.  The forwarding table look up operation is keyed on {VRF,
336	   IP address} tuples.

338	   VRF's are a pure routing construct and do not have end-to-end
339	   significance in the sense that the data plane carries a VRF indicator
340	   on an end-to-end basis.  Instead, the VRF is derived at each hop
341	   using a combination of incoming interface and some information in the
342	   frame (e.g., local VLAN tag).  Furthermore, the VRF model has
343	   typically assumed that a separate control plane governs the
344	   population of the forwarding table within that VRF.  Thus, a
345	   traditional VRF model assumes multiple, independent control planes
346	   and has no specific tag within a data frame to identify the VRF of
347	   the frame.

349	   There are number of VPN approaches that provide some of the desired
350	   semantics of virtual networks (e.g., [RFC4364]).  But VPN approaches
351	   have traditionally been deployed across WANs and have not seen
352	   widespread deployment within enterprise data centers.  They are not
353	   necessarily seen as supporting the characteristics outlined in
354	   Section 2.7.

356	3.2.  Benefits of Network Overlays

358	   To address the problems described earlier, a network overlay model
359	   can be used.

361	   The idea behind an overlay is quite straightforward.  Each virtual
362	   network instance is implemented as an overlay.  The original frame is
363	   encapsulated by the first hop network device.  The encapsulation
364	   identifies the destination of the device that will perform the
365	   decapsulation before delivering the frame to the endpoint.  The rest
366	   of the network forwards the frame based on the encapsulation header
367	   and can be oblivious to the payload that is carried inside.  To avoid
368	   belaboring the point each time, the first hop network device can be a
369	   traditional switch or router or the virtual switch residing inside a
370	   hypervisor.  Furthermore, the endpoint can be a VM or it can be a
371	   physical server.  Some examples of network overlays are tunnels such
372	   as IP GRE [RFC2784], LISP [I-D.ietf-lisp] or TRILL [RFC6325].

374	   With the overlay, a virtual network identifier (or VNID) can be
375	   carried as part of the overlay header so that every data frame
376	   explicitly identifies the specific virtual network the frame belongs
377	   to.  Since both routed and bridged semantics can be supported by a
378	   virtual data center, the original frame carried within the overlay
379	   header can be an Ethernet frame complete with MAC addresses or just
380	   the IP packet.

382	   The use of a large (e.g., 24-bit) VNID would allow 16 million
383	   distinct virtual networks within a single data center, eliminating
384	   current VLAN size limitations.  This VNID needs to be carried in the
385	   data plane along with the packet.  Adding an overlay header provides
386	   a place to carry this VNID.

388	   A key aspect of overlays is the decoupling of the "virtual" MAC and
389	   IP addresses used by VMs from the physical network infrastructure and
390	   the infrastructure IP addresses used by the data center.  If a VM
391	   changes location, the switches at the edge of the overlay simply
392	   update their mapping tables to reflect the new location of the VM
393	   within the data center's infrastructure space.  Because an overlay
394	   network is used, a VM can now be located anywhere in the data center
395	   that the overlay reaches without regards to traditional constraints
396	   implied by L2 properties such as VLAN numbering, or the span of an L2
397	   broadcast domain scoped to a single pod or access switch.

399	   Multi-tenancy is supported by isolating the traffic of one virtual
400	   network instance from traffic of another.  Traffic from one virtual
401	   network instance cannot be delivered to another instance without
402	   (conceptually) exiting the instance and entering the other instance
403	   via an entity that has connectivity to both virtual network
404	   instances.  Without the existence of this entity, tenant traffic
405	   remains isolated within each individual virtual network instance.
406	   External communications (from a VM within a virtual network instance
407	   to a machine outside of any virtual network instance, e.g. on the
408	   Internet) is handled by having an ingress switch forward traffic to
409	   an external router, where an egress switch decapsulates a tunneled
410	   packet and delivers it to the router for normal processing.  This
411	   router is external to the overlay, and behaves much like existing
412	   external facing routers in data centers today.

414	   Overlays are designed to allow a set of VMs to be placed within a
415	   single virtual network instance, whether that virtual network
416	   provides the bridged network or a routed network.

418	3.3.  Overlay Networking Work Areas

420	   There are three specific and separate potential work areas needed to
421	   realize an overlay solution.  The areas correspond to different
422	   possible "on-the-wire" protocols, where distinct entities interact
423	   with each other.

425	   One area of work concerns the address dissemination protocol an NVE
426	   uses to build and maintain the mapping tables it uses to deliver
427	   encapsulated frames to their proper destination.  One approach is to
428	   build mapping tables entirely via learning (as is done in 802.1
429	   networks).  But to provide better scaling properties, a more
430	   sophisticated approach is needed, i.e., the use of a specialized
431	   control plane protocol.  While there are some advantages to using or
432	   leveraging an existing protocol for maintaining mapping tables, the
433	   fact that large numbers of NVE's will likely reside in hypervisors
434	   places constraints on the resources (cpu and memory) that can be
435	   dedicated to such functions.  For example, routing protocols (e.g.,
436	   IS-IS, BGP) may have scaling difficulties if implemented directly in
437	   all NVEs, based on both flooding and convergence time concerns.  This
438	   suggests that use of a standard lookup protocol between NVEs and a
439	   smaller number of network nodes that implement the actual routing
440	   protocol (or the directory-based "oracle") is a more promising
441	   approach at larger scale.

443	   From an architectural perspective, one can view the address mapping
444	   dissemination problem as having two distinct and separable
445	   components.  The first component consists of a back-end "oracle" that
446	   is responsible for distributing and maintaining the mapping
447	   information for the entire overlay system.  The second component
448	   consists of the on-the-wire protocols an NVE uses when interacting
449	   with the oracle.

451	   The back-end oracle could provide high performance, high resiliancy,
452	   failover, etc. and could be implemented in different ways.  For
453	   example, one model uses a traditional, centralized "directory-based"
454	   database, using replicated instances for reliability and failover
455	   (e.g., LISP-XXX).  A second model involves using and possibly
456	   extending an existing routing protocol (e.g., BGP, IS-IS, etc.).  To
457	   support different architectural models, it is useful to have one
458	   standard protocol for the NVE-oracle interaction while allowing
459	   different protocols and architectural approaches for the oracle
460	   itself.  Separating the two allows NVEs to interact with different
461	   types of oracles, i.e., either of the two architectural models
462	   described above.  Having separate protocols also allows for a
463	   simplified NVE that only interacts with the oracle for the mapping
464	   table entries it needs and allows the oracle (and its associated
465	   protocols) to evolve independently over time with minimal impact to
466	   the NVEs.

468	   A third work area considers the attachment and detachment of VMs (or
469	   Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from
470	   a specific virtual network instance.  When a VM attaches, the Network
471	   Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates
472	   the VM with a specific overlay for the purposes of tunneling traffic
473	   sourced from or destined to the VM.  When a VM disconnects, it is
474	   removed from the overlay and the NVE effectively terminates any
475	   tunnels associated with the VM.  To achieve this functionality, a
476	   standardized interaction between the NVE and hypervisor may be
477	   needed, for example in the case where the NVE resides on a separate
478	   device from the VM.

480	   In summary, there are three areas of potential work.  The first area
481	   concerns the oracle itself and any on-the-wire protocols it needs.  A
482	   second area concerns the interaction between the oracle and NVEs.
483	   The third work area concerns protocols associated with attaching and
484	   detaching a VM from a particular virtual network instance.  The
485	   latter two items are the priority work areas and can be done largely
486	   independent of any oracle-related work.

488	4.  Related Work

490	4.1.  IEEE 802.1aq - Shortest Path Bridging

492	   Shortest Path Bridging (SPB) is an IS-IS based overlay for L2
493	   Ethernets.  SPB supports multi-pathing and addresses a number of
494	   shortcoming in the original Ethernet Spanning Tree Protocol.  SPB-M
495	   uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit
496	   I-SID, which can be used to identify virtual network instances.  SPB
497	   is entirely L2 based, extending the L2 Ethernet bridging model.

499	4.2.  ARMD

501	   ARMD is chartered to look at data center scaling issues with a focus
502	   on address resolution.  ARMD is currently chartered to develop a
503	   problem statement and is not currently developing solutions.  While
504	   an overlay-based approach may address some of the "pain points" that
505	   have been raised in ARMD (e.g., better support for multi-tenancy), an
506	   overlay approach may also push some of the L2 scaling concerns (e.g.,
507	   excessive flooding) to the IP level (flooding via IP multicast).
508	   Analysis will be needed to understand the scaling trade offs of an
509	   overlay based approach compared with existing approaches.  On the
510	   other hand, existing IP-based approaches such as proxy ARP may help
511	   mitigate some concerns.

513	4.3.  TRILL

515	   TRILL is an L2-based approach aimed at improving deficiencies and
516	   limitations with current Ethernet networks and STP in particular.
517	   Although it differs from Shortest Path Bridging in many architectural
518	   and implementation details, it is similar in that is provides an L2-
519	   based service to end systems.  TRILL as defined today, supports only
520	   the standard (and limited) 12-bit VLAN model.  Approaches to extend
521	   TRILL to support more than 4094 VLANs are currently under
522	   investigation [I-D.ietf-trill-fine-labeling]

524	4.4.  L2VPNs

526	   The IETF has specified a number of approaches for connecting L2
527	   domains together as part of the L2VPN Working Group.  That group,
528	   however has historically been focused on Provider-provisioned L2
529	   VPNs, where the service provider participates in management and
530	   provisioning of the VPN.  In addition, much of the target environment
531	   for such deployments involves carrying L2 traffic over WANs.  Overlay
532	   approaches are intended be used within data centers where the overlay
533	   network is managed by the data center operator, rather than by an
534	   outside party.  While overlays can run across the Internet as well,
535	   they will extend well into the data center itself (e.g., up to and
536	   including hypervisors) and include large numbers of machines within
537	   the data center itself.

539	   Other L2VPN approaches, such as L2TP [RFC2661] require significant
540	   tunnel state at the encapsulating and decapsulating end points.
541	   Overlays require less tunnel state than other approaches, which is
542	   important to allow overlays to scale to hundreds of thousands of end
543	   points.  It is assumed that smaller switches (i.e., virtual switches
544	   in hypervisors or the physical switches to which VMs connect) will be
545	   part of the overlay network and be responsible for encapsulating and
546	   decapsulating packets.

548	4.5.  Proxy Mobile IP

550	   Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
551	   [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.

553	4.6.  LISP

555	   LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
556	   the internal addresses are end station Identifiers and the outer IP
557	   addresses represent the location of the end station within the core
558	   IP network topology.  The LISP overlay header uses a 24 bit Instance
559	   ID used to support overlapping inner IP addresses.

561	4.7.  Individual Submissions

563	   Many individual submissions also look to addressing some or all of
564	   the issues addressed in this draft.  Examples of such drafts are
565	   VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
566	   [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in
567	   L3 networks[I-D.wkumari-dcops-l3-vmmobility].

569	5.  Further Work

571	   It is believed that overlay-based approaches may be able to reduce
572	   the overall amount of flooding and other multicast and broadcast
573	   related traffic (e.g, ARP and ND) currently experienced within
574	   current data centers with a large flat L2 network.  Further analysis
575	   is needed to characterize expected improvements.

577	6.  Summary

579	   This document has argued that network virtualization using L3
580	   overlays addresses a number of issues being faced as data centers
581	   scale in size.  In addition, careful consideration of a number of
582	   issues would lead to the development of interoperable implementation
583	   of virtualization overlays.

585	   Three potential work were identified.  The first involves the
586	   interaction that take place when a VM attaches or detaches from an
587	   overlay.  A second involves the protocol an NVE would use to
588	   communicate with a backend "oracle" to learn and disseminate mapping
589	   information about the VMs the NVE communicates with.  The third
590	   potential work area involves the backend oracle itself, i.e., how it
591	   provides failover and how it interacts with oracles in other domains.

593	7.  Acknowledgments

595	   Helpful comments and improvements to this document have come from
596	   Ariel Hendel, Vinit Jain, and Benson Schliesser.

598	8.  IANA Considerations

600	   This memo includes no request to IANA.

602	9.  Security Considerations

604	   TBD

606	10.  Informative References

608	   [I-D.ietf-lisp]
609	              Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
610	              "Locator/ID Separation Protocol (LISP)",
611	              draft-ietf-lisp-23 (work in progress), May 2012.

613	   [I-D.ietf-trill-fine-labeling]
614	              Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D.
615	              Dutt, "TRILL: Fine-Grained Labeling",
616	              draft-ietf-trill-fine-labeling-01 (work in progress),
617	              June 2012.

619	   [I-D.kreeger-nvo3-overlay-cp]
620	              Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T.
621	              Narten, "Network Virtualization Overlay Control Protocol
622	              Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in
623	              progress), January 2012.

625	   [I-D.lasserre-nvo3-framework]
626	              Lasserre, M., Balus, F., Morin, T., Bitar, N., Rekhter,
627	              Y., and Y. Ikejiri, "Framework for DC Network
628	              Virtualization", draft-lasserre-nvo3-framework-01 (work in
629	              progress), March 2012.

631	   [I-D.mahalingam-dutt-dcops-vxlan]
632	              Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
633	              C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
634	              Framework for Overlaying Virtualized Layer 2 Networks over
635	              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
636	              (work in progress), February 2012.

638	   [I-D.sridharan-virtualization-nvgre]
639	              Sridhavan, M., Duda, K., Ganga, I., Greenberg, A., Lin,
640	              G., Pearson, M., Thaler, P., Tumuluri, C., and Y. Wang,
641	              "NVGRE: Network Virtualization using Generic Routing
642	              Encapsulation", draft-sridharan-virtualization-nvgre-00
643	              (work in progress), September 2011.

645	   [I-D.wkumari-dcops-l3-vmmobility]
646	              Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
647	              Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
648	              progress), August 2011.

650	   [RFC2661]  Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
651	              G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
652	              RFC 2661, August 1999.

654	   [RFC2784]  Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
655	              Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
656	              March 2000.

658	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
659	              Networks (VPNs)", RFC 4364, February 2006.

661	   [RFC5213]  Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
662	              and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.

664	   [RFC5844]  Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
665	              Mobile IPv6", RFC 5844, May 2010.

667	   [RFC5845]  Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung,
668	              "Generic Routing Encapsulation (GRE) Key Option for Proxy
669	              Mobile IPv6", RFC 5845, June 2010.

671	   [RFC6245]  Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J.
672	              Navali, "Generic Routing Encapsulation (GRE) Key Extension
673	              for Mobile IPv4", RFC 6245, May 2011.

675	   [RFC6325]  Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
676	              Ghanwani, "Routing Bridges (RBridges): Base Protocol
677	              Specification", RFC 6325, July 2011.

679	Appendix A.  Change Log

681	A.1.  Changes from -01

683	   1.  Removed Section 4.2 (Standardization Issues) and Section 5
684	       (Control Plane) as those are more appropriately covered in and
685	       overlap with material in [I-D.lasserre-nvo3-framework] and
686	       [I-D.kreeger-nvo3-overlay-cp].

688	   2.  Expanded introduction and better explained terms such as tenant
689	       and virtual network instance.  These had been covered in a
690	       section that has since been removed.

692	   3.  Added Section 3.3 "Overlay Networking Work Areas" to better
693	       articulate the three separable work components (or "on-the-wire
694	       protocols") where work is needed.

696	   4.  Added section on Shortest Path Bridging in Related Work section.

698	   5.  Revised some of the terminology to be consistent with
699	       [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp].

701	Authors' Addresses

703	   Thomas Narten (editor)
704	   IBM

706	   Email: narten@us.ibm.com

708	   Murari Sridharan
709	   Microsoft

711	   Email: muraris@microsoft.com

713	   Dinesh Dutt

715	   Email: ddutt.ietf@hobbesdutt.com

717	   David Black
718	   EMC

720	   Email: david.black@emc.com

722	   Lawrence Kreeger
723	   Cisco

725	   Email: kreeger@cisco.com