idnits 2.17.1 

draft-narten-nvo3-overlay-problem-statement-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 17, 2012) is 4299 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC4023' is defined on line 718, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5036' is defined on line 725, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-11) exists of
     draft-ietf-l2vpn-evpn-01

  == Outdated reference: A later version (-24) exists of draft-ietf-lisp-23

  == Outdated reference: A later version (-07) exists of
     draft-ietf-trill-fine-labeling-01

  == Outdated reference: A later version (-04) exists of
     draft-kreeger-nvo3-overlay-cp-00

  == Outdated reference: A later version (-09) exists of
     draft-mahalingam-dutt-dcops-vxlan-01

  == Outdated reference: A later version (-07) exists of
     draft-raggarwa-data-center-mobility-03

  == Outdated reference: A later version (-08) exists of
     draft-sridharan-virtualization-nvgre-01


     Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Engineering Task Force                           T. Narten, Ed.
3	Internet-Draft                                                       IBM
4	Intended status: Informational                              M. Sridharan
5	Expires: January 18, 2013                                      Microsoft
6	                                                                 D. Dutt

8	                                                                D. Black
9	                                                                     EMC
10	                                                              L. Kreeger
11	                                                                   Cisco
12	                                                           July 17, 2012

14	         Problem Statement: Overlays for Network Virtualization
15	             draft-narten-nvo3-overlay-problem-statement-03

17	Abstract

19	   This document describes issues associated with providing multi-
20	   tenancy in large data center networks and an overlay-based network
21	   virtualization approach to addressing them.  A key multi-tenancy
22	   requirement is traffic isolation, so that a tenant's traffic is not
23	   visible to any other tenant.  This isolation can be achieved by
24	   assigning one or more virtual networks to each tenant such that
25	   traffic within a virtual network is isolated from traffic in other
26	   virtual networks.  The primary functionality required is provisioning
27	   virtual networks, associating a virtual machine's virtual network
28	   interface(s) with the appropriate virtual network, and maintaining
29	   that association as the virtual machine is activated, migrated and/or
30	   deactivated.  Use of an overlay-based approach enables scalable
31	   deployment on large network infrastructures.

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on January 18, 2013.

50	Copyright Notice

52	   Copyright (c) 2012 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Problem Details  . . . . . . . . . . . . . . . . . . . . . . .  5
69	     2.1.  Dynamic Provisioning . . . . . . . . . . . . . . . . . . .  5
70	     2.2.  Virtual Machine Mobility Requirements  . . . . . . . . . .  5
71	     2.3.  Span of Virtual Networks . . . . . . . . . . . . . . . . .  6
72	     2.4.  Inadequate Forwarding Table Sizes in Switches  . . . . . .  6
73	     2.5.  Decoupling Logical and Physical Configuration  . . . . . .  6
74	     2.6.  Separating Tenant Addressing from Infrastructure
75	           Addressing . . . . . . . . . . . . . . . . . . . . . . . .  7
76	     2.7.  Communication Between Virtual and Traditional Networks . .  7
77	     2.8.  Communication Between Virtual Networks . . . . . . . . . .  7
78	     2.9.  Overlay Design Characteristics . . . . . . . . . . . . . .  8
79	   3.  Network Overlays . . . . . . . . . . . . . . . . . . . . . . .  9
80	     3.1.  Limitations of Existing Virtual Network Models . . . . . .  9
81	     3.2.  Benefits of Network Overlays . . . . . . . . . . . . . . . 10
82	     3.3.  Overlay Networking Work Areas  . . . . . . . . . . . . . . 11
83	   4.  Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13
84	     4.1.  IEEE 802.1aq - Shortest Path Bridging  . . . . . . . . . . 13
85	     4.2.  ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
86	     4.3.  TRILL  . . . . . . . . . . . . . . . . . . . . . . . . . . 13
87	     4.4.  L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
88	     4.5.  Proxy Mobile IP  . . . . . . . . . . . . . . . . . . . . . 14
89	     4.6.  LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
90	     4.7.  Individual Submissions . . . . . . . . . . . . . . . . . . 14
91	   5.  Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15
92	   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
93	   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 15
94	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
95	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 15
96	   10. Informative References . . . . . . . . . . . . . . . . . . . . 15
97	   Appendix A.  Change Log  . . . . . . . . . . . . . . . . . . . . . 17
98	     A.1.  Changes from -01 . . . . . . . . . . . . . . . . . . . . . 17
99	     A.2.  Changes from -02 . . . . . . . . . . . . . . . . . . . . . 18
100	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18

102	1.  Introduction

104	   Server virtualization is increasingly becoming the norm in data
105	   centers.  With server virtualization, each physical server supports
106	   multiple virtual machines (VMs), each running its own operating
107	   system, middleware and applications.  Virtualization is a key enabler
108	   of workload agility, i.e., allowing any server to host any
109	   application and providing the flexibility of adding, shrinking, or
110	   moving services within the physical infrastructure.  Server
111	   virtualization provides numerous benefits, including higher
112	   utilization, increased security, reduced user downtime, reduced power
113	   usage, etc.

115	   Large scale multi-tenant data centers are taking advantage of the
116	   benefits of server virtualization to provide a new kind of hosting, a
117	   virtual hosted data center.  Multi-tenant data centers are ones where
118	   individual tenants could belong to a different company (in the case
119	   of a public provider) or a different department (in the case of an
120	   internal company data center).  Each tenant has the expectation of a
121	   level of security and privacy separating their resources from those
122	   of other tenants.  For example, one tenant's traffic must never be
123	   exposed to another tenant, except through carefully controlled
124	   interfaces, such as a security gateway.

126	   To a tenant, virtual data centers are similar to their physical
127	   counterparts, consisting of end stations attached to a network,
128	   complete with services such as load balancers and firewalls.  But
129	   unlike a physical data center, end stations connect to a virtual
130	   network.  To end stations, a virtual network looks like a normal
131	   network (e.g., providing an ethernet service), except that the only
132	   end stations connected to the virtual network are those belonging to
133	   the tenant.

135	   A tenant is the administrative entity that is responsible for and
136	   manages a specific virtual network instance and its associated
137	   services (whether virtual or physical).  In a cloud environment, a
138	   tenant would correspond to the customer that has defined and is using
139	   a particular virtual network.  However, a tenant may also find it
140	   useful to create multiple different virtual network instances.
141	   Hence, there is a one-to-many mapping between tenants and virtual
142	   network instances.  A single tenant may operate multiple individual
143	   virtual network instances, each associated with a different service.

145	   How a virtual network is implemented does not matter to the tenant.
146	   It could be a pure routed network, a pure bridged network or a
147	   combination of bridged and routed networks.  The key requirement is
148	   that each individual virtual network instance be isolated from other
149	   virtual network instances.

151	   This document outlines the problems encountered in scaling the number
152	   of isolated networks in a data center, as well as the problems of
153	   managing the creation/deletion, membership and span of these networks
154	   and makes the case that an overlay based approach, where individual
155	   networks are implemented within individual virtual networks that are
156	   dynamically controlled by a standardized control plane provides a
157	   number of advantages over current approaches.  The purpose of this
158	   document is to identify the set of problems that any solution has to
159	   address in building multi-tenant data centers.  With this approach,
160	   the goal is to allow the construction of standardized, interoperable
161	   implementations to allow the construction of multi-tenant data
162	   centers.

164	   Section 2 describes the problem space details.  Section 3 describes
165	   network overlays in more detail and the potential work areas.
166	   Sections 4 and 5 review related and further work, while Section 6
167	   closes with a summary.

169	2.  Problem Details

171	   The following subsections describe aspects of multi-tenant networking
172	   that pose problems for large scale network infrastructure.  Different
173	   problem aspects may arise based on the network architecture and
174	   scale.

176	2.1.  Dynamic Provisioning

178	   Cloud computing involves on-demand provisioning of resources for
179	   multi-tenant environments.  A common example of cloud computing is
180	   the public cloud, where a cloud service provider offers elastic
181	   services to multiple customers over the same infrastructure.  The on-
182	   demand nature of provisioning in conjunction with trusted hypervisors
183	   controlling network access by VMs can be achieved through resilient
184	   distributed network control mechanisms.

186	2.2.  Virtual Machine Mobility Requirements

188	   A key benefit of server virtualization is virtual machine (VM)
189	   mobility.  A VM can be migrated from one server to another, live,
190	   i.e., while continuing to run and without needing to shut it down and
191	   restart it at the new location.  A key requirement for live migration
192	   is that a VM retain critical network state at its new location,
193	   including its IP and MAC address(es).  Preservation of MAC addresses
194	   may be necessary, for example, when software licences are bound to
195	   MAC addresses.  More generally, any change in the VM's MAC addresses
196	   resulting from a move would be visible to the VM and thus potentially
197	   result in unexpected disruptions.  Retaining IP addresses after a
198	   move is necessary to prevent existing transport connections (e.g.,
199	   TCP) from breaking and needing to be restarted.

201	   In traditional data centers, servers are assigned IP addresses based
202	   on their physical location, for example based on the Top of Rack
203	   (ToR) switch for the server rack or the VLAN configured to the
204	   server.  Servers can only move to other locations within the same IP
205	   subnet.  This constraint is not problematic for physical servers,
206	   which move infrequently, but it restricts the placement and movement
207	   of VMs within the data center.  Any solution for a scalable multi-
208	   tenant data center must allow a VM to be placed (or moved) anywhere
209	   within the data center, without being constrained by the subnet
210	   boundary concerns of the host servers.

212	2.3.  Span of Virtual Networks

214	   Another use case is cross pod expansion.  A pod typically consists of
215	   one or more racks of servers with its associated network and storage
216	   connectivity.  Tenants may start off on a pod and, due to expansion,
217	   require servers/VMs on other pods, especially the case when tenants
218	   on the other pods are not fully utilizing all their resources.  This
219	   use case requires that virtual networks span multiple pods in order
220	   to provide connectivity to all of the tenant's servers/VMs.

222	2.4.  Inadequate Forwarding Table Sizes in Switches

224	   Today's virtualized environments place additional demands on the
225	   forwarding tables of switches.  Instead of just one link-layer
226	   address per server, the switching infrastructure has to learn
227	   addresses of the individual VMs (which could range in the 100s per
228	   server).  This is a requirement since traffic from/to the VMs to the
229	   rest of the physical network will traverse the physical network
230	   infrastructure.  This places a much larger demand on the switches'
231	   forwarding table capacity compared to non-virtualized environments,
232	   causing more traffic to be flooded or dropped when the addresses in
233	   use exceeds the forwarding table capacity.

235	2.5.  Decoupling Logical and Physical Configuration

237	   Data center operators must be able to achieve high utilization of
238	   server and network capacity.  For efficient and flexible allocation,
239	   operators should be able to spread a virtual network instance across
240	   servers in any rack in the data center.  It should also be possible
241	   to migrate compute workloads to any server anywhere in the network
242	   while retaining the workload's addresses.  This can be achieved today
243	   by stretching VLANs (e.g., by using TRILL or SPB).

245	   However, in order to limit the broadcast domain of each VLAN, multi-
246	   destination frames within a VLAN should optimally flow only to those
247	   devices that have that VLAN configured.  When workloads migrate, the
248	   physical network (e.g., access lists) may need to be reconfigured
249	   which is typically time consuming and error prone.

251	2.6.  Separating Tenant Addressing from Infrastructure Addressing

253	   It is highly desirable to be able to number the data center underlay
254	   network using whatever addresses make sense for it, without having to
255	   worry about address collisions between addresses used by the underlay
256	   and those used by tenants.

258	2.7.  Communication Between Virtual and Traditional Networks

260	   Not all communication will be between devices connected to
261	   virtualized networks.  Devices using overlays will continue to access
262	   devices and make use of services on traditional, non-virtualized
263	   networks, whether in the data center, the public Internet, or at
264	   remote/branch campuses.  Any virtual network solution must be capable
265	   of interoperating with existing routers, VPN services, load
266	   balancers, intrusion detection services, firewalls, etc. on external
267	   networks.

269	   Communication between devices attached to a virtual network and
270	   devices connected to non-virtualized networks is handled
271	   architecturally by having specialized gateway devices that receive
272	   packets from a virtualized network, decapsulate them, process them as
273	   regular (i.e., non-virtualized) traffic, and finally forward them on
274	   to their appropriate destination (and vice versa).  Additional
275	   identification, such as VLAN tags, could be used on the non-
276	   virtualized side of such a gateway to enable forwarding of traffic
277	   for multiple virtual networks over a common non-virtualized link.

279	   A wide range of implementation approaches are possible.  Overlay
280	   gateway functionality could be combined with other network
281	   functionality into a network device that implements the overlay
282	   functionality, and then forwards traffic between other internal
283	   components that implement functionality such as full router service,
284	   load balancing, firewall support, VPN gateway, etc.

286	2.8.  Communication Between Virtual Networks

288	   Communication between devices on different virtual networks is
289	   handled architecturally by adding specialized interconnect
290	   functionality among the otherwise isolated virtual networks.  For a
291	   virtual network providing an Ethernet service, such interconnect
292	   functionality could be IP forwarding configured as part of the
293	   "default gateway" for each virtual network.  For a virtual network
294	   providing IP service, the interconnect functionality could be IP
295	   forwarding configured as part of the IP addressing structure of each
296	   virtual network.  In both cases, the implementation of the
297	   interconnect functionality could be distributed across the NVEs, and
298	   could be combined with other network functionality (e.g., load
299	   balancing, firewall support) that is applied to traffic that is
300	   forwarded between virtual networks.

302	2.9.  Overlay Design Characteristics

304	   There are existing layer 2 overlay protocols in existence, but they
305	   were not necessarily designed to solve the problem in the environment
306	   of a highly virtualized data center.  Below are some of the
307	   characteristics of environments that must be taken into account by
308	   the overlay technology:

310	   1.  Highly distributed systems.  The overlay should work in an
311	       environment where there could be many thousands of access
312	       switches (e.g. residing within the hypervisors) and many more end
313	       systems (e.g.  VMs) connected to them.  This leads to a
314	       distributed mapping system that puts a low overhead on the
315	       overlay tunnel endpoints.

317	   2.  Many highly distributed virtual networks with sparse membership.
318	       Each virtual network could be highly dispersed inside the data
319	       center.  Also, along with expectation of many virtual networks,
320	       the number of end systems connected to any one virtual network is
321	       expected to be relatively low; Therefore, the percentage of
322	       access switches participating in any given virtual network would
323	       also be expected to be low.  For this reason, efficient pruning
324	       of multi-destination traffic should be taken into consideration.

326	   3.  Highly dynamic end systems.  End systems connected to virtual
327	       networks can be very dynamic, both in terms of creation/deletion/
328	       power-on/off and in terms of mobility across the access switches.

330	   4.  Work with existing, widely deployed network Ethernet switches and
331	       IP routers without requiring wholesale replacement.  The first
332	       hop switch that adds and removes the overlay header will require
333	       new equipment and/or new software.

335	   5.  Network infrastructure administered by a single administrative
336	       domain.  This is consistent with operation within a data center,
337	       and not across the Internet.

339	3.  Network Overlays

341	   Virtual Networks are used to isolate a tenant's traffic from that of
342	   other tenants (or even traffic within the same tenant that requires
343	   isolation).  There are two main characteristics of virtual networks:

345	   1.  Providing network address space that is isolated from other
346	       virtual networks.  The same network addresses may be used in
347	       different virtual networks on the same underlying network
348	       infrastructure.

350	   2.  Limiting the scope of frames sent on the virtual network.  Frames
351	       sent by end systems attached to a virtual network are delivered
352	       as expected to other end systems on that virtual network and may
353	       exit a virtual network only through controlled exit points such
354	       as a security gateway.  Likewise, frames sourced outside of the
355	       virtual network may enter the virtual network only through
356	       controlled entry points, such as a security gateway.

358	3.1.  Limitations of Existing Virtual Network Models

360	   Virtual networks are not new to networking.  For example, VLANs are a
361	   well known construct in the networking industry.  A VLAN is an L2
362	   bridging construct that provides some of the semantics of virtual
363	   networks mentioned above: a MAC address is unique within a VLAN, but
364	   not necessarily across VLANs.  Traffic sourced within a VLAN
365	   (including broadcast and multicast traffic) remains within the VLAN
366	   it originates from.  Traffic forwarded from one VLAN to another
367	   typically involves router (L3) processing.  The forwarding table look
368	   up operation is keyed on {VLAN, MAC address} tuples.

370	   But there are problems and limitations with L2 VLANs.  VLANs are a
371	   pure L2 bridging construct and VLAN identifiers are carried along
372	   with data frames to allow each forwarding point to know what VLAN the
373	   frame belongs to.  A VLAN today is defined as a 12 bit number,
374	   limiting the total number of VLANs to 4096 (though typically, this
375	   number is 4094 since 0 and 4095 are reserved).  Due to the large
376	   number of tenants that a cloud provider might service, the 4094 VLAN
377	   limit is often inadequate.  In addition, there is often a need for
378	   multiple VLANs per tenant, which exacerbates the issue.  The use of a
379	   sufficiently large VNID, present in the overlay control plane and
380	   possibly also in the dataplane would eliminate current VLAN size
381	   limitations associated with single 12-bit VLAN tags.

383	   For IP/MPLS networks, Ethernet Virtual Private Network (E-VPN)
384	   [I-D.ietf-l2vpn-evpn] provides an emulated Ethernet service in which
385	   each tenant has its own Ethernet network over a common IP or MPLS
386	   infrastructure and a BGP/MPLS control plane is used to distribute the
387	   tenant MAC addresses and the MPLS labels that identify the tenants
388	   and tenant MAC addresses.  Within the BGP/MPLS control plane a thirty
389	   two bit Ethernet Tag is used to identify the broadcast domains
390	   (VLANs) associated with a given L2 VLAN service instance and these
391	   Ethernet tags are mapped to VLAN IDs understood by the tenant at the
392	   service edges.  This means that the limit of 4096 VLANs is associated
393	   with an individual tenant service edge, enabling a much higher level
394	   of scalability.  Interconnectivity between tenants is also allowed in
395	   a controlled fashion.

397	   IP/MPLS networks also provide an IP VPN service (L3 VPN) [RFC4364] in
398	   which each tenant has its own IP network over a common IP or MPLS
399	   infrastructure and a BGP/MPLS control plane is used to distribute the
400	   tenant IP routes and the MPLS labels that identify the tenants and
401	   tenant IP routes.  As with E-VPNs, interconnectivity between tenants
402	   is also allowed in a controlled fashion.

404	   VM Mobility [I-D.raggarwa-data-center-mobility] introduces the
405	   concept of a combined L2/L3 VPN service in order to support the
406	   mobility of individual Virtual Machines (VMs) between Data Centers
407	   connected over a common IP or MPLS infrastructure.

409	   There are a number of VPN approaches that provide some if not all of
410	   the desired semantics of virtual networks.  A gap analysis will be
411	   needed to assess how well existing approaches satisfy the
412	   requirements.

414	3.2.  Benefits of Network Overlays

416	   To address the problems described earlier, a network overlay model
417	   can be used.

419	   The idea behind an overlay is quite straightforward.  Each virtual
420	   network instance is implemented as an overlay.  The original frame is
421	   encapsulated by the first hop network device.  The encapsulation
422	   identifies the destination of the device that will perform the
423	   decapsulation before delivering the frame to the endpoint.  The rest
424	   of the network forwards the frame based on the encapsulation header
425	   and can be oblivious to the payload that is carried inside.  To avoid
426	   belaboring the point each time, the first hop network device can be a
427	   traditional switch or router or the virtual switch residing inside a
428	   hypervisor.  Furthermore, the endpoint can be a VM or it can be a
429	   physical server.  Examples of architectures based on network overlays
430	   include BGP/MPLS VPNs [RFC4364], TRILL [RFC6325], LISP
431	   [I-D.ietf-lisp], and Shortest Path Bridging [SPB].

433	   With the overlay, a virtual network identifier (or VNID) can be
434	   carried as part of the overlay header so that every data frame
435	   explicitly identifies the specific virtual network the frame belongs
436	   to.  Since both routed and bridged semantics can be supported by a
437	   virtual data center, the original frame carried within the overlay
438	   header can be an Ethernet frame complete with MAC addresses or just
439	   the IP packet.

441	   The use of a sufficiently large VNID would address current VLAN
442	   limitations associated with single 12-bit VLAN tags.  This VNID can
443	   be carried in the control plane.  In the data plane, an overlay
444	   header provides a place to carry either the VNID, or a locally-
445	   significant identifier.  In both cases, the identifier in the overlay
446	   header specifies which virtual network the data packet belongs to.

448	   A key aspect of overlays is the decoupling of the "virtual" MAC and
449	   IP addresses used by VMs from the physical network infrastructure and
450	   the infrastructure IP addresses used by the data center.  If a VM
451	   changes location, the switches at the edge of the overlay simply
452	   update their mapping tables to reflect the new location of the VM
453	   within the data center's infrastructure space.  Because an overlay
454	   network is used, a VM can now be located anywhere in the data center
455	   that the overlay reaches without regards to traditional constraints
456	   implied by L2 properties such as VLAN numbering, or the span of an L2
457	   broadcast domain scoped to a single pod or access switch.

459	   Multi-tenancy is supported by isolating the traffic of one virtual
460	   network instance from traffic of another.  Traffic from one virtual
461	   network instance cannot be delivered to another instance without
462	   (conceptually) exiting the instance and entering the other instance
463	   via an entity that has connectivity to both virtual network
464	   instances.  Without the existence of this entity, tenant traffic
465	   remains isolated within each individual virtual network instance.

467	   Overlays are designed to allow a set of VMs to be placed within a
468	   single virtual network instance, whether that virtual network
469	   provides a bridged network or a routed network.

471	3.3.  Overlay Networking Work Areas

473	   There are three specific and separate potential work areas needed to
474	   realize an overlay solution.  The areas correspond to different
475	   possible "on-the-wire" protocols, where distinct entities interact
476	   with each other.

478	   One area of work concerns the address dissemination protocol an NVE
479	   uses to build and maintain the mapping tables it uses to deliver
480	   encapsulated frames to their proper destination.  One approach is to
481	   build mapping tables entirely via learning (as is done in 802.1
482	   networks).  But to provide better scaling properties, a more
483	   sophisticated approach is needed, i.e., the use of a specialized
484	   control plane protocol.  While there are some advantages to using or
485	   leveraging an existing protocol for maintaining mapping tables, the
486	   fact that large numbers of NVE's will likely reside in hypervisors
487	   places constraints on the resources (cpu and memory) that can be
488	   dedicated to such functions.  For example, routing protocols (e.g.,
489	   IS-IS, BGP) may have scaling difficulties if implemented directly in
490	   all NVEs, based on both flooding and convergence time concerns.  An
491	   alternative approach would be to use a standard query protocol
492	   between NVEs and the set of network nodes that maintain address
493	   mappings used across the data center for the entire overlay system.

495	   From an architectural perspective, one can view the address mapping
496	   dissemination problem as having two distinct and separable
497	   components.  The first component consists of a back-end "oracle" that
498	   is responsible for distributing and maintaining the mapping
499	   information for the entire overlay system.  The second component
500	   consists of the on-the-wire protocols an NVE uses when interacting
501	   with the oracle.

503	   The back-end oracle could provide high performance, high resiliency,
504	   failover, etc. and could be implemented in significantly different
505	   ways.  For example, one model uses a traditional, centralized
506	   "directory-based" database, using replicated instances for
507	   reliability and failover.  A second model involves using and possibly
508	   extending an existing routing protocol (e.g., BGP, IS-IS, etc.).  To
509	   support different architectural models, it is useful to have one
510	   standard protocol for the NVE-oracle interaction while allowing
511	   different protocols and architectural approaches for the oracle
512	   itself.  Separating the two allows NVEs to transparently interact
513	   with different types of oracles, i.e., either of the two
514	   architectural models described above.  Having separate protocols
515	   could also allow for a simplified NVE that only interacts with the
516	   oracle for the mapping table entries it needs and allows the oracle
517	   (and its associated protocols) to evolve independently over time with
518	   minimal impact to the NVEs.

520	   A third work area considers the attachment and detachment of VMs (or
521	   Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from
522	   a specific virtual network instance.  When a VM attaches, the Network
523	   Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates
524	   the VM with a specific overlay for the purposes of tunneling traffic
525	   sourced from or destined to the VM.  When a VM disconnects, it is
526	   removed from the overlay and the NVE effectively terminates any
527	   tunnels associated with the VM.  To achieve this functionality, a
528	   standardized interaction between the NVE and hypervisor may be
529	   needed, for example in the case where the NVE resides on a separate
530	   device from the VM.

532	   In summary, there are three areas of potential work.  The first area
533	   concerns the oracle itself and any on-the-wire protocols it needs.  A
534	   second area concerns the interaction between the oracle and NVEs.
535	   The third work area concerns protocols associated with attaching and
536	   detaching a VM from a particular virtual network instance.  All three
537	   work areas are important to the development of a scalable,
538	   interoperable solution.

540	4.  Related Work

542	4.1.  IEEE 802.1aq - Shortest Path Bridging

544	   Shortest Path Bridging (SPB) is an IS-IS based overlay for L2
545	   Ethernets.  SPB supports multi-pathing and addresses a number of
546	   shortcoming in the original Ethernet Spanning Tree Protocol.  SPB-M
547	   uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit
548	   I-SID, which can be used to identify virtual network instances.  SPB
549	   is entirely L2 based, extending the L2 Ethernet bridging model.

551	4.2.  ARMD

553	   ARMD is chartered to look at data center scaling issues with a focus
554	   on address resolution.  ARMD is currently chartered to develop a
555	   problem statement and is not currently developing solutions.  While
556	   an overlay-based approach may address some of the "pain points" that
557	   have been raised in ARMD (e.g., better support for multi-tenancy), an
558	   overlay approach may also push some of the L2 scaling concerns (e.g.,
559	   excessive flooding) to the IP level (flooding via IP multicast).
560	   Analysis will be needed to understand the scaling tradeoffs of an
561	   overlay based approach compared with existing approaches.  On the
562	   other hand, existing IP-based approaches such as proxy ARP may help
563	   mitigate some concerns.

565	4.3.  TRILL

567	   TRILL is an L2-based approach aimed at improving deficiencies and
568	   limitations with current Ethernet networks and STP in particular.
569	   Although it differs from Shortest Path Bridging in many architectural
570	   and implementation details, it is similar in that is provides an L2-
571	   based service to end systems.  TRILL as defined today, supports only
572	   the standard (and limited) 12-bit VLAN model.  Approaches to extend
573	   TRILL to support more than 4094 VLANs are currently under
574	   investigation [I-D.ietf-trill-fine-labeling]

576	4.4.  L2VPNs

578	   The IETF has specified a number of approaches for connecting L2
579	   domains together as part of the L2VPN Working Group.  That group,
580	   however has historically been focused on Provider-provisioned L2
581	   VPNs, where the service provider participates in management and
582	   provisioning of the VPN.  In addition, much of the target environment
583	   for such deployments involves carrying L2 traffic over WANs.  Overlay
584	   approaches are intended be used within data centers where the overlay
585	   network is managed by the data center operator, rather than by an
586	   outside party.  While overlays can run across the Internet as well,
587	   they will extend well into the data center itself (e.g., up to and
588	   including hypervisors) and include large numbers of machines within
589	   the data center itself.

591	   Other L2VPN approaches, such as L2TP [RFC2661] require significant
592	   tunnel state at the encapsulating and decapsulating end points.
593	   Overlays require less tunnel state than other approaches, which is
594	   important to allow overlays to scale to hundreds of thousands of end
595	   points.  It is assumed that smaller switches (i.e., virtual switches
596	   in hypervisors or the physical switches to which VMs connect) will be
597	   part of the overlay network and be responsible for encapsulating and
598	   decapsulating packets.

600	4.5.  Proxy Mobile IP

602	   Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field
603	   [RFC5845] [RFC6245], but not in a way that supports multi-tenancy.

605	4.6.  LISP

607	   LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where
608	   the internal addresses are end station Identifiers and the outer IP
609	   addresses represent the location of the end station within the core
610	   IP network topology.  The LISP overlay header uses a 24-bit Instance
611	   ID used to support overlapping inner IP addresses.

613	4.7.  Individual Submissions

615	   Many individual submissions also look to addressing some or all of
616	   the issues addressed in this draft.  Examples of such drafts are
617	   VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE
618	   [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in
619	   L3 networks[I-D.wkumari-dcops-l3-vmmobility].

621	5.  Further Work

623	   It is believed that overlay-based approaches may be able to reduce
624	   the overall amount of flooding and other multicast and broadcast
625	   related traffic (e.g, ARP and ND) currently experienced within
626	   current data centers with a large flat L2 network.  Further analysis
627	   is needed to characterize expected improvements.

629	6.  Summary

631	   This document has argued that network virtualization using L3
632	   overlays addresses a number of issues being faced as data centers
633	   scale in size.  In addition, careful consideration of a number of
634	   issues would lead to the development of interoperable implementation
635	   of virtualization overlays.

637	   Three potential work were identified.  The first involves the
638	   interaction that take place when a VM attaches or detaches from an
639	   overlay.  A second involves the protocol an NVE would use to
640	   communicate with a backend "oracle" to learn and disseminate mapping
641	   information about the VMs the NVE communicates with.  The third
642	   potential work area involves the backend oracle itself, i.e., how it
643	   provides failover and how it interacts with oracles in other domains.

645	7.  Acknowledgments

647	   Helpful comments and improvements to this document have come from
648	   Ariel Hendel, Vinit Jain, and Benson Schliesser.

650	8.  IANA Considerations

652	   This memo includes no request to IANA.

654	9.  Security Considerations

656	   TBD

658	10.  Informative References

660	   [I-D.ietf-l2vpn-evpn]
661	              Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F.,
662	              Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN",
663	              draft-ietf-l2vpn-evpn-01 (work in progress), July 2012.

665	   [I-D.ietf-lisp]
666	              Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
667	              "Locator/ID Separation Protocol (LISP)",
668	              draft-ietf-lisp-23 (work in progress), May 2012.

670	   [I-D.ietf-trill-fine-labeling]
671	              Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D.
672	              Dutt, "TRILL: Fine-Grained Labeling",
673	              draft-ietf-trill-fine-labeling-01 (work in progress),
674	              June 2012.

676	   [I-D.kreeger-nvo3-overlay-cp]
677	              Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T.
678	              Narten, "Network Virtualization Overlay Control Protocol
679	              Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in
680	              progress), January 2012.

682	   [I-D.lasserre-nvo3-framework]
683	              Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
684	              Rekhter, "Framework for DC Network Virtualization",
685	              draft-lasserre-nvo3-framework-03 (work in progress),
686	              July 2012.

688	   [I-D.mahalingam-dutt-dcops-vxlan]
689	              Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright,
690	              C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A
691	              Framework for Overlaying Virtualized Layer 2 Networks over
692	              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01
693	              (work in progress), February 2012.

695	   [I-D.raggarwa-data-center-mobility]
696	              Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R.,
697	              and L. Fang, "Data Center Mobility based on BGP/MPLS, IP
698	              Routing and NHRP", draft-raggarwa-data-center-mobility-03
699	              (work in progress), June 2012.

701	   [I-D.sridharan-virtualization-nvgre]
702	              Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang,
703	              Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
704	              and C. Tumuluri, "NVGRE: Network Virtualization using
705	              Generic Routing Encapsulation",
706	              draft-sridharan-virtualization-nvgre-01 (work in
707	              progress), July 2012.

709	   [I-D.wkumari-dcops-l3-vmmobility]
710	              Kumari, W. and J. Halpern, "Virtual Machine mobility in L3
711	              Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in
712	              progress), August 2011.

714	   [RFC2661]  Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
715	              G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
716	              RFC 2661, August 1999.

718	   [RFC4023]  Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating
719	              MPLS in IP or Generic Routing Encapsulation (GRE)",
720	              RFC 4023, March 2005.

722	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
723	              Networks (VPNs)", RFC 4364, February 2006.

725	   [RFC5036]  Andersson, L., Minei, I., and B. Thomas, "LDP
726	              Specification", RFC 5036, October 2007.

728	   [RFC5213]  Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K.,
729	              and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008.

731	   [RFC5844]  Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy
732	              Mobile IPv6", RFC 5844, May 2010.

734	   [RFC5845]  Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung,
735	              "Generic Routing Encapsulation (GRE) Key Option for Proxy
736	              Mobile IPv6", RFC 5845, June 2010.

738	   [RFC6245]  Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J.
739	              Navali, "Generic Routing Encapsulation (GRE) Key Extension
740	              for Mobile IPv4", RFC 6245, May 2011.

742	   [RFC6325]  Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A.
743	              Ghanwani, "Routing Bridges (RBridges): Base Protocol
744	              Specification", RFC 6325, July 2011.

746	   [SPB]      "IEEE P802.1aq/D4.5 Draft Standard for Local and
747	              Metropolitan Area Networks -- Media Access Control (MAC)
748	              Bridges and Virtual Bridged Local Area Networks,
749	              Amendment 8: Shortest Path Bridging", February 2012.

751	Appendix A.  Change Log

753	A.1.  Changes from -01

755	   1.  Removed Section 4.2 (Standardization Issues) and Section 5
756	       (Control Plane) as those are more appropriately covered in and
757	       overlap with material in [I-D.lasserre-nvo3-framework] and
758	       [I-D.kreeger-nvo3-overlay-cp].

760	   2.  Expanded introduction and better explained terms such as tenant
761	       and virtual network instance.  These had been covered in a
762	       section that has since been removed.

764	   3.  Added Section 3.3 "Overlay Networking Work Areas" to better
765	       articulate the three separable work components (or "on-the-wire
766	       protocols") where work is needed.

768	   4.  Added section on Shortest Path Bridging in Related Work section.

770	   5.  Revised some of the terminology to be consistent with
771	       [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp].

773	A.2.  Changes from -02

775	   1.  Numerous changes in response to discussions on the nvo3 mailing
776	       list, with majority of changes in Section 2 (Problem Details) and
777	       Section 3 (Network Overlays).  Best to see diffs for specific
778	       text changes.

780	Authors' Addresses

782	   Thomas Narten (editor)
783	   IBM

785	   Email: narten@us.ibm.com

787	   Murari Sridharan
788	   Microsoft

790	   Email: muraris@microsoft.com

792	   Dinesh Dutt

794	   Email: ddutt.ietf@hobbesdutt.com

796	   David Black
797	   EMC

799	   Email: david.black@emc.com
800	   Lawrence Kreeger
801	   Cisco

803	   Email: kreeger@cisco.com