idnits 2.17.1 

draft-kompella-nvo3-server2nve-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 29, 2013) is 4014 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'I-D.ietf-nvo3-overlay-problem-statement' is defined
     on line 777, but no explicit reference was found in the text

  == Unused Reference: 'I-D.kreeger-nvo3-overlay-cp' is defined on line 784,
     but no explicit reference was found in the text

  == Outdated reference: A later version (-11) exists of
     draft-ietf-l2vpn-evpn-03

  == Outdated reference: A later version (-09) exists of
     draft-ietf-nvo3-framework-02

  == Outdated reference: A later version (-04) exists of
     draft-ietf-nvo3-overlay-problem-statement-02

  == Outdated reference: A later version (-04) exists of
     draft-kreeger-nvo3-overlay-cp-02

  == Outdated reference: A later version (-09) exists of
     draft-mahalingam-dutt-dcops-vxlan-03

  == Outdated reference: A later version (-08) exists of
     draft-sridharan-virtualization-nvgre-02


     Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                        K. Kompella
3	Internet-Draft                                                Y. Rekhter
4	Intended status: Informational                          Juniper Networks
5	Expires: October 31, 2013                                       T. Morin
6	                                            France Telecom - Orange Labs
7	                                                                D. Black
8	                                                         EMC Corporation
9	                                                          April 29, 2013

11	 Signaling Virtual Machine Activity to the Network Virtualization Edge
12	                   draft-kompella-nvo3-server2nve-02

14	Abstract

16	   This document proposes a simplified approach for provisioning network
17	   parameters related to Virtual Machine creation, migration and
18	   termination on servers.  The idea is to provision the server, then
19	   have the server signal the requisite parameters to the relevant
20	   network device(s).  Such an approach reduces the workload on the
21	   provisioning system and simplifies the data model that the
22	   provisioning system needs to maintain.  It is also more resilient to
23	   topology changes in server-network connectivity, for example,
24	   reconnecting a server to a different network port or switch.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on October 31, 2013.

43	Copyright Notice

45	   Copyright (c) 2013 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
61	     1.1.  VM Creation . . . . . . . . . . . . . . . . . . . . . . .   3
62	     1.2.  VM Live Migration . . . . . . . . . . . . . . . . . . . .   4
63	     1.3.  VM Termination  . . . . . . . . . . . . . . . . . . . . .   5
64	   2.  Acronyms Used . . . . . . . . . . . . . . . . . . . . . . . .   6
65	   3.  Virtual Networks  . . . . . . . . . . . . . . . . . . . . . .   7
66	     3.1.  Current Mode of Operation . . . . . . . . . . . . . . . .   8
67	     3.2.  Future Mode of Operation  . . . . . . . . . . . . . . . .   8
68	   4.  Provisioning DCVPNs . . . . . . . . . . . . . . . . . . . . .   9
69	   5.  Signaling . . . . . . . . . . . . . . . . . . . . . . . . . .   9
70	     5.1.  Preliminaries . . . . . . . . . . . . . . . . . . . . . .   9
71	     5.2.  VM Operations . . . . . . . . . . . . . . . . . . . . . .  10
72	       5.2.1.  Network Parameters  . . . . . . . . . . . . . . . . .  10
73	       5.2.2.  Creating a VM . . . . . . . . . . . . . . . . . . . .  12
74	       5.2.3.  Terminating a VM  . . . . . . . . . . . . . . . . . .  14
75	       5.2.4.  Migrating a VM  . . . . . . . . . . . . . . . . . . .  15
76	     5.3.  Signaling Protocols . . . . . . . . . . . . . . . . . . .  16
77	   6.  Interfacing with DCVPN Control Planes . . . . . . . . . . . .  16
78	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
79	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  17
80	   9.  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  17
81	   10. Informative References  . . . . . . . . . . . . . . . . . . .  17
82	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  18

84	1.  Introduction

86	   To create a Virtual Machine (VM) on a server in a data center, one
87	   must specify parameters for the compute, storage, network and
88	   appliance aspects of the VM.  At a minimum, this requires
89	   provisioning the server that will host the VM, and the Network
90	   Virtualization Edge (NVE) that will implement the virtual network for
91	   the VM in addition to the VM's storage.  Similar considerations apply
92	   to live migration and terminating VMs.  This document proposes
93	   mechanisms whereby a server can be provisioned with all of the
94	   parameters for the VM, and the server in turn signals the networking
95	   aspects to the NVE.  The NVE may be located on the server or in an
96	   external network switch that may be directly connected to the server
97	   or accessed via an L2 (Ethernet) LAN or VLAN.  The following sections
98	   capture the abstract sequence of steps for VM creation, live
99	   migration and deletion.

101	   While much of the material in this draft may apply to virtual
102	   entities other than virtual machines that exist on physical entities
103	   other than servers, this draft is written in terms of virtual
104	   machines and servers for clarity.

106	1.1.  VM Creation

108	   This section describes an abstract sequence of steps involved in
109	   creating a VM and making it operational (the latter is also known as
110	   "powering on" the VM).  The following steps are intended as an
111	   illustrative example, not as prescriptive text; the goal is to
112	   capture sufficient detail to set a context for the signaling
113	   described in Section 5.

115	   Creating a VM requires:

117	   1.  gathering the compute, network, storage, and appliance parameters
118	       required for the VM;

120	   2.  deciding which server, network, storage and network appliance
121	       devices best match the VM requirements in the current state of
122	       the data center;

124	   3.  provisioning the server with the VM parameters;

126	   4.  provisioning the network element(s) to which the server is
127	       connected with the network-related parameters of the VM;

129	   5.  informing the network element(s) to which the server is connected
130	       about the VM's peer VMs, storage devices and other network
131	       appliances with which the VM needs to communicate;

133	   6.  informing the network element(s) to which a VM's peer VMs are
134	       connected about the new VM and its addresses;

136	   7.  provisioning storage with the storage-related parameters; and

138	   8.  provisioning necessary network appliances (firewalls, load
139	       balancers and "middle boxes").

141	   Steps 1 and 2 are primarily information gathering.  For Steps 3 to 8,
142	   the provisioning system talks actively to servers, network switches,
143	   storage and appliances, and must know the details of the physical
144	   server, network, storage and appliance connectivity topologies.  Step
145	   4 is typically done using just provisioning, whereas Steps 5 and 6
146	   may be a combination of provisioning and other techniques that may
147	   defer discovery of the relevant information.  Steps 4 to 6 accomplish
148	   the task of provisioning the network for a VM, the result of which is
149	   a Data Center Virtual Private Network (DCVPN) overlaid on the
150	   physical network.

152	   While shown as a numbered sequence above, some of these steps may be
153	   concurrent (e.g., server, storage and network provisioning for the
154	   new VM may be done concurrently), and the two "informing" steps for
155	   the network (5 and 6) may be partially or fully lazily evaluated
156	   based on network traffic that the VM sends or receives after it
157	   becomes operational.

159	   This document focuses on the case where the network elements in Step
160	   4 are not co-resident with the server, and describes how the
161	   provisioning in Step 4 can be replaced by signaling between server
162	   and network, using information from Step 3.

164	1.2.  VM Live Migration

166	   This subsection describes an abstract sequence of steps involved in
167	   live migration of a VM.  Live migration is sometimes referred to as
168	   "hot" migration, in that from an external viewpoint, the VM appears
169	   to continue to run while being migrated to another server (e.g., TCP
170	   connections generally survive this class of migration).  In contrast,
171	   suspend/resume (or "cold") migration consists of suspending VM
172	   execution on one server and resuming it on another.  The following
173	   live migration steps are intended as an illustrative example, not as
174	   prescriptive text; the goal is to capture sufficient detail to
175	   provide context for the signaling described in Section 5.

177	   For simplicity, this set of abstract steps assumes shared storage, so
178	   that the VM's storage is accessible to the source and destination
179	   servers.  Live migration of a VM requires:

181	   1.   deciding which server should be the destination of the migration
182	        based on the VM's requirements, data center state and reason for
183	        the migration;

185	   2.   provisioning the destination server with the VM parameters and
186	        creating a VM to receive the live migration;

188	   3.   provisioning the network element(s) to which the destination
189	        server is connected with the network-related parameters of the
190	        VM;

192	   4.   transferring the VM's memory image between the source and
193	        destination servers;

195	   5.   actually moving the VM: pausing the VM's execution on the source
196	        server, transferring the VM's execution state and any remaining
197	        memory state to the destination server and continuing the VM's
198	        execution on the destination server;

200	   6.   informing the network element(s) to which the destination server
201	        is connected about the VM's peer VMs, storage devices and other
202	        network appliances with which the VM needs to communicate;

204	   7.   informing the network element(s) to which a VM's peer VMs are
205	        connected about the VM's new location;

207	   8.   activating the VM's network parameters at the destination
208	        server;

210	   9.   deprovisioning the VM from the network element(s) to which the
211	        source server is connected; and

213	   10.  deleting the VM from the source server.

215	   Step 1 is primarily information gathering.  For Steps 2, 3, 9 and 10,
216	   the provisioning system talks actively to servers, network switches
217	   and appliances, and must know the details of the physical server,
218	   network and appliance connectivity topologies.  Steps 4 and 5 are
219	   usually handled directly by the servers involved.  Steps 6 to 9 may
220	   be handled by the servers (e.g., one or more "gratuitous" ARPs or
221	   RARPs from the destination server may accomplish all four steps) or
222	   other techniques.  For steps 6 and 7, the other techniques may
223	   involve discovery of the relevant information after the VM has been
224	   migrated.

226	   While shown as a numbered sequence above, some of these steps may be
227	   concurrent (e.g., moving the VM and associated network changes), and
228	   the two "informing" steps (6 and 7) may be partially or fully lazily
229	   evaluated based on network traffic that the VM sends and/or receives
230	   after it is migrated to the destination server.

232	   This document focuses on the case where the network elements are not
233	   co-resident with the server, and shows how the provisioning in Step 3
234	   and the deprovisioning in Step 9 can be replaced by signaling between
235	   server and network, using information from Step 3.

237	1.3.  VM Termination
238	   This subsection describes an abstract sequence of steps involved in
239	   termination of a VM, also referred to as "powering off" a VM.  The
240	   following termination steps are intended as an illustrative example,
241	   not as prescriptive text; the goal is to capture sufficient detail to
242	   set a context for the signaling described in Section 5.

244	   Termination of a VM requires:

246	   1.  ensuring that the VM is no longer executing;

248	   2.  deprovisioning the VM from the network element(s) to which the
249	       server is connected; and

251	   3.  deleting the VM from the server (the VM's image may remain on
252	       storage for reuse).

254	   Steps 1 and 3 are handled by the server, based on instructions from
255	   the provisioning system.  For Step 2, the provisioning system talks
256	   actively to servers, network switches, storage and appliances, and
257	   must know the details of the physical server, network, storage and
258	   appliance connectivity topologies.

260	   While shown as a numbered sequence above, some of these steps may be
261	   concurrent (e.g., network deprovisioning and VM deletion).

263	   This document focuses on the case where the network elements in Step
264	   2 are not co-resident with the server, and shows how the
265	   deprovisioning in Step 3 can be replaced by signaling between server
266	   and network.

268	2.  Acronyms Used

270	   The following acronyms are used:

272	      DCVPN: Data Center Virtual Private Network -- a virtual
273	      connectivity topology overlaid on physical devices to provide
274	      virtual devices with the connectivity they need and isolation from
275	      other DCVPNs.  This corresponds to the concept of a Virtual
276	      Network Instance (VNI) in [I-D.ietf-nvo3-framework].

278	      NVE: Network Virtualization Edge -- the entities that realize
279	      private communication among VMs in a DCVPN

281	         l-NVE: local NVE: wrt a VM, NVE elements to which it is
282	         directly connected

284	         r-NVE: remote NVE: wrt a VM, NVE elements to which the VM's
285	         peer VMs are connected

287	      NVGRE: Network Virtualization using Generic Routing Encapsulation

289	      VDP: VSI Discovery and Configuration Protocol

291	      VID: 12-bit VLAN tag or identifier used locally between a server
292	      and its l-NVE

294	      VLAN: Virtual Local Area Network

296	      VM: Virtual Machine (same as Virtual Station)

298	      Peer VM: wrt a VM, other VMs in the VM's DCVPN

300	      VNID: DCVPN Identifier

302	      VSI: Virtual Station Interface

304	      VXLAN: Virtual eXtensible Local Area Network

306	3.  Virtual Networks

308	   The goal of provisioning a network for VMs is to create an "isolation
309	   domain" wherein a group of VMs can talk freely to each other, but
310	   communication to and from VMs outside that group is restricted
311	   (either prohibited, or mediated via a router, firewall or other
312	   network gateway).  Such an isolation domain, sometimes called a
313	   Closed User Group, here will be called a Data Center Virtual Private
314	   Network (DCVPN).  The network elements on the outer border or edge of
315	   the overlay portion of a Virtual Network are called Network
316	   Virtualization Edges (NVEs).

318	   A DCVPN is assigned a global "name" that identifies it in the
319	   management plane; this name is unique in the scope of the data
320	   center, but may be unique across several cooperating data centers.  A
321	   DCVPN is also assigned an identifier unique in the scope of the data
322	   center, the Virtual Network Group ID (VNID).  The VNID is a control
323	   plane entity.  A data plane tag is also needed to distinguish
324	   different DCVPNs' traffic; more on this later.

326	   For a given VM, the NVE can be classified into two parts: the network
327	   elements to which the VM's server is directly connected (the local
328	   NVE or l-NVE), and those to which peer VMs are connected (the remote
329	   NVE or r-NVE).  In some cases, the l-NVE is co-resident with the
330	   server hosting the VM; in other cases, the l-NVE is separate
331	   (distributed l-NVE).  The latter case is the one of primary interest
332	   in this document.

334	   A created VM is added to a DCVPN through Steps 4 to 6 in section
335	   Section 1.1 which can be recast as follows.  In Step 4, the l-NVE(s)
336	   are informed about the VM's VNID, network addresses and policies, and
337	   the l-NVE and server agree on how to distinguish traffic for
338	   different DCVPNs from and to the server.  In Step 5 the relevant
339	   r-NVE elements and the addresses of their VMs are discovered, and in
340	   Step 6, the r-NVE(s) are informed of the presence of the new VM and
341	   obtain or discover its addresses; for both steps 5 and 6, the
342	   discovery may be lazily evaluated so that it occurs after the VM
343	   begins sending and receiving DCVPN traffic.

345	   Once a DCVPN is created, the next steps for network provisioning are
346	   to create and apply policies such as for QoS or access control.
347	   These occur in three flavors: policies for all VMs in the group,
348	   policies for individual VMs, and policies for communication across
349	   DCVPN boundaries.

351	3.1.  Current Mode of Operation

353	   DCVPNs are often realized as Ethernet VLAN segments.  A VLAN segment
354	   satisfies the communication properties of a DCVPN.  A VLAN also has
355	   data plane mechanisms for discovering network elements (Layer 2
356	   switches, aka bridges) and VM addresses.  When a DCVPN is realized as
357	   a VLAN, Step 4 in section Section 1.1 requires provisioning both the
358	   server and l-NVE with the VLAN tag that identifies the DCVPN.  Step 6
359	   requires provisioning all involved network elements with the same
360	   VLAN tag.  Address learning is done by flooding, and the announcement
361	   of a new VM or the new location of a migrated VM is often via a
362	   "gratuitous" ARP or RARP.

364	   While VLANs are familiar and well-understood, they have scaling
365	   challenges because they are Layer 2 infrastructure.  The number of
366	   independent VLANs in a Layer 2 domain is limited by the 12-bit size
367	   of the VLAN tag.  In addition, data plane techniques (flooding and
368	   broadcast) are another source of scaling concerns as the overall size
369	   of the network grows.

371	3.2.  Future Mode of Operation

373	   There are multiple scalable realizations of DCVPNs that address the
374	   isolation requirements of DCVPNs as well as the need for a scalable
375	   substrate for DCVPNs and the need for scalable mechanisms for NVE and
376	   VM address discovery.  While describing these approaches beyond the
377	   scope of this document, a secondary goal of this document is to show
378	   how the signaling that replaces Step 4 in section Section 1.1 can
379	   seamlessly interact with realizations of DCVPNs.

381	   VLAN tags (VIDs) will be used as the data plane tag to distinguish
382	   traffic for different DCVPNs' between a server and its l-NVE.  Note
383	   that, as used here, VIDs only have local significance between server
384	   and NVE, and should not be confused with data-center-wide usage of
385	   VLANs.  If VLAN tags are used for traffic between NVEs, that tag
386	   usage depends on the encapsulation mechanism among the NVEs and is
387	   orthogonal to VLAN tag usage between servers and l-NVEs.

389	4.  Provisioning DCVPNs

391	   For VM creation as described in section Section 1.1, Step 3
392	   provisions the server; Steps 4 and 5 provision the l-NVE elements;
393	   Step 6 provisions the r-NVE elements.

395	   In some cases, the l-NVE is located within the server (e.g., a
396	   software-implemented switch within a hypervisor); in this case, Steps
397	   3 and 4 are "single-touch" in that the provisioning system need only
398	   talk to the server, as both compute and network parameters are
399	   applied by the server.  However, in other cases, the l-NVE is
400	   separate from the server, requiring that the provisioning system talk
401	   to both the server and l-NVE.  This scenario, which we call
402	   "distributed local NVE", is the one considered in this document.
403	   This draft's goal is to describe how "single-touch" provisioning can
404	   be achieved in the distributed l-NVE case.

406	   The overall approach is to provision the server, and have the server
407	   signal the requisite parameters to the l-NVE.  This approach reduces
408	   the workload on the provisioning system, allowing it to scale both in
409	   the number of elements it can manage, as well as the rate at which it
410	   can process changes.  It also simplifies the data model of the
411	   network that is used by the provisioning system, because a complete,
412	   up-to-date map of server to network connectivity is not required.
413	   This approach is also more resilient to server-network connectivity/
414	   topology changes that have not yet been transmitted to the
415	   provisioning system.  For example, if a server is reconnected to a
416	   different port or a different l-NVE to recover from a malfunctioning
417	   port, the server can contact the new l-NVE over the new port without
418	   the provisioning system needing to immediately be aware of the
419	   change.

421	   While this draft focuses on provisioning networking parameters via
422	   signaling, extensions may address the provisioning of storage and
423	   network appliance parameters in a similar fashion.

425	5.  Signaling

427	5.1.  Preliminaries
428	   This draft considers three common VM operations in a virtualized data
429	   center: creating a VM; migrating a VM from one physical server to
430	   another; and terminating a VM.  Creating a VM requires "associating"
431	   it with its DCVPN and "activating" that association; decommissioning
432	   a VM requires "dissociating" the VM from its DCVPN.  Moving a VM
433	   consists of associating it with its DCVPN in its new location,
434	   activating that association, and dissociating the VM from its old
435	   location.

437	5.2.  VM Operations

439	5.2.1.  Network Parameters

441	   For each VM association or dissociation operation, a subset of the
442	   following information is needed from server to l-NVE:

444	   operation:  one of associate or dissociate.

446	   authentication:  proof that this operation was authorized by the
447	      provisioning system

449	   VNID:  identifier of DCVPN to which VM belongs

451	   VID:  tag to use between server and l-NVE to distinguish DCVPN
452	      traffic; the value zero in an associate operation is a request
453	      that the l-NVE to assign an unused VID.  This approach provides
454	      extensibility by allowing the VID to be a VLAN-id, although other
455	      local means of multiplexing traffic between the server and the NVE
456	      could be used instead of VIDs.

458	   encapsulation type:  type of encapsulation used by the DCVPN for
459	      traffic exchanged between NVEs (see below).

461	   network addresses:  network addresses for VM on the server (e.g.,
462	      MACs)

464	   policy:  VM-specific and/or network-address-specific network
465	      policies, such as access control lists and/or QoS policies

467	   hold time:  time (in milliseconds) to keep a VM's addresses after it
468	      migrates away from this l-NVE.  This is usually set to zero when a
469	      VM is terminated.

471	   per-address-VID-allocation:  boolean flag which can optionally be set
472	      to "yes", resulting in the VID allocated to the this address being
473	      distinct from the VID allocated to other addresses (for the same
474	      VM or other VMs) connected to the same DCVPN on a same NVE port;
475	      this behavior will result in traffic always transiting through the
476	      NVE, even to/from other addresses for the same DCVPN on the same
477	      server.

479	   The "activate" operation is a dataplane operations that references a
480	   previously established association via the address and VID; all other
481	   parameters are obtained at the NVE by mapping the source address, VID
482	   and port involved to obtain information established by a prior
483	   associate operation.

485	   Realizations of DCVPNs include, E-VPNs ([I-D.ietf-l2vpn-evpn]), IP
486	   VPNs ([RFC4364]), NVGRE ([I-D.sridharan-virtualization-nvgre], VPLS
487	   ([RFC4761], [RFC4762]), and VXLAN
488	   ([I-D.mahalingam-dutt-dcops-vxlan]).  The encapsulation type
489	   determines whether forwarding at the NVE for the DCVPN is based on
490	   Layer 2 or Layer 3 service.

492	   Typically, for the associate messages, all of the above information
493	   except hold time would be needed.  Similarly, for the dissociate
494	   message, all of the above information except VID and encapsulation
495	   type would typically be needed.

497	   These operations are stateful in that their results remain in place
498	   until superseded by another operation.  For example, on receiving an
499	   associate message, an NVE is expected to create and maintain the
500	   DCVPN information for the addresses until the NVE receives a
501	   dissociate message to remove that information.  A separate liveness
502	   protocol may be run between server and NVE to let each side know that
503	   the other is still operational; if the liveness protocol fails, each
504	   side may remove state installed in response to messages from the
505	   other.

507	   The descriptions below generally assume that the NVEs participate in
508	   a mechanism for control plane distribution of VM addresses, as
509	   opposed to doing this in the data plane.  If this is not the case,
510	   NVE elements can lazily evaluate (via data plane discovery) the parts
511	   of the procedures below that involve address distribution.

513	   As VIDs are local to server-NVE communication, in fact to a specific
514	   port connecting these two elements, a mapping table containing
515	   4-tuples of the following form will prove useful to the NVE:

517	                   <VID, port, VNID, VM network address>

519	   The valid VID values are from 1 to 4094, inclusive.  A value of 0 is
520	   used to mean "unassigned".  When a VID can be shared by more than one
521	   VM, it is necessary to reference-count entries in this table; the
522	   list of addresses in an entry serves this purpose.  Entries in this
523	   table have multiple uses:

525	   o  Finding the VNID for a VID and port for association, activation
526	      and traffic forwarding;

528	   o  Determining whether a VID exists (has already been assigned) for a
529	      VNID and port.

531	   o  Determining which <VID, port> pairs to use for forwarding traffic
532	      that requires flooding on the DCVPN.

534	   For simplicity and clarity, this draft assumes that the network
535	   interfaces in VMs (vNICs) do not use VLAN tags.

537	5.2.2.  Creating a VM

539	   When a VM is instantiated on a server (powered on, e.g., after
540	   creation), each of the VM's interfaces is assigned a VNID, one or
541	   more network addresses and an encapsulation type for the DCVPN.  The
542	   VM addresses may be any of IPv4, IPv6 and MAC addresses.  There may
543	   also be network policies specific to the VM or its interfaces.  To
544	   connect the VM to its DCVPN, the server signals these parameters to
545	   the l-NVE via an "associate" operation followed by an "activate"
546	   operation to put the parameters into use.  (Note that the l-NVE may
547	   consist of more than one device.)

549	   On receiving an associate message on port P from server S, an NVE
550	   device does the following for each network address in that message:

552	   A.1:  Validate the authentication (if present).  If not, inform the
553	      provisioning system, log the error, and stop processing the
554	      associate message.  This validation may include authorization
555	      checks.

557	   A.2:  Check the per-address-VID-allocation flag in the associate
558	      message:

560	      *  if this flag is not set:

562	         +  Check if the VID in the associate message is zero (i.e., the
563	            associate message requests VID allocation); if so, look up
564	            the VID for <VNID, port, network address> ; if there is no
565	            current VID for that tuple, allocate a new VID

567	         +  If the VID in the associate message is non-zero, look up the
568	            VID for <VNID, port>.  If that lookup results in the same
569	            VID as the one in the associate message, associate that VID
570	            with <VNID, network address>.  If the lookup indicates that
571	            there is no current VID for that tuple, associate the VID in
572	            the associate message with <VNID, port, network address>.
573	            Otherwise, the VID in the associate message does not match
574	            the VID that is currently in use for <VNID, port>, so
575	            respond to S with an error, and stop processing the
576	            associate message.

578	      *  if this flag is set, check if the VID in the associate message
579	         is zero :

581	         +  if so, this is an allocation request, so allocate a new VID,
582	            distinct from other VIDs allocated on this port;

584	         +  if the VID is non-zero, check that the provided VID is
585	            distinct from other VIDs allocated on this port; if so,
586	            associate the VID with <VNID, port, network address>.  If
587	            not, the provided VID is already in used and hence cannot be
588	            dedicated to this network address, so respond to S with an
589	            error, and stop processing the associate message.

591	   A.3:  Add the <VID, port, VNID, network address> entry to the NVE's
592	      mapping table.  This table entry includes information about the
593	      DCVPN encapsulation type for the VNID.

595	   A.4:  Communicate with the control plane to advertise the network
596	      address, and (if the VNID is new to the NVE) also to get other
597	      network addresses in the DCVPN.  Populate the NVE's mapping table
598	      with all of these network addresses (some control planes may not
599	      provide all or even any of the other addresses in the DCVPN at
600	      this point).

602	   A.5:  Finally, respond to S with the VID for <VNID, port, network
603	      address>, and indicate that the operation was successful.

605	   After a successful associate, the network has been provisioned (at
606	   least in the local NVE) for traffic, but forwarding has not been
607	   enabled.  On receiving an activate message on port P from server S,
608	   an NVE device does the following (activate is a one-way message that
609	   does not have a response):

611	   B.1:  Validate the authentication (if present).  If not, inform the
612	      provisioning system, log the error, and stop processing the
613	      associate message.  This validation may include authorization
614	      checks.  The authentication and authorization may be implicit when
615	      the activate message is a dataplane frame (e.g., a "gratuitous"
616	      ARP or RARP).

618	   B.2:  Check if the VID in the activate message is zero.  If so, log
619	      the error, and stop processing the activate message.

621	   B.3:  Use the VID and port P to look up the VNID from a previous
622	      associate message.  If there is no mapping table state for that
623	      VID and port, log the error and stop processing the activate
624	      message.

626	   B.4:  If forwarding is not enabled for <VID, port, network address>
627	      activate it, mapping VID -> VNID on this port (P) for traffic sent
628	      to and received from r-NVEs.

630	   B.5:  If the activate message is a dataplane frame that requires
631	      forwarding beyond the NVE, (e.g., a "gratuitous" ARP or RARP), use
632	      the activated forwarding to send the frame onward via the virtual
633	      network identified by the VNID.

635	5.2.3.  Terminating a VM

637	   On receiving a request from the provisioning system to terminate
638	   execution of a VM (powering off the VM, whether or not the VM's image
639	   is retained on storage), the server sends a dissociate message to the
640	   l-NVE with the hold time set to zero.  The dissociate message
641	   contains the operation, authentication, VNID, encapsulation type, and
642	   VM addresses.  On receiving the dissociate message on port P from
643	   server S, each NVE device L does the following:

645	   D.1:  Validate the authentication (if present).  If not, inform the
646	      provisioning system, log the error, and stop processing the
647	      associate message.

649	   D.2:  Communicate with the control plane to withdraw the VM's
650	      addresses.  If the hold time is as non-zero, wait until the hold
651	      time expires before proceeding to the next step.

653	   D.3:  Delete the VM's addresses from the mapping table and delete any
654	      VM-specific network policies associated with any of the VM
655	      addresses.  If a mapping tuple contains no VM addresses as a
656	      result delete that tuple.  If the mapping table contains no
657	      entries for the VNID involved after deleting the tuple, optionally
658	      delete any network policies for the VNID.

660	   D.4:  Respond to S saying that the operation was successful.

662	   At step D.2, the control plane is responsible for not disrupting
663	   network operation if the addresses are in use at another l-NVE.
664	   Also, l-NVEs cannot rely on receiving dissociate messages for all
665	   terminated VMs, as a server crash may implicitly terminate a VM
666	   before a dissociate message can be sent.

668	5.2.4.  Migrating a VM

670	   Consider a VM that is being migrated from server S (connected to
671	   l-NVE device L) to server S' (connected to l-NVE device L').  This
672	   section assumes shared storage, so that both S and S' have access to
673	   the VM's storage.  The sequence of steps for a successful VM
674	   migration is:

676	   M.1:  S' gets a request to prepare to receive a copy of the VM from
677	      S.

679	   M.2:  S gets a request to copy the VM to S'.

681	   M.3:  The copy of the VM (memory, configuration state, etc.)  occurs
682	      while the VM continues to execute.

684	   M.4:  When that copy has made sufficient progress, S pauses the VM,
685	      and completes the copy, including the VM's execution state.

687	   M.5:  S' gets a request to resume the paused VM.

689	   M.6:  After that resume has succeeded, S then proceeds to terminate
690	      the paused VM on S, see section Section 5.2.3, but this operation
691	      may specify a non-zero hold time during which traffic received may
692	      be forwarded to the VM's new location.

694	   Steps M.1 and M.2 initiate the copy of the VM.  During step M.3, S'
695	   sends an "associate" message to L' for each of the VM's network
696	   addresses (S' receives information about these addresses as part of
697	   the VM copy).  Step M.4 occurs when the VM copy has made sufficient
698	   progress that the pause required to transfer the VM's execution from
699	   S to S' is sufficiently short.  At step M.4, or M.5 at the latest, S'
700	   sends an "activate" message to L' for each of the VM's interfaces.
701	   At Step M.6, S sends a "dissociate" message to L for each of the VM's
702	   network addresses, optionally with a non-zero hold time.

704	   From the DCVPN's view, there are two important overlaps in the
705	   apparent network location of the VM's addresses:

707	   o  The VM's addresses are associated with both L and L' between steps
708	      M.3 and M.6.

710	   o  The VM's addresses are activated at L' during step M.4 or step M.5
711	      at the latest (e.g., if activate is a dataplane operation based on
712	      traffic sent at that step); both of these typically occur before
713	      these addresses are dissociated at L during step M.6

715	   The DCVPN control plane must work correctly in the presence of these
716	   overlaps, and in particular must not:

718	   o  Fail to activate the VM's network addresses at L' because they
719	      have not yet been withdrawn at L, or

721	   o  Disruptively withdraw the VM's network addresses from use at step
722	      M.6 of a migration when the VM continues to execute on a different
723	      server.

725	   An additional scenario that is important for migration is that the
726	   source and destination servers, S and S', may share a common l-NVE,
727	   i.e., L and L' are the same.  In this scenario there is no need for
728	   remote interaction of that l-NVE with other NVEs, but that NVE must
729	   be aware of the possibility of a new association of the VM's
730	   addresses with a different port and the need to promptly activate
731	   them on that port even though they have not (yet) been dissociated
732	   from their original port.

734	5.3.  Signaling Protocols

736	   There are multiple protocols that can be used to signal the above
737	   messages.  One could invent a new protocol for this purpose, or reuse
738	   existing protocols, among them LLDP, XMPP, HTTP REST, and VDP [VDP],
739	   a new protocol standardized for the purposes of signaling a VM's
740	   network parameters from server to l-NVE.  Multiple factors influence
741	   the choice of protocol(s); this draft's focus is on what needs to be
742	   signaled, leaving choices of how the information is signaled, and
743	   specific encodings for other drafts to consider.

745	6.  Interfacing with DCVPN Control Planes

747	   The control plane for a DCVPN manages the creation/deletion,
748	   membership and span of the DCVPN ([I-D.ietf-nvo3-overlay-problem-
749	   statement],[I-D.kreeger-nvo3-overlay-cp]).  Such a control plane
750	   needs to work with the server-to-nve signaling in a coordinated
751	   manner, to ensure that address changes at a local NVE are reflected
752	   appropriately in remote NVEs.  The details of such coordination are
753	   specified in separate documents.

755	7.  Security Considerations
756	8.  IANA Considerations

758	9.  Acknowledgments

760	   Many thanks to Amit Shukla for his help with the details of EVB and
761	   his insight into data center issues.  Many thanks to members of the
762	   nvo3 WG for their comments, including Yingjie Gu.

764	10.  Informative References

766	   [I-D.ietf-l2vpn-evpn]
767	              Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F.,
768	              Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN",
769	              draft-ietf-l2vpn-evpn-03 (work in progress), February
770	              2013.

772	   [I-D.ietf-nvo3-framework]
773	              Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
774	              Rekhter, "Framework for DC Network Virtualization", draft-
775	              ietf-nvo3-framework-02 (work in progress), February 2013.

777	   [I-D.ietf-nvo3-overlay-problem-statement]
778	              Narten, T., Gray, E., Black, D., Dutt, D., Fang, L.,
779	              Kreeger, L., Napierala, M., and M. Sridharan, "Problem
780	              Statement: Overlays for Network Virtualization", draft-
781	              ietf-nvo3-overlay-problem-statement-02 (work in progress),
782	              February 2013.

784	   [I-D.kreeger-nvo3-overlay-cp]
785	              Kreeger, L., Dutt, D., Narten, T., and M. Sridharan,
786	              "Network Virtualization Overlay Control Protocol
787	              Requirements", draft-kreeger-nvo3-overlay-cp-02 (work in
788	              progress), October 2012.

790	   [I-D.mahalingam-dutt-dcops-vxlan]
791	              Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
792	              L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
793	              Framework for Overlaying Virtualized Layer 2 Networks over
794	              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-03
795	              (work in progress), February 2013.

797	   [I-D.sridharan-virtualization-nvgre]
798	              Sridharan, M., Greenberg, A., Venkataramaiah, N., Wang,
799	              Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P.,
800	              and C. Tumuluri, "NVGRE: Network Virtualization using
801	              Generic Routing Encapsulation", draft-sridharan-
802	              virtualization-nvgre-02 (work in progress), February 2013.

804	   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
805	              Networks (VPNs)", RFC 4364, February 2006.

807	   [RFC4761]  Kompella, K. and Y. Rekhter, "Virtual Private LAN Service
808	              (VPLS) Using BGP for Auto-Discovery and Signaling", RFC
809	              4761, January 2007.

811	   [RFC4762]  Lasserre, M. and V. Kompella, "Virtual Private LAN Service
812	              (VPLS) Using Label Distribution Protocol (LDP) Signaling",
813	              RFC 4762, January 2007.

815	   [VDP]      IEEE, "Edge Virtual Bridging (IEEE Std 802.1Qbg-2012)",
816	              July 2012.

818	Authors' Addresses

820	   Kireeti Kompella
821	   Juniper Networks
822	   1194 N. Mathilda Ave.
823	   Sunnyvale, CA  94089
824	   US

826	   Email: kireeti@juniper.net

828	   Yakov Rekhter
829	   Juniper Networks
830	   1194 N. Mathilda Ave.
831	   Sunnyvale, CA  94089
832	   US

834	   Email: yakov@juniper.net

836	   Thomas Morin
837	   France Telecom - Orange Labs
838	   2, avenue Pierre Marzin
839	   Lannion  22307
840	   France

842	   Email: thomas.morin@orange.com
843	   David L. Black
844	   EMC Corporation
845	   176 South St.
846	   Hopkinton, MA  01748

848	   Email: david.black@emc.com