idnits 2.17.1 

draft-ietf-nvo3-arch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 932 has weird spacing: '...xxxxxxx    xxx...'

  -- The document date (December 17, 2013) is 3776 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-03) exists of
     draft-ietf-nvo3-dataplane-requirements-02

  == Outdated reference: A later version (-09) exists of
     draft-ietf-nvo3-framework-04

  == Outdated reference: A later version (-05) exists of
     draft-ietf-nvo3-nve-nva-cp-req-01

  == Outdated reference: A later version (-09) exists of
     draft-mahalingam-dutt-dcops-vxlan-06

  == Outdated reference: A later version (-08) exists of
     draft-sridharan-virtualization-nvgre-03


     Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force                                 D. Black
3	Internet-Draft                                                       EMC
4	Intended status: Informational                                 J. Hudson
5	Expires: June 20, 2014                                           Brocade
6	                                                              L. Kreeger
7	                                                                   Cisco
8	                                                             M. Lasserre
9	                                                          Alcatel-Lucent
10	                                                               T. Narten
11	                                                                     IBM
12	                                                       December 17, 2013

14	              An Architecture for Overlay Networks (NVO3)
15	                        draft-ietf-nvo3-arch-00

17	Abstract

19	   This document presents a high-level overview architecture for
20	   building overlay networks in NVO3.  The architecture is given at a
21	   high-level, showing the major components of an overall system.  An
22	   important goal is to divide the space into individual smaller
23	   components that can be implemented independently and with clear
24	   interfaces and interactions with other components.  It should be
25	   possible to build and implement individual components in isolation
26	   and have them work with other components with no changes to other
27	   components.  That way implementers have flexibility in implementing
28	   individual components and can optimize and innovate within their
29	   respective components without requiring changes to other components.

31	Status of This Memo

33	   This Internet-Draft is submitted in full conformance with the
34	   provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF).  Note that other groups may also distribute
38	   working documents as Internet-Drafts.  The list of current Internet-
39	   Drafts is at http://datatracker.ietf.org/drafts/current/.

41	   Internet-Drafts are draft documents valid for a maximum of six months
42	   and may be updated, replaced, or obsoleted by other documents at any
43	   time.  It is inappropriate to use Internet-Drafts as reference
44	   material or to cite them other than as "work in progress."

46	   This Internet-Draft will expire on June 20, 2014.

48	Copyright Notice

50	   Copyright (c) 2013 IETF Trust and the persons identified as the
51	   document authors.  All rights reserved.

53	   This document is subject to BCP 78 and the IETF Trust's Legal
54	   Provisions Relating to IETF Documents
55	   (http://trustee.ietf.org/license-info) in effect on the date of
56	   publication of this document.  Please review these documents
57	   carefully, as they describe your rights and restrictions with respect
58	   to this document.  Code Components extracted from this document must
59	   include Simplified BSD License text as described in Section 4.e of
60	   the Trust Legal Provisions and are provided without warranty as
61	   described in the Simplified BSD License.

63	Table of Contents

65	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
66	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
67	   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . .   4
68	     3.1.  VN Service (L2 and L3)  . . . . . . . . . . . . . . . . .   5
69	     3.2.  Network Virtualization Edge (NVE) . . . . . . . . . . . .   6
70	     3.3.  Network Virtualization Authority (NVA)  . . . . . . . . .   8
71	     3.4.  VM Orchestration Systems  . . . . . . . . . . . . . . . .   8
72	   4.  Network Virtualization Edge (NVE) . . . . . . . . . . . . . .   9
73	     4.1.  NVE Co-located With Server Hypervisor . . . . . . . . . .  10
74	     4.2.  Split-NVE . . . . . . . . . . . . . . . . . . . . . . . .  10
75	     4.3.  NVE State . . . . . . . . . . . . . . . . . . . . . . . .  11
76	   5.  Tenant System Types . . . . . . . . . . . . . . . . . . . . .  12
77	     5.1.  Overlay-Aware Network Service Appliances  . . . . . . . .  12
78	     5.2.  Bare Metal Servers  . . . . . . . . . . . . . . . . . . .  12
79	     5.3.  Gateways  . . . . . . . . . . . . . . . . . . . . . . . .  13
80	     5.4.  Distributed Gateways  . . . . . . . . . . . . . . . . . .  13
81	   6.  Network Virtualization Authority  . . . . . . . . . . . . . .  14
82	     6.1.  How an NVA Obtains Information  . . . . . . . . . . . . .  14
83	     6.2.  Internal NVA Architecture . . . . . . . . . . . . . . . .  15
84	     6.3.  NVA External Interface  . . . . . . . . . . . . . . . . .  15
85	   7.  NVE-to-NVA Protocol . . . . . . . . . . . . . . . . . . . . .  17
86	     7.1.  NVE-NVA Interaction Models  . . . . . . . . . . . . . . .  17
87	     7.2.  Direct NVE-NVA Protocol . . . . . . . . . . . . . . . . .  18
88	     7.3.  Propagating Information Between NVEs and NVAs . . . . . .  19
89	   8.  Federated NVAs  . . . . . . . . . . . . . . . . . . . . . . .  20
90	     8.1.  Inter-NVA Peering . . . . . . . . . . . . . . . . . . . .  22
91	   9.  Control Protocol Work Areas . . . . . . . . . . . . . . . . .  23
92	   10. NVO3 Data Plane Encapsulation . . . . . . . . . . . . . . . .  23
93	   11. Operations and Management . . . . . . . . . . . . . . . . . .  24
94	   12. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .  24
95	   13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .  24
96	   14. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  24
97	   15. Security Considerations . . . . . . . . . . . . . . . . . . .  24
98	   16. Informative References  . . . . . . . . . . . . . . . . . . .  24
99	   Appendix A.  Change Log . . . . . . . . . . . . . . . . . . . . .  26
100	     A.1.  Changes From draft-narten-nvo3 to draft-ietf-nvo3 . . . .  26
101	     A.2.  Changes From -00 to -01 (of draft-narten-nvo3-arch) . . .  26
102	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  26

104	1.  Introduction

106	   This document presents a high-level architecture for building overlay
107	   networks in NVO3.  The architecture is given at a high-level, showing
108	   the major components of an overall system.  An important goal is to
109	   divide the space into smaller individual components that can be
110	   implemented independently and with clear interfaces and interactions
111	   with other components.  It should be possible to build and implement
112	   individual components in isolation and have them work with other
113	   components with no changes to other components.  That way
114	   implementers have flexibility in implementing individual components
115	   and can optimize and innovate within their respective components
116	   without necessarily requiring changes to other components.

118	   The motivation for overlay networks is given in
119	   [I-D.ietf-nvo3-overlay-problem-statement].  "Framework for DC Network
120	   Virtualization" [I-D.ietf-nvo3-framework] provides a framework for
121	   discussing overlay networks generally and the various components that
122	   must work together in building such systems.  This document differs
123	   from the framework document in that it doesn't attempt to cover all
124	   possible approaches within the general design space.  Rather, it
125	   describes one particular approach.

127	   This document is intended to be a concrete strawman that can be used
128	   for discussion within the IETF NVO3 WG on what the NVO3 architecture
129	   should look like.

131	2.  Terminology

133	   This document uses the same terminology as [I-D.ietf-nvo3-framework].
134	   In addition, the following terms are used:

136	   NV Domain  A Network Virtualization Domain is an administrative
137	      construct that defines a Network Virtualization Authority (NVA),
138	      the set of Network Virtualization Edges (NVEs) associated with
139	      that NVA, and the set of virtual networks the NVA manages and
140	      supports.  NVEs are associated with a (logically centralized) NVA,
141	      and an NVE supports communication for any of the virtual networks
142	      in the domain.

144	   NV Region  A region over which information about a set of virtual
145	      networks is shared.  The degenerate case of a single NV Domain
146	      corresponds to an NV region corresponding to that domain.  The
147	      more interesting case occurs when two or more NV Domains share
148	      information about part or all of a set of virtual networks that
149	      they manage.  Two NVAs share information about particular virtual
150	      networks for the purpose of supporting connectivity between
151	      tenants located in different NVA Domains.  NVAs can share
152	      information about an entire NV domain, or just individual virtual
153	      networks.

155	   Tenant System Identifier (TSI)  Interface to a Virtual Network as
156	      presented to a Tenant System.  The TSI logically connects to the
157	      NVE via a Virtual Access Point (VAP).  To the Tenant System, the
158	      TSI is like a NIC; the TSI presents itself to a Tenant System as a
159	      normal network interface.

161	3.  Background

163	   Overlay networks are an approach for providing network virtualization
164	   services to a set of Tenant Systems (TSs) [I-D.ietf-nvo3-framework].
165	   With overlays, data traffic between tenants is tunneled across the
166	   underlying data center's IP network.  The use of tunnels provides a
167	   number of benefits by decoupling the network as viewed by tenants
168	   from the underlying physical network across which they communicate.

170	   Tenant Systems connect to Virtual Networks (VNs), with each VN having
171	   associated attributes defining properties of the network, such as the
172	   set of members that connect to it.  Tenant Systems connected to a
173	   virtual network typically communicate freely with other Tenant
174	   Systems on the same VN, but communication between Tenant Systems on
175	   one VN and those external to the VN (whether on another VN or
176	   connected to the Internet) is carefully controlled and governed by
177	   policy.

179	   A Network Virtualization Edge (NVE) [I-D.ietf-nvo3-framework] is the
180	   entity that implements the overlay functionality.  An NVE resides at
181	   the boundary between a Tenant System and the overlay network as shown
182	   in Figure 1.  An NVE creates and maintains local state about each
183	   Virtual Network for which it is providing service on behalf of a
184	   Tenant System.

186	       +--------+                                             +--------+
187	       | Tenant +--+                                     +----| Tenant |
188	       | System |  |                                    (')   | System |
189	       +--------+  |          ................         (   )  +--------+
190	                   |  +-+--+  .              .  +--+-+  (_)
191	                   |  | NVE|--.              .--| NVE|   |
192	                   +--|    |  .              .  |    |---+
193	                      +-+--+  .              .   +--+-+
194	                      /       .              .
195	                     /        .  L3 Overlay  .   +--+-++--------+
196	       +--------+   /         .    Network   .   | NVE|| Tenant |
197	       | Tenant +--+          .              .- -|    || System |
198	       | System |             .              .   +--+-++--------+
199	       +--------+             ................
200	                                     |
201	                                   +----+
202	                                   | NVE|
203	                                   |    |
204	                                   +----+
205	                                     |
206	                                     |
207	                           =====================
208	                             |               |
209	                         +--------+      +--------+
210	                         | Tenant |      | Tenant |
211	                         | System |      | System |
212	                         +--------+      +--------+

214	   The dotted line indicates a network connection (i.e., IP).

216	                  Figure 1: NVO3 Generic Reference Model

218	   The following subsections describe key aspects of an overlay system
219	   in more detail.  Section 3.1 describes the service model (Ethernet
220	   vs. IP) provided to Tenant Systems.  Section 3.2 describes NVEs in
221	   more detail.  Section 3.3 introduces the Network Virtualization
222	   Authority, from which NVEs obtain information about virtual networks.
223	   Section 3.4 provides background on VM orchestration systems and their
224	   use of virtual networks.

226	3.1.  VN Service (L2 and L3)

228	   A Virtual Network provides either L2 or L3 service to connected
229	   tenants.  For L2 service, VNs transport Ethernet frames, and a Tenant
230	   System is provided with a service that is analogous to being
231	   connected to a specific L2 C-VLAN.  L2 broadcast frames are delivered
232	   to all (and multicast frames delivered to a subset of) the other
233	   Tenant Systems on the VN.  To a Tenant System, it appears as if they
234	   are connected to a regular L2 Ethernet link.  Within NVO3, tenant
235	   frames are tunneled to remote NVEs based on the MAC addresses of the
236	   frame headers as originated by the Tenant System.  On the underlay,
237	   NVO3 packets are forwarded between NVEs based on the outer addresses
238	   of tunneled packets.

240	   For L3 service, VNs transport IP datagrams, and a Tenant System is
241	   provided with a service that only supports IP traffic.  Within NVO3,
242	   tenant frames are tunneled to remote NVEs based on the IP addresses
243	   of the packet originated by the Tenant System; any L2 destination
244	   addresses provided by Tenant Systems are effectively ignored.

246	   L2 service is intended for systems that need native L2 Ethernet
247	   service and the ability to run protocols directly over Ethernet
248	   (i.e., not based on IP).  L3 service is intended for systems in which
249	   all the traffic can safely be assumed to be IP.  It is important to
250	   note that whether NVO3 provides L2 or L3 service to a Tenant System,
251	   the Tenant System does not generally need to be aware of the
252	   distinction.  In both cases, the virtual network presents itself to
253	   the Tenant System as an L2 Ethernet interface.  An Ethernet interface
254	   is used in both cases simply as a widely supported interface type
255	   that essentially all Tenant Systems already support.  Consequently,
256	   no special software is needed on Tenant Systems to use an L3 vs. an
257	   L2 overlay service.

259	3.2.  Network Virtualization Edge (NVE)

261	   Tenant Systems connect to NVEs via a Tenant System Interface (TSI).
262	   The TSI logically connects to the NVE via a Virtual Access Point
263	   (VAP) as shown in Figure 2.  To the Tenant System, the TSI is like a
264	   NIC; the TSI presents itself to a Tenant System as a normal network
265	   interface.  On the NVE side, a VAP is a logical network port (virtual
266	   or physical) into a specific virtual network.  Note that two
267	   different Tenant Systems (and TSIs) attached to a common NVE can
268	   share a VAP (e.g., TS1 and TS2 in Figure 2) so long as they connect
269	   to the same Virtual Network.

271	                    |         Data Center Network (IP)        |
272	                    |                                         |
273	                    +-----------------------------------------+
274	                         |                           |
275	                         |       Tunnel Overlay      |
276	            +------------+---------+       +---------+------------+
277	            | +----------+-------+ |       | +-------+----------+ |
278	            | |  Overlay Module  | |       | |  Overlay Module  | |
279	            | +---------+--------+ |       | +---------+--------+ |
280	            |           |          |       |           |          |
281	     NVE1   |           |          |       |           |          | NVE2
282	            |  +--------+-------+  |       |  +--------+-------+  |
283	            |  | |VNI1|     |VNI2| |       |  | |VNI1|    |VNI2|  |
284	            |  +-+----------+---+  |       |  +-+-----------+--+  |
285	            |    | VAP1     | VAP2 |       |    | VAP1      | VAP2|
286	            +----+------------+----+       +----+-----------+ ----+
287	                 |          |                   |           |
288	                 |\         |                   |           |
289	                 | \        |                   |          /|
290	          -------+--\-------+-------------------+---------/-+-------
291	                 |   \      |     Tenant        |        /  |
292	            TSI1 |TSI2\     | TSI3            TSI1  TSI2/   TSI3
293	                +---+ +---+ +---+             +---+ +---+   +---+
294	                |TS1| |TS2| |TS3|             |TS4| |TS5|   |TS6|
295	                +---+ +---+ +---+             +---+ +---+   +---+

297	                       Figure 2: NVE Reference Model

299	   The Overlay Module performs the actual encapsulation and
300	   decapsulation of tunneled packets.  The NVE maintains state about the
301	   virtual networks it is a part of so that it can provide the Overlay
302	   Module with such information as the destination address of the NVE to
303	   tunnel a packet to, or the Context ID that should be placed in the
304	   encapsulation header to identify the virtual network a tunneled
305	   packet belong to.

307	   On the data center network side, the NVE sends and receives native IP
308	   traffic.  When ingressing traffic from a Tenant System, the NVE
309	   identifies the egress NVE to which the packet should be sent, adds an
310	   overlay encapsulation header, and sends the packet on the underlay
311	   network.  When receiving traffic from a remote NVE, an NVE strips off
312	   the encapsulation header, and delivers the (original) packet to the
313	   appropriate Tenant System.

315	   Conceptually, the NVE is a single entity implementing the NVO3
316	   functionality.  In practice, there are a number of different
317	   implementation scenarios, as described in detail in Section 4.

319	3.3.  Network Virtualization Authority (NVA)

321	   Address dissemination refers to the process of learning, building and
322	   distributing the mapping/forwarding information that NVEs need in
323	   order to tunnel traffic to each other on behalf of communicating
324	   Tenant Systems.  For example, in order to send traffic to a remote
325	   Tenant System, the sending NVE must know the destination NVE for that
326	   Tenant System.

328	   One way to build and maintain mapping tables is to use learning, as
329	   802.1 bridges do [IEEE-802.1Q].  When forwarding traffic to multicast
330	   or unknown unicast destinations, an NVE could simply flood traffic
331	   everywhere.  While flooding works, it can lead to traffic hot spots
332	   and can lead to problems in larger networks.

334	   Alternatively, NVEs can make use of a Network Virtualization
335	   Authority (NVA).  An NVA is the entity that provides address mapping
336	   and other information to NVEs.  NVEs interact with an NVA to obtain
337	   any required address mapping information they need in order to
338	   properly forward traffic on behalf of tenants.  The term NVA refers
339	   to the overall system, without regards to its scope or how it is
340	   implemented.  NVAs provide a service, and NVEs access that service
341	   via an NVE-to-NVA protocol.

343	   Even when an NVA is present, learning could be used as a fallback
344	   mechanism, should the NVA be unable to provide an answer or for other
345	   reasons.  This document does not consider flooding approaches in
346	   detail, as there are a number of benefits in using an approach that
347	   depends on the presence of an NVA.

349	   NVAs are discussed in more detail in Section 6.

351	3.4.  VM Orchestration Systems

353	   VM Orchestration systems manage server virtualization across a set of
354	   servers.  Although VM management is a separate topic from network
355	   virtualization, the two areas are closely related.  Managing the
356	   creation, placement, and movements of VMs also involves creating,
357	   attaching to and detaching from virtual networks.  A number of
358	   existing VM orchestration systems have incorporated aspects of
359	   virtual network management into their systems.

361	   When a new VM image is started, the VM Orchestration system
362	   determines where the VM should be placed, interacts with the
363	   hypervisor on the target server to load and start the server and
364	   controls when a VM should be shutdown or migrated elsewhere.  VM
365	   Orchestration systems also have knowledge about how a VM should
366	   connect to a network, possibly including the name of the virtual
367	   network to which a VM is to connect.  The VM orchestration system can
368	   pass such information to the hypervisor when a VM is instantiated.
369	   VM orchestration systems have significant (and sometimes global)
370	   knowledge over the domain they manage.  They typically know on what
371	   servers a VM is running, and meta data associated with VM images can
372	   be useful from a network virtualization perspective.  For example,
373	   the meta data may include the addresses (MAC and IP) the VMs will use
374	   and the name(s) of the virtual network(s) they connect to.

376	   VM orchestration systems run a protocol with an agent running on the
377	   hypervisor of the servers they manage.  That protocol can also carry
378	   information about what virtual network a VM is associated with.  When
379	   the orchestrator instantiates a VM on a hypervisor, the hypervisor
380	   interacts with the NVE in order to attach the VM to the virtual
381	   networks it has access to.  In general, the hypervisor will need to
382	   communicate significant VM state changes to the NVE.  In the reverse
383	   direction, the NVE may need to communicate network connectivity
384	   information back to the hypervisor.  Example VM orchestration systems
385	   in use today include VMware's vCenter Server or Microsoft's System
386	   Center Virtual Machine Manager.  Both can pass information about what
387	   virtual networks a VM connects to down to the hypervisor.  The
388	   protocol used between the VM orchestration system and hypervisors is
389	   generally proprietary.

391	   It should be noted that VM orchestration systems may not have direct
392	   access to all networking related information a VM uses.  For example,
393	   a VM may make use of additional IP or MAC addresses that the VM
394	   management system is not aware of.

396	4.  Network Virtualization Edge (NVE)

398	   As introduced in Section 3.2 an NVE is the entity that implements the
399	   overlay functionality.  This section describes NVEs in more detail.
400	   An NVE will have two external interfaces:

402	   Tenant Facing:  On the tenant facing side, an NVE interacts with the
403	      hypervisor (or equivalent entity) to provide the NVO3 service.  An
404	      NVE will need to be notified when a Tenant System "attaches" to a
405	      virtual network (so it can validate the request and set up any
406	      state needed to send and receive traffic on behalf of the Tenant
407	      System on that VN).  Likewise, an NVE will need to be informed
408	      when the Tenant System "detaches" from the virtual network so that
409	      it can reclaim state and resources appropriately.

411	   DCN Facing:  On the data center network facing side, an NVE
412	      interfaces with the data center underlay network, sending and
413	      receiving tunneled IP packets to and from the underlay.  The NVE
414	      may also run a control protocol with other entities on the
415	      network, such as the Network Virtualization Authority.

417	4.1.  NVE Co-located With Server Hypervisor

419	   When server virtualization is used, the entire NVE functionality will
420	   typically be implemented as part of the hypervisor and/or virtual
421	   switch on the server.  In such cases, the Tenant System interacts
422	   with the hypervisor and the hypervisor interacts with the NVE.
423	   Because the interaction between the hypervisor and NVE is implemented
424	   entirely in software on the server, there is no "on-the-wire"
425	   protocol between Tenant Systems (or the hypervisor) and the NVE that
426	   needs to be standardized.  While there may be APIs between the NVE
427	   and hypervisor to support necessary interaction, the details of such
428	   an API are not in-scope for the IETF to work on.

430	   Implementing NVE functionality entirely on a server has the
431	   disadvantage that server CPU resources must be spent implementing the
432	   NVO3 functionality.  Experimentation with overlay approaches and
433	   previous experience with TCP and checksum adapter offloads suggests
434	   that offloading certain NVE operations (e.g., encapsulation and
435	   decapsulation operations) onto the physical network adaptor can
436	   produce performance improvements.  As has been done with checksum and
437	   /or TCP server offload and other optimization approaches, there may
438	   be benefits to offloading common operations onto adaptors where
439	   possible.  Just as important, the addition of an overlay header can
440	   disable existing adaptor offload capabilities that are generally not
441	   prepared to handle the addition of a new header or other operations
442	   associated with an NVE.

444	   While the details of how to split the implementation of specific NVE
445	   functionality between a server and its network adaptors is outside
446	   the scope of IETF standardization, the NVO3 architecture should
447	   support such separation.  Ideally, it may even be possible to bypass
448	   the hypervisor completely on critical data path operations so that
449	   packets between a TS and its VN can be sent and received without
450	   having the hypervisor involved in each individual packet operation.

452	4.2.  Split-NVE

454	   Another possible scenario leads to the need for a split NVE
455	   implementation.  A hypervisor running on a server could be aware that
456	   NVO3 is in use, but have some of the actual NVO3 functionality
457	   implemented on an adjacent switch to which the server is attached.
458	   While one could imagine a number of link types between a server and
459	   the NVE, the simplest deployment scenario would involve a server and
460	   NVE separated by a simple L2 Ethernet link, across which LLDP runs.
461	   A more complicated scenario would have the server and NVE separated
462	   by a bridged access network, such as when the NVE resides on a ToR,
463	   with an embedded switch residing between servers and the ToR.

465	   While the above talks about a scenario involving a hypervisor, it
466	   should be noted that the same scenario can apply to Network Service
467	   Appliances as discussed in Section 5.1.  In general, when this
468	   document discusses the interaction between a hypervisor and NVE, the
469	   discussion applies to Network Service Appliances as well.

471	   For the split NVE case, protocols will be needed that allow the
472	   hypervisor and NVE to negotiate and setup the necessary state so that
473	   traffic sent across the access link between a server and the NVE can
474	   be associated with the correct virtual network instance.
475	   Specifically, on the access link, traffic belonging to a specific
476	   Tenant System would be tagged with a specific VLAN C-TAG that
477	   identifies which specific NVO3 virtual network instance it belongs
478	   to.  The hypervisor-NVE protocol would negotiate which VLAN C-TAG to
479	   use for a particular virtual network instance.  More details of the
480	   protocol requirements for functionality between hypervisors and NVEs
481	   can be found in [I-D.kreeger-nvo3-hypervisor-nve-cp].

483	4.3.  NVE State

485	   NVEs maintain internal data structures and state to support the
486	   sending and receiving of tenant traffic.  An NVE may need some or all
487	   of the following information:

489	   1.  An NVE keeps track of which attached Tenant Systems are connected
490	       to which virtual networks.  When a Tenant System attaches to a
491	       virtual network, the NVE will need to create or update local
492	       state for that virtual network.  When the last Tenant System
493	       detaches from a given VN, the NVE can reclaim state associated
494	       with that VN.

496	   2.  For tenant unicast traffic, an NVE maintains a per-VN table of
497	       mappings from Tenant System (inner) addresses to remote NVE
498	       (outer) addresses.

500	   3.  For tenant multicast (or broadcast) traffic, an NVE maintains a
501	       per-VN table of mappings and other information on how to deliver
502	       multicast (or broadcast) traffic.  If the underlying network
503	       supports IP multicast, the NVE could use IP multicast to deliver
504	       tenant traffic.  In such a case, the NVE would need to know what
505	       IP underlay multicast address to use for a given VN.
506	       Alternatively, if the underlying network does not support
507	       multicast, an NVE could use serial unicast to deliver traffic.
508	       In such a case, an NVE would need to know which destinations are
509	       subscribers to the tenant multicast group.  An NVE could use both
510	       approaches, switching from one mode to the other depending on
511	       such factors as bandwidth efficiency and group membership
512	       sparseness.

514	   4.  An NVE maintains necessary information to encapsulate outgoing
515	       traffic, including what type of encapsulation and what value to
516	       use for a Context ID within the encapsulation header.

518	   5.  In order to deliver incoming encapsulated packets to the correct
519	       Tenant Systems, an NVE maintains the necessary information to map
520	       incoming traffic to the appropriate VAP and Tenant System.

522	   6.  An NVE may find it convenient to maintain additional per-VN
523	       information such as QoS settings, Path MTU information, ACLs,
524	       etc.

526	5.  Tenant System Types

528	   This section describes a number of special Tenant System types and
529	   how they fit into an NVO3 system.

531	5.1.  Overlay-Aware Network Service Appliances

533	   Some Network Service Appliances [I-D.ietf-nvo3-nve-nva-cp-req]
534	   (virtual or physical) provide tenant-aware services.  That is, the
535	   specific service they provide depends on the identity of the tenant
536	   making use of the service.  For example, firewalls are now becoming
537	   available that support multi-tenancy where a single firewall provides
538	   virtual firewall service on a per-tenant basis, using per-tenant
539	   configuration rules and maintaining per-tenant state.  Such
540	   appliances will be aware of the VN an activity corresponds to while
541	   processing requests.  Unlike server virtualization, which shields VMs
542	   from needing to know about multi-tenancy, a Network Service Appliance
543	   explicitly supports multi-tenancy.  In such cases, the Network
544	   Service Appliance itself will be aware of network virtualization and
545	   either embed an NVE directly, or implement a split NVE as described
546	   in Section 4.2.  Unlike server virtualization, however, the Network
547	   Service Appliance will not be running a traditional hypervisor and
548	   the VM Orchestration system may not interact with the Network Service
549	   Appliance.  The NVE on such appliances will need to support a control
550	   plane to obtain the necessary information needed to fully participate
551	   in an NVO3 Domain.

553	5.2.  Bare Metal Servers
554	   Many data centers will continue to have at least some servers
555	   operating as non-virtualized (or "bare metal") machines running a
556	   traditional operating system and workload.  In such systems, there
557	   will be no NVE functionality on the server, and the server will have
558	   no knowledge of NVO3 (including whether overlays are even in use).
559	   In such environments, the NVE functionality can reside on the first-
560	   hop physical switch.  In such a case, the network administrator would
561	   (manually) configure the switch to enable the appropriate NVO3
562	   functionality on the switch port connecting the server and associate
563	   that port with a specific virtual network.  Such configuration would
564	   typically be static, since the server is not virtualized, and once
565	   configured, is unlikely to change frequently.  Consequently, this
566	   scenario does not require any protocol or standards work.

568	5.3.  Gateways

570	   Gateways on VNs relay traffic onto and off of a virtual network.
571	   Tenant Systems use gateways to reach destinations outside of the
572	   local VN.  Gateways receive encapsulated traffic from one VN, remove
573	   the encapsulation header, and send the native packet out onto the
574	   data center network for delivery.  Outside traffic enters a VN in a
575	   reverse manner.

577	   Gateways can be either virtual (i.e., implemented as a VM) or
578	   physical (i.e., as a standalone physical device).  For performance
579	   reasons, standalone hardware gateways may be desirable in some cases.
580	   Such gateways could consist of a simple switch forwarding traffic
581	   from a VN onto the local data center network, or could embed router
582	   functionality.  On such gateways, network interfaces connecting to
583	   virtual networks will (at least conceptually) embed NVE (or split-
584	   NVE) functionality within them.  As in the case with Network Service
585	   Appliances, gateways will not support a hypervisor and will need an
586	   appropriate control plane protocol to obtain the information needed
587	   to provide NVO3 service.

589	   Gateways handle several different use cases.  For example, a virtual
590	   network could consist of systems supporting overlays together with
591	   legacy Tenant Systems that do not.  Gateways could be used to connect
592	   legacy systems supporting, e.g., L2 VLANs, to specific virtual
593	   networks, effectively making them part of the same virtual network.
594	   Gateways could also forward traffic between a virtual network and
595	   other hosts on the data center network or relay traffic between
596	   different VNs.  Finally, gateways can provide external connectivity
597	   such as Internet or VPN access.

599	5.4.  Distributed Gateways
600	   The relaying of traffic from one VN to another deserves special
601	   consideration.  The previous section described gateways performing
602	   this function.  If such gateways are centralized, traffic between
603	   TSes on different VNs can take suboptimal paths, i.e., triangular
604	   routing results in paths that always traverse the gateway.  As an
605	   optimization, individual NVEs can be part of a distributed gateway
606	   that performs such relaying, reducing or completely eliminating
607	   triangular routing.  In a distributed gateway, each ingress NVE can
608	   perform such relaying activity directly, so long as it has access to
609	   the policy information needed to determine whether cross-VN
610	   communication is allowed.  Having individual NVEs be part of a
611	   distributed gateway allows them to tunnel traffic directly to the
612	   destination NVE without the need to take suboptimal paths.

614	   The NVO3 architecture should [must? or just say it does?] support
615	   distributed gateways.  Such support requires that NVO3 control
616	   protocols include mechanisms for the maintenance and distribution of
617	   policy information about what type of cross-VN communication is
618	   allowed so that NVEs acting as distributed gateways can tunnel
619	   traffic from one VN to another as appropriate.

621	6.  Network Virtualization Authority

623	   Before sending to and receiving traffic from a virtual network, an
624	   NVE must obtain the information needed to build its internal
625	   forwarding tables and state as listed in Section 4.3.  An NVE obtains
626	   such information from a Network Virtualization Authority.

628	   The Network Virtualization Authority (NVA) is the entity that
629	   provides address mapping and other information to NVEs.  NVEs
630	   interact with an NVA to obtain any required information they need in
631	   order to properly forward traffic on behalf of tenants.  The term NVA
632	   refers to the overall system, without regards to its scope or how it
633	   is implemented.

635	6.1.  How an NVA Obtains Information

637	   There are two primary ways in which an NVA can obtain the address
638	   dissemination information it manages.  The NVA can obtain information
639	   either from the VM orchestration system, or directly from the NVEs
640	   themselves.

642	   On virtualized systems, the NVA may be able to obtain the address
643	   mapping information associated with VMs from the VM orchestration
644	   system itself.  If the VM orchestration system contains a master
645	   database for all the virtualization information, having the NVA
646	   obtain information directly to the orchestration system would be a
647	   natural approach.  Indeed, the NVA could effectively be co-located
648	   with the VM orchestration system itself.  In such systems, the VM
649	   orchestration system communicates with the NVE indirectly through the
650	   hypervisor.

652	   However, as described in Section 4 not all NVEs are associated with
653	   hypervisors.  In such cases, NVAs cannot leverage VM orchestration
654	   protocols to interact with an NVE and will instead need to peer
655	   directly with them.  By peering directly with an NVE, NVAs can obtain
656	   information about the TSes connected to that NVE and can distribute
657	   information to the NVE about the VNs those TSes are associated with.
658	   For example, whenever a Tenant System attaches to an NVE, that NVE
659	   would notify the NVA that the TS is now associated with that NVE.
660	   Likewise when a TS detaches from an NVE, that NVE would inform the
661	   NVA.  By communicating directly with NVEs, both the NVA and the NVE
662	   are able to maintain up-to-date information about all active tenants
663	   and the NVEs to which they are attached.

665	6.2.  Internal NVA Architecture

667	   For reliability and fault tolerance reasons, an NVA would be
668	   implemented in a distributed or replicated manner without single
669	   points of failure.  How the NVA is implemented, however, is not
670	   important to an NVE so long as the NVA provides a consistent and
671	   well-defined interface to the NVE.  For example, an NVA could be
672	   implemented via database techniques whereby a server stores address
673	   mapping information in a traditional (possibly replicated) database.
674	   Alternatively, an NVA could be implemented in a distributed fashion
675	   using an existing (or modified) routing protocol to maintain and
676	   distribute mappings.  So long as there is a clear interface between
677	   the NVE and NVA, how an NVA is architected and implemented is not
678	   important to an NVE.

680	   A number of architectural approaches could be used to implement NVAs
681	   themselves.  NVAs manage address bindings and distribute them to
682	   where they need to go.  One approach would be to use BGP (possibly
683	   with extensions) and route reflectors.  Another approach could use a
684	   transaction-based database model with replicated servers.  Because
685	   the implementation details are local to an NVA, there is no need to
686	   pick exactly one solution technology, so long as the external
687	   interfaces to the NVEs (and remote NVAs) are sufficiently well
688	   defined to achieve interoperability.

690	6.3.  NVA External Interface

692	   [note: the following section discusses various options that the WG
693	   has not yet expressed an opinion on.  Discussion is encouraged. ]
694	   Conceptually, from the perspective of an NVE, an NVA is a single
695	   entity.  An NVE interacts with the NVA, and it is the NVA's
696	   responsibility for ensuring that interactions between the NVE and NVA
697	   result in consistent behavior across the NVA and all other NVEs using
698	   the same NVA.  Because an NVA is built from multiple internal
699	   components, an NVA will have to ensure that information flows to all
700	   internal NVA components appropriately.

702	   One architectural question is how the NVA presents itself to the NVE.
703	   For example, an NVA could be required to provide access via a single
704	   IP address.  If NVEs only have one IP address to interact with, it
705	   would be the responsibility of the NVA to handle NVA component
706	   failures, e.g., by using a "floating IP address" that migrates among
707	   NVA components to ensure that the NVA can always be reached via the
708	   one address.  Having all NVA accesses through a single IP address,
709	   however, adds constraints to implementing robust failover, load
710	   balancing, etc.

712	   [Note: the following is a strawman proposal.]

714	   In the NVO3 architecture, an NVA is accessed through one or more IP
715	   addresses (ir IP address/port combination).  If multiple IP addresses
716	   are used, each IP address provides equivalent functionality, meaning
717	   that an NVE can use any of the provided addresses to interact with
718	   the NVA.  Should one address stop working, an NVE is expected to
719	   failover to another.  While the different addresses result in
720	   equivalent functionality, one address may be more respond more
721	   quickly than another, e.g., due to network conditions, load on the
722	   server, etc.

724	   [Note: should we support the following? ]  To provide some control
725	   over load balancing, NVA addresses may have an associated priority.
726	   Addresses are used in order of priority, with no explicit preference
727	   among NVA addresses having the same priority.  To provide basic load-
728	   balancing among NVAs of equal priorities, NVEs use some randomization
729	   input to select among equal-priority NVAs.  Such a priority scheme
730	   facilitates failover and load balancing, for example, allowing a
731	   network operator to specify a set of primary and backup NVAs.

733	   [note: should we support the following?  It would presumably add
734	   considerable complexity to the NVE.]  It may be desirable to have
735	   individual NVA addresses responsible for a subset of information
736	   about an NV Domain.  In such a case, NVEs would use different NVA
737	   addresses for obtaining or updating information about particular VNs
738	   or TS bindings.  A key question with such an approach is how
739	   information would be partitioned, and how an NVE could determine
740	   which address to use to get the information it needs.

742	   Another possibility is to treat the information on which NVA
743	   addresses to use as cached (soft-state) information at the NVEs, so
744	   that any NVA address can be used to obtain any information, but NVEs
745	   are informed of preferences for which addresses to use for particular
746	   information on VNs or TS bindings.  That preference information would
747	   be cached for future use to improve behavior - e.g., if all requests
748	   for a specific subset of VNs are forwarded to a specific NVA
749	   component, the NVE can optimize future requests within that subset by
750	   sending them directly to that NVA component via its address.

752	7.  NVE-to-NVA Protocol

754	   [Note: this and later sections are a bit sketchy and need work.
755	   Discussion is encouraged.]

757	   As outlined in Section 4.3, an NVE needs certain information in order
758	   to perform its functions.  To obtain such information from an NVA, an
759	   NVE-to-NVA protocol is needed.  The NVE-to-NVA protocol provides two
760	   functions.  First it allows an NVE to obtain information about the
761	   location and status of other TSes with which it needs to
762	   communication.  Second, the NVE-to-NVA protocol provides a way for
763	   NVEs to provide updates to the NVA about the TSes attached to that
764	   NVE (e.g., when a TS attaches or detaches from the NVE), or about
765	   communication errors encountered when sending traffic to remote NVEs.
766	   For example, an NVE could indicate that a destination it is trying to
767	   reach at a destination NVE is unreachable for some reason.

769	   While having a direct NVE-to-NVA protocol might seem straightforward,
770	   the existence of existing VM orchestration systems complicates the
771	   choices an NVE has for interacting with the NVA.

773	7.1.  NVE-NVA Interaction Models

775	   An NVE interacts with an NVA in at least two (quite different) ways:

777	   o  NVEs supporting VMs and hypervisors can obtain necessary
778	      information entirely through the hypervisor-facing side of the
779	      NVE.  Such an approach is a natural extension to existing VM
780	      orchestration systems supporting server virtualization because an
781	      existing protocol between the hypervisor and VM Orchestration
782	      system already exists and can be leveraged to obtain any needed
783	      information.  Specifically, VM orchestration systems used to
784	      create, terminate and migrate VMs already use well-defined (though
785	      typically proprietary) protocols to handle the interactions
786	      between the hypervisor and VM orchestration system.  For such
787	      systems, it is a natural extension to leverage the existing
788	      orchestration protocol as a sort of proxy protocol for handling
789	      the interactions between an NVE and the NVA.  Indeed, existing
790	      implementation already do this.

792	   o  Alternatively, an NVE can obtain needed information by interacting
793	      directly with an NVA via a protocol operating over the data center
794	      underlay network.  Such an approach is needed to support NVEs that
795	      are not associated with systems performing server virtualization
796	      (e.g., as in the case of a standalone gateway) or where the NVE
797	      needs to communicate directly with the NVA for other reasons.

799	   [Note: The following paragraph is included to stimulate discussion,
800	   and the WG will need to decide what direction it wants to take.]

802	   The WG The NVO3 architecture should support both of the above models,
803	   as in practice, it is likely that both models will coexist in
804	   practice and be used simultaneously in a deployment.  Existing
805	   virtualization environments are already using the first model.  But
806	   they are not sufficient to cover the case of standalone gateways --
807	   such gateways do not support virtualization and do not interface with
808	   existing VM orchestration systems.  Also, A hybrid approach might be
809	   desirable in some cases where the first model is used to obtain the
810	   information, but the latter approach is used to validate and further
811	   authenticate the information before using it.

813	7.2.  Direct NVE-NVA Protocol

815	   An NVE can interact directly with an NVA via an NVE-to-NVA protocol.
816	   Such a protocol can be either independent of the NVA internal
817	   protocol, or an extension of it.  Using a dedicated protocol provides
818	   architectural separation and independence between the NVE and NVA.
819	   The NVE and NVA interact in a well-defined way, and changes in the
820	   NVA (or NVE) do not need to impact each other.  Using a dedicated
821	   protocol also ensures that both NVE and NVA implementations can
822	   evolve independently and without dependencies on each other.  Such
823	   independence is important because the upgrade path for NVEs and NVAs
824	   is quite different.  Upgrading all the NVEs at a site will likely be
825	   more difficult in practice than upgrading NVAs because of their large
826	   number - one on each end device.  In practice, it is assumed that an
827	   NVE will be implemented once, and then (hopefully) not again, whereas
828	   an NVA (and its associated protocols) are more likely to evolve over
829	   time as experience is gained from usage.

831	   Requirements for a direct NVE-NVA protocol can be found in
832	   [I-D.ietf-nvo3-nve-nva-cp-req]

834	7.3.  Propagating Information Between NVEs and NVAs

836	   [Note: This section has been completely redone to move away from the
837	   push/pull discussion at an abstract level.]

839	   Information flows between NVEs and NVAs in both directions.  The NVA
840	   maintains information about all VNs in the NV Domain, so that NVEs do
841	   not need to do so themselves.  NVEs obtain from the NVA information
842	   about where a given remote TS destination resides.  NVAs in turn
843	   obtain information from NVEs about the individual TSs attached to
844	   those NVEs.

846	   While the NVA could push information about every virtual network to
847	   every NVE, such an approach scales poorly and is unnecessary.  In
848	   practice, a given NVE will only need and want to know about VNs to
849	   which it is attached.  Thus, an NVE should be able to subscribe to
850	   updates only for the virtual networks it is interested in receiving
851	   updates for.  The NVO3 architecture supports a model where an NVE is
852	   not required to have full mapping tables for all virtual networks in
853	   an NV Domain.

855	   Before sending unicast traffic to a remote TS, an NVE must know where
856	   the remote TS currently resides.  When a TS attaches to a virtual
857	   network, the NVE obtains information about that VN from the NVA.  The
858	   NVA can provide that information to the NVE at the time the TS
859	   attaches to the VN, either because the NVE requests the information
860	   when the attach operation occurs, or because the VM orchestration
861	   system has initiated the attach operation and provides associated
862	   mapping information to the NVE at the same time.  A similar process
863	   can take place with regards to obtaining necessary information needed
864	   for delivery of tenant broadcast or multicast traffic.

866	   There are scenarios where an NVE may wish to query the NVA about
867	   individual mappings within an VN.  For example, when sending traffic
868	   to a remote TS on a remote NVE, that TS may become unavailable (e.g,.
869	   because it has migrated elsewhere or has been shutdown, in which case
870	   the remote NVE may return an error indication).  In such situations,
871	   the NVE may need to query the NVA to obtain updated mapping
872	   information for a specific TS, or verify that the information is
873	   still correct despite the error condition.  Note that such a query
874	   could also be used by the NVA as an indication that there may be an
875	   inconsistency in the network and that it should take steps to verify
876	   that the information it has about the current state and location of a
877	   specific TS is still correct.

879	   For very large virtual networks, the amount of state an NVE needs to
880	   maintain for a given virtual network could be significant.  Moreover,
881	   an NVE may only be communicating with a small subset of the TSes on
882	   such a virtual network.  In such cases, the NVE may find it desirable
883	   to maintain state only for those destinations it is actively
884	   communicating with.  In such scenarios, an NVE may not want to
885	   maintain full mapping information about all destinations on a VN.
886	   Should it then need to communicate with a destination for which it
887	   does not have have mapping information, however, it will need to be
888	   able to query the NVA on demand for the missing information on a per-
889	   destination basis.

891	   The NVO3 architecture will need to support a range of operations
892	   between the NVE and NVA.  Requirements for those operations can be
893	   found in [I-D.ietf-nvo3-nve-nva-cp-req].

895	8.  Federated NVAs

897	   An NVA provides service to the set of NVEs in its NV Domain.  Each
898	   NVA manages network virtualization information for the virtual
899	   networks within its NV Domain.  An NV domain is administered by a
900	   single entity.

902	   In some cases, it will be necessary to expand the scope of a specific
903	   VN or even an entire NV domain beyond a single NVA.  For example,
904	   multiple data centers managed by the same administrator may wish to
905	   operate all of its data centers as a single NV region.  Such cases
906	   are handled by having different NVAs peer with each other to exchange
907	   mapping information about specific VNs.  NVAs operate in a federated
908	   manner with a set of NVAs operating as a loosely-coupled federation
909	   of individual NVAs.  If a virtual network spans multiple NVAs (e.g.,
910	   located at different data centers), and an NVE needs to deliver
911	   tenant traffic to an NVE at a remote NVA, it still interacts only
912	   with its NVA, even when obtaining mappings for NVEs associated with
913	   domains at a remote NVA.

915	   Figure Figure 3 shows a scenario where two separate NV Domains (1 and
916	   2) share information about Virtual Network "1217".  VM1 and VM1 both
917	   connect to the same Virtual Network (1217), even though the two VMs
918	   are in separate NV Domains.  There are two cases to consider.  In the
919	   first case, NV Domain B (NVB) does not allow NVE-A to tunnel traffic
920	   directly to NVE-B. There could be a number of reasons for this.  For
921	   example, NV Domains 1 and 2 may not share a common address space
922	   (i.e., require traversal through a NAT device), or for policy
923	   reasons, a domain might require that all traffic between separate NV
924	   Domains be funneled through a particular device (e.g., a firewall).
925	   In such cases, NVA-2 will advertise to NVA-1 that VM1 on virtual
926	   network 1217 is available, and direct that traffic between the two
927	   nodes go through IP-G. IP-G would then decapsulate received traffic
928	   from one NV Domain, translate it appropriately for the other domain
929	   and re-encapsulate the packet for delivery.

931	                   xxxxxx                          xxxxxx        +-----+
932	+-----+     xxxxxxxx    xxxxxx               xxxxxxx     xxxxx   | VM2 |
933	| VM1 |    xx                xx            xxx               xx  |-----|
934	|-----|   xx      +           x          xx                   x  |NVE-B|
935	|NVE-A|   x                   x  +----+  x                     x +-----+
936	+--+--+   x     NV Domain 1   x  |IP-G|--x                      x    |
937	   +-------x                 xx--+    | x                       xx   |
938	           x                x    +----+ x      NV Domain 2       x   |
939	        +---x             xx            xx                       x---+
940	        |    xxxx        xx           +->xx                     xx
941	        |       xxxxxxxxxx            |   xx                   xx
942	    +---+-+                           |     xx                xx
943	    |NVA-1|                        +--+--+    xx           xxx
944	    +-----+                        |NVA-2|     xxxx     xxxx
945	                                   +-----+        xxxxxxx

947	            Figure 3: VM1 and VM2 are in different NV Domains.

949	   NVAs at one site share information and interact with NVAs at other
950	   sites, but only in a controlled manner.  It is expected that policy
951	   and access control will be applied at the boundaries between
952	   different sites (and NVAs) so as to minimize dependencies on external
953	   NVAs that could negatively impact the operation within a site.  It is
954	   an architectural principle that operations involving NVAs at one site
955	   not be immediately impacted by failures or errors at another site.
956	   (Of course, communication between NVEs in different NVO3 domains may
957	   be impacted by such failures or errors.)  It is a strong requirement
958	   that an NVA continue to operate properly for local NVEs even if
959	   external communication is interrupted (e.g., should communication
960	   between a local and remote NVA fail).

962	   At a high level, a federation of interconnected NVAs has some
963	   analogies to BGP and Autonomous Systems.  Like an Autonomous System,
964	   NVAs at one site are managed by a single administrative entity and do
965	   not interact with external NVAs except as allowed by policy.
966	   Likewise, the interface between NVAs at different sites is well
967	   defined, so that the internal details of operations at one site are
968	   largely hidden to other sites.  Finally, an NVA only peers with other
969	   NVAs that it has a trusted relationship with, i.e., where a virtual
970	   network is intended to span multiple NVAs.

972	   [Note: the following are motivations for having a federated NVA model
973	   and are intended for discussion.  Depending on discussion, these may
974	   be removed from future versions of this document. ] Reasons for using
975	   a federated model include:

977	   o  Provide isolation between NVAs operating at different sites at
978	      different geographic locations.

980	   o  Control the quantity and rate of information updates that flow
981	      (and must be processed) between different NVAs in different data
982	      centers.

984	   o  Control the set of external NVAs (and external sites) a site peers
985	      with.  A site will only peer with other sites that are cooperating
986	      in providing an overlay service.

988	   o  Allow policy to be applied between sites.  A site will want to
989	      carefully control what information it exports (and to whom) as
990	      well as what information it is willing to import (and from whom).

992	   o  Allow different protocols and architectures to be used to for
993	      intra- vs. inter-NVA communication.  For example, within a single
994	      data center, a replicated transaction server using database
995	      techniques might be an attractive implementation option for an
996	      NVA, and protocols optimized for intra-NVA communication would
997	      likely be different from protocols involving inter-NVA
998	      communication between different sites.

1000	   o  Allow for optimized protocols, rather than using a one-size-fits
1001	      all approach.  Within a data center, networks tend to have lower-
1002	      latency, higher-speed and higher redundancy when compared with WAN
1003	      links interconnecting data centers.  The design constraints and
1004	      tradeoffs for a protocol operating within a data center network
1005	      are different from those operating over WAN links.  While a single
1006	      protocol could be used for both cases, there could be advantages
1007	      to using different and more specialized protocols for the intra-
1008	      and inter-NVA case.

1010	8.1.  Inter-NVA Peering

1012	   To support peering between different NVAs, an inter-NVA protocol is
1013	   needed.  The inter-NVA protocol defines what information is exchanged
1014	   between NVAs.  It is assumed that the protocol will be used to share
1015	   addressing information between data centers and must scale well over
1016	   WAN links.

1018	9.  Control Protocol Work Areas

1020	   The NVO3 architecture consists of two major distinct entities: NVEs
1021	   and NVAs.  In order to provide isolation and independence between
1022	   these two entities, the NVO3 architecture calls for well defined
1023	   protocols for interfacing between them.  For an individual NVA, the
1024	   architecture calls for a single conceptual entity, that could be
1025	   implemented in a distributed or replicated fashion.  While the IETF
1026	   may choose to define one or more specific architectural approaches to
1027	   building individual NVAs, there is little need for it to pick exactly
1028	   one approach to the exclusion of others.  An NVA for a single domain
1029	   will likely be deployed as a single vendor product and thus their is
1030	   little benefit in standardizing the internal structure of an NVA.

1032	   Individual NVAs peer with each other in a federated manner.  The NVO3
1033	   architecture calls for a well-defined interface between NVAs.

1035	   Finally, a hypervisor-to-NVE protocol is needed to cover the split-
1036	   NVE scenario described in Section 4.2.

1038	10.  NVO3 Data Plane Encapsulation

1040	   When tunneling tenant traffic, NVEs add encapsulation header to the
1041	   original tenant packet.  The exact encapsulation to use for NVO3 does
1042	   not seem to be critical.  The main requirement is that the
1043	   encapsulation support a Context ID of sufficient size
1044	   [I-D.ietf-nvo3-dataplane-requirements].  A number of encapsulations
1045	   already exist that provide a VN Context of sufficient size for NVO3.
1046	   For example, VXLAN [I-D.mahalingam-dutt-dcops-vxlan] has a 24-bit
1047	   VXLAN Network Identifier (VNI).  NVGRE
1048	   [I-D.sridharan-virtualization-nvgre] has a 24-bit Tenant Network ID
1049	   (TNI).  MPLS-over-GRE provides a 20-bit label field.  While there is
1050	   widespread recognition that a 12-bit VN Context would be too small
1051	   (only 4096 distinct values), it is generally agreed that 20 bits (1
1052	   million distinct values) and 24 bits (16.8 million distinct values)
1053	   are sufficient for a wide variety of deployment scenarios.

1055	   [Note: the following paragraph is included for WG discussion.  Future
1056	   versions of this document may omit this text.]

1058	   While one might argue that a new encapsulation should be defined just
1059	   for NVO3, no compelling requirements for doing so have been
1060	   identified yet.  Moreover, optimized implementations for existing
1061	   encapsulations are already starting to become available on the market
1062	   (i.e., in silicon).  If the IETF were to define a new encapsulation
1063	   format, it would take at least 2 (and likely more) years before
1064	   optimized implementations of the new format would become available in
1065	   products.  In addition, a new encapsulation format would not likely
1066	   displace existing formats, at least not for years.  Thus, there seems
1067	   little reason to define a new encapsulation.  However, it does make
1068	   sense for NVO3 to support multiple encapsulation formats, so as to
1069	   allow NVEs to use their preferred encapsulations when possible.  This
1070	   implies that the address dissemination protocols must also include an
1071	   indication of supported encapsulations along with the address mapping
1072	   details.

1074	11.  Operations and Management

1076	   The simplicity of operating and debugging overlay networks will be
1077	   critical for successful deployment.  Some architectural choices can
1078	   facilitate or hinder OAM.  Related OAM drafts include
1079	   [I-D.ashwood-nvo3-operational-requirement].

1081	12.  Summary

1083	   This document provides a start at a general architecture for overlays
1084	   in NVO3.  The architecture calls for three main areas of protocol
1085	   work:

1087	   1.  A hypervisor-to-NVE protocol to support Split NVEs as discussed
1088	       in Section 4.2.

1090	   2.  An NVE to NVA protocol for address dissemination.

1092	   3.  An NVA-to-NVA protocol for exchange of information about specific
1093	       virtual networks between NVAs.

1095	   It should be noted that existing protocols or extensions of existing
1096	   protocols are applicable.

1098	13.  Acknowledgments

1100	   Helpful comments and improvements to this document have come from
1101	   Lizhong Jin, Dennis (Xiaohong) Qin and Lucy Yong.

1103	14.  IANA Considerations

1105	   This memo includes no request to IANA.

1107	15.  Security Considerations

1109	   Yep, kind of sparse.  But we'll get there eventually. :-)

1111	16.  Informative References

1113	   [I-D.ashwood-nvo3-operational-requirement]
1114	              Ashwood-Smith, P., Iyengar, R., Tsou, T., Sajassi, A.,
1115	              Boucadair, M., Jacquenet, C., and M. Daikoku, "NVO3
1116	              Operational Requirements", draft-ashwood-nvo3-operational-
1117	              requirement-03 (work in progress), July 2013.

1119	   [I-D.ietf-nvo3-dataplane-requirements]
1120	              Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L.,
1121	              and B. Khasnabish, "NVO3 Data Plane Requirements", draft-
1122	              ietf-nvo3-dataplane-requirements-02 (work in progress),
1123	              November 2013.

1125	   [I-D.ietf-nvo3-framework]
1126	              Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y.
1127	              Rekhter, "Framework for DC Network Virtualization", draft-
1128	              ietf-nvo3-framework-04 (work in progress), November 2013.

1130	   [I-D.ietf-nvo3-nve-nva-cp-req]
1131	              Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network
1132	              Virtualization NVE to NVA Control Protocol Requirements",
1133	              draft-ietf-nvo3-nve-nva-cp-req-01 (work in progress),
1134	              October 2013.

1136	   [I-D.ietf-nvo3-overlay-problem-statement]
1137	              Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L.,
1138	              and M. Napierala, "Problem Statement: Overlays for Network
1139	              Virtualization", draft-ietf-nvo3-overlay-problem-
1140	              statement-04 (work in progress), July 2013.

1142	   [I-D.kreeger-nvo3-hypervisor-nve-cp]
1143	              Kreeger, L., Narten, T., and D. Black, "Network
1144	              Virtualization Hypervisor-to-NVE Overlay Control Protocol
1145	              Requirements", draft-kreeger-nvo3-hypervisor-nve-cp-01
1146	              (work in progress), February 2013.

1148	   [I-D.mahalingam-dutt-dcops-vxlan]
1149	              Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger,
1150	              L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A
1151	              Framework for Overlaying Virtualized Layer 2 Networks over
1152	              Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-06
1153	              (work in progress), November 2013.

1155	   [I-D.sridharan-virtualization-nvgre]
1156	              Sridharan, M., Greenberg, A., Wang, Y., Garg, P.,
1157	              Venkataramiah, N., Duda, K., Ganga, I., Lin, G., Pearson,
1158	              M., Thaler, P., and C. Tumuluri, "NVGRE: Network
1159	              Virtualization using Generic Routing Encapsulation",
1160	              draft-sridharan-virtualization-nvgre-03 (work in
1161	              progress), August 2013.

1163	   [IEEE-802.1Q]
1164	              IEEE 802.1Q-2011, , "IEEE standard for local and
1165	              metropolitan area networks: Media access control (MAC)
1166	              bridges and virtual bridged local area networks,", August
1167	              2011.

1169	Appendix A.  Change Log

1171	A.1.  Changes From draft-narten-nvo3 to draft-ietf-nvo3

1173	   1.  No changes between draft-narten-nvo3-arch-01 and draft-ietf-nvoe-
1174	       arch-00.

1176	A.2.  Changes From -00 to -01 (of draft-narten-nvo3-arch)

1178	   1.  Editorial and clarity improvements.

1180	   2.  Replaced "push vs. pull" section with section more focussed on
1181	       triggers where an event implies or triggers some action.

1183	   3.  Clarified text on co-located NVE to show how offloading NVE
1184	       functionality onto adaptors is desirable.

1186	   4.  Added new section on distributed gateways.

1188	   5.  Expanded Section on NVA external interface, adding requirement
1189	       for NVE to support multiple IP NVA addresses.

1191	Authors' Addresses

1193	   David Black
1194	   EMC

1196	   Email: david.black@emc.com

1198	   Jon Hudson
1199	   Brocade
1200	   120 Holger Way
1201	   San Jose, CA  95134
1202	   USA

1204	   Email: jon.hudson@gmail.com
1205	   Lawrence Kreeger
1206	   Cisco

1208	   Email: kreeger@cisco.com

1210	   Marc Lasserre
1211	   Alcatel-Lucent

1213	   Email: marc.lasserre@alcatel-lucent.com

1215	   Thomas Narten
1216	   IBM

1218	   Email: narten@us.ibm.com