idnits 2.17.1 draft-narten-nvo3-arch-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 9 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 932 has weird spacing: '...xxxxxxx xxx...' -- The document date (October 22, 2013) is 3810 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-03) exists of draft-ietf-nvo3-dataplane-requirements-01 == Outdated reference: A later version (-09) exists of draft-ietf-nvo3-framework-03 == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-00 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-05 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-03 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force D. Black 3 Internet-Draft EMC 4 Intended status: Informational J. Hudson 5 Expires: April 25, 2014 Brocade 6 L. Kreeger 7 Cisco 8 M. Lasserre 9 Alcatel-Lucent 10 T. Narten 11 IBM 12 October 22, 2013 14 An Architecture for Overlay Networks (NVO3) 15 draft-narten-nvo3-arch-01 17 Abstract 19 This document presents a high-level overview architecture for 20 building overlay networks in NVO3. The architecture is given at a 21 high-level, showing the major components of an overall system. An 22 important goal is to divide the space into individual smaller 23 components that can be implemented independently and with clear 24 interfaces and interactions with other components. It should be 25 possible to build and implement individual components in isolation 26 and have them work with other components with no changes to other 27 components. That way implementers have flexibility in implementing 28 individual components and can optimize and innovate within their 29 respective components without requiring changes to other components. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on April 25, 2014. 48 Copyright Notice 50 Copyright (c) 2013 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3.1. VN Service (L2 and L3) . . . . . . . . . . . . . . . . . 5 69 3.2. Network Virtualization Edge (NVE) . . . . . . . . . . . . 6 70 3.3. Network Virtualization Authority (NVA) . . . . . . . . . 7 71 3.4. VM Orchestration Systems . . . . . . . . . . . . . . . . 8 72 4. Network Virtualization Edge (NVE) . . . . . . . . . . . . . . 9 73 4.1. NVE Co-located With Server Hypervisor . . . . . . . . . . 9 74 4.2. Split-NVE . . . . . . . . . . . . . . . . . . . . . . . . 10 75 4.3. NVE State . . . . . . . . . . . . . . . . . . . . . . . . 11 76 5. Tenant System Types . . . . . . . . . . . . . . . . . . . . . 12 77 5.1. Overlay-Aware Network Service Appliances . . . . . . . . 12 78 5.2. Bare Metal Servers . . . . . . . . . . . . . . . . . . . 12 79 5.3. Gateways . . . . . . . . . . . . . . . . . . . . . . . . 13 80 5.4. Distributed Gateways . . . . . . . . . . . . . . . . . . 13 81 6. Network Virtualization Authority . . . . . . . . . . . . . . 14 82 6.1. How an NVA Obtains Information . . . . . . . . . . . . . 14 83 6.2. Internal NVA Architecture . . . . . . . . . . . . . . . . 15 84 6.3. NVA External Interface . . . . . . . . . . . . . . . . . 15 85 7. NVE-to-NVA Protocol . . . . . . . . . . . . . . . . . . . . . 17 86 7.1. NVE-NVA Interaction Models . . . . . . . . . . . . . . . 17 87 7.2. Direct NVE-NVA Protocol . . . . . . . . . . . . . . . . . 18 88 7.3. Propagating Information Between NVEs and NVAs . . . . . . 19 89 8. Federated NVAs . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1. Inter-NVA Peering . . . . . . . . . . . . . . . . . . . . 22 91 9. Control Protocol Work Areas . . . . . . . . . . . . . . . . . 23 92 10. NVO3 Data Plane Encapsulation . . . . . . . . . . . . . . . . 23 93 11. Operations and Management . . . . . . . . . . . . . . . . . . 24 94 12. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 95 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 96 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 97 15. Security Considerations . . . . . . . . . . . . . . . . . . . 24 98 16. Informative References . . . . . . . . . . . . . . . . . . . 24 99 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 26 100 A.1. Changes From -00 to -01 . . . . . . . . . . . . . . . . . 26 101 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 103 1. Introduction 105 This document presents a high-level architecture for building overlay 106 networks in NVO3. The architecture is given at a high-level, showing 107 the major components of an overall system. An important goal is to 108 divide the space into smaller individual components that can be 109 implemented independently and with clear interfaces and interactions 110 with other components. It should be possible to build and implement 111 individual components in isolation and have them work with other 112 components with no changes to other components. That way 113 implementers have flexibility in implementing individual components 114 and can optimize and innovate within their respective components 115 without necessarily requiring changes to other components. 117 The motivation for overlay networks is given in 118 [I-D.ietf-nvo3-overlay-problem-statement]. "Framework for DC Network 119 Virtualization" [I-D.ietf-nvo3-framework] provides a framework for 120 discussing overlay networks generally and the various components that 121 must work together in building such systems. This document differs 122 from the framework document in that it doesn't attempt to cover all 123 possible approaches within the general design space. Rather, it 124 describes one particular approach. 126 This document is intended to be a concrete strawman that can be used 127 for discussion within the IETF NVO3 WG on what the NVO3 architecture 128 should look like. 130 2. Terminology 132 This document uses the same terminology as [I-D.ietf-nvo3-framework]. 133 In addition, the following terms are used: 135 NV Domain A Network Virtualization Domain is an administrative 136 construct that defines a Network Virtualization Authority (NVA), 137 the set of Network Virtualization Edges (NVEs) associated with 138 that NVA, and the set of virtual networks the NVA manages and 139 supports. NVEs are associated with a (logically centralized) NVA, 140 and an NVE supports communication for any of the virtual networks 141 in the domain. 143 NV Region A region over which information about a set of virtual 144 networks is shared. The degenerate case of a single NV Domain 145 corresponds to an NV region corresponding to that domain. The 146 more interesting case occurs when two or more NV Domains share 147 information about part or all of a set of virtual networks that 148 they manage. Two NVAs share information about particular virtual 149 networks for the purpose of supporting connectivity between 150 tenants located in different NVA Domains. NVAs can share 151 information about an entire NV domain, or just individual virtual 152 networks. 154 Tenant System Identifier (TSI) Interface to a Virtual Network as 155 presented to a Tenant System. The TSI logically connects to the 156 NVE via a Virtual Access Point (VAP). To the Tenant System, the 157 TSI is like a NIC; the TSI presents itself to a Tenant System as a 158 normal network interface. 160 3. Background 162 Overlay networks are an approach for providing network virtualization 163 services to a set of Tenant Systems (TSs) [I-D.ietf-nvo3-framework]. 164 With overlays, data traffic between tenants is tunneled across the 165 underlying data center's IP network. The use of tunnels provides a 166 number of benefits by decoupling the network as viewed by tenants 167 from the underlying physical network across which they communicate. 169 Tenant Systems connect to Virtual Networks (VNs), with each VN having 170 associated attributes defining properties of the network, such as the 171 set of members that connect to it. Tenant Systems connected to a 172 virtual network typically communicate freely with other Tenant 173 Systems on the same VN, but communication between Tenant Systems on 174 one VN and those external to the VN (whether on another VN or 175 connected to the Internet) is carefully controlled and governed by 176 policy. 178 A Network Virtualization Edge (NVE) [I-D.ietf-nvo3-framework] is the 179 entity that implements the overlay functionality. An NVE resides at 180 the boundary between a Tenant System and the overlay network as shown 181 in Figure 1. An NVE creates and maintains local state about each 182 Virtual Network for which it is providing service on behalf of a 183 Tenant System. 185 +--------+ +--------+ 186 | Tenant +--+ +----| Tenant | 187 | System | | (') | System | 188 +--------+ | ................ ( ) +--------+ 189 | +-+--+ . . +--+-+ (_) 190 | | NVE|--. .--| NVE| | 191 +--| | . . | |---+ 192 +-+--+ . . +--+-+ 193 / . . 194 / . L3 Overlay . +--+-++--------+ 195 +--------+ / . Network . | NVE|| Tenant | 196 | Tenant +--+ . .- -| || System | 197 | System | . . +--+-++--------+ 198 +--------+ ................ 199 | 200 +----+ 201 | NVE| 202 | | 203 +----+ 204 | 205 | 206 ===================== 207 | | 208 +--------+ +--------+ 209 | Tenant | | Tenant | 210 | System | | System | 211 +--------+ +--------+ 213 The dotted line indicates a network connection (i.e., IP). 215 Figure 1: NVO3 Generic Reference Model 217 The following subsections describe key aspects of an overlay system 218 in more detail. Section 3.1 describes the service model (Ethernet 219 vs. IP) provided to Tenant Systems. Section 3.2 describes NVEs in 220 more detail. Section 3.3 introduces the Network Virtualization 221 Authority, from which NVEs obtain information about virtual networks. 222 Section 3.4 provides background on VM orchestration systems and their 223 use of virtual networks. 225 3.1. VN Service (L2 and L3) 226 A Virtual Network provides either L2 or L3 service to connected 227 tenants. For L2 service, VNs transport Ethernet frames, and a Tenant 228 System is provided with a service that is analogous to being 229 connected to a specific L2 C-VLAN. L2 broadcast frames are delivered 230 to all (and multicast frames delivered to a subset of) the other 231 Tenant Systems on the VN. To a Tenant System, it appears as if they 232 are connected to a regular L2 Ethernet link. Within NVO3, tenant 233 frames are tunneled to remote NVEs based on the MAC addresses of the 234 frame headers as originated by the Tenant System. On the underlay, 235 NVO3 packets are forwarded between NVEs based on the outer addresses 236 of tunneled packets. 238 For L3 service, VNs transport IP datagrams, and a Tenant System is 239 provided with a service that only supports IP traffic. Within NVO3, 240 tenant frames are tunneled to remote NVEs based on the IP addresses 241 of the packet originated by the Tenant System; any L2 destination 242 addresses provided by Tenant Systems are effectively ignored. 244 L2 service is intended for systems that need native L2 Ethernet 245 service and the ability to run protocols directly over Ethernet 246 (i.e., not based on IP). L3 service is intended for systems in which 247 all the traffic can safely be assumed to be IP. It is important to 248 note that whether NVO3 provides L2 or L3 service to a Tenant System, 249 the Tenant System does not generally need to be aware of the 250 distinction. In both cases, the virtual network presents itself to 251 the Tenant System as an L2 Ethernet interface. An Ethernet interface 252 is used in both cases simply as a widely supported interface type 253 that essentially all Tenant Systems already support. Consequently, 254 no special software is needed on Tenant Systems to use an L3 vs. an 255 L2 overlay service. 257 3.2. Network Virtualization Edge (NVE) 259 Tenant Systems connect to NVEs via a Tenant System Interface (TSI). 260 The TSI logically connects to the NVE via a Virtual Access Point 261 (VAP) as shown in Figure 2. To the Tenant System, the TSI is like a 262 NIC; the TSI presents itself to a Tenant System as a normal network 263 interface. On the NVE side, a VAP is a logical network port (virtual 264 or physical) into a specific virtual network. Note that two 265 different Tenant Systems (and TSIs) attached to a common NVE can 266 share a VAP (e.g., TS1 and TS2 in Figure 2) so long as they connect 267 to the same Virtual Network. 269 | Data Center Network (IP) | 270 | | 271 +-----------------------------------------+ 272 | | 273 | Tunnel Overlay | 274 +------------+---------+ +---------+------------+ 275 | +----------+-------+ | | +-------+----------+ | 276 | | Overlay Module | | | | Overlay Module | | 277 | +---------+--------+ | | +---------+--------+ | 278 | | | | | | 279 NVE1 | | | | | | NVE2 280 | +--------+-------+ | | +--------+-------+ | 281 | | |VNI1| |VNI2| | | | |VNI1| |VNI2| | 282 | +-+----------+---+ | | +-+-----------+--+ | 283 | | VAP1 | VAP2 | | | VAP1 | VAP2| 284 +----+------------+----+ +----+-----------+ ----+ 285 | | | | 286 |\ | | | 287 | \ | | /| 288 -------+--\-------+-------------------+---------/-+------- 289 | \ | Tenant | / | 290 TSI1 |TSI2\ | TSI3 TSI1 TSI2/ TSI3 291 +---+ +---+ +---+ +---+ +---+ +---+ 292 |TS1| |TS2| |TS3| |TS4| |TS5| |TS6| 293 +---+ +---+ +---+ +---+ +---+ +---+ 295 Figure 2: NVE Reference Model 297 The Overlay Module performs the actual encapsulation and 298 decapsulation of tunneled packets. The NVE maintains state about the 299 virtual networks it is a part of so that it can provide the Overlay 300 Module with such information as the destination address of the NVE to 301 tunnel a packet to, or the Context ID that should be placed in the 302 encapsulation header to identify the virtual network a tunneled 303 packet belong to. 305 On the data center network side, the NVE sends and receives native IP 306 traffic. When ingressing traffic from a Tenant System, the NVE 307 identifies the egress NVE to which the packet should be sent, adds an 308 overlay encapsulation header, and sends the packet on the underlay 309 network. When receiving traffic from a remote NVE, an NVE strips off 310 the encapsulation header, and delivers the (original) packet to the 311 appropriate Tenant System. 313 Conceptually, the NVE is a single entity implementing the NVO3 314 functionality. In practice, there are a number of different 315 implementation scenarios, as described in detail in Section 4. 317 3.3. Network Virtualization Authority (NVA) 319 Address dissemination refers to the process of learning, building and 320 distributing the mapping/forwarding information that NVEs need in 321 order to tunnel traffic to each other on behalf of communicating 322 Tenant Systems. For example, in order to send traffic to a remote 323 Tenant System, the sending NVE must know the destination NVE for that 324 Tenant System. 326 One way to build and maintain mapping tables is to use learning, as 327 802.1 bridges do [IEEE-802.1Q]. When forwarding traffic to multicast 328 or unknown unicast destinations, an NVE could simply flood traffic 329 everywhere. While flooding works, it can lead to traffic hot spots 330 and can lead to problems in larger networks. 332 Alternatively, NVEs can make use of a Network Virtualization 333 Authority (NVA). An NVA is the entity that provides address mapping 334 and other information to NVEs. NVEs interact with an NVA to obtain 335 any required address mapping information they need in order to 336 properly forward traffic on behalf of tenants. The term NVA refers 337 to the overall system, without regards to its scope or how it is 338 implemented. NVAs provide a service, and NVEs access that service 339 via an NVE-to-NVA protocol. 341 Even when an NVA is present, learning could be used as a fallback 342 mechanism, should the NVA be unable to provide an answer or for other 343 reasons. This document does not consider flooding approaches in 344 detail, as there are a number of benefits in using an approach that 345 depends on the presence of an NVA. 347 NVAs are discussed in more detail in Section 6. 349 3.4. VM Orchestration Systems 351 VM Orchestration systems manage server virtualization across a set of 352 servers. Although VM management is a separate topic from network 353 virtualization, the two areas are closely related. Managing the 354 creation, placement, and movements of VMs also involves creating, 355 attaching to and detaching from virtual networks. A number of 356 existing VM orchestration systems have incorporated aspects of 357 virtual network management into their systems. 359 When a new VM image is started, the VM Orchestration system 360 determines where the VM should be placed, interacts with the 361 hypervisor on the target server to load and start the server and 362 controls when a VM should be shutdown or migrated elsewhere. VM 363 Orchestration systems also have knowledge about how a VM should 364 connect to a network, possibly including the name of the virtual 365 network to which a VM is to connect. The VM orchestration system can 366 pass such information to the hypervisor when a VM is instantiated. 367 VM orchestration systems have significant (and sometimes global) 368 knowledge over the domain they manage. They typically know on what 369 servers a VM is running, and meta data associated with VM images can 370 be useful from a network virtualization perspective. For example, 371 the meta data may include the addresses (MAC and IP) the VMs will use 372 and the name(s) of the virtual network(s) they connect to. 374 VM orchestration systems run a protocol with an agent running on the 375 hypervisor of the servers they manage. That protocol can also carry 376 information about what virtual network a VM is associated with. When 377 the orchestrator instantiates a VM on a hypervisor, the hypervisor 378 interacts with the NVE in order to attach the VM to the virtual 379 networks it has access to. In general, the hypervisor will need to 380 communicate significant VM state changes to the NVE. In the reverse 381 direction, the NVE may need to communicate network connectivity 382 information back to the hypervisor. Example VM orchestration systems 383 in use today include VMware's vCenter Server or Microsoft's System 384 Center Virtual Machine Manager. Both can pass information about what 385 virtual networks a VM connects to down to the hypervisor. The 386 protocol used between the VM orchestration system and hypervisors is 387 generally proprietary. 389 It should be noted that VM orchestration systems may not have direct 390 access to all networking related information a VM uses. For example, 391 a VM may make use of additional IP or MAC addresses that the VM 392 management system is not aware of. 394 4. Network Virtualization Edge (NVE) 396 As introduced in Section 3.2 an NVE is the entity that implements the 397 overlay functionality. This section describes NVEs in more detail. 398 An NVE will have two external interfaces: 400 Tenant Facing: On the tenant facing side, an NVE interacts with the 401 hypervisor (or equivalent entity) to provide the NVO3 service. An 402 NVE will need to be notified when a Tenant System "attaches" to a 403 virtual network (so it can validate the request and set up any 404 state needed to send and receive traffic on behalf of the Tenant 405 System on that VN). Likewise, an NVE will need to be informed 406 when the Tenant System "detaches" from the virtual network so that 407 it can reclaim state and resources appropriately. 409 DCN Facing: On the data center network facing side, an NVE 410 interfaces with the data center underlay network, sending and 411 receiving tunneled IP packets to and from the underlay. The NVE 412 may also run a control protocol with other entities on the 413 network, such as the Network Virtualization Authority. 415 4.1. NVE Co-located With Server Hypervisor 416 When server virtualization is used, the entire NVE functionality will 417 typically be implemented as part of the hypervisor and/or virtual 418 switch on the server. In such cases, the Tenant System interacts 419 with the hypervisor and the hypervisor interacts with the NVE. 420 Because the interaction between the hypervisor and NVE is implemented 421 entirely in software on the server, there is no "on-the-wire" 422 protocol between Tenant Systems (or the hypervisor) and the NVE that 423 needs to be standardized. While there may be APIs between the NVE 424 and hypervisor to support necessary interaction, the details of such 425 an API are not in-scope for the IETF to work on. 427 Implementing NVE functionality entirely on a server has the 428 disadvantage that server CPU resources must be spent implementing the 429 NVO3 functionality. Experimentation with overlay approaches and 430 previous experience with TCP and checksum adapter offloads suggests 431 that offloading certain NVE operations (e.g., encapsulation and 432 decapsulation operations) onto the physical network adaptor can 433 produce performance improvements. As has been done with checksum and 434 /or TCP server offload and other optimization approaches, there may 435 be benefits to offloading common operations onto adaptors where 436 possible. Just as important, the addition of an overlay header can 437 disable existing adaptor offload capabilities that are generally not 438 prepared to handle the addition of a new header or other operations 439 associated with an NVE. 441 While the details of how to split the implementation of specific NVE 442 functionality between a server and its network adaptors is outside 443 the scope of IETF standardization, the NVO3 architecture should 444 support such separation. Ideally, it may even be possible to bypass 445 the hypervisor completely on critical data path operations so that 446 packets between a TS and its VN can be sent and received without 447 having the hypervisor involved in each individual packet operation. 449 4.2. Split-NVE 451 Another possible scenario leads to the need for a split NVE 452 implementation. A hypervisor running on a server could be aware that 453 NVO3 is in use, but have some of the actual NVO3 functionality 454 implemented on an adjacent switch to which the server is attached. 455 While one could imagine a number of link types between a server and 456 the NVE, the simplest deployment scenario would involve a server and 457 NVE separated by a simple L2 Ethernet link, across which LLDP runs. 458 A more complicated scenario would have the server and NVE separated 459 by a bridged access network, such as when the NVE resides on a ToR, 460 with an embedded switch residing between servers and the ToR. 462 While the above talks about a scenario involving a hypervisor, it 463 should be noted that the same scenario can apply to Network Service 464 Appliances as discussed in Section 5.1. In general, when this 465 document discusses the interaction between a hypervisor and NVE, the 466 discussion applies to Network Service Appliances as well. 468 For the split NVE case, protocols will be needed that allow the 469 hypervisor and NVE to negotiate and setup the necessary state so that 470 traffic sent across the access link between a server and the NVE can 471 be associated with the correct virtual network instance. 472 Specifically, on the access link, traffic belonging to a specific 473 Tenant System would be tagged with a specific VLAN C-TAG that 474 identifies which specific NVO3 virtual network instance it belongs 475 to. The hypervisor-NVE protocol would negotiate which VLAN C-TAG to 476 use for a particular virtual network instance. More details of the 477 protocol requirements for functionality between hypervisors and NVEs 478 can be found in [I-D.kreeger-nvo3-hypervisor-nve-cp]. 480 4.3. NVE State 482 NVEs maintain internal data structures and state to support the 483 sending and receiving of tenant traffic. An NVE may need some or all 484 of the following information: 486 1. An NVE keeps track of which attached Tenant Systems are connected 487 to which virtual networks. When a Tenant System attaches to a 488 virtual network, the NVE will need to create or update local 489 state for that virtual network. When the last Tenant System 490 detaches from a given VN, the NVE can reclaim state associated 491 with that VN. 493 2. For tenant unicast traffic, an NVE maintains a per-VN table of 494 mappings from Tenant System (inner) addresses to remote NVE 495 (outer) addresses. 497 3. For tenant multicast (or broadcast) traffic, an NVE maintains a 498 per-VN table of mappings and other information on how to deliver 499 multicast (or broadcast) traffic. If the underlying network 500 supports IP multicast, the NVE could use IP multicast to deliver 501 tenant traffic. In such a case, the NVE would need to know what 502 IP underlay multicast address to use for a given VN. 503 Alternatively, if the underlying network does not support 504 multicast, an NVE could use serial unicast to deliver traffic. 505 In such a case, an NVE would need to know which destinations are 506 subscribers to the tenant multicast group. An NVE could use both 507 approaches, switching from one mode to the other depending on 508 such factors as bandwidth efficiency and group membership 509 sparseness. 511 4. An NVE maintains necessary information to encapsulate outgoing 512 traffic, including what type of encapsulation and what value to 513 use for a Context ID within the encapsulation header. 515 5. In order to deliver incoming encapsulated packets to the correct 516 Tenant Systems, an NVE maintains the necessary information to map 517 incoming traffic to the appropriate VAP and Tenant System. 519 6. An NVE may find it convenient to maintain additional per-VN 520 information such as QoS settings, Path MTU information, ACLs, 521 etc. 523 5. Tenant System Types 525 This section describes a number of special Tenant System types and 526 how they fit into an NVO3 system. 528 5.1. Overlay-Aware Network Service Appliances 530 Some Network Service Appliances [I-D.ietf-nvo3-nve-nva-cp-req] 531 (virtual or physical) provide tenant-aware services. That is, the 532 specific service they provide depends on the identity of the tenant 533 making use of the service. For example, firewalls are now becoming 534 available that support multi-tenancy where a single firewall provides 535 virtual firewall service on a per-tenant basis, using per-tenant 536 configuration rules and maintaining per-tenant state. Such 537 appliances will be aware of the VN an activity corresponds to while 538 processing requests. Unlike server virtualization, which shields VMs 539 from needing to know about multi-tenancy, a Network Service Appliance 540 explicitly supports multi-tenancy. In such cases, the Network 541 Service Appliance itself will be aware of network virtualization and 542 either embed an NVE directly, or implement a split NVE as described 543 in Section 4.2. Unlike server virtualization, however, the Network 544 Service Appliance will not be running a traditional hypervisor and 545 the VM Orchestration system may not interact with the Network Service 546 Appliance. The NVE on such appliances will need to support a control 547 plane to obtain the necessary information needed to fully participate 548 in an NVO3 Domain. 550 5.2. Bare Metal Servers 552 Many data centers will continue to have at least some servers 553 operating as non-virtualized (or "bare metal") machines running a 554 traditional operating system and workload. In such systems, there 555 will be no NVE functionality on the server, and the server will have 556 no knowledge of NVO3 (including whether overlays are even in use). 557 In such environments, the NVE functionality can reside on the first- 558 hop physical switch. In such a case, the network administrator would 559 (manually) configure the switch to enable the appropriate NVO3 560 functionality on the switch port connecting the server and associate 561 that port with a specific virtual network. Such configuration would 562 typically be static, since the server is not virtualized, and once 563 configured, is unlikely to change frequently. Consequently, this 564 scenario does not require any protocol or standards work. 566 5.3. Gateways 568 Gateways on VNs relay traffic onto and off of a virtual network. 569 Tenant Systems use gateways to reach destinations outside of the 570 local VN. Gateways receive encapsulated traffic from one VN, remove 571 the encapsulation header, and send the native packet out onto the 572 data center network for delivery. Outside traffic enters a VN in a 573 reverse manner. 575 Gateways can be either virtual (i.e., implemented as a VM) or 576 physical (i.e., as a standalone physical device). For performance 577 reasons, standalone hardware gateways may be desirable in some cases. 578 Such gateways could consist of a simple switch forwarding traffic 579 from a VN onto the local data center network, or could embed router 580 functionality. On such gateways, network interfaces connecting to 581 virtual networks will (at least conceptually) embed NVE (or split- 582 NVE) functionality within them. As in the case with Network Service 583 Appliances, gateways will not support a hypervisor and will need an 584 appropriate control plane protocol to obtain the information needed 585 to provide NVO3 service. 587 Gateways handle several different use cases. For example, a virtual 588 network could consist of systems supporting overlays together with 589 legacy Tenant Systems that do not. Gateways could be used to connect 590 legacy systems supporting, e.g., L2 VLANs, to specific virtual 591 networks, effectively making them part of the same virtual network. 592 Gateways could also forward traffic between a virtual network and 593 other hosts on the data center network or relay traffic between 594 different VNs. Finally, gateways can provide external connectivity 595 such as Internet or VPN access. 597 5.4. Distributed Gateways 599 The relaying of traffic from one VN to another deserves special 600 consideration. The previous section described gateways performing 601 this function. If such gateways are centralized, traffic between 602 TSes on different VNs can take suboptimal paths, i.e., triangular 603 routing results in paths that always traverse the gateway. As an 604 optimization, individual NVEs can be part of a distributed gateway 605 that performs such relaying, reducing or completely eliminating 606 triangular routing. In a distributed gateway, each ingress NVE can 607 perform such relaying activity directly, so long as it has access to 608 the policy information needed to determine whether cross-VN 609 communication is allowed. Having individual NVEs be part of a 610 distributed gateway allows them to tunnel traffic directly to the 611 destination NVE without the need to take suboptimal paths. 613 The NVO3 architecture should [must? or just say it does?] support 614 distributed gateways. Such support requires that NVO3 control 615 protocols include mechanisms for the maintenance and distribution of 616 policy information about what type of cross-VN communication is 617 allowed so that NVEs acting as distributed gateways can tunnel 618 traffic from one VN to another as appropriate. 620 6. Network Virtualization Authority 622 Before sending to and receiving traffic from a virtual network, an 623 NVE must obtain the information needed to build its internal 624 forwarding tables and state as listed in Section 4.3. An NVE obtains 625 such information from a Network Virtualization Authority. 627 The Network Virtualization Authority (NVA) is the entity that 628 provides address mapping and other information to NVEs. NVEs 629 interact with an NVA to obtain any required information they need in 630 order to properly forward traffic on behalf of tenants. The term NVA 631 refers to the overall system, without regards to its scope or how it 632 is implemented. 634 6.1. How an NVA Obtains Information 636 There are two primary ways in which an NVA can obtain the address 637 dissemination information it manages. The NVA can obtain information 638 either from the VM orchestration system, or directly from the NVEs 639 themselves. 641 On virtualized systems, the NVA may be able to obtain the address 642 mapping information associated with VMs from the VM orchestration 643 system itself. If the VM orchestration system contains a master 644 database for all the virtualization information, having the NVA 645 obtain information directly to the orchestration system would be a 646 natural approach. Indeed, the NVA could effectively be co-located 647 with the VM orchestration system itself. In such systems, the VM 648 orchestration system communicates with the NVE indirectly through the 649 hypervisor. 651 However, as described in Section 4 not all NVEs are associated with 652 hypervisors. In such cases, NVAs cannot leverage VM orchestration 653 protocols to interact with an NVE and will instead need to peer 654 directly with them. By peering directly with an NVE, NVAs can obtain 655 information about the TSes connected to that NVE and can distribute 656 information to the NVE about the VNs those TSes are associated with. 657 For example, whenever a Tenant System attaches to an NVE, that NVE 658 would notify the NVA that the TS is now associated with that NVE. 659 Likewise when a TS detaches from an NVE, that NVE would inform the 660 NVA. By communicating directly with NVEs, both the NVA and the NVE 661 are able to maintain up-to-date information about all active tenants 662 and the NVEs to which they are attached. 664 6.2. Internal NVA Architecture 666 For reliability and fault tolerance reasons, an NVA would be 667 implemented in a distributed or replicated manner without single 668 points of failure. How the NVA is implemented, however, is not 669 important to an NVE so long as the NVA provides a consistent and 670 well-defined interface to the NVE. For example, an NVA could be 671 implemented via database techniques whereby a server stores address 672 mapping information in a traditional (possibly replicated) database. 673 Alternatively, an NVA could be implemented in a distributed fashion 674 using an existing (or modified) routing protocol to maintain and 675 distribute mappings. So long as there is a clear interface between 676 the NVE and NVA, how an NVA is architected and implemented is not 677 important to an NVE. 679 A number of architectural approaches could be used to implement NVAs 680 themselves. NVAs manage address bindings and distribute them to 681 where they need to go. One approach would be to use BGP (possibly 682 with extensions) and route reflectors. Another approach could use a 683 transaction-based database model with replicated servers. Because 684 the implementation details are local to an NVA, there is no need to 685 pick exactly one solution technology, so long as the external 686 interfaces to the NVEs (and remote NVAs) are sufficiently well 687 defined to achieve interoperability. 689 6.3. NVA External Interface 691 [note: the following section discusses various options that the WG 692 has not yet expressed an opinion on. Discussion is encouraged. ] 694 Conceptually, from the perspective of an NVE, an NVA is a single 695 entity. An NVE interacts with the NVA, and it is the NVA's 696 responsibility for ensuring that interactions between the NVE and NVA 697 result in consistent behavior across the NVA and all other NVEs using 698 the same NVA. Because an NVA is built from multiple internal 699 components, an NVA will have to ensure that information flows to all 700 internal NVA components appropriately. 702 One architectural question is how the NVA presents itself to the NVE. 703 For example, an NVA could be required to provide access via a single 704 IP address. If NVEs only have one IP address to interact with, it 705 would be the responsibility of the NVA to handle NVA component 706 failures, e.g., by using a "floating IP address" that migrates among 707 NVA components to ensure that the NVA can always be reached via the 708 one address. Having all NVA accesses through a single IP address, 709 however, adds constraints to implementing robust failover, load 710 balancing, etc. 712 [Note: the following is a strawman proposal.] 714 In the NVO3 architecture, an NVA is accessed through one or more IP 715 addresses (ir IP address/port combination). If multiple IP addresses 716 are used, each IP address provides equivalent functionality, meaning 717 that an NVE can use any of the provided addresses to interact with 718 the NVA. Should one address stop working, an NVE is expected to 719 failover to another. While the different addresses result in 720 equivalent functionality, one address may be more respond more 721 quickly than another, e.g., due to network conditions, load on the 722 server, etc. 724 [Note: should we support the following? ] To provide some control 725 over load balancing, NVA addresses may have an associated priority. 726 Addresses are used in order of priority, with no explicit preference 727 among NVA addresses having the same priority. To provide basic load- 728 balancing among NVAs of equal priorities, NVEs use some randomization 729 input to select among equal-priority NVAs. Such a priority scheme 730 facilitates failover and load balancing, for example, allowing a 731 network operator to specify a set of primary and backup NVAs. 733 [note: should we support the following? It would presumably add 734 considerable complexity to the NVE.] It may be desirable to have 735 individual NVA addresses responsible for a subset of information 736 about an NV Domain. In such a case, NVEs would use different NVA 737 addresses for obtaining or updating information about particular VNs 738 or TS bindings. A key question with such an approach is how 739 information would be partitioned, and how an NVE could determine 740 which address to use to get the information it needs. 742 Another possibility is to treat the information on which NVA 743 addresses to use as cached (soft-state) information at the NVEs, so 744 that any NVA address can be used to obtain any information, but NVEs 745 are informed of preferences for which addresses to use for particular 746 information on VNs or TS bindings. That preference information would 747 be cached for future use to improve behavior - e.g., if all requests 748 for a specific subset of VNs are forwarded to a specific NVA 749 component, the NVE can optimize future requests within that subset by 750 sending them directly to that NVA component via its address. 752 7. NVE-to-NVA Protocol 754 [Note: this and later sections are a bit sketchy and need work. 755 Discussion is encouraged.] 757 As outlined in Section 4.3, an NVE needs certain information in order 758 to perform its functions. To obtain such information from an NVA, an 759 NVE-to-NVA protocol is needed. The NVE-to-NVA protocol provides two 760 functions. First it allows an NVE to obtain information about the 761 location and status of other TSes with which it needs to 762 communication. Second, the NVE-to-NVA protocol provides a way for 763 NVEs to provide updates to the NVA about the TSes attached to that 764 NVE (e.g., when a TS attaches or detaches from the NVE), or about 765 communication errors encountered when sending traffic to remote NVEs. 766 For example, an NVE could indicate that a destination it is trying to 767 reach at a destination NVE is unreachable for some reason. 769 While having a direct NVE-to-NVA protocol might seem straightforward, 770 the existence of existing VM orchestration systems complicates the 771 choices an NVE has for interacting with the NVA. 773 7.1. NVE-NVA Interaction Models 775 An NVE interacts with an NVA in at least two (quite different) ways: 777 o NVEs supporting VMs and hypervisors can obtain necessary 778 information entirely through the hypervisor-facing side of the 779 NVE. Such an approach is a natural extension to existing VM 780 orchestration systems supporting server virtualization because an 781 existing protocol between the hypervisor and VM Orchestration 782 system already exists and can be leveraged to obtain any needed 783 information. Specifically, VM orchestration systems used to 784 create, terminate and migrate VMs already use well-defined (though 785 typically proprietary) protocols to handle the interactions 786 between the hypervisor and VM orchestration system. For such 787 systems, it is a natural extension to leverage the existing 788 orchestration protocol as a sort of proxy protocol for handling 789 the interactions between an NVE and the NVA. Indeed, existing 790 implementation already do this. 792 o Alternatively, an NVE can obtain needed information by interacting 793 directly with an NVA via a protocol operating over the data center 794 underlay network. Such an approach is needed to support NVEs that 795 are not associated with systems performing server virtualization 796 (e.g., as in the case of a standalone gateway) or where the NVE 797 needs to communicate directly with the NVA for other reasons. 799 [Note: The following paragraph is included to stimulate discussion, 800 and the WG will need to decide what direction it wants to take.] 802 The WG The NVO3 architecture should support both of the above models, 803 as in practice, it is likely that both models will coexist in 804 practice and be used simultaneously in a deployment. Existing 805 virtualization environments are already using the first model. But 806 they are not sufficient to cover the case of standalone gateways -- 807 such gateways do not support virtualization and do not interface with 808 existing VM orchestration systems. Also, A hybrid approach might be 809 desirable in some cases where the first model is used to obtain the 810 information, but the latter approach is used to validate and further 811 authenticate the information before using it. 813 7.2. Direct NVE-NVA Protocol 815 An NVE can interact directly with an NVA via an NVE-to-NVA protocol. 816 Such a protocol can be either independent of the NVA internal 817 protocol, or an extension of it. Using a dedicated protocol provides 818 architectural separation and independence between the NVE and NVA. 819 The NVE and NVA interact in a well-defined way, and changes in the 820 NVA (or NVE) do not need to impact each other. Using a dedicated 821 protocol also ensures that both NVE and NVA implementations can 822 evolve independently and without dependencies on each other. Such 823 independence is important because the upgrade path for NVEs and NVAs 824 is quite different. Upgrading all the NVEs at a site will likely be 825 more difficult in practice than upgrading NVAs because of their large 826 number - one on each end device. In practice, it is assumed that an 827 NVE will be implemented once, and then (hopefully) not again, whereas 828 an NVA (and its associated protocols) are more likely to evolve over 829 time as experience is gained from usage. 831 Requirements for a direct NVE-NVA protocol can be found in 832 [I-D.ietf-nvo3-nve-nva-cp-req] 834 7.3. Propagating Information Between NVEs and NVAs 836 [Note: This section has been completely redone to move away from the 837 push/pull discussion at an abstract level.] 839 Information flows between NVEs and NVAs in both directions. The NVA 840 maintains information about all VNs in the NV Domain, so that NVEs do 841 not need to do so themselves. NVEs obtain from the NVA information 842 about where a given remote TS destination resides. NVAs in turn 843 obtain information from NVEs about the individual TSs attached to 844 those NVEs. 846 While the NVA could push information about every virtual network to 847 every NVE, such an approach scales poorly and is unnecessary. In 848 practice, a given NVE will only need and want to know about VNs to 849 which it is attached. Thus, an NVE should be able to subscribe to 850 updates only for the virtual networks it is interested in receiving 851 updates for. The NVO3 architecture supports a model where an NVE is 852 not required to have full mapping tables for all virtual networks in 853 an NV Domain. 855 Before sending unicast traffic to a remote TS, an NVE must know where 856 the remote TS currently resides. When a TS attaches to a virtual 857 network, the NVE obtains information about that VN from the NVA. The 858 NVA can provide that information to the NVE at the time the TS 859 attaches to the VN, either because the NVE requests the information 860 when the attach operation occurs, or because the VM orchestration 861 system has initiated the attach operation and provides associated 862 mapping information to the NVE at the same time. A similar process 863 can take place with regards to obtaining necessary information needed 864 for delivery of tenant broadcast or multicast traffic. 866 There are scenarios where an NVE may wish to query the NVA about 867 individual mappings within an VN. For example, when sending traffic 868 to a remote TS on a remote NVE, that TS may become unavailable (e.g,. 869 because it has migrated elsewhere or has been shutdown, in which case 870 the remote NVE may return an error indication). In such situations, 871 the NVE may need to query the NVA to obtain updated mapping 872 information for a specific TS, or verify that the information is 873 still correct despite the error condition. Note that such a query 874 could also be used by the NVA as an indication that there may be an 875 inconsistency in the network and that it should take steps to verify 876 that the information it has about the current state and location of a 877 specific TS is still correct. 879 For very large virtual networks, the amount of state an NVE needs to 880 maintain for a given virtual network could be significant. Moreover, 881 an NVE may only be communicating with a small subset of the TSes on 882 such a virtual network. In such cases, the NVE may find it desirable 883 to maintain state only for those destinations it is actively 884 communicating with. In such scenarios, an NVE may not want to 885 maintain full mapping information about all destinations on a VN. 886 Should it then need to communicate with a destination for which it 887 does not have have mapping information, however, it will need to be 888 able to query the NVA on demand for the missing information on a per- 889 destination basis. 891 The NVO3 architecture will need to support a range of operations 892 between the NVE and NVA. Requirements for those operations can be 893 found in [I-D.ietf-nvo3-nve-nva-cp-req]. 895 8. Federated NVAs 897 An NVA provides service to the set of NVEs in its NV Domain. Each 898 NVA manages network virtualization information for the virtual 899 networks within its NV Domain. An NV domain is administered by a 900 single entity. 902 In some cases, it will be necessary to expand the scope of a specific 903 VN or even an entire NV domain beyond a single NVA. For example, 904 multiple data centers managed by the same administrator may wish to 905 operate all of its data centers as a single NV region. Such cases 906 are handled by having different NVAs peer with each other to exchange 907 mapping information about specific VNs. NVAs operate in a federated 908 manner with a set of NVAs operating as a loosely-coupled federation 909 of individual NVAs. If a virtual network spans multiple NVAs (e.g., 910 located at different data centers), and an NVE needs to deliver 911 tenant traffic to an NVE at a remote NVA, it still interacts only 912 with its NVA, even when obtaining mappings for NVEs associated with 913 domains at a remote NVA. 915 Figure Figure 3 shows a scenario where two separate NV Domains (1 and 916 2) share information about Virtual Network "1217". VM1 and VM1 both 917 connect to the same Virtual Network (1217), even though the two VMs 918 are in separate NV Domains. There are two cases to consider. In the 919 first case, NV Domain B (NVB) does not allow NVE-A to tunnel traffic 920 directly to NVE-B. There could be a number of reasons for this. For 921 example, NV Domains 1 and 2 may not share a common address space 922 (i.e., require traversal through a NAT device), or for policy 923 reasons, a domain might require that all traffic between separate NV 924 Domains be funneled through a particular device (e.g., a firewall). 925 In such cases, NVA-2 will advertise to NVA-1 that VM1 on virtual 926 network 1217 is available, and direct that traffic between the two 927 nodes go through IP-G. IP-G would then decapsulate received traffic 928 from one NV Domain, translate it appropriately for the other domain 929 and re-encapsulate the packet for delivery. 931 xxxxxx xxxxxx +-----+ 932 +-----+ xxxxxxxx xxxxxx xxxxxxx xxxxx | VM2 | 933 | VM1 | xx xx xxx xx |-----| 934 |-----| xx + x xx x |NVE-B| 935 |NVE-A| x x +----+ x x +-----+ 936 +--+--+ x NV Domain 1 x |IP-G|--x x | 937 +-------x xx--+ | x xx | 938 x x +----+ x NV Domain 2 x | 939 +---x xx xx x---+ 940 | xxxx xx +->xx xx 941 | xxxxxxxxxx | xx xx 942 +---+-+ | xx xx 943 |NVA-1| +--+--+ xx xxx 944 +-----+ |NVA-2| xxxx xxxx 945 +-----+ xxxxxxx 947 Figure 3: VM1 and VM2 are in different NV Domains. 949 NVAs at one site share information and interact with NVAs at other 950 sites, but only in a controlled manner. It is expected that policy 951 and access control will be applied at the boundaries between 952 different sites (and NVAs) so as to minimize dependencies on external 953 NVAs that could negatively impact the operation within a site. It is 954 an architectural principle that operations involving NVAs at one site 955 not be immediately impacted by failures or errors at another site. 956 (Of course, communication between NVEs in different NVO3 domains may 957 be impacted by such failures or errors.) It is a strong requirement 958 that an NVA continue to operate properly for local NVEs even if 959 external communication is interrupted (e.g., should communication 960 between a local and remote NVA fail). 962 At a high level, a federation of interconnected NVAs has some 963 analogies to BGP and Autonomous Systems. Like an Autonomous System, 964 NVAs at one site are managed by a single administrative entity and do 965 not interact with external NVAs except as allowed by policy. 966 Likewise, the interface between NVAs at different sites is well 967 defined, so that the internal details of operations at one site are 968 largely hidden to other sites. Finally, an NVA only peers with other 969 NVAs that it has a trusted relationship with, i.e., where a virtual 970 network is intended to span multiple NVAs. 972 [Note: the following are motivations for having a federated NVA model 973 and are intended for discussion. Depending on discussion, these may 974 be removed from future versions of this document. ] Reasons for using 975 a federated model include: 977 o Provide isolation between NVAs operating at different sites at 978 different geographic locations. 980 o Control the quantity and rate of information updates that flow 981 (and must be processed) between different NVAs in different data 982 centers. 984 o Control the set of external NVAs (and external sites) a site peers 985 with. A site will only peer with other sites that are cooperating 986 in providing an overlay service. 988 o Allow policy to be applied between sites. A site will want to 989 carefully control what information it exports (and to whom) as 990 well as what information it is willing to import (and from whom). 992 o Allow different protocols and architectures to be used to for 993 intra- vs. inter-NVA communication. For example, within a single 994 data center, a replicated transaction server using database 995 techniques might be an attractive implementation option for an 996 NVA, and protocols optimized for intra-NVA communication would 997 likely be different from protocols involving inter-NVA 998 communication between different sites. 1000 o Allow for optimized protocols, rather than using a one-size-fits 1001 all approach. Within a data center, networks tend to have lower- 1002 latency, higher-speed and higher redundancy when compared with WAN 1003 links interconnecting data centers. The design constraints and 1004 tradeoffs for a protocol operating within a data center network 1005 are different from those operating over WAN links. While a single 1006 protocol could be used for both cases, there could be advantages 1007 to using different and more specialized protocols for the intra- 1008 and inter-NVA case. 1010 8.1. Inter-NVA Peering 1012 To support peering between different NVAs, an inter-NVA protocol is 1013 needed. The inter-NVA protocol defines what information is exchanged 1014 between NVAs. It is assumed that the protocol will be used to share 1015 addressing information between data centers and must scale well over 1016 WAN links. 1018 9. Control Protocol Work Areas 1020 The NVO3 architecture consists of two major distinct entities: NVEs 1021 and NVAs. In order to provide isolation and independence between 1022 these two entities, the NVO3 architecture calls for well defined 1023 protocols for interfacing between them. For an individual NVA, the 1024 architecture calls for a single conceptual entity, that could be 1025 implemented in a distributed or replicated fashion. While the IETF 1026 may choose to define one or more specific architectural approaches to 1027 building individual NVAs, there is little need for it to pick exactly 1028 one approach to the exclusion of others. An NVA for a single domain 1029 will likely be deployed as a single vendor product and thus their is 1030 little benefit in standardizing the internal structure of an NVA. 1032 Individual NVAs peer with each other in a federated manner. The NVO3 1033 architecture calls for a well-defined interface between NVAs. 1035 Finally, a hypervisor-to-NVE protocol is needed to cover the split- 1036 NVE scenario described in Section 4.2. 1038 10. NVO3 Data Plane Encapsulation 1040 When tunneling tenant traffic, NVEs add encapsulation header to the 1041 original tenant packet. The exact encapsulation to use for NVO3 does 1042 not seem to be critical. The main requirement is that the 1043 encapsulation support a Context ID of sufficient size 1044 [I-D.ietf-nvo3-dataplane-requirements]. A number of encapsulations 1045 already exist that provide a VN Context of sufficient size for NVO3. 1046 For example, VXLAN [I-D.mahalingam-dutt-dcops-vxlan] has a 24-bit 1047 VXLAN Network Identifier (VNI). NVGRE 1048 [I-D.sridharan-virtualization-nvgre] has a 24-bit Tenant Network ID 1049 (TNI). MPLS-over-GRE provides a 20-bit label field. While there is 1050 widespread recognition that a 12-bit VN Context would be too small 1051 (only 4096 distinct values), it is generally agreed that 20 bits (1 1052 million distinct values) and 24 bits (16.8 million distinct values) 1053 are sufficient for a wide variety of deployment scenarios. 1055 [Note: the following paragraph is included for WG discussion. Future 1056 versions of this document may omit this text.] 1058 While one might argue that a new encapsulation should be defined just 1059 for NVO3, no compelling requirements for doing so have been 1060 identified yet. Moreover, optimized implementations for existing 1061 encapsulations are already starting to become available on the market 1062 (i.e., in silicon). If the IETF were to define a new encapsulation 1063 format, it would take at least 2 (and likely more) years before 1064 optimized implementations of the new format would become available in 1065 products. In addition, a new encapsulation format would not likely 1066 displace existing formats, at least not for years. Thus, there seems 1067 little reason to define a new encapsulation. However, it does make 1068 sense for NVO3 to support multiple encapsulation formats, so as to 1069 allow NVEs to use their preferred encapsulations when possible. This 1070 implies that the address dissemination protocols must also include an 1071 indication of supported encapsulations along with the address mapping 1072 details. 1074 11. Operations and Management 1076 The simplicity of operating and debugging overlay networks will be 1077 critical for successful deployment. Some architectural choices can 1078 facilitate or hinder OAM. Related OAM drafts include 1079 [I-D.ashwood-nvo3-operational-requirement]. 1081 12. Summary 1083 This document provides a start at a general architecture for overlays 1084 in NVO3. The architecture calls for three main areas of protocol 1085 work: 1087 1. A hypervisor-to-NVE protocol to support Split NVEs as discussed 1088 in Section 4.2. 1090 2. An NVE to NVA protocol for address dissemination. 1092 3. An NVA-to-NVA protocol for exchange of information about specific 1093 virtual networks between NVAs. 1095 It should be noted that existing protocols or extensions of existing 1096 protocols are applicable. 1098 13. Acknowledgments 1100 Helpful comments and improvements to this document have come from 1101 Lizhong Jin, Dennis (Xiaohong) Qin and Lucy Yong. 1103 14. IANA Considerations 1105 This memo includes no request to IANA. 1107 15. Security Considerations 1109 Yep, kind of sparse. But we'll get there eventually. :-) 1111 16. Informative References 1113 [I-D.ashwood-nvo3-operational-requirement] 1114 Ashwood-Smith, P., Iyengar, R., Tsou, T., Sajassi, A., 1115 Boucadair, M., Jacquenet, C., and M. Daikoku, "NVO3 1116 Operational Requirements", draft-ashwood-nvo3-operational- 1117 requirement-03 (work in progress), July 2013. 1119 [I-D.ietf-nvo3-dataplane-requirements] 1120 Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., 1121 and B. Khasnabish, "NVO3 Data Plane Requirements", draft- 1122 ietf-nvo3-dataplane-requirements-01 (work in progress), 1123 July 2013. 1125 [I-D.ietf-nvo3-framework] 1126 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1127 Rekhter, "Framework for DC Network Virtualization", draft- 1128 ietf-nvo3-framework-03 (work in progress), July 2013. 1130 [I-D.ietf-nvo3-nve-nva-cp-req] 1131 Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network 1132 Virtualization NVE to NVA Control Protocol Requirements", 1133 draft-ietf-nvo3-nve-nva-cp-req-00 (work in progress), July 1134 2013. 1136 [I-D.ietf-nvo3-overlay-problem-statement] 1137 Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., 1138 and M. Napierala, "Problem Statement: Overlays for Network 1139 Virtualization", draft-ietf-nvo3-overlay-problem- 1140 statement-04 (work in progress), July 2013. 1142 [I-D.kreeger-nvo3-hypervisor-nve-cp] 1143 Kreeger, L., Narten, T., and D. Black, "Network 1144 Virtualization Hypervisor-to-NVE Overlay Control Protocol 1145 Requirements", draft-kreeger-nvo3-hypervisor-nve-cp-01 1146 (work in progress), February 2013. 1148 [I-D.mahalingam-dutt-dcops-vxlan] 1149 Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1150 L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A 1151 Framework for Overlaying Virtualized Layer 2 Networks over 1152 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-05 1153 (work in progress), October 2013. 1155 [I-D.sridharan-virtualization-nvgre] 1156 Sridharan, M., Greenberg, A., Wang, Y., Garg, P., 1157 Venkataramiah, N., Duda, K., Ganga, I., Lin, G., Pearson, 1158 M., Thaler, P., and C. Tumuluri, "NVGRE: Network 1159 Virtualization using Generic Routing Encapsulation", 1160 draft-sridharan-virtualization-nvgre-03 (work in 1161 progress), August 2013. 1163 [IEEE-802.1Q] 1164 IEEE 802.1Q-2011, ., "IEEE standard for local and 1165 metropolitan area networks: Media access control (MAC) 1166 bridges and virtual bridged local area networks, ", August 1167 2011. 1169 Appendix A. Change Log 1171 A.1. Changes From -00 to -01 1173 1. Editorial and clarity improvements. 1175 2. Replaced "push vs. pull" section with section more focussed on 1176 triggers where an event implies or triggers some action. 1178 3. Clarified text on co-located NVE to show how offloading NVE 1179 functionality onto adaptors is desirable. 1181 4. Added new section on distributed gateways. 1183 5. Expanded Section on NVA external interface, adding requirement 1184 for NVE to support multiple IP NVA addresses. 1186 Authors' Addresses 1188 David Black 1189 EMC 1191 Email: david.black@emc.com 1193 Jon Hudson 1194 Brocade 1195 120 Holger Way 1196 San Jose, CA 95134 1197 USA 1199 Email: jon.hudson@gmail.com 1201 Lawrence Kreeger 1202 Cisco 1204 Email: kreeger@cisco.com 1205 Marc Lasserre 1206 Alcatel-Lucent 1208 Email: marc.lasserre@alcatel-lucent.com 1210 Thomas Narten 1211 IBM 1213 Email: narten@us.ibm.com