idnits 2.17.1 draft-ietf-nvo3-arch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 932 has weird spacing: '...xxxxxxx xxx...' -- The document date (December 17, 2013) is 3776 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-03) exists of draft-ietf-nvo3-dataplane-requirements-02 == Outdated reference: A later version (-09) exists of draft-ietf-nvo3-framework-04 == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-01 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-06 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-03 Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force D. Black 3 Internet-Draft EMC 4 Intended status: Informational J. Hudson 5 Expires: June 20, 2014 Brocade 6 L. Kreeger 7 Cisco 8 M. Lasserre 9 Alcatel-Lucent 10 T. Narten 11 IBM 12 December 17, 2013 14 An Architecture for Overlay Networks (NVO3) 15 draft-ietf-nvo3-arch-00 17 Abstract 19 This document presents a high-level overview architecture for 20 building overlay networks in NVO3. The architecture is given at a 21 high-level, showing the major components of an overall system. An 22 important goal is to divide the space into individual smaller 23 components that can be implemented independently and with clear 24 interfaces and interactions with other components. It should be 25 possible to build and implement individual components in isolation 26 and have them work with other components with no changes to other 27 components. That way implementers have flexibility in implementing 28 individual components and can optimize and innovate within their 29 respective components without requiring changes to other components. 31 Status of This Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on June 20, 2014. 48 Copyright Notice 50 Copyright (c) 2013 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 3.1. VN Service (L2 and L3) . . . . . . . . . . . . . . . . . 5 69 3.2. Network Virtualization Edge (NVE) . . . . . . . . . . . . 6 70 3.3. Network Virtualization Authority (NVA) . . . . . . . . . 8 71 3.4. VM Orchestration Systems . . . . . . . . . . . . . . . . 8 72 4. Network Virtualization Edge (NVE) . . . . . . . . . . . . . . 9 73 4.1. NVE Co-located With Server Hypervisor . . . . . . . . . . 10 74 4.2. Split-NVE . . . . . . . . . . . . . . . . . . . . . . . . 10 75 4.3. NVE State . . . . . . . . . . . . . . . . . . . . . . . . 11 76 5. Tenant System Types . . . . . . . . . . . . . . . . . . . . . 12 77 5.1. Overlay-Aware Network Service Appliances . . . . . . . . 12 78 5.2. Bare Metal Servers . . . . . . . . . . . . . . . . . . . 12 79 5.3. Gateways . . . . . . . . . . . . . . . . . . . . . . . . 13 80 5.4. Distributed Gateways . . . . . . . . . . . . . . . . . . 13 81 6. Network Virtualization Authority . . . . . . . . . . . . . . 14 82 6.1. How an NVA Obtains Information . . . . . . . . . . . . . 14 83 6.2. Internal NVA Architecture . . . . . . . . . . . . . . . . 15 84 6.3. NVA External Interface . . . . . . . . . . . . . . . . . 15 85 7. NVE-to-NVA Protocol . . . . . . . . . . . . . . . . . . . . . 17 86 7.1. NVE-NVA Interaction Models . . . . . . . . . . . . . . . 17 87 7.2. Direct NVE-NVA Protocol . . . . . . . . . . . . . . . . . 18 88 7.3. Propagating Information Between NVEs and NVAs . . . . . . 19 89 8. Federated NVAs . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1. Inter-NVA Peering . . . . . . . . . . . . . . . . . . . . 22 91 9. Control Protocol Work Areas . . . . . . . . . . . . . . . . . 23 92 10. NVO3 Data Plane Encapsulation . . . . . . . . . . . . . . . . 23 93 11. Operations and Management . . . . . . . . . . . . . . . . . . 24 94 12. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 95 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 96 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 97 15. Security Considerations . . . . . . . . . . . . . . . . . . . 24 98 16. Informative References . . . . . . . . . . . . . . . . . . . 24 99 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 26 100 A.1. Changes From draft-narten-nvo3 to draft-ietf-nvo3 . . . . 26 101 A.2. Changes From -00 to -01 (of draft-narten-nvo3-arch) . . . 26 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 104 1. Introduction 106 This document presents a high-level architecture for building overlay 107 networks in NVO3. The architecture is given at a high-level, showing 108 the major components of an overall system. An important goal is to 109 divide the space into smaller individual components that can be 110 implemented independently and with clear interfaces and interactions 111 with other components. It should be possible to build and implement 112 individual components in isolation and have them work with other 113 components with no changes to other components. That way 114 implementers have flexibility in implementing individual components 115 and can optimize and innovate within their respective components 116 without necessarily requiring changes to other components. 118 The motivation for overlay networks is given in 119 [I-D.ietf-nvo3-overlay-problem-statement]. "Framework for DC Network 120 Virtualization" [I-D.ietf-nvo3-framework] provides a framework for 121 discussing overlay networks generally and the various components that 122 must work together in building such systems. This document differs 123 from the framework document in that it doesn't attempt to cover all 124 possible approaches within the general design space. Rather, it 125 describes one particular approach. 127 This document is intended to be a concrete strawman that can be used 128 for discussion within the IETF NVO3 WG on what the NVO3 architecture 129 should look like. 131 2. Terminology 133 This document uses the same terminology as [I-D.ietf-nvo3-framework]. 134 In addition, the following terms are used: 136 NV Domain A Network Virtualization Domain is an administrative 137 construct that defines a Network Virtualization Authority (NVA), 138 the set of Network Virtualization Edges (NVEs) associated with 139 that NVA, and the set of virtual networks the NVA manages and 140 supports. NVEs are associated with a (logically centralized) NVA, 141 and an NVE supports communication for any of the virtual networks 142 in the domain. 144 NV Region A region over which information about a set of virtual 145 networks is shared. The degenerate case of a single NV Domain 146 corresponds to an NV region corresponding to that domain. The 147 more interesting case occurs when two or more NV Domains share 148 information about part or all of a set of virtual networks that 149 they manage. Two NVAs share information about particular virtual 150 networks for the purpose of supporting connectivity between 151 tenants located in different NVA Domains. NVAs can share 152 information about an entire NV domain, or just individual virtual 153 networks. 155 Tenant System Identifier (TSI) Interface to a Virtual Network as 156 presented to a Tenant System. The TSI logically connects to the 157 NVE via a Virtual Access Point (VAP). To the Tenant System, the 158 TSI is like a NIC; the TSI presents itself to a Tenant System as a 159 normal network interface. 161 3. Background 163 Overlay networks are an approach for providing network virtualization 164 services to a set of Tenant Systems (TSs) [I-D.ietf-nvo3-framework]. 165 With overlays, data traffic between tenants is tunneled across the 166 underlying data center's IP network. The use of tunnels provides a 167 number of benefits by decoupling the network as viewed by tenants 168 from the underlying physical network across which they communicate. 170 Tenant Systems connect to Virtual Networks (VNs), with each VN having 171 associated attributes defining properties of the network, such as the 172 set of members that connect to it. Tenant Systems connected to a 173 virtual network typically communicate freely with other Tenant 174 Systems on the same VN, but communication between Tenant Systems on 175 one VN and those external to the VN (whether on another VN or 176 connected to the Internet) is carefully controlled and governed by 177 policy. 179 A Network Virtualization Edge (NVE) [I-D.ietf-nvo3-framework] is the 180 entity that implements the overlay functionality. An NVE resides at 181 the boundary between a Tenant System and the overlay network as shown 182 in Figure 1. An NVE creates and maintains local state about each 183 Virtual Network for which it is providing service on behalf of a 184 Tenant System. 186 +--------+ +--------+ 187 | Tenant +--+ +----| Tenant | 188 | System | | (') | System | 189 +--------+ | ................ ( ) +--------+ 190 | +-+--+ . . +--+-+ (_) 191 | | NVE|--. .--| NVE| | 192 +--| | . . | |---+ 193 +-+--+ . . +--+-+ 194 / . . 195 / . L3 Overlay . +--+-++--------+ 196 +--------+ / . Network . | NVE|| Tenant | 197 | Tenant +--+ . .- -| || System | 198 | System | . . +--+-++--------+ 199 +--------+ ................ 200 | 201 +----+ 202 | NVE| 203 | | 204 +----+ 205 | 206 | 207 ===================== 208 | | 209 +--------+ +--------+ 210 | Tenant | | Tenant | 211 | System | | System | 212 +--------+ +--------+ 214 The dotted line indicates a network connection (i.e., IP). 216 Figure 1: NVO3 Generic Reference Model 218 The following subsections describe key aspects of an overlay system 219 in more detail. Section 3.1 describes the service model (Ethernet 220 vs. IP) provided to Tenant Systems. Section 3.2 describes NVEs in 221 more detail. Section 3.3 introduces the Network Virtualization 222 Authority, from which NVEs obtain information about virtual networks. 223 Section 3.4 provides background on VM orchestration systems and their 224 use of virtual networks. 226 3.1. VN Service (L2 and L3) 228 A Virtual Network provides either L2 or L3 service to connected 229 tenants. For L2 service, VNs transport Ethernet frames, and a Tenant 230 System is provided with a service that is analogous to being 231 connected to a specific L2 C-VLAN. L2 broadcast frames are delivered 232 to all (and multicast frames delivered to a subset of) the other 233 Tenant Systems on the VN. To a Tenant System, it appears as if they 234 are connected to a regular L2 Ethernet link. Within NVO3, tenant 235 frames are tunneled to remote NVEs based on the MAC addresses of the 236 frame headers as originated by the Tenant System. On the underlay, 237 NVO3 packets are forwarded between NVEs based on the outer addresses 238 of tunneled packets. 240 For L3 service, VNs transport IP datagrams, and a Tenant System is 241 provided with a service that only supports IP traffic. Within NVO3, 242 tenant frames are tunneled to remote NVEs based on the IP addresses 243 of the packet originated by the Tenant System; any L2 destination 244 addresses provided by Tenant Systems are effectively ignored. 246 L2 service is intended for systems that need native L2 Ethernet 247 service and the ability to run protocols directly over Ethernet 248 (i.e., not based on IP). L3 service is intended for systems in which 249 all the traffic can safely be assumed to be IP. It is important to 250 note that whether NVO3 provides L2 or L3 service to a Tenant System, 251 the Tenant System does not generally need to be aware of the 252 distinction. In both cases, the virtual network presents itself to 253 the Tenant System as an L2 Ethernet interface. An Ethernet interface 254 is used in both cases simply as a widely supported interface type 255 that essentially all Tenant Systems already support. Consequently, 256 no special software is needed on Tenant Systems to use an L3 vs. an 257 L2 overlay service. 259 3.2. Network Virtualization Edge (NVE) 261 Tenant Systems connect to NVEs via a Tenant System Interface (TSI). 262 The TSI logically connects to the NVE via a Virtual Access Point 263 (VAP) as shown in Figure 2. To the Tenant System, the TSI is like a 264 NIC; the TSI presents itself to a Tenant System as a normal network 265 interface. On the NVE side, a VAP is a logical network port (virtual 266 or physical) into a specific virtual network. Note that two 267 different Tenant Systems (and TSIs) attached to a common NVE can 268 share a VAP (e.g., TS1 and TS2 in Figure 2) so long as they connect 269 to the same Virtual Network. 271 | Data Center Network (IP) | 272 | | 273 +-----------------------------------------+ 274 | | 275 | Tunnel Overlay | 276 +------------+---------+ +---------+------------+ 277 | +----------+-------+ | | +-------+----------+ | 278 | | Overlay Module | | | | Overlay Module | | 279 | +---------+--------+ | | +---------+--------+ | 280 | | | | | | 281 NVE1 | | | | | | NVE2 282 | +--------+-------+ | | +--------+-------+ | 283 | | |VNI1| |VNI2| | | | |VNI1| |VNI2| | 284 | +-+----------+---+ | | +-+-----------+--+ | 285 | | VAP1 | VAP2 | | | VAP1 | VAP2| 286 +----+------------+----+ +----+-----------+ ----+ 287 | | | | 288 |\ | | | 289 | \ | | /| 290 -------+--\-------+-------------------+---------/-+------- 291 | \ | Tenant | / | 292 TSI1 |TSI2\ | TSI3 TSI1 TSI2/ TSI3 293 +---+ +---+ +---+ +---+ +---+ +---+ 294 |TS1| |TS2| |TS3| |TS4| |TS5| |TS6| 295 +---+ +---+ +---+ +---+ +---+ +---+ 297 Figure 2: NVE Reference Model 299 The Overlay Module performs the actual encapsulation and 300 decapsulation of tunneled packets. The NVE maintains state about the 301 virtual networks it is a part of so that it can provide the Overlay 302 Module with such information as the destination address of the NVE to 303 tunnel a packet to, or the Context ID that should be placed in the 304 encapsulation header to identify the virtual network a tunneled 305 packet belong to. 307 On the data center network side, the NVE sends and receives native IP 308 traffic. When ingressing traffic from a Tenant System, the NVE 309 identifies the egress NVE to which the packet should be sent, adds an 310 overlay encapsulation header, and sends the packet on the underlay 311 network. When receiving traffic from a remote NVE, an NVE strips off 312 the encapsulation header, and delivers the (original) packet to the 313 appropriate Tenant System. 315 Conceptually, the NVE is a single entity implementing the NVO3 316 functionality. In practice, there are a number of different 317 implementation scenarios, as described in detail in Section 4. 319 3.3. Network Virtualization Authority (NVA) 321 Address dissemination refers to the process of learning, building and 322 distributing the mapping/forwarding information that NVEs need in 323 order to tunnel traffic to each other on behalf of communicating 324 Tenant Systems. For example, in order to send traffic to a remote 325 Tenant System, the sending NVE must know the destination NVE for that 326 Tenant System. 328 One way to build and maintain mapping tables is to use learning, as 329 802.1 bridges do [IEEE-802.1Q]. When forwarding traffic to multicast 330 or unknown unicast destinations, an NVE could simply flood traffic 331 everywhere. While flooding works, it can lead to traffic hot spots 332 and can lead to problems in larger networks. 334 Alternatively, NVEs can make use of a Network Virtualization 335 Authority (NVA). An NVA is the entity that provides address mapping 336 and other information to NVEs. NVEs interact with an NVA to obtain 337 any required address mapping information they need in order to 338 properly forward traffic on behalf of tenants. The term NVA refers 339 to the overall system, without regards to its scope or how it is 340 implemented. NVAs provide a service, and NVEs access that service 341 via an NVE-to-NVA protocol. 343 Even when an NVA is present, learning could be used as a fallback 344 mechanism, should the NVA be unable to provide an answer or for other 345 reasons. This document does not consider flooding approaches in 346 detail, as there are a number of benefits in using an approach that 347 depends on the presence of an NVA. 349 NVAs are discussed in more detail in Section 6. 351 3.4. VM Orchestration Systems 353 VM Orchestration systems manage server virtualization across a set of 354 servers. Although VM management is a separate topic from network 355 virtualization, the two areas are closely related. Managing the 356 creation, placement, and movements of VMs also involves creating, 357 attaching to and detaching from virtual networks. A number of 358 existing VM orchestration systems have incorporated aspects of 359 virtual network management into their systems. 361 When a new VM image is started, the VM Orchestration system 362 determines where the VM should be placed, interacts with the 363 hypervisor on the target server to load and start the server and 364 controls when a VM should be shutdown or migrated elsewhere. VM 365 Orchestration systems also have knowledge about how a VM should 366 connect to a network, possibly including the name of the virtual 367 network to which a VM is to connect. The VM orchestration system can 368 pass such information to the hypervisor when a VM is instantiated. 369 VM orchestration systems have significant (and sometimes global) 370 knowledge over the domain they manage. They typically know on what 371 servers a VM is running, and meta data associated with VM images can 372 be useful from a network virtualization perspective. For example, 373 the meta data may include the addresses (MAC and IP) the VMs will use 374 and the name(s) of the virtual network(s) they connect to. 376 VM orchestration systems run a protocol with an agent running on the 377 hypervisor of the servers they manage. That protocol can also carry 378 information about what virtual network a VM is associated with. When 379 the orchestrator instantiates a VM on a hypervisor, the hypervisor 380 interacts with the NVE in order to attach the VM to the virtual 381 networks it has access to. In general, the hypervisor will need to 382 communicate significant VM state changes to the NVE. In the reverse 383 direction, the NVE may need to communicate network connectivity 384 information back to the hypervisor. Example VM orchestration systems 385 in use today include VMware's vCenter Server or Microsoft's System 386 Center Virtual Machine Manager. Both can pass information about what 387 virtual networks a VM connects to down to the hypervisor. The 388 protocol used between the VM orchestration system and hypervisors is 389 generally proprietary. 391 It should be noted that VM orchestration systems may not have direct 392 access to all networking related information a VM uses. For example, 393 a VM may make use of additional IP or MAC addresses that the VM 394 management system is not aware of. 396 4. Network Virtualization Edge (NVE) 398 As introduced in Section 3.2 an NVE is the entity that implements the 399 overlay functionality. This section describes NVEs in more detail. 400 An NVE will have two external interfaces: 402 Tenant Facing: On the tenant facing side, an NVE interacts with the 403 hypervisor (or equivalent entity) to provide the NVO3 service. An 404 NVE will need to be notified when a Tenant System "attaches" to a 405 virtual network (so it can validate the request and set up any 406 state needed to send and receive traffic on behalf of the Tenant 407 System on that VN). Likewise, an NVE will need to be informed 408 when the Tenant System "detaches" from the virtual network so that 409 it can reclaim state and resources appropriately. 411 DCN Facing: On the data center network facing side, an NVE 412 interfaces with the data center underlay network, sending and 413 receiving tunneled IP packets to and from the underlay. The NVE 414 may also run a control protocol with other entities on the 415 network, such as the Network Virtualization Authority. 417 4.1. NVE Co-located With Server Hypervisor 419 When server virtualization is used, the entire NVE functionality will 420 typically be implemented as part of the hypervisor and/or virtual 421 switch on the server. In such cases, the Tenant System interacts 422 with the hypervisor and the hypervisor interacts with the NVE. 423 Because the interaction between the hypervisor and NVE is implemented 424 entirely in software on the server, there is no "on-the-wire" 425 protocol between Tenant Systems (or the hypervisor) and the NVE that 426 needs to be standardized. While there may be APIs between the NVE 427 and hypervisor to support necessary interaction, the details of such 428 an API are not in-scope for the IETF to work on. 430 Implementing NVE functionality entirely on a server has the 431 disadvantage that server CPU resources must be spent implementing the 432 NVO3 functionality. Experimentation with overlay approaches and 433 previous experience with TCP and checksum adapter offloads suggests 434 that offloading certain NVE operations (e.g., encapsulation and 435 decapsulation operations) onto the physical network adaptor can 436 produce performance improvements. As has been done with checksum and 437 /or TCP server offload and other optimization approaches, there may 438 be benefits to offloading common operations onto adaptors where 439 possible. Just as important, the addition of an overlay header can 440 disable existing adaptor offload capabilities that are generally not 441 prepared to handle the addition of a new header or other operations 442 associated with an NVE. 444 While the details of how to split the implementation of specific NVE 445 functionality between a server and its network adaptors is outside 446 the scope of IETF standardization, the NVO3 architecture should 447 support such separation. Ideally, it may even be possible to bypass 448 the hypervisor completely on critical data path operations so that 449 packets between a TS and its VN can be sent and received without 450 having the hypervisor involved in each individual packet operation. 452 4.2. Split-NVE 454 Another possible scenario leads to the need for a split NVE 455 implementation. A hypervisor running on a server could be aware that 456 NVO3 is in use, but have some of the actual NVO3 functionality 457 implemented on an adjacent switch to which the server is attached. 458 While one could imagine a number of link types between a server and 459 the NVE, the simplest deployment scenario would involve a server and 460 NVE separated by a simple L2 Ethernet link, across which LLDP runs. 461 A more complicated scenario would have the server and NVE separated 462 by a bridged access network, such as when the NVE resides on a ToR, 463 with an embedded switch residing between servers and the ToR. 465 While the above talks about a scenario involving a hypervisor, it 466 should be noted that the same scenario can apply to Network Service 467 Appliances as discussed in Section 5.1. In general, when this 468 document discusses the interaction between a hypervisor and NVE, the 469 discussion applies to Network Service Appliances as well. 471 For the split NVE case, protocols will be needed that allow the 472 hypervisor and NVE to negotiate and setup the necessary state so that 473 traffic sent across the access link between a server and the NVE can 474 be associated with the correct virtual network instance. 475 Specifically, on the access link, traffic belonging to a specific 476 Tenant System would be tagged with a specific VLAN C-TAG that 477 identifies which specific NVO3 virtual network instance it belongs 478 to. The hypervisor-NVE protocol would negotiate which VLAN C-TAG to 479 use for a particular virtual network instance. More details of the 480 protocol requirements for functionality between hypervisors and NVEs 481 can be found in [I-D.kreeger-nvo3-hypervisor-nve-cp]. 483 4.3. NVE State 485 NVEs maintain internal data structures and state to support the 486 sending and receiving of tenant traffic. An NVE may need some or all 487 of the following information: 489 1. An NVE keeps track of which attached Tenant Systems are connected 490 to which virtual networks. When a Tenant System attaches to a 491 virtual network, the NVE will need to create or update local 492 state for that virtual network. When the last Tenant System 493 detaches from a given VN, the NVE can reclaim state associated 494 with that VN. 496 2. For tenant unicast traffic, an NVE maintains a per-VN table of 497 mappings from Tenant System (inner) addresses to remote NVE 498 (outer) addresses. 500 3. For tenant multicast (or broadcast) traffic, an NVE maintains a 501 per-VN table of mappings and other information on how to deliver 502 multicast (or broadcast) traffic. If the underlying network 503 supports IP multicast, the NVE could use IP multicast to deliver 504 tenant traffic. In such a case, the NVE would need to know what 505 IP underlay multicast address to use for a given VN. 506 Alternatively, if the underlying network does not support 507 multicast, an NVE could use serial unicast to deliver traffic. 508 In such a case, an NVE would need to know which destinations are 509 subscribers to the tenant multicast group. An NVE could use both 510 approaches, switching from one mode to the other depending on 511 such factors as bandwidth efficiency and group membership 512 sparseness. 514 4. An NVE maintains necessary information to encapsulate outgoing 515 traffic, including what type of encapsulation and what value to 516 use for a Context ID within the encapsulation header. 518 5. In order to deliver incoming encapsulated packets to the correct 519 Tenant Systems, an NVE maintains the necessary information to map 520 incoming traffic to the appropriate VAP and Tenant System. 522 6. An NVE may find it convenient to maintain additional per-VN 523 information such as QoS settings, Path MTU information, ACLs, 524 etc. 526 5. Tenant System Types 528 This section describes a number of special Tenant System types and 529 how they fit into an NVO3 system. 531 5.1. Overlay-Aware Network Service Appliances 533 Some Network Service Appliances [I-D.ietf-nvo3-nve-nva-cp-req] 534 (virtual or physical) provide tenant-aware services. That is, the 535 specific service they provide depends on the identity of the tenant 536 making use of the service. For example, firewalls are now becoming 537 available that support multi-tenancy where a single firewall provides 538 virtual firewall service on a per-tenant basis, using per-tenant 539 configuration rules and maintaining per-tenant state. Such 540 appliances will be aware of the VN an activity corresponds to while 541 processing requests. Unlike server virtualization, which shields VMs 542 from needing to know about multi-tenancy, a Network Service Appliance 543 explicitly supports multi-tenancy. In such cases, the Network 544 Service Appliance itself will be aware of network virtualization and 545 either embed an NVE directly, or implement a split NVE as described 546 in Section 4.2. Unlike server virtualization, however, the Network 547 Service Appliance will not be running a traditional hypervisor and 548 the VM Orchestration system may not interact with the Network Service 549 Appliance. The NVE on such appliances will need to support a control 550 plane to obtain the necessary information needed to fully participate 551 in an NVO3 Domain. 553 5.2. Bare Metal Servers 554 Many data centers will continue to have at least some servers 555 operating as non-virtualized (or "bare metal") machines running a 556 traditional operating system and workload. In such systems, there 557 will be no NVE functionality on the server, and the server will have 558 no knowledge of NVO3 (including whether overlays are even in use). 559 In such environments, the NVE functionality can reside on the first- 560 hop physical switch. In such a case, the network administrator would 561 (manually) configure the switch to enable the appropriate NVO3 562 functionality on the switch port connecting the server and associate 563 that port with a specific virtual network. Such configuration would 564 typically be static, since the server is not virtualized, and once 565 configured, is unlikely to change frequently. Consequently, this 566 scenario does not require any protocol or standards work. 568 5.3. Gateways 570 Gateways on VNs relay traffic onto and off of a virtual network. 571 Tenant Systems use gateways to reach destinations outside of the 572 local VN. Gateways receive encapsulated traffic from one VN, remove 573 the encapsulation header, and send the native packet out onto the 574 data center network for delivery. Outside traffic enters a VN in a 575 reverse manner. 577 Gateways can be either virtual (i.e., implemented as a VM) or 578 physical (i.e., as a standalone physical device). For performance 579 reasons, standalone hardware gateways may be desirable in some cases. 580 Such gateways could consist of a simple switch forwarding traffic 581 from a VN onto the local data center network, or could embed router 582 functionality. On such gateways, network interfaces connecting to 583 virtual networks will (at least conceptually) embed NVE (or split- 584 NVE) functionality within them. As in the case with Network Service 585 Appliances, gateways will not support a hypervisor and will need an 586 appropriate control plane protocol to obtain the information needed 587 to provide NVO3 service. 589 Gateways handle several different use cases. For example, a virtual 590 network could consist of systems supporting overlays together with 591 legacy Tenant Systems that do not. Gateways could be used to connect 592 legacy systems supporting, e.g., L2 VLANs, to specific virtual 593 networks, effectively making them part of the same virtual network. 594 Gateways could also forward traffic between a virtual network and 595 other hosts on the data center network or relay traffic between 596 different VNs. Finally, gateways can provide external connectivity 597 such as Internet or VPN access. 599 5.4. Distributed Gateways 600 The relaying of traffic from one VN to another deserves special 601 consideration. The previous section described gateways performing 602 this function. If such gateways are centralized, traffic between 603 TSes on different VNs can take suboptimal paths, i.e., triangular 604 routing results in paths that always traverse the gateway. As an 605 optimization, individual NVEs can be part of a distributed gateway 606 that performs such relaying, reducing or completely eliminating 607 triangular routing. In a distributed gateway, each ingress NVE can 608 perform such relaying activity directly, so long as it has access to 609 the policy information needed to determine whether cross-VN 610 communication is allowed. Having individual NVEs be part of a 611 distributed gateway allows them to tunnel traffic directly to the 612 destination NVE without the need to take suboptimal paths. 614 The NVO3 architecture should [must? or just say it does?] support 615 distributed gateways. Such support requires that NVO3 control 616 protocols include mechanisms for the maintenance and distribution of 617 policy information about what type of cross-VN communication is 618 allowed so that NVEs acting as distributed gateways can tunnel 619 traffic from one VN to another as appropriate. 621 6. Network Virtualization Authority 623 Before sending to and receiving traffic from a virtual network, an 624 NVE must obtain the information needed to build its internal 625 forwarding tables and state as listed in Section 4.3. An NVE obtains 626 such information from a Network Virtualization Authority. 628 The Network Virtualization Authority (NVA) is the entity that 629 provides address mapping and other information to NVEs. NVEs 630 interact with an NVA to obtain any required information they need in 631 order to properly forward traffic on behalf of tenants. The term NVA 632 refers to the overall system, without regards to its scope or how it 633 is implemented. 635 6.1. How an NVA Obtains Information 637 There are two primary ways in which an NVA can obtain the address 638 dissemination information it manages. The NVA can obtain information 639 either from the VM orchestration system, or directly from the NVEs 640 themselves. 642 On virtualized systems, the NVA may be able to obtain the address 643 mapping information associated with VMs from the VM orchestration 644 system itself. If the VM orchestration system contains a master 645 database for all the virtualization information, having the NVA 646 obtain information directly to the orchestration system would be a 647 natural approach. Indeed, the NVA could effectively be co-located 648 with the VM orchestration system itself. In such systems, the VM 649 orchestration system communicates with the NVE indirectly through the 650 hypervisor. 652 However, as described in Section 4 not all NVEs are associated with 653 hypervisors. In such cases, NVAs cannot leverage VM orchestration 654 protocols to interact with an NVE and will instead need to peer 655 directly with them. By peering directly with an NVE, NVAs can obtain 656 information about the TSes connected to that NVE and can distribute 657 information to the NVE about the VNs those TSes are associated with. 658 For example, whenever a Tenant System attaches to an NVE, that NVE 659 would notify the NVA that the TS is now associated with that NVE. 660 Likewise when a TS detaches from an NVE, that NVE would inform the 661 NVA. By communicating directly with NVEs, both the NVA and the NVE 662 are able to maintain up-to-date information about all active tenants 663 and the NVEs to which they are attached. 665 6.2. Internal NVA Architecture 667 For reliability and fault tolerance reasons, an NVA would be 668 implemented in a distributed or replicated manner without single 669 points of failure. How the NVA is implemented, however, is not 670 important to an NVE so long as the NVA provides a consistent and 671 well-defined interface to the NVE. For example, an NVA could be 672 implemented via database techniques whereby a server stores address 673 mapping information in a traditional (possibly replicated) database. 674 Alternatively, an NVA could be implemented in a distributed fashion 675 using an existing (or modified) routing protocol to maintain and 676 distribute mappings. So long as there is a clear interface between 677 the NVE and NVA, how an NVA is architected and implemented is not 678 important to an NVE. 680 A number of architectural approaches could be used to implement NVAs 681 themselves. NVAs manage address bindings and distribute them to 682 where they need to go. One approach would be to use BGP (possibly 683 with extensions) and route reflectors. Another approach could use a 684 transaction-based database model with replicated servers. Because 685 the implementation details are local to an NVA, there is no need to 686 pick exactly one solution technology, so long as the external 687 interfaces to the NVEs (and remote NVAs) are sufficiently well 688 defined to achieve interoperability. 690 6.3. NVA External Interface 692 [note: the following section discusses various options that the WG 693 has not yet expressed an opinion on. Discussion is encouraged. ] 694 Conceptually, from the perspective of an NVE, an NVA is a single 695 entity. An NVE interacts with the NVA, and it is the NVA's 696 responsibility for ensuring that interactions between the NVE and NVA 697 result in consistent behavior across the NVA and all other NVEs using 698 the same NVA. Because an NVA is built from multiple internal 699 components, an NVA will have to ensure that information flows to all 700 internal NVA components appropriately. 702 One architectural question is how the NVA presents itself to the NVE. 703 For example, an NVA could be required to provide access via a single 704 IP address. If NVEs only have one IP address to interact with, it 705 would be the responsibility of the NVA to handle NVA component 706 failures, e.g., by using a "floating IP address" that migrates among 707 NVA components to ensure that the NVA can always be reached via the 708 one address. Having all NVA accesses through a single IP address, 709 however, adds constraints to implementing robust failover, load 710 balancing, etc. 712 [Note: the following is a strawman proposal.] 714 In the NVO3 architecture, an NVA is accessed through one or more IP 715 addresses (ir IP address/port combination). If multiple IP addresses 716 are used, each IP address provides equivalent functionality, meaning 717 that an NVE can use any of the provided addresses to interact with 718 the NVA. Should one address stop working, an NVE is expected to 719 failover to another. While the different addresses result in 720 equivalent functionality, one address may be more respond more 721 quickly than another, e.g., due to network conditions, load on the 722 server, etc. 724 [Note: should we support the following? ] To provide some control 725 over load balancing, NVA addresses may have an associated priority. 726 Addresses are used in order of priority, with no explicit preference 727 among NVA addresses having the same priority. To provide basic load- 728 balancing among NVAs of equal priorities, NVEs use some randomization 729 input to select among equal-priority NVAs. Such a priority scheme 730 facilitates failover and load balancing, for example, allowing a 731 network operator to specify a set of primary and backup NVAs. 733 [note: should we support the following? It would presumably add 734 considerable complexity to the NVE.] It may be desirable to have 735 individual NVA addresses responsible for a subset of information 736 about an NV Domain. In such a case, NVEs would use different NVA 737 addresses for obtaining or updating information about particular VNs 738 or TS bindings. A key question with such an approach is how 739 information would be partitioned, and how an NVE could determine 740 which address to use to get the information it needs. 742 Another possibility is to treat the information on which NVA 743 addresses to use as cached (soft-state) information at the NVEs, so 744 that any NVA address can be used to obtain any information, but NVEs 745 are informed of preferences for which addresses to use for particular 746 information on VNs or TS bindings. That preference information would 747 be cached for future use to improve behavior - e.g., if all requests 748 for a specific subset of VNs are forwarded to a specific NVA 749 component, the NVE can optimize future requests within that subset by 750 sending them directly to that NVA component via its address. 752 7. NVE-to-NVA Protocol 754 [Note: this and later sections are a bit sketchy and need work. 755 Discussion is encouraged.] 757 As outlined in Section 4.3, an NVE needs certain information in order 758 to perform its functions. To obtain such information from an NVA, an 759 NVE-to-NVA protocol is needed. The NVE-to-NVA protocol provides two 760 functions. First it allows an NVE to obtain information about the 761 location and status of other TSes with which it needs to 762 communication. Second, the NVE-to-NVA protocol provides a way for 763 NVEs to provide updates to the NVA about the TSes attached to that 764 NVE (e.g., when a TS attaches or detaches from the NVE), or about 765 communication errors encountered when sending traffic to remote NVEs. 766 For example, an NVE could indicate that a destination it is trying to 767 reach at a destination NVE is unreachable for some reason. 769 While having a direct NVE-to-NVA protocol might seem straightforward, 770 the existence of existing VM orchestration systems complicates the 771 choices an NVE has for interacting with the NVA. 773 7.1. NVE-NVA Interaction Models 775 An NVE interacts with an NVA in at least two (quite different) ways: 777 o NVEs supporting VMs and hypervisors can obtain necessary 778 information entirely through the hypervisor-facing side of the 779 NVE. Such an approach is a natural extension to existing VM 780 orchestration systems supporting server virtualization because an 781 existing protocol between the hypervisor and VM Orchestration 782 system already exists and can be leveraged to obtain any needed 783 information. Specifically, VM orchestration systems used to 784 create, terminate and migrate VMs already use well-defined (though 785 typically proprietary) protocols to handle the interactions 786 between the hypervisor and VM orchestration system. For such 787 systems, it is a natural extension to leverage the existing 788 orchestration protocol as a sort of proxy protocol for handling 789 the interactions between an NVE and the NVA. Indeed, existing 790 implementation already do this. 792 o Alternatively, an NVE can obtain needed information by interacting 793 directly with an NVA via a protocol operating over the data center 794 underlay network. Such an approach is needed to support NVEs that 795 are not associated with systems performing server virtualization 796 (e.g., as in the case of a standalone gateway) or where the NVE 797 needs to communicate directly with the NVA for other reasons. 799 [Note: The following paragraph is included to stimulate discussion, 800 and the WG will need to decide what direction it wants to take.] 802 The WG The NVO3 architecture should support both of the above models, 803 as in practice, it is likely that both models will coexist in 804 practice and be used simultaneously in a deployment. Existing 805 virtualization environments are already using the first model. But 806 they are not sufficient to cover the case of standalone gateways -- 807 such gateways do not support virtualization and do not interface with 808 existing VM orchestration systems. Also, A hybrid approach might be 809 desirable in some cases where the first model is used to obtain the 810 information, but the latter approach is used to validate and further 811 authenticate the information before using it. 813 7.2. Direct NVE-NVA Protocol 815 An NVE can interact directly with an NVA via an NVE-to-NVA protocol. 816 Such a protocol can be either independent of the NVA internal 817 protocol, or an extension of it. Using a dedicated protocol provides 818 architectural separation and independence between the NVE and NVA. 819 The NVE and NVA interact in a well-defined way, and changes in the 820 NVA (or NVE) do not need to impact each other. Using a dedicated 821 protocol also ensures that both NVE and NVA implementations can 822 evolve independently and without dependencies on each other. Such 823 independence is important because the upgrade path for NVEs and NVAs 824 is quite different. Upgrading all the NVEs at a site will likely be 825 more difficult in practice than upgrading NVAs because of their large 826 number - one on each end device. In practice, it is assumed that an 827 NVE will be implemented once, and then (hopefully) not again, whereas 828 an NVA (and its associated protocols) are more likely to evolve over 829 time as experience is gained from usage. 831 Requirements for a direct NVE-NVA protocol can be found in 832 [I-D.ietf-nvo3-nve-nva-cp-req] 834 7.3. Propagating Information Between NVEs and NVAs 836 [Note: This section has been completely redone to move away from the 837 push/pull discussion at an abstract level.] 839 Information flows between NVEs and NVAs in both directions. The NVA 840 maintains information about all VNs in the NV Domain, so that NVEs do 841 not need to do so themselves. NVEs obtain from the NVA information 842 about where a given remote TS destination resides. NVAs in turn 843 obtain information from NVEs about the individual TSs attached to 844 those NVEs. 846 While the NVA could push information about every virtual network to 847 every NVE, such an approach scales poorly and is unnecessary. In 848 practice, a given NVE will only need and want to know about VNs to 849 which it is attached. Thus, an NVE should be able to subscribe to 850 updates only for the virtual networks it is interested in receiving 851 updates for. The NVO3 architecture supports a model where an NVE is 852 not required to have full mapping tables for all virtual networks in 853 an NV Domain. 855 Before sending unicast traffic to a remote TS, an NVE must know where 856 the remote TS currently resides. When a TS attaches to a virtual 857 network, the NVE obtains information about that VN from the NVA. The 858 NVA can provide that information to the NVE at the time the TS 859 attaches to the VN, either because the NVE requests the information 860 when the attach operation occurs, or because the VM orchestration 861 system has initiated the attach operation and provides associated 862 mapping information to the NVE at the same time. A similar process 863 can take place with regards to obtaining necessary information needed 864 for delivery of tenant broadcast or multicast traffic. 866 There are scenarios where an NVE may wish to query the NVA about 867 individual mappings within an VN. For example, when sending traffic 868 to a remote TS on a remote NVE, that TS may become unavailable (e.g,. 869 because it has migrated elsewhere or has been shutdown, in which case 870 the remote NVE may return an error indication). In such situations, 871 the NVE may need to query the NVA to obtain updated mapping 872 information for a specific TS, or verify that the information is 873 still correct despite the error condition. Note that such a query 874 could also be used by the NVA as an indication that there may be an 875 inconsistency in the network and that it should take steps to verify 876 that the information it has about the current state and location of a 877 specific TS is still correct. 879 For very large virtual networks, the amount of state an NVE needs to 880 maintain for a given virtual network could be significant. Moreover, 881 an NVE may only be communicating with a small subset of the TSes on 882 such a virtual network. In such cases, the NVE may find it desirable 883 to maintain state only for those destinations it is actively 884 communicating with. In such scenarios, an NVE may not want to 885 maintain full mapping information about all destinations on a VN. 886 Should it then need to communicate with a destination for which it 887 does not have have mapping information, however, it will need to be 888 able to query the NVA on demand for the missing information on a per- 889 destination basis. 891 The NVO3 architecture will need to support a range of operations 892 between the NVE and NVA. Requirements for those operations can be 893 found in [I-D.ietf-nvo3-nve-nva-cp-req]. 895 8. Federated NVAs 897 An NVA provides service to the set of NVEs in its NV Domain. Each 898 NVA manages network virtualization information for the virtual 899 networks within its NV Domain. An NV domain is administered by a 900 single entity. 902 In some cases, it will be necessary to expand the scope of a specific 903 VN or even an entire NV domain beyond a single NVA. For example, 904 multiple data centers managed by the same administrator may wish to 905 operate all of its data centers as a single NV region. Such cases 906 are handled by having different NVAs peer with each other to exchange 907 mapping information about specific VNs. NVAs operate in a federated 908 manner with a set of NVAs operating as a loosely-coupled federation 909 of individual NVAs. If a virtual network spans multiple NVAs (e.g., 910 located at different data centers), and an NVE needs to deliver 911 tenant traffic to an NVE at a remote NVA, it still interacts only 912 with its NVA, even when obtaining mappings for NVEs associated with 913 domains at a remote NVA. 915 Figure Figure 3 shows a scenario where two separate NV Domains (1 and 916 2) share information about Virtual Network "1217". VM1 and VM1 both 917 connect to the same Virtual Network (1217), even though the two VMs 918 are in separate NV Domains. There are two cases to consider. In the 919 first case, NV Domain B (NVB) does not allow NVE-A to tunnel traffic 920 directly to NVE-B. There could be a number of reasons for this. For 921 example, NV Domains 1 and 2 may not share a common address space 922 (i.e., require traversal through a NAT device), or for policy 923 reasons, a domain might require that all traffic between separate NV 924 Domains be funneled through a particular device (e.g., a firewall). 925 In such cases, NVA-2 will advertise to NVA-1 that VM1 on virtual 926 network 1217 is available, and direct that traffic between the two 927 nodes go through IP-G. IP-G would then decapsulate received traffic 928 from one NV Domain, translate it appropriately for the other domain 929 and re-encapsulate the packet for delivery. 931 xxxxxx xxxxxx +-----+ 932 +-----+ xxxxxxxx xxxxxx xxxxxxx xxxxx | VM2 | 933 | VM1 | xx xx xxx xx |-----| 934 |-----| xx + x xx x |NVE-B| 935 |NVE-A| x x +----+ x x +-----+ 936 +--+--+ x NV Domain 1 x |IP-G|--x x | 937 +-------x xx--+ | x xx | 938 x x +----+ x NV Domain 2 x | 939 +---x xx xx x---+ 940 | xxxx xx +->xx xx 941 | xxxxxxxxxx | xx xx 942 +---+-+ | xx xx 943 |NVA-1| +--+--+ xx xxx 944 +-----+ |NVA-2| xxxx xxxx 945 +-----+ xxxxxxx 947 Figure 3: VM1 and VM2 are in different NV Domains. 949 NVAs at one site share information and interact with NVAs at other 950 sites, but only in a controlled manner. It is expected that policy 951 and access control will be applied at the boundaries between 952 different sites (and NVAs) so as to minimize dependencies on external 953 NVAs that could negatively impact the operation within a site. It is 954 an architectural principle that operations involving NVAs at one site 955 not be immediately impacted by failures or errors at another site. 956 (Of course, communication between NVEs in different NVO3 domains may 957 be impacted by such failures or errors.) It is a strong requirement 958 that an NVA continue to operate properly for local NVEs even if 959 external communication is interrupted (e.g., should communication 960 between a local and remote NVA fail). 962 At a high level, a federation of interconnected NVAs has some 963 analogies to BGP and Autonomous Systems. Like an Autonomous System, 964 NVAs at one site are managed by a single administrative entity and do 965 not interact with external NVAs except as allowed by policy. 966 Likewise, the interface between NVAs at different sites is well 967 defined, so that the internal details of operations at one site are 968 largely hidden to other sites. Finally, an NVA only peers with other 969 NVAs that it has a trusted relationship with, i.e., where a virtual 970 network is intended to span multiple NVAs. 972 [Note: the following are motivations for having a federated NVA model 973 and are intended for discussion. Depending on discussion, these may 974 be removed from future versions of this document. ] Reasons for using 975 a federated model include: 977 o Provide isolation between NVAs operating at different sites at 978 different geographic locations. 980 o Control the quantity and rate of information updates that flow 981 (and must be processed) between different NVAs in different data 982 centers. 984 o Control the set of external NVAs (and external sites) a site peers 985 with. A site will only peer with other sites that are cooperating 986 in providing an overlay service. 988 o Allow policy to be applied between sites. A site will want to 989 carefully control what information it exports (and to whom) as 990 well as what information it is willing to import (and from whom). 992 o Allow different protocols and architectures to be used to for 993 intra- vs. inter-NVA communication. For example, within a single 994 data center, a replicated transaction server using database 995 techniques might be an attractive implementation option for an 996 NVA, and protocols optimized for intra-NVA communication would 997 likely be different from protocols involving inter-NVA 998 communication between different sites. 1000 o Allow for optimized protocols, rather than using a one-size-fits 1001 all approach. Within a data center, networks tend to have lower- 1002 latency, higher-speed and higher redundancy when compared with WAN 1003 links interconnecting data centers. The design constraints and 1004 tradeoffs for a protocol operating within a data center network 1005 are different from those operating over WAN links. While a single 1006 protocol could be used for both cases, there could be advantages 1007 to using different and more specialized protocols for the intra- 1008 and inter-NVA case. 1010 8.1. Inter-NVA Peering 1012 To support peering between different NVAs, an inter-NVA protocol is 1013 needed. The inter-NVA protocol defines what information is exchanged 1014 between NVAs. It is assumed that the protocol will be used to share 1015 addressing information between data centers and must scale well over 1016 WAN links. 1018 9. Control Protocol Work Areas 1020 The NVO3 architecture consists of two major distinct entities: NVEs 1021 and NVAs. In order to provide isolation and independence between 1022 these two entities, the NVO3 architecture calls for well defined 1023 protocols for interfacing between them. For an individual NVA, the 1024 architecture calls for a single conceptual entity, that could be 1025 implemented in a distributed or replicated fashion. While the IETF 1026 may choose to define one or more specific architectural approaches to 1027 building individual NVAs, there is little need for it to pick exactly 1028 one approach to the exclusion of others. An NVA for a single domain 1029 will likely be deployed as a single vendor product and thus their is 1030 little benefit in standardizing the internal structure of an NVA. 1032 Individual NVAs peer with each other in a federated manner. The NVO3 1033 architecture calls for a well-defined interface between NVAs. 1035 Finally, a hypervisor-to-NVE protocol is needed to cover the split- 1036 NVE scenario described in Section 4.2. 1038 10. NVO3 Data Plane Encapsulation 1040 When tunneling tenant traffic, NVEs add encapsulation header to the 1041 original tenant packet. The exact encapsulation to use for NVO3 does 1042 not seem to be critical. The main requirement is that the 1043 encapsulation support a Context ID of sufficient size 1044 [I-D.ietf-nvo3-dataplane-requirements]. A number of encapsulations 1045 already exist that provide a VN Context of sufficient size for NVO3. 1046 For example, VXLAN [I-D.mahalingam-dutt-dcops-vxlan] has a 24-bit 1047 VXLAN Network Identifier (VNI). NVGRE 1048 [I-D.sridharan-virtualization-nvgre] has a 24-bit Tenant Network ID 1049 (TNI). MPLS-over-GRE provides a 20-bit label field. While there is 1050 widespread recognition that a 12-bit VN Context would be too small 1051 (only 4096 distinct values), it is generally agreed that 20 bits (1 1052 million distinct values) and 24 bits (16.8 million distinct values) 1053 are sufficient for a wide variety of deployment scenarios. 1055 [Note: the following paragraph is included for WG discussion. Future 1056 versions of this document may omit this text.] 1058 While one might argue that a new encapsulation should be defined just 1059 for NVO3, no compelling requirements for doing so have been 1060 identified yet. Moreover, optimized implementations for existing 1061 encapsulations are already starting to become available on the market 1062 (i.e., in silicon). If the IETF were to define a new encapsulation 1063 format, it would take at least 2 (and likely more) years before 1064 optimized implementations of the new format would become available in 1065 products. In addition, a new encapsulation format would not likely 1066 displace existing formats, at least not for years. Thus, there seems 1067 little reason to define a new encapsulation. However, it does make 1068 sense for NVO3 to support multiple encapsulation formats, so as to 1069 allow NVEs to use their preferred encapsulations when possible. This 1070 implies that the address dissemination protocols must also include an 1071 indication of supported encapsulations along with the address mapping 1072 details. 1074 11. Operations and Management 1076 The simplicity of operating and debugging overlay networks will be 1077 critical for successful deployment. Some architectural choices can 1078 facilitate or hinder OAM. Related OAM drafts include 1079 [I-D.ashwood-nvo3-operational-requirement]. 1081 12. Summary 1083 This document provides a start at a general architecture for overlays 1084 in NVO3. The architecture calls for three main areas of protocol 1085 work: 1087 1. A hypervisor-to-NVE protocol to support Split NVEs as discussed 1088 in Section 4.2. 1090 2. An NVE to NVA protocol for address dissemination. 1092 3. An NVA-to-NVA protocol for exchange of information about specific 1093 virtual networks between NVAs. 1095 It should be noted that existing protocols or extensions of existing 1096 protocols are applicable. 1098 13. Acknowledgments 1100 Helpful comments and improvements to this document have come from 1101 Lizhong Jin, Dennis (Xiaohong) Qin and Lucy Yong. 1103 14. IANA Considerations 1105 This memo includes no request to IANA. 1107 15. Security Considerations 1109 Yep, kind of sparse. But we'll get there eventually. :-) 1111 16. Informative References 1113 [I-D.ashwood-nvo3-operational-requirement] 1114 Ashwood-Smith, P., Iyengar, R., Tsou, T., Sajassi, A., 1115 Boucadair, M., Jacquenet, C., and M. Daikoku, "NVO3 1116 Operational Requirements", draft-ashwood-nvo3-operational- 1117 requirement-03 (work in progress), July 2013. 1119 [I-D.ietf-nvo3-dataplane-requirements] 1120 Bitar, N., Lasserre, M., Balus, F., Morin, T., Jin, L., 1121 and B. Khasnabish, "NVO3 Data Plane Requirements", draft- 1122 ietf-nvo3-dataplane-requirements-02 (work in progress), 1123 November 2013. 1125 [I-D.ietf-nvo3-framework] 1126 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 1127 Rekhter, "Framework for DC Network Virtualization", draft- 1128 ietf-nvo3-framework-04 (work in progress), November 2013. 1130 [I-D.ietf-nvo3-nve-nva-cp-req] 1131 Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network 1132 Virtualization NVE to NVA Control Protocol Requirements", 1133 draft-ietf-nvo3-nve-nva-cp-req-01 (work in progress), 1134 October 2013. 1136 [I-D.ietf-nvo3-overlay-problem-statement] 1137 Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., 1138 and M. Napierala, "Problem Statement: Overlays for Network 1139 Virtualization", draft-ietf-nvo3-overlay-problem- 1140 statement-04 (work in progress), July 2013. 1142 [I-D.kreeger-nvo3-hypervisor-nve-cp] 1143 Kreeger, L., Narten, T., and D. Black, "Network 1144 Virtualization Hypervisor-to-NVE Overlay Control Protocol 1145 Requirements", draft-kreeger-nvo3-hypervisor-nve-cp-01 1146 (work in progress), February 2013. 1148 [I-D.mahalingam-dutt-dcops-vxlan] 1149 Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1150 L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A 1151 Framework for Overlaying Virtualized Layer 2 Networks over 1152 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-06 1153 (work in progress), November 2013. 1155 [I-D.sridharan-virtualization-nvgre] 1156 Sridharan, M., Greenberg, A., Wang, Y., Garg, P., 1157 Venkataramiah, N., Duda, K., Ganga, I., Lin, G., Pearson, 1158 M., Thaler, P., and C. Tumuluri, "NVGRE: Network 1159 Virtualization using Generic Routing Encapsulation", 1160 draft-sridharan-virtualization-nvgre-03 (work in 1161 progress), August 2013. 1163 [IEEE-802.1Q] 1164 IEEE 802.1Q-2011, , "IEEE standard for local and 1165 metropolitan area networks: Media access control (MAC) 1166 bridges and virtual bridged local area networks,", August 1167 2011. 1169 Appendix A. Change Log 1171 A.1. Changes From draft-narten-nvo3 to draft-ietf-nvo3 1173 1. No changes between draft-narten-nvo3-arch-01 and draft-ietf-nvoe- 1174 arch-00. 1176 A.2. Changes From -00 to -01 (of draft-narten-nvo3-arch) 1178 1. Editorial and clarity improvements. 1180 2. Replaced "push vs. pull" section with section more focussed on 1181 triggers where an event implies or triggers some action. 1183 3. Clarified text on co-located NVE to show how offloading NVE 1184 functionality onto adaptors is desirable. 1186 4. Added new section on distributed gateways. 1188 5. Expanded Section on NVA external interface, adding requirement 1189 for NVE to support multiple IP NVA addresses. 1191 Authors' Addresses 1193 David Black 1194 EMC 1196 Email: david.black@emc.com 1198 Jon Hudson 1199 Brocade 1200 120 Holger Way 1201 San Jose, CA 95134 1202 USA 1204 Email: jon.hudson@gmail.com 1205 Lawrence Kreeger 1206 Cisco 1208 Email: kreeger@cisco.com 1210 Marc Lasserre 1211 Alcatel-Lucent 1213 Email: marc.lasserre@alcatel-lucent.com 1215 Thomas Narten 1216 IBM 1218 Email: narten@us.ibm.com