idnits 2.17.1 draft-ietf-nvo3-overlay-problem-statement-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 7, 2013) is 4086 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-02 == Outdated reference: A later version (-07) exists of draft-ietf-trill-fine-labeling-04 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-04 -- Obsolete informational reference (is this intentional?): RFC 6830 (Obsoleted by RFC 9300, RFC 9301) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational E. Gray, Ed. 5 Expires: August 11, 2013 Ericsson 6 D. Black 7 EMC 8 D. Dutt 9 Cumulus Networks 10 L. Fang 11 Cisco Systems 12 L. Kreeger 13 Cisco 14 M. Napierala 15 AT&T 16 M. Sridharan 17 Microsoft 18 February 7, 2013 20 Problem Statement: Overlays for Network Virtualization 21 draft-ietf-nvo3-overlay-problem-statement-02 23 Abstract 25 This document describes issues associated with providing multi- 26 tenancy in large data center networks and how these issues may be 27 addressed using an overlay-based network virtualization approach. A 28 key multi-tenancy requirement is traffic isolation, so that one 29 tenant's traffic is not visible to any other tenant. Another 30 requirement is address space isolation, so that different tenants can 31 use the same address space within different virtual networks. 32 Traffic and address space isolation is achieved by assigning one or 33 more virtual networks to each tenant, where traffic within a virtual 34 network can only cross into another virtual network in a controlled 35 fashion (e.g., via a configured router and/or a security gateway). 36 Additional functionality is required to provision virtual networks, 37 associating a virtual machine's network interface(s) with the 38 appropriate virtual network, and maintaining that association as the 39 virtual machine is activated, migrated and/or deactivated. Use of an 40 overlay-based approach enables scalable deployment on large network 41 infrastructures. 43 Status of this Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF). Note that other groups may also distribute 50 working documents as Internet-Drafts. The list of current Internet- 51 Drafts is at http://datatracker.ietf.org/drafts/current/. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 This Internet-Draft will expire on August 11, 2013. 60 Copyright Notice 62 Copyright (c) 2013 IETF Trust and the persons identified as the 63 document authors. All rights reserved. 65 This document is subject to BCP 78 and the IETF Trust's Legal 66 Provisions Relating to IETF Documents 67 (http://trustee.ietf.org/license-info) in effect on the date of 68 publication of this document. Please review these documents 69 carefully, as they describe your rights and restrictions with respect 70 to this document. Code Components extracted from this document must 71 include Simplified BSD License text as described in Section 4.e of 72 the Trust Legal Provisions and are provided without warranty as 73 described in the Simplified BSD License. 75 Table of Contents 77 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 78 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 79 3. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 6 80 3.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 6 81 3.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6 82 3.3. Inadequate Forwarding Table Sizes . . . . . . . . . . . . 7 83 3.4. Need to Decouple Logical and Physical Configuration . . . 7 84 3.5. Need For Address Separation Between Virtual Networks . . . 8 85 3.6. Need For Address Separation Between Virtual Networks 86 and Infrastructure . . . . . . . . . . . . . . . . . . . . 8 87 3.7. Optimal Forwarding . . . . . . . . . . . . . . . . . . . . 8 88 4. Using Network Overlays to Provide Virtual Networks . . . . . . 9 89 4.1. Overview of Network Overlays . . . . . . . . . . . . . . . 10 90 4.2. Communication Between Virtual and Non-virtualized 91 Networks . . . . . . . . . . . . . . . . . . . . . . . . . 11 92 4.3. Communication Between Virtual Networks . . . . . . . . . . 12 93 4.4. Overlay Design Characteristics . . . . . . . . . . . . . . 12 94 4.5. Control Plane Overlay Networking Work Areas . . . . . . . 13 95 4.6. Data Plane Work Areas . . . . . . . . . . . . . . . . . . 14 96 5. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . . 15 97 5.1. BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . . . 15 98 5.2. BGP/MPLS Ethernet VPNs . . . . . . . . . . . . . . . . . . 15 99 5.3. 802.1 VLANs . . . . . . . . . . . . . . . . . . . . . . . 16 100 5.4. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 16 101 5.5. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 102 5.6. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 17 103 5.7. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 17 104 5.8. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 18 105 5.9. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 106 5.10. VDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 107 6. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 18 108 7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 109 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 110 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 111 10. Security Considerations . . . . . . . . . . . . . . . . . . . 19 112 11. Informative References . . . . . . . . . . . . . . . . . . . . 20 113 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 21 114 A.1. Changes From -01 to -02 . . . . . . . . . . . . . . . . . 21 115 A.2. Changes From -00 to -01 . . . . . . . . . . . . . . . . . 21 116 A.3. Changes from 117 draft-narten-nvo3-overlay-problem-statement-04.txt . . . . 22 118 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 120 1. Introduction 122 Data Centers are increasingly being consolidated and outsourced in an 123 effort to improve the deployment time of applications and reduce 124 operational costs. This coincides with an increasing demand for 125 compute, storage, and network resources from applications. In order 126 to scale compute, storage, and network resources, physical resources 127 are being abstracted from their logical representation, in what is 128 referred to as server, storage, and network virtualization. 129 Virtualization can be implemented in various layers of computer 130 systems or networks. 132 The demand for server virtualization is increasing in data centers. 133 With server virtualization, each physical server supports multiple 134 virtual machines (VMs), each running its own operating system, 135 middleware and applications. Virtualization is a key enabler of 136 workload agility, i.e., allowing any server to host any application 137 and providing the flexibility of adding, shrinking, or moving 138 services within the physical infrastructure. Server virtualization 139 provides numerous benefits, including higher utilization, increased 140 security, reduced user downtime, reduced power usage, etc. 142 Multi-tenant data centers are taking advantage of the benefits of 143 server virtualization to provide a new kind of hosting, a virtual 144 hosted data center. Multi-tenant data centers are ones where 145 individual tenants could belong to a different company (in the case 146 of a public provider) or a different department (in the case of an 147 internal company data center). Each tenant has the expectation of a 148 level of security and privacy separating their resources from those 149 of other tenants. For example, one tenant's traffic must never be 150 exposed to another tenant, except through carefully controlled 151 interfaces, such as a security gateway (e.g., a firewall). 153 To a tenant, virtual data centers are similar to their physical 154 counterparts, consisting of end stations attached to a network, 155 complete with services such as load balancers and firewalls. But 156 unlike a physical data center, tenant systems connect to a virtual 157 network. To tenant systems, a virtual network looks like a normal 158 network (e.g., providing an ethernet or L3 service), except that the 159 only end stations connected to the virtual network are those 160 belonging to a tenant's specific virtual network. 162 A tenant is the administrative entity on whose behalf one or more 163 specific virtual network instance and its associated services 164 (whether virtual or physical) are managed. In a cloud environment, a 165 tenant would correspond to the customer that is using a particular 166 virtual network. However, a tenant may also find it useful to create 167 multiple different virtual network instances. Hence, there is a one- 168 to-many mapping between tenants and virtual network instances. A 169 single tenant may operate multiple individual virtual network 170 instances, each associated with a different service. 172 How a virtual network is implemented does not generally matter to the 173 tenant; what matters is that the service provided (L2 or L3) has the 174 right semantics, performance, etc. It could be implemented via a 175 pure routed network, a pure bridged network or a combination of 176 bridged and routed networks. A key requirement is that each 177 individual virtual network instance be isolated from other virtual 178 network instances, with traffic crossing from one virtual network to 179 another only when allowed by policy. 181 For data center virtualization, two key issues must be addressed. 182 First, address space separation between tenants must be supported. 183 Second, it must be possible to place (and migrate) VMs anywhere in 184 the data center, without restricting VM addressing to match the 185 subnet boundaries of the underlying data center network. 187 The document outlines problems encountered in scaling the number of 188 isolated virtual networks in a data center. Furthermore, the 189 document presents issues associated with managing those virtual 190 networks, in relation to operations, such as virtual network 191 creation/deletion and end-node membership change. Finally, the 192 document makes the case that an overlay based approach has a number 193 of advantages over traditional, non-overlay approaches. The purpose 194 of this document is to identify the set of issues that any solution 195 has to address in building multi-tenant data centers. With this 196 approach, the goal is to allow the construction of standardized, 197 interoperable implementations to allow the construction of multi- 198 tenant data centers. 200 This document is the problem statement for the "Network 201 Virtualization over L3" (NVO3) Working Group. NVO3 is focused on the 202 construction of overlay networks that operate over an IP (L3) 203 underlay transport network. NVO3 expects to provide both L2 service 204 and IP service to end devices (though perhaps as two different 205 solutions). Some deployments require an L2 service, others an L3 206 service, and some may require both. 208 Section 2 gives terminology. Section 3 describes the problem space 209 details. Section 4 describes overlay networks in more detail. 210 Sections 5 and 6 review related and further work, while Section 7 211 closes with a summary. 213 2. Terminology 215 This document uses the same terminology as 216 [I-D.lasserre-nvo3-framework]. In addition, this document use the 217 following terms. 219 In-Band Virtual Network: A Virtual Network that separates tenant 220 traffic without hiding tenant forwarding information from the 221 physical infrastructure. The Tenant System may also retain 222 visibility of a tenant within the underlying physical 223 infrastructure. IEEE 802.1 networks using C-VIDs are an example 224 of an in-band Virtual Network. 226 Overlay Virtual Network: A Virtual Network in which the separation 227 of tenants is hidden from the underlying physical infrastructure. 228 That is, the underlying transport network does not need to know 229 about tenancy separation to correctly forward traffic. 231 VLANs: An informal term referring to IEEE 802.1 networks using 232 C-VIDs. 234 3. Problem Areas 236 The following subsections describe aspects of multi-tenant data 237 center networking that pose problems for network infrastructure. 238 Different problem aspects may arise based on the network architecture 239 and scale. 241 3.1. Need For Dynamic Provisioning 243 Cloud computing involves on-demand provisioning of resources for 244 multi-tenant environments. A common example of cloud computing is 245 the public cloud, where a cloud service provider offers elastic 246 services to multiple customers over the same infrastructure. In 247 current systems, it can be difficult to provision resources for 248 individual tenants (e.g., QoS) in such a way that provisioned 249 properties migrate automatically when services are dynamically moved 250 around within the data center to optimize workloads. 252 3.2. Virtual Machine Mobility Limitations 254 A key benefit of server virtualization is virtual machine (VM) 255 mobility. A VM can be migrated from one server to another, live, 256 i.e., while continuing to run and without needing to shut it down and 257 restart it at the new location. A key requirement for live migration 258 is that a VM retain critical network state at its new location, 259 including its IP and MAC address(es). Preservation of MAC addresses 260 may be necessary, for example, when software licenses are bound to 261 MAC addresses. More generally, any change in the VM's MAC addresses 262 resulting from a move would be visible to the VM and thus potentially 263 result in unexpected disruptions. Retaining IP addresses after a 264 move is necessary to prevent existing transport connections (e.g., 265 TCP) from breaking and needing to be restarted. 267 In data center networks, servers are typically assigned IP addresses 268 based on their physical location, for example based on the Top of 269 Rack (ToR) switch for the server rack or the VLAN configured to the 270 server. Servers can only move to other locations within the same IP 271 subnet. This constraint is not problematic for physical servers, 272 which move infrequently, but it restricts the placement and movement 273 of VMs within the data center. Any solution for a scalable multi- 274 tenant data center must allow a VM to be placed (or moved) anywhere 275 within the data center, without being constrained by the subnet 276 boundary concerns of the host servers. 278 3.3. Inadequate Forwarding Table Sizes 280 Today's virtualized environments place additional demands on the 281 forwarding tables of forwarding nodes in the physical infrastructure. 282 The core problem is that location independence results in specific 283 end state information being propagated into the forwarding system 284 (e.g., /32 host routes in L3 networks, or MAC addresses in L2 285 networks). In L2 networks, for instance, instead of just one link- 286 layer address per server, the switching infrastructure may have to 287 learn addresses of the individual VMs (which could range in the 100s 288 per server). This increases the demand on a forwarding node's table 289 capacity compared to non-virtualized environments. 291 3.4. Need to Decouple Logical and Physical Configuration 293 Data center operators must be able to achieve high utilization of 294 server and network capacity. For efficient and flexible allocation, 295 operators should be able to spread a virtual network instance across 296 servers in any rack in the data center. It should also be possible 297 to migrate compute workloads to any server anywhere in the network 298 while retaining the workload's addresses. In networks using VLANs, 299 moving servers elsewhere in the network may require expanding the 300 scope of the VLAN beyond its original boundaries. While this can be 301 done, it requires potentially complex network configuration changes 302 and can conflict with the desire to bound the size of broadcast 303 domains, especially in larger data centers. In addition, when VMs 304 migrate, the physical network (e.g., access lists) may need to be 305 reconfigured which can be time consuming and error prone. 307 An important use case is cross-pod expansion. A pod typically 308 consists of one or more racks of servers with associated network and 309 storage connectivity. A tenant's virtual network may start off on a 310 pod and, due to expansion, require servers/VMs on other pods, 311 especially the case when other pods are not fully utilizing all their 312 resources. This use case requires that virtual networks span 313 multiple pods in order to provide connectivity to all of its tenant's 314 servers/VMs. Such expansion can be difficult to achieve when tenant 315 addressing is tied to the addressing used by the underlay network or 316 when the expansion requires that the scope of the underlying L2 VLAN 317 expand beyond its original pod boundary. 319 3.5. Need For Address Separation Between Virtual Networks 321 Individual tenants need control over the addresses they use within a 322 virtual network. But it can be problematic when different tenants 323 want to use the same addresses, or even if the same tenant wants to 324 reuse the same addresses in different virtual networks. 325 Consequently, virtual networks must allow tenants to use whatever 326 addresses they want without concern for what addresses are being used 327 by other tenants or other virtual networks. 329 3.6. Need For Address Separation Between Virtual Networks and 330 Infrastructure 332 As in the previous case, a tenant needs to be able to use whatever 333 addresses it wants in a virtual network independent of what addresses 334 the underlying data center network is using. Tenants (and the 335 underlay infrastructure provider) should be able use whatever 336 addresses make sense for them, without having to worry about address 337 collisions between addresses used by tenants and those used by the 338 underlay data center network. 340 3.7. Optimal Forwarding 342 Another problem area relates to the routing of traffic into and out 343 of a virtual network. A virtual network may have two routers for 344 traffic to/from other VNs or external to all VNs, and the optimal 345 choice of router may depend on where the VM is located. The two 346 routers may not be equally "close" to a given VM. The issue appears 347 both when a VM is initially instantiated on a virtual network or when 348 a VM migrates or is moved to a different location. After a 349 migration, the VM's closest router for such traffic may change, i.e., 350 the VM may get better service by switching to the "closer" router, 351 and this may improve the utilization of network resources. 353 IP implementations in network endpoints typically do not distinguish 354 between multiple routers on the same subnet - there may only be a 355 single default gateway in use, and any use of multiple routers 356 usually considers all of them to be one-hop away. Routing protocol 357 functionality is constrained by the requirement to cope with these 358 endpoint limitations - for example VRRP has one router serve as the 359 master to handle all outbound traffic. This problem can be 360 particularly acute when the virtual network spans multiple data 361 centers, as a VM is likely to receive significantly better service 362 when forwarding external traffic through a local router by comparison 363 to using a router at a remote data center. 365 The optimal forwarding problem applies to both outbound and inbound 366 traffic. For outbound traffic, the choice of outbound router 367 determines the path of outgoing traffic from the VM, which may be 368 sub-optimal after a VM move. For inbound traffic, the location of 369 the VM within the IP subnet for the VM is not visible to the routers 370 beyond the virtual network. Thus, the routing infrastructure will 371 have no information as to which of the two externally visible 372 gateways leading into the virtual network would be the better choice 373 for reaching a particular VM. 375 The issue is further complicated when middleboxes (e.g., load- 376 balancers, firewalls, etc.) must be traversed. Middle boxes may have 377 session state that must be preserved for ongoing communication, and 378 traffic must continue to flow through the middle box, regardless of 379 which router is "closest". 381 4. Using Network Overlays to Provide Virtual Networks 383 Virtual Networks are used to isolate a tenant's traffic from that of 384 other tenants (or even traffic within the same tenant network that 385 requires isolation). There are two main characteristics of virtual 386 networks: 388 1. Virtual networks isolate the address space used in one virtual 389 network from the address space used by another virtual network. 390 The same network addresses may be used in different virtual 391 networks at the same time. In addition, the address space used 392 by a virtual network is independent from that used by the 393 underlying physical network. 395 2. Virtual Networks limit the scope of packets sent on the virtual 396 network. Packets sent by Tenant Systems attached to a virtual 397 network are delivered as expected to other Tenant Systems on that 398 virtual network and may exit a virtual network only through 399 controlled exit points such as a security gateway. Likewise, 400 packets sourced from outside of the virtual network may enter the 401 virtual network only through controlled entry points, such as a 402 security gateway. 404 4.1. Overview of Network Overlays 406 To address the problems described in Section 3, a network overlay 407 approach can be used. 409 The idea behind an overlay is quite straightforward. Each virtual 410 network instance is implemented as an overlay. The original packet 411 is encapsulated by the first-hop network device, called a Network 412 Virtualization Edge (NVE). The encapsulation identifies the 413 destination of the device that will perform the decapsulation (i.e., 414 the egress NVE) before delivering the original packet to the 415 endpoint. The rest of the network forwards the packet based on the 416 encapsulation header and can be oblivious to the payload that is 417 carried inside. 419 Overlays are based on what is commonly known as a "map-and-encap" 420 architecture. When processing and forwarding packets, three distinct 421 and logically separable steps take place: 423 1. The first-hop overlay device implements a mapping operation that 424 determines where the encapsulated packet should be sent to reach 425 its intended destination VM. Specifically, the mapping function 426 maps the destination address (either L2 or L3) of a packet 427 received from a VM into the corresponding destination address of 428 the egress NVE device. The destination address will be the 429 underlay address of the NVE device doing the decapsulation and is 430 an IP address. 432 2. Once the mapping has been determined, the ingress overlay NVE 433 device encapsulates the received packet within an overlay header. 435 3. The final step is to actually forward the (now encapsulated) 436 packet to its destination. The packet is forwarded by the 437 underlay (i.e., the IP network) based entirely on its outer 438 address. Upon receipt at the destination, the egress overlay NVE 439 device decapsulates the original packet and delivers it to the 440 intended recipient VM. 442 Each of the above steps is logically distinct, though an 443 implementation might combine them for efficiency or other reasons. 444 It should be noted that in L3 BGP/VPN terminology, the above steps 445 are commonly known as "forwarding" or "virtual forwarding". 447 The first hop network NVE device can be a traditional switch or 448 router or the virtual switch residing inside a hypervisor. 449 Furthermore, the endpoint can be a VM or it can be a physical server. 450 Examples of architectures based on network overlays include BGP/MPLS 451 VPNs [RFC4364], TRILL [RFC6325], LISP [RFC6830], and Shortest Path 452 Bridging (SPB) [SPB]. 454 In the data plane, an overlay header provides a place to carry either 455 the virtual network identifier, or an identifier that is locally- 456 significant to the edge device. In both cases, the identifier in the 457 overlay header specifies which specific virtual network the data 458 packet belongs to. Since both routed and bridged semantics can be 459 supported by a virtual data center, the original packet carried 460 within the overlay header can be an Ethernet frame or just the IP 461 packet. 463 A key aspect of overlays is the decoupling of the "virtual" MAC 464 and/or IP addresses used by VMs from the physical network 465 infrastructure and the infrastructure IP addresses used by the data 466 center. If a VM changes location, the overlay edge devices simply 467 update their mapping tables to reflect the new location of the VM 468 within the data center's infrastructure space. Because an overlay 469 network is used, a VM can now be located anywhere in the data center 470 that the overlay reaches without regards to traditional constraints 471 imposed by the underlay network such as the L2 VLAN scope, or the IP 472 subnet scope. 474 Multi-tenancy is supported by isolating the traffic of one virtual 475 network instance from traffic of another. Traffic from one virtual 476 network instance cannot be delivered to another instance without 477 (conceptually) exiting the instance and entering the other instance 478 via an entity (e.g., a gateway) that has connectivity to both virtual 479 network instances. Without the existence of a gateway entity, tenant 480 traffic remains isolated within each individual virtual network 481 instance. 483 Overlays are designed to allow a set of VMs to be placed within a 484 single virtual network instance, whether that virtual network 485 provides a bridged network or a routed network. 487 4.2. Communication Between Virtual and Non-virtualized Networks 489 Not all communication will be between devices connected to 490 virtualized networks. Devices using overlays will continue to access 491 devices and make use of services on non-virtualized networks, whether 492 in the data center, the public Internet, or at remote/branch 493 campuses. Any virtual network solution must be capable of 494 interoperating with existing routers, VPN services, load balancers, 495 intrusion detection services, firewalls, etc. on external networks. 497 Communication between devices attached to a virtual network and 498 devices connected to non-virtualized networks is handled 499 architecturally by having specialized gateway devices that receive 500 packets from a virtualized network, decapsulate them, process them as 501 regular (i.e., non-virtualized) traffic, and finally forward them on 502 to their appropriate destination (and vice versa). 504 A wide range of implementation approaches are possible. Overlay 505 gateway functionality could be combined with other network 506 functionality into a network device that implements the overlay 507 functionality, and then forwards traffic between other internal 508 components that implement functionality such as full router service, 509 load balancing, firewall support, VPN gateway, etc. 511 4.3. Communication Between Virtual Networks 513 Communication between devices on different virtual networks is 514 handled architecturally by adding specialized interconnect 515 functionality among the otherwise isolated virtual networks. For a 516 virtual network providing an L2 service, such interconnect 517 functionality could be IP forwarding configured as part of the 518 "default gateway" for each virtual network. For a virtual network 519 providing L3 service, the interconnect functionality could be IP 520 forwarding configured as part of routing between IP subnets or it can 521 be based on configured inter-virtual-network traffic policies. In 522 both cases, the implementation of the interconnect functionality 523 could be distributed across the NVEs and could be combined with other 524 network functionality (e.g., load balancing, firewall support) that 525 is applied to traffic forwarded between virtual networks. 527 4.4. Overlay Design Characteristics 529 Below are some of the characteristics of environments that must be 530 taken into account by the overlay technology. 532 1. Highly distributed systems: The overlay should work in an 533 environment where there could be many thousands of access 534 switches (e.g. residing within the hypervisors) and many more 535 Tenant Systems (e.g. VMs) connected to them. This leads to a 536 distributed mapping system that puts a low overhead on the 537 overlay tunnel endpoints. 539 2. Many highly distributed virtual networks with sparse membership: 540 Each virtual network could be highly dispersed inside the data 541 center. Also, along with expectation of many virtual networks, 542 the number of end systems connected to any one virtual network is 543 expected to be relatively low; Therefore, the percentage of NVEs 544 participating in any given virtual network would also be expected 545 to be low. For this reason, efficient delivery of multi- 546 destination traffic within a virtual network instance should be 547 taken into consideration. 549 3. Highly dynamic Tenant Systems: Tenant Systems connected to 550 virtual networks can be very dynamic, both in terms of creation/ 551 deletion/power-on/off and in terms of mobility from one access 552 device to another. 554 4. Be incrementally deployable, without necessarily requiring major 555 upgrade of the entire network: The first hop device (or end 556 system) that adds and removes the overlay header may require new 557 software and may require new hardware (e.g., for improved 558 performance). But the rest of the network should not need to 559 change just to enable the use of overlays. 561 5. Work with existing data center network deployments without 562 requiring major changes in operational or other practices: For 563 example, some data centers have not enabled multicast beyond 564 link-local scope. Overlays should be capable of leveraging 565 underlay multicast support where appropriate, but not require its 566 enablement in order to use an overlay solution. 568 6. Network infrastructure administered by a single administrative 569 domain: This is consistent with operation within a data center, 570 and not across the Internet. 572 4.5. Control Plane Overlay Networking Work Areas 574 There are three specific and separate potential work areas in the 575 area of control plane protocols needed to realize an overlay 576 solution. The areas correspond to different possible "on-the-wire" 577 protocols, where distinct entities interact with each other. 579 One area of work concerns the address dissemination protocol an NVE 580 uses to build and maintain the mapping tables it uses to deliver 581 encapsulated packets to their proper destination. One approach is to 582 build mapping tables entirely via learning (as is done in 802.1 583 networks). Another approach is to use a specialized control plane 584 protocol. While there are some advantages to using or leveraging an 585 existing protocol for maintaining mapping tables, the fact that large 586 numbers of NVE's will likely reside in hypervisors places constraints 587 on the resources (cpu and memory) that can be dedicated to such 588 functions. 590 From an architectural perspective, one can view the address mapping 591 dissemination problem as having two distinct and separable 592 components. The first component consists of a back-end "oracle" that 593 is responsible for distributing and maintaining the mapping 594 information for the entire overlay system. For this document, we use 595 the term "oracle" in its generic sense, referring to an entity that 596 supplies answers, without regard to how it knows the answers it is 597 providing. The second component consists of the on-the-wire 598 protocols an NVE uses when interacting with the oracle. 600 The back-end oracle could provide high performance, high resiliency, 601 failover, etc. and could be implemented in significantly different 602 ways. For example, one model uses a traditional, centralized 603 "directory-based" database, using replicated instances for 604 reliability and failover. A second model involves using and possibly 605 extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To 606 support different architectural models, it is useful to have one 607 standard protocol for the NVE-oracle interaction while allowing 608 different protocols and architectural approaches for the oracle 609 itself. Separating the two allows NVEs to transparently interact 610 with different types of oracles, i.e., either of the two 611 architectural models described above. Having separate protocols 612 could also allow for a simplified NVE that only interacts with the 613 oracle for the mapping table entries it needs and allows the oracle 614 (and its associated protocols) to evolve independently over time with 615 minimal impact to the NVEs. 617 A third work area considers the attachment and detachment of VMs (or 618 Tenant Systems [I-D.lasserre-nvo3-framework] more generally) from a 619 specific virtual network instance. When a VM attaches, the NVE 620 associates the VM with a specific overlay for the purposes of 621 tunneling traffic sourced from or destined to the VM. When a VM 622 disconnects, the NVE should notify the oracle that the Tenant System 623 to NVE address mapping is no longer valid. In addition, if this VM 624 was the last remaining member of the virtual network, then the NVE 625 can also terminate any tunnels used to deliver tenant multi- 626 destination packets within the VN to the NVE. In the case where an 627 NVE and hypervisor are on separate physical devices separated by an 628 access network, a standardized protocol may be needed. 630 In summary, there are three areas of potential work. The first area 631 concerns the implementation of the oracle function itself and any 632 protocols it needs (e.g., if implemented in a distributed fashion). 633 A second area concerns the interaction between the oracle and NVEs. 634 The third work area concerns protocols associated with attaching and 635 detaching a VM from a particular virtual network instance. All three 636 work areas are important to the development of scalable, 637 interoperable solutions. 639 4.6. Data Plane Work Areas 641 The data plane carries encapsulated packets for Tenant Systems. The 642 data plane encapsulation header carries a VN Context identifier 643 [I-D.lasserre-nvo3-framework] for the virtual network to which the 644 data packet belongs. Numerous encapsulation or tunneling protocols 645 already exist that can be leveraged. In the absence of strong and 646 compelling justification, it would not seem necessary or helpful to 647 develop yet another encapsulation format just for NVO3. 649 5. Related IETF and IEEE Work 651 The following subsections discuss related IETF and IEEE work. The 652 items are not meant to provide complete coverage of all IETF and IEEE 653 data center related work, nor should the descriptions be considered 654 comprehensive. Each area aims to address particular limitations of 655 today's data center networks. In all areas, scaling is a common 656 theme as are multi-tenancy and VM mobility. Comparing and evaluating 657 the work result and progress of each work area listed is out of scope 658 of this document. The intent of this section is to provide a 659 reference to the interested readers. Note that NVO3 is scoped to 660 running over an IP/L3 underlay network. 662 5.1. BGP/MPLS IP VPNs 664 BGP/MPLS IP VPNs [RFC4364] support multi-tenancy, VPN traffic 665 isolation, address overlapping and address separation between tenants 666 and network infrastructure. The BGP/MPLS control plane is used to 667 distribute the VPN labels and the tenant IP addresses that identify 668 the tenants (or to be more specific, the particular VPN/virtual 669 network) and tenant IP addresses. Deployment of enterprise L3 VPNs 670 has been shown to scale to thousands of VPNs and millions of VPN 671 prefixes. BGP/MPLS IP VPNs are currently deployed in some large 672 enterprise data centers. The potential limitation for deploying BGP/ 673 MPLS IP VPNs in data center environments is the practicality of using 674 BGP in the data center, especially reaching into the servers or 675 hypervisors. There may be computing work force skill set issues, 676 equipment support issues, and potential new scaling challenges. A 677 combination of BGP and lighter weight IP signaling protocols, e.g., 678 XMPP, have been proposed to extend the solutions into DC environment 679 [I-D.marques-l3vpn-end-system], while taking advantage of built-in 680 VPN features with its rich policy support; it is especially useful 681 for inter-tenant connectivity. 683 5.2. BGP/MPLS Ethernet VPNs 685 Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] 686 provide an emulated L2 service in which each tenant has its own 687 Ethernet network over a common IP or MPLS infrastructure. A BGP/MPLS 688 control plane is used to distribute the tenant MAC addresses and the 689 MPLS labels that identify the tenants and tenant MAC addresses. 690 Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is 691 used to identify the broadcast domains (VLANs) associated with a 692 given L2 VLAN service instance and these Ethernet tags are mapped to 693 VLAN IDs understood by the tenant at the service edges. This means 694 that the limit of 4096 VLANs is associated with an individual tenant 695 service edge, enabling a much higher level of scalability. 696 Interconnection between tenants is also allowed in a controlled 697 fashion. 699 VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 700 concept of a combined L2/L3 VPN service in order to support the 701 mobility of individual Virtual Machines (VMs) between Data Centers 702 connected over a common IP or MPLS infrastructure. 704 5.3. 802.1 VLANs 706 VLANs are a well understood construct in the networking industry, 707 providing an L2 service via an in-band L2 Virtual Network. A VLAN is 708 an L2 bridging construct that provides the semantics of virtual 709 networks mentioned above: a MAC address can be kept unique within a 710 VLAN, but it is not necessarily unique across VLANs. Traffic scoped 711 within a VLAN (including broadcast and multicast traffic) can be kept 712 within the VLAN it originates from. Traffic forwarded from one VLAN 713 to another typically involves router (L3) processing. The forwarding 714 table look up operation may be keyed on {VLAN, MAC address} tuples. 716 VLANs are a pure L2 bridging construct and VLAN identifiers are 717 carried along with data frames to allow each forwarding point to know 718 what VLAN the frame belongs to. Various types of VLANs are available 719 today, which can be used for network virtualization even together. 720 The C-VLAN, S-VLAN and B-VLAN IDs are 12 bits. The 24-bit I-SID 721 allows the support of more than 16 million virtual networks. 723 5.4. IEEE 802.1aq - Shortest Path Bridging 725 Shortest Path Bridging (SPB) [SPB] is an IS-IS based overlay that 726 operates over L2 Ethernets. SPB supports multi-pathing and addresses 727 a number of shortcomings in the original Ethernet Spanning Tree 728 Protocol. Shortest Path Bridging Mac (SPBM) uses IEEE 802.1ah PBB 729 (MAC-in-MAC) encapsulation and supports a 24-bit I-SID, which can be 730 used to identify virtual network instances. SPBM provides multi- 731 pathing and supports easy virtual network creation or update. 733 SPBM extends IS-IS in order to perform link-state routing among core 734 SPBM nodes, obviating the need for learning for communication among 735 core SPBM nodes. Learning is still used to build and maintain the 736 mapping tables of edge nodes to encapsulate Tenant System traffic for 737 transport across the SPBM core. 739 SPB is compatible with all other 802.1 standards thus allows 740 leveraging of other features, e.g., VSI Discovery Protocol (VDP), OAM 741 or scalability solutions. 743 5.5. ARMD 745 The ARMD WG examined data center scaling issues with a focus on 746 address resolution and developed a problem statement document 747 [RFC6820]. While an overlay-based approach may address some of the 748 "pain points" that were raised in ARMD (e.g., better support for 749 multi-tenancy), an overlay approach may also push some of the L2 750 scaling concerns (e.g., excessive flooding) to the IP level (flooding 751 via IP multicast). Analysis will be needed to understand the scaling 752 tradeoffs of an overlay based approach compared with existing 753 approaches. On the other hand, existing IP-based approaches such as 754 proxy ARP may help mitigate some concerns. 756 5.6. TRILL 758 TRILL is a network protocol that provides an Ethernet L2 service to 759 end systems and is designed to operate over any L2 link type. TRILL 760 establishes forwarding paths using IS-IS routing and encapsulates 761 traffic within its own TRILL header. TRILL as defined today, 762 supports only the standard (and limited) 12-bit C-VID identifier. 763 Approaches to extend TRILL to support more than 4094 VLANs are 764 currently under investigation [I-D.ietf-trill-fine-labeling] 766 5.7. L2VPNs 768 The IETF has specified a number of approaches for connecting L2 769 domains together as part of the L2VPN Working Group. That group, 770 however has historically been focused on Provider-provisioned L2 771 VPNs, where the service provider participates in management and 772 provisioning of the VPN. In addition, much of the target environment 773 for such deployments involves carrying L2 traffic over WANs. Overlay 774 approaches as discussed in this document are intended be used within 775 data centers where the overlay network is managed by the data center 776 operator, rather than by an outside party. While overlays can run 777 across the Internet as well, they will extend well into the data 778 center itself (e.g., up to and including hypervisors) and include 779 large numbers of machines within the data center itself. 781 Other L2VPN approaches, such as L2TP [RFC3931] require significant 782 tunnel state at the encapsulating and decapsulating end points. 783 Overlays require less tunnel state than other approaches, which is 784 important to allow overlays to scale to hundreds of thousands of end 785 points. It is assumed that smaller switches (i.e., virtual switches 786 in hypervisors or the adjacent devices to which VMs connect) will be 787 part of the overlay network and be responsible for encapsulating and 788 decapsulating packets. 790 5.8. Proxy Mobile IP 792 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 793 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 795 5.9. LISP 797 LISP[RFC6830] essentially provides an IP over IP overlay where the 798 internal addresses are end station Identifiers and the outer IP 799 addresses represent the location of the end station within the core 800 IP network topology. The LISP overlay header uses a 24-bit Instance 801 ID used to support overlapping inner IP addresses. 803 5.10. VDP 805 VDP is the Virtual Station Interface (VSI) Discovery and 806 Configuration Protocol specified by IEEE P802.1Qbg [Qbg]. VDP is a 807 protocol that supports the association of a VSI with a port. VDP is 808 run between the end system (e.g., a hypervisor) and its adjacent 809 switch, i.e., the device on the edge of the network. VDP is used for 810 example to communicate to the switch that a Virtual Machine (Virtual 811 Station) is moving, i.e. designed for VM migration. 813 6. Further Work 815 It is believed that overlay-based approaches may be able to reduce 816 the overall amount of flooding and other multicast and broadcast 817 related traffic (e.g, ARP and ND) currently experienced within 818 current data centers with a large flat L2 network. Further analysis 819 is needed to characterize expected improvements. 821 There are a number of VPN approaches that provide some if not all of 822 the desired semantics of virtual networks. A gap analysis will be 823 needed to assess how well existing approaches satisfy the 824 requirements. 826 7. Summary 828 This document has argued that network virtualization using overlays 829 addresses a number of issues being faced as data centers scale in 830 size. In addition, careful study of current data center problems is 831 needed for development of proper requirements and standard solutions. 833 This document identified three potential control protocol work areas. 835 The first involves a backend "oracle" and how it learns and 836 distributes the mapping information NVEs use when processing tenant 837 traffic. A second involves the protocol an NVE would use to 838 communicate with the backend oracle to obtain the mapping 839 information. The third potential work concerns the interactions that 840 take place when a VM attaches or detaches from an specific virtual 841 network instance. 843 8. Acknowledgments 845 Helpful comments and improvements to this document have come from Lou 846 Berger, John Drake, Janos Farkas, Ilango Ganga, Ariel Hendel, Vinit 847 Jain, Petr Lapukhov, Thomas Morin, Benson Schliesser, Xiaohu Xu, Lucy 848 Yong and many others on the NVO3 mailing list. 850 9. IANA Considerations 852 This memo includes no request to IANA. 854 10. Security Considerations 856 Because this document describes the problem space associated with the 857 need for virtualization of networks in complex, large-scale, data- 858 center networks, it does not itself introduce any security risks. 859 However, it is clear that security concerns need to be a 860 consideration of any solutions proposed to address this problem 861 space. 863 Solutions will need to address both data plane and control plane 864 security concerns. In the data plane, isolation between NVO3 domains 865 is a primary concern. Assurances against spoofing, snooping, transit 866 modification and denial of service are examples of other important 867 considerations. Some limited environments may even require 868 confidentially within domains. 870 In the control plane, the primary security concern is ensuring that 871 unauthorized control information is not installed for use in the data 872 plane. The prevention of the installation of improper control 873 information, and other forms of denial of service are also concerns. 874 Hereto, some environments may also be concerned about confidentiality 875 of the control plane. 877 11. Informative References 879 [I-D.ietf-l2vpn-evpn] 880 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 881 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 882 draft-ietf-l2vpn-evpn-02 (work in progress), October 2012. 884 [I-D.ietf-trill-fine-labeling] 885 Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. 886 Dutt, "TRILL: Fine-Grained Labeling", 887 draft-ietf-trill-fine-labeling-04 (work in progress), 888 December 2012. 890 [I-D.lasserre-nvo3-framework] 891 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 892 Rekhter, "Framework for DC Network Virtualization", 893 draft-lasserre-nvo3-framework-03 (work in progress), 894 July 2012. 896 [I-D.marques-l3vpn-end-system] 897 Marques, P., Fang, L., Pan, P., Shukla, A., Napierala, M., 898 and N. Bitar, "BGP-signaled end-system IP/VPNs.", 899 draft-marques-l3vpn-end-system-07 (work in progress), 900 August 2012. 902 [I-D.raggarwa-data-center-mobility] 903 Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., 904 Fang, L., and A. Sajassi, "Data Center Mobility based on 905 E-VPN, BGP/MPLS IP VPN, IP Routing and NHRP", 906 draft-raggarwa-data-center-mobility-04 (work in progress), 907 December 2012. 909 [Qbg] "IEEE P802.1Qbg Edge Virtual Bridging", February 2012. 911 [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling 912 Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. 914 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 915 Networks (VPNs)", RFC 4364, February 2006. 917 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 918 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 920 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 921 Mobile IPv6", RFC 5844, May 2010. 923 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 924 "Generic Routing Encapsulation (GRE) Key Option for Proxy 925 Mobile IPv6", RFC 5845, June 2010. 927 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 928 Navali, "Generic Routing Encapsulation (GRE) Key Extension 929 for Mobile IPv4", RFC 6245, May 2011. 931 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 932 Ghanwani, "Routing Bridges (RBridges): Base Protocol 933 Specification", RFC 6325, July 2011. 935 [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution 936 Problems in Large Data Center Networks", RFC 6820, 937 January 2013. 939 [RFC6830] Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, "The 940 Locator/ID Separation Protocol (LISP)", RFC 6830, 941 January 2013. 943 [SPB] "IEEE P802.1aq/D4.5 Draft Standard for Local and 944 Metropolitan Area Networks -- Media Access Control (MAC) 945 Bridges and Virtual Bridged Local Area Networks, 946 Amendment 8: Shortest Path Bridging", February 2012. 948 Appendix A. Change Log 950 A.1. Changes From -01 to -02 952 1. Security Considerations changes (Lou Berger) 954 2. Changes to section on Optimal Forwarding (Xuxiaohu) 956 3. More wording improvements in L2 details (Janos Farkas) 958 4. Referennces to ARMD and LISP documets are now RFCs. 960 A.2. Changes From -00 to -01 962 1. Numerous editorial and clarity improvements. 964 2. Picked up updated terminology from the framework document (e.g., 965 Tenant System). 967 3. Significant changes regarding IEEE 802.1 Ethernets and VLANs. 968 All text moved to the Related Work section, where the technology 969 is summarized. 971 4. Removed section on Forwarding Table Size limitations. This issue 972 only occurs in some deployments with L2 bridging, and is not 973 considered a motivating factor for the NVO3 work. 975 5. Added paragraph in Introduction that makes clear that NVO3 is 976 focused on providing both L2 and L3 service to end systems, and 977 that IP is assumed as the underlay transport in the data center. 979 6. Added new section (2.6) on Optimal Forwarding. 981 7. Added a section on Data Plane issues. 983 8. Significant improvement to Section describing SPBM. 985 9. Added sub-section on VDP in "Related Work" 987 A.3. Changes from draft-narten-nvo3-overlay-problem-statement-04.txt 989 1. This document has only one substantive change relative to 990 draft-narten-nvo3-overlay-problem-statement-04.txt. Two 991 sentences were removed per the discussion that led to WG adoption 992 of this document. 994 Authors' Addresses 996 Thomas Narten (editor) 997 IBM 999 Email: narten@us.ibm.com 1001 Eric Gray (editor) 1002 Ericsson 1004 Email: eric.gray@ericsson.com 1006 David Black 1007 EMC 1009 Email: david.black@emc.com 1010 Dinesh Dutt 1011 Cumulus Networks 1013 Email: ddutt.ietf@hobbesdutt.com 1015 Luyuan Fang 1016 Cisco Systems 1017 111 Wood Avenue South 1018 Iselin, NJ 08830 1019 USA 1021 Email: lufang@cisco.com 1023 Lawrence Kreeger 1024 Cisco 1026 Email: kreeger@cisco.com 1028 Maria Napierala 1029 AT&T 1030 200 Laurel Avenue 1031 Middletown, NJ 07748 1032 USA 1034 Email: mnapierala@att.com 1036 Murari Sridharan 1037 Microsoft 1039 Email: muraris@microsoft.com