idnits 2.17.1 draft-narten-nvo3-overlay-problem-statement-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 10, 2012) is 4248 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-01 == Outdated reference: A later version (-24) exists of draft-ietf-lisp-23 == Outdated reference: A later version (-07) exists of draft-ietf-trill-fine-labeling-01 == Outdated reference: A later version (-04) exists of draft-kreeger-nvo3-overlay-cp-01 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-03 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational D. Black 5 Expires: February 11, 2013 EMC 6 D. Dutt 8 L. Fang 9 Cisco Systems 10 E. Gray 11 Ericsson 12 L. Kreeger 13 Cisco 14 M. Napierala 15 AT&T 16 M. Sridharan 17 Microsoft 18 August 10, 2012 20 Problem Statement: Overlays for Network Virtualization 21 draft-narten-nvo3-overlay-problem-statement-04 23 Abstract 25 This document describes issues associated with providing multi- 26 tenancy in large data center networks that require an overlay-based 27 network virtualization approach to addressing them. A key multi- 28 tenancy requirement is traffic isolation, so that a tenant's traffic 29 is not visible to any other tenant. This isolation can be achieved 30 by assigning one or more virtual networks to each tenant such that 31 traffic within a virtual network is isolated from traffic in other 32 virtual networks. The primary functionality required is provisioning 33 virtual networks, associating a virtual machine's virtual network 34 interface(s) with the appropriate virtual network, and maintaining 35 that association as the virtual machine is activated, migrated and/or 36 deactivated. Use of an overlay-based approach enables scalable 37 deployment on large network infrastructures. 39 Status of this Memo 41 This Internet-Draft is submitted in full conformance with the 42 provisions of BCP 78 and BCP 79. 44 Internet-Drafts are working documents of the Internet Engineering 45 Task Force (IETF). Note that other groups may also distribute 46 working documents as Internet-Drafts. The list of current Internet- 47 Drafts is at http://datatracker.ietf.org/drafts/current/. 49 Internet-Drafts are draft documents valid for a maximum of six months 50 and may be updated, replaced, or obsoleted by other documents at any 51 time. It is inappropriate to use Internet-Drafts as reference 52 material or to cite them other than as "work in progress." 54 This Internet-Draft will expire on February 11, 2013. 56 Copyright Notice 58 Copyright (c) 2012 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the Simplified BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 2. Problem Areas . . . . . . . . . . . . . . . . . . . . . . . . 5 75 2.1. Need For Dynamic Provisioning . . . . . . . . . . . . . . 5 76 2.2. Virtual Machine Mobility Limitations . . . . . . . . . . . 6 77 2.3. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 78 2.4. Need to Decouple Logical and Physical Configuration . . . 7 79 2.5. Need For Address Separation Between Tenants . . . . . . . 7 80 2.6. Need For Address Separation Between Tenant and 81 Infrastructure . . . . . . . . . . . . . . . . . . . . . . 7 82 2.7. IEEE 802.1 VLAN Limitations . . . . . . . . . . . . . . . 8 83 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 8 84 3.1. Benefits of Network Overlays . . . . . . . . . . . . . . . 9 85 3.2. Communication Between Virtual and Traditional Networks . . 10 86 3.3. Communication Between Virtual Networks . . . . . . . . . . 11 87 3.4. Overlay Design Characteristics . . . . . . . . . . . . . . 11 88 3.5. Overlay Networking Work Areas . . . . . . . . . . . . . . 12 89 4. Related IETF and IEEE Work . . . . . . . . . . . . . . . . . 14 90 4.1. L3 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 14 91 4.2. L2 BGP/MPLS IP VPNs . . . . . . . . . . . . . . . . . . . 15 92 4.3. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 15 93 4.4. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 94 4.5. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 95 4.6. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 16 96 4.7. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 16 97 4.8. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 98 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 16 99 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 100 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 17 101 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 102 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 103 10. Informative References . . . . . . . . . . . . . . . . . . . . 17 104 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 19 105 A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 19 106 A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 19 107 A.3. Changes from -03 . . . . . . . . . . . . . . . . . . . . . 20 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 110 1. Introduction 112 Data Centers are increasingly being consolidated and outsourced in an 113 effort, both to improve the deployment time of applications as well 114 as reduce operational costs. This coincides with an increasing 115 demand for compute, storage, and network resources from applications. 116 In order to scale compute, storage, and network resources, physical 117 resources are being abstracted from their logical representation, in 118 what is referred to as server, storage, and network virtualization. 119 Virtualization can be implemented in various layers of computer 120 systems or networks 122 The demand for server virtualization is increasing in data centers. 123 With server virtualization, each physical server supports multiple 124 virtual machines (VMs), each running its own operating system, 125 middleware and applications. Virtualization is a key enabler of 126 workload agility, i.e., allowing any server to host any application 127 and providing the flexibility of adding, shrinking, or moving 128 services within the physical infrastructure. Server virtualization 129 provides numerous benefits, including higher utilization, increased 130 security, reduced user downtime, reduced power usage, etc. 132 Multi-tenant data centers are taking advantage of the benefits of 133 server virtualization to provide a new kind of hosting, a virtual 134 hosted data center. Multi-tenant data centers are ones where 135 individual tenants could belong to a different company (in the case 136 of a public provider) or a different department (in the case of an 137 internal company data center). Each tenant has the expectation of a 138 level of security and privacy separating their resources from those 139 of other tenants. For example, one tenant's traffic must never be 140 exposed to another tenant, except through carefully controlled 141 interfaces, such as a security gateway. 143 To a tenant, virtual data centers are similar to their physical 144 counterparts, consisting of end stations attached to a network, 145 complete with services such as load balancers and firewalls. But 146 unlike a physical data center, end stations connect to a virtual 147 network. To end stations, a virtual network looks like a normal 148 network (e.g., providing an ethernet or L3 service), except that the 149 only end stations connected to the virtual network are those 150 belonging to a tenant's specific virtual network. 152 A tenant is the administrative entity that is responsible for and 153 manages a specific virtual network instance and its associated 154 services (whether virtual or physical). In a cloud environment, a 155 tenant would correspond to the customer that has defined and is using 156 a particular virtual network. However, a tenant may also find it 157 useful to create multiple different virtual network instances. 159 Hence, there is a one-to-many mapping between tenants and virtual 160 network instances. A single tenant may operate multiple individual 161 virtual network instances, each associated with a different service. 163 How a virtual network is implemented does not generally matter to the 164 tenant; what matters is that the service provided (L2 or L3) has the 165 right semantics, performance, etc. It could be implemented via a 166 pure routed network, a pure bridged network or a combination of 167 bridged and routed networks. A key requirement is that each 168 individual virtual network instance be isolated from other virtual 169 network instances. 171 For data center virtualization, two key issues must be addressed. 172 First, address space separation between tenants must be supported. 173 Second, it must be possible to place (and migrate) VMs anywhere in 174 the data center, without restricting VM addressing to match the 175 subnet boundaries of the underlying data center network. 177 This document outlines the problems encountered in scaling the number 178 of isolated networks in a data center, as well as the problems of 179 managing the creation/deletion, membership and span of these networks 180 and makes the case that an overlay based approach, where individual 181 networks are implemented within individual virtual networks that are 182 dynamically controlled by a standardized control plane provides a 183 number of advantages over current approaches. The purpose of this 184 document is to identify the set of problems that any solution has to 185 address in building multi-tenant data centers. With this approach, 186 the goal is to allow the construction of standardized, interoperable 187 implementations to allow the construction of multi-tenant data 188 centers. 190 Section 2 describes the problem space details. Section 3 describes 191 overlay networks in more detail. Sections 4 and 5 review related and 192 further work, while Section 6 closes with a summary. 194 2. Problem Areas 196 The following subsections describe aspects of multi-tenant data 197 center networking that pose problems for network infrastructure. 198 Different problem aspects may arise based on the network architecture 199 and scale. 201 2.1. Need For Dynamic Provisioning 203 Cloud computing involves on-demand provisioning of resources for 204 multi-tenant environments. A common example of cloud computing is 205 the public cloud, where a cloud service provider offers elastic 206 services to multiple customers over the same infrastructure. In 207 current systems, it can be difficult to provision resources for 208 individual tenants in such a way that provisioned properties migrate 209 automatically when services are dynamically moved around within the 210 data center to optimize workloads. 212 2.2. Virtual Machine Mobility Limitations 214 A key benefit of server virtualization is virtual machine (VM) 215 mobility. A VM can be migrated from one server to another, live, 216 i.e., while continuing to run and without needing to shut it down and 217 restart it at the new location. A key requirement for live migration 218 is that a VM retain critical network state at its new location, 219 including its IP and MAC address(es). Preservation of MAC addresses 220 may be necessary, for example, when software licenses are bound to 221 MAC addresses. More generally, any change in the VM's MAC addresses 222 resulting from a move would be visible to the VM and thus potentially 223 result in unexpected disruptions. Retaining IP addresses after a 224 move is necessary to prevent existing transport connections (e.g., 225 TCP) from breaking and needing to be restarted. 227 In traditional data centers, servers are assigned IP addresses based 228 on their physical location, for example based on the Top of Rack 229 (ToR) switch for the server rack or the VLAN configured to the 230 server. Servers can only move to other locations within the same IP 231 subnet. This constraint is not problematic for physical servers, 232 which move infrequently, but it restricts the placement and movement 233 of VMs within the data center. Any solution for a scalable multi- 234 tenant data center must allow a VM to be placed (or moved) anywhere 235 within the data center, without being constrained by the subnet 236 boundary concerns of the host servers. 238 2.3. Inadequate Forwarding Table Sizes in Switches 240 Today's virtualized environments place additional demands on the 241 forwarding tables of switches in the physical infrastructure. 242 Instead of just one link-layer address per server, the switching 243 infrastructure has to learn addresses of the individual VMs (which 244 could range in the 100s per server). This is a requirement since 245 traffic from/to the VMs to the rest of the physical network will 246 traverse the physical network infrastructure. This places a much 247 larger demand on the switches' forwarding table capacity compared to 248 non-virtualized environments, causing more traffic to be flooded or 249 dropped when the number of addresses in use exceeds a switch's 250 forwarding table capacity. 252 2.4. Need to Decouple Logical and Physical Configuration 254 Data center operators must be able to achieve high utilization of 255 server and network capacity. For efficient and flexible allocation, 256 operators should be able to spread a virtual network instance across 257 servers in any rack in the data center. It should also be possible 258 to migrate compute workloads to any server anywhere in the network 259 while retaining the workload's addresses. In networks using VLANs, 260 moving servers elsewhere in the network may require expanding the 261 scope of the VLAN beyond its original boundaries. While this can be 262 done, it requires potentially complex network configuration changes 263 and can conflict with the desire to bound the size of broadcast 264 domains, especially in larger data centers. 266 However, in order to limit the broadcast domain of each VLAN, multi- 267 destination frames within a VLAN should optimally flow only to those 268 devices that have that VLAN configured. When workloads migrate, the 269 physical network (e.g., access lists) may need to be reconfigured 270 which is typically time consuming and error prone. 272 An important use case is cross-pod expansion. A pod typically 273 consists of one or more racks of servers with its associated network 274 and storage connectivity. A tenant's virtual network may start off 275 on a pod and, due to expansion, require servers/VMs on other pods, 276 especially the case when other pods are not fully utilizing all their 277 resources. This use case requires that virtual networks span 278 multiple pods in order to provide connectivity to all of its tenant's 279 servers/VMs. Such expansion can be difficult to achieve when tenant 280 addressing is tied to the addressing used by the underlay network or 281 when it requires that the scope of the underlying L2 VLAN expand 282 beyond its original pod boundary. 284 2.5. Need For Address Separation Between Tenants 286 Individual tenants need control over the addresses they use within a 287 virtual network. But it can be problematic when different tenants 288 want to use the same addresses, or even if the same tenant wants to 289 reuse the same addresses in different virtual networks. 290 Consequently, virtual networks must allow tenants to use whatever 291 addresses they want without concern for what addresses are being used 292 by other tenants or other virtual networks. 294 2.6. Need For Address Separation Between Tenant and Infrastructure 296 As in the previous case, a tenant needs to be able to use whatever 297 addresses it wants in a virtual network independent of what addresses 298 the underlying data center network is using. Tenants (and the 299 underlay infrastructure provider) should be able use whatever 300 addresses make sense for them, without having to worry about address 301 collisions between addresses used by tenants and those used by the 302 underlay data center network. 304 2.7. IEEE 802.1 VLAN Limitations 306 VLANs are a well known construct in the networking industry, 307 providing an L2 service via an L2 underlay. A VLAN is an L2 bridging 308 construct that provides some of the semantics of virtual networks 309 mentioned above: a MAC address is unique within a VLAN, but not 310 necessarily across VLANs. Traffic sourced within a VLAN (including 311 broadcast and multicast traffic) remains within the VLAN it 312 originates from. Traffic forwarded from one VLAN to another 313 typically involves router (L3) processing. The forwarding table look 314 up operation is keyed on {VLAN, MAC address} tuples. 316 But there are problems and limitations with L2 VLANs. VLANs are a 317 pure L2 bridging construct and VLAN identifiers are carried along 318 with data frames to allow each forwarding point to know what VLAN the 319 frame belongs to. A VLAN today is defined as a 12 bit number, 320 limiting the total number of VLANs to 4096 (though typically, this 321 number is 4094 since 0 and 4095 are reserved). Due to the large 322 number of tenants that a cloud provider might service, the 4094 VLAN 323 limit is often inadequate. In addition, there is often a need for 324 multiple VLANs per tenant, which exacerbates the issue. The use of a 325 sufficiently large VNID, present in the overlay control plane and 326 possibly also in the dataplane would eliminate current VLAN size 327 limitations associated with single 12-bit VLAN tags. 329 3. Network Overlays 331 Virtual Networks are used to isolate a tenant's traffic from that of 332 other tenants (or even traffic within the same tenant that requires 333 isolation). There are two main characteristics of virtual networks: 335 1. Virtual networks isolate the address space used in one virtual 336 network from the address space used by another virtual network. 337 The same network addresses may be used in different virtual 338 networks at the same time. In addition, the address space used 339 by a virtual network is independent from that used by the 340 underlying physical network. 342 2. Virtual Networks limit the scope of packets sent on the virtual 343 network. Packets sent by end systems attached to a virtual 344 network are delivered as expected to other end systems on that 345 virtual network and may exit a virtual network only through 346 controlled exit points such as a security gateway. Likewise, 347 packets sourced from outside of the virtual network may enter the 348 virtual network only through controlled entry points, such as a 349 security gateway. 351 3.1. Benefits of Network Overlays 353 To address the problems described in Section 2, a network overlay 354 model can be used. 356 The idea behind an overlay is quite straightforward. Each virtual 357 network instance is implemented as an overlay. The original packet 358 is encapsulated by the first-hop network device. The encapsulation 359 identifies the destination of the device that will perform the 360 decapsulation before delivering the original packet to the endpoint. 361 The rest of the network forwards the packet based on the 362 encapsulation header and can be oblivious to the payload that is 363 carried inside. 365 Overlays are based on what is commonly known as a "map-and-encap" 366 architecture. There are three distinct and logically separable 367 steps: 369 1. The first-hop overlay device implements a mapping operation that 370 determines where the encapsulated packet should be sent to reach 371 its intended destination VM. Specifically, the mapping function 372 maps the destination address (either L2 or L3) of a packet 373 received from a VM into the corresponding destination address of 374 the egress device. The destination address will be the underlay 375 address of the device doing the decapsulation and is an IP 376 address. 378 2. Once the mapping has been determined, the ingress overlay device 379 encapsulates the received packet within an overlay header. 381 3. The final step is to actually forward the (now encapsulated) 382 packet to its destination. The packet is forwarded by the 383 underlay (i.e., the IP network) based entirely on its outer 384 address. Upon receipt at the destination, the egress overlay 385 device decapsulates the original packet and delivers it to the 386 intended recipient VM. 388 Each of the above steps is logically distinct, though an 389 implementation might combine them for efficiency or other reasons. 390 It should be noted that in L3 BGP/VPN terminology, the above steps 391 are commonly known as "forwarding" or "virtual forwarding". 393 The first hop network device can be a traditional switch or router or 394 the virtual switch residing inside a hypervisor. Furthermore, the 395 endpoint can be a VM or it can be a physical server. Examples of 396 architectures based on network overlays include BGP/MPLS VPNs 397 [RFC4364], TRILL [RFC6325], LISP [I-D.ietf-lisp], and Shortest Path 398 Bridging (SPB-M) [SPBM]. 400 In the data plane, a virtual network identifier (or VNID), or a 401 locally significant identifier, can be carried as part of the overlay 402 header so that every data packet explicitly identifies the specific 403 virtual network the packet belongs to. Since both routed and bridged 404 semantics can be supported by a virtual data center, the original 405 packet carried within the overlay header can be an Ethernet frame 406 complete with MAC addresses or just the IP packet. 408 The use of a sufficiently large VNID would address current VLAN 409 limitations associated with single 12-bit VLAN tags. This VNID can 410 be carried in the control plane. In the data plane, an overlay 411 header provides a place to carry either the VNID, or an identifier 412 that is locally-significant to the edge device. In both cases, the 413 identifier in the overlay header specifies which virtual network the 414 data packet belongs to. 416 A key aspect of overlays is the decoupling of the "virtual" MAC 417 and/or IP addresses used by VMs from the physical network 418 infrastructure and the infrastructure IP addresses used by the data 419 center. If a VM changes location, the overlay edge devices simply 420 update their mapping tables to reflect the new location of the VM 421 within the data center's infrastructure space. Because an overlay 422 network is used, a VM can now be located anywhere in the data center 423 that the overlay reaches without regards to traditional constraints 424 implied by L2 properties such as VLAN numbering, or the span of an L2 425 broadcast domain scoped to a single pod or access switch. 427 Multi-tenancy is supported by isolating the traffic of one virtual 428 network instance from traffic of another. Traffic from one virtual 429 network instance cannot be delivered to another instance without 430 (conceptually) exiting the instance and entering the other instance 431 via an entity that has connectivity to both virtual network 432 instances. Without the existence of this entity, tenant traffic 433 remains isolated within each individual virtual network instance. 435 Overlays are designed to allow a set of VMs to be placed within a 436 single virtual network instance, whether that virtual network 437 provides a bridged network or a routed network. 439 3.2. Communication Between Virtual and Traditional Networks 441 Not all communication will be between devices connected to 442 virtualized networks. Devices using overlays will continue to access 443 devices and make use of services on traditional, non-virtualized 444 networks, whether in the data center, the public Internet, or at 445 remote/branch campuses. Any virtual network solution must be capable 446 of interoperating with existing routers, VPN services, load 447 balancers, intrusion detection services, firewalls, etc. on external 448 networks. 450 Communication between devices attached to a virtual network and 451 devices connected to non-virtualized networks is handled 452 architecturally by having specialized gateway devices that receive 453 packets from a virtualized network, decapsulate them, process them as 454 regular (i.e., non-virtualized) traffic, and finally forward them on 455 to their appropriate destination (and vice versa). Additional 456 identification, such as VLAN tags, could be used on the non- 457 virtualized side of such a gateway to enable forwarding of traffic 458 for multiple virtual networks over a common non-virtualized link. 460 A wide range of implementation approaches are possible. Overlay 461 gateway functionality could be combined with other network 462 functionality into a network device that implements the overlay 463 functionality, and then forwards traffic between other internal 464 components that implement functionality such as full router service, 465 load balancing, firewall support, VPN gateway, etc. 467 3.3. Communication Between Virtual Networks 469 Communication between devices on different virtual networks is 470 handled architecturally by adding specialized interconnect 471 functionality among the otherwise isolated virtual networks. For a 472 virtual network providing an L2 service, such interconnect 473 functionality could be IP forwarding configured as part of the 474 "default gateway" for each virtual network. For a virtual network 475 providing L3 service, the interconnect functionality could be IP 476 forwarding configured as part of routing between IP subnets or it can 477 be based on configured inter-virtual network traffic policies. In 478 both cases, the implementation of the interconnect functionality 479 could be distributed across the NVEs, and could be combined with 480 other network functionality (e.g., load balancing, firewall support) 481 that is applied to traffic that is forwarded between virtual 482 networks. 484 3.4. Overlay Design Characteristics 486 There are existing layer 2 and layer 3 overlay protocols in 487 existence, but they do not necessarily solve all of today's problem 488 in the environment of a highly virtualized data center. Below are 489 some of the characteristics of environments that must be taken into 490 account by the overlay technology: 492 1. Highly distributed systems. The overlay should work in an 493 environment where there could be many thousands of access devices 494 (e.g. residing within the hypervisors) and many more end systems 495 (e.g. VMs) connected to them. This leads to a distributed 496 mapping system that puts a low overhead on the overlay tunnel 497 endpoints. 499 2. Many highly distributed virtual networks with sparse membership. 500 Each virtual network could be highly dispersed inside the data 501 center. Also, along with expectation of many virtual networks, 502 the number of end systems connected to any one virtual network is 503 expected to be relatively low; Therefore, the percentage of 504 access devices participating in any given virtual network would 505 also be expected to be low. For this reason, efficient delivery 506 of multi-destination traffic within a virtual network instance 507 should be taken into consideration. 509 3. Highly dynamic end systems. End systems connected to virtual 510 networks can be very dynamic, both in terms of creation/deletion/ 511 power-on/off and in terms of mobility across the access devices. 513 4. Work with existing, widely deployed network Ethernet switches and 514 IP routers without requiring wholesale replacement. The first 515 hop device (or end system) that adds and removes the overlay 516 header will require new equipment and/or new software. 518 5. Work with existing data center network deployments without 519 requiring major changes in operational or other practices. For 520 example, some data centers have not enabled multicast beyond 521 link-local scope. Overlays should be capable of leveraging 522 underlay multicast support where appropriate, but not require its 523 enablement in order to use an overlay solution. 525 6. Network infrastructure administered by a single administrative 526 domain. This is consistent with operation within a data center, 527 and not across the Internet. 529 3.5. Overlay Networking Work Areas 531 There are three specific and separate potential work areas needed to 532 realize an overlay solution. The areas correspond to different 533 possible "on-the-wire" protocols, where distinct entities interact 534 with each other. 536 One area of work concerns the address dissemination protocol an NVE 537 uses to build and maintain the mapping tables it uses to deliver 538 encapsulated packets to their proper destination. One approach is to 539 build mapping tables entirely via learning (as is done in 802.1 540 networks). But to provide better scaling properties, a more 541 sophisticated approach is needed, i.e., the use of a specialized 542 control plane protocol. While there are some advantages to using or 543 leveraging an existing protocol for maintaining mapping tables, the 544 fact that large numbers of NVE's will likely reside in hypervisors 545 places constraints on the resources (cpu and memory) that can be 546 dedicated to such functions. For example, routing protocols (e.g., 547 IS-IS, BGP) may have scaling difficulties if implemented directly in 548 all NVEs, based on both flooding and convergence time concerns. An 549 alternative approach would be to use a standard query protocol 550 between NVEs and the set of network nodes that maintain address 551 mappings used across the data center for the entire overlay system. 553 From an architectural perspective, one can view the address mapping 554 dissemination problem as having two distinct and separable 555 components. The first component consists of a back-end "oracle" that 556 is responsible for distributing and maintaining the mapping 557 information for the entire overlay system. The second component 558 consists of the on-the-wire protocols an NVE uses when interacting 559 with the oracle. 561 The back-end oracle could provide high performance, high resiliency, 562 failover, etc. and could be implemented in significantly different 563 ways. For example, one model uses a traditional, centralized 564 "directory-based" database, using replicated instances for 565 reliability and failover. A second model involves using and possibly 566 extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To 567 support different architectural models, it is useful to have one 568 standard protocol for the NVE-oracle interaction while allowing 569 different protocols and architectural approaches for the oracle 570 itself. Separating the two allows NVEs to transparently interact 571 with different types of oracles, i.e., either of the two 572 architectural models described above. Having separate protocols 573 could also allow for a simplified NVE that only interacts with the 574 oracle for the mapping table entries it needs and allows the oracle 575 (and its associated protocols) to evolve independently over time with 576 minimal impact to the NVEs. 578 A third work area considers the attachment and detachment of VMs (or 579 Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from 580 a specific virtual network instance. When a VM attaches, the Network 581 Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates 582 the VM with a specific overlay for the purposes of tunneling traffic 583 sourced from or destined to the VM. When a VM disconnects, it is 584 removed from the overlay and the NVE effectively terminates any 585 tunnels associated with the VM. To achieve this functionality, a 586 standardized interaction between the NVE and hypervisor may be 587 needed, for example in the case where the NVE resides on a separate 588 device from the VM. 590 In summary, there are three areas of potential work. The first area 591 concerns the oracle itself and any on-the-wire protocols it needs. A 592 second area concerns the interaction between the oracle and NVEs. 593 The third work area concerns protocols associated with attaching and 594 detaching a VM from a particular virtual network instance. All three 595 work areas are important to the development of scalable, 596 interoperable solutions. 598 4. Related IETF and IEEE Work 600 The following subsections discuss related IETF and IEEE work in 601 progress, the items are not meant to be complete coverage of all IETF 602 and IEEE data center related work, nor are the descriptions 603 comprehensive. Each area is currently trying to address certain 604 limitations of today's data center networks, e.g., scaling is a 605 common issue for every area listed and multi-tenancy and VM mobility 606 are important focus areas as well. Comparing and evaluating the work 607 result and progress of each work area listed is out of scope of this 608 document. The intent of this section is to provide a reference to 609 the interested readers. 611 4.1. L3 BGP/MPLS IP VPNs 613 BGP/MPLS IP VPNs [RFC4364] support multi-tenancy address overlapping, 614 VPN traffic isolation, and address separation between tenants and 615 network infrastructure. The BGP/MPLS control plane is used to 616 distribute the VPN labels and the tenant IP addresses which identify 617 the tenants (or to be more specific, the particular VPN/VN) and 618 tenant IP addresses. Deployment of enterprise L3 VPNs has been shown 619 to scale to thousands of VPNs and millions of VPN prefixes. BGP/MPLS 620 IP VPNs are currently deployed in some large enterprise data centers. 621 The potential limitation for deploying BGP/MPLS IP VPNs in data 622 center environments is the practicality of using BGP in the data 623 center, especially reaching into the servers or hypervisors. There 624 may be computing work force skill set issues, equipment support 625 issues, and potential new scaling challenges. A combination of BGP 626 and lighter weight IP signaling protocols, e.g., XMPP, have been 627 proposed to extend the solutions into DC environment [I-D.margues- 628 end-system], while taking advantage of building in VPN features with 629 its rich policy support; it is especially useful for inter-tenant 630 connectivity. 632 4.2. L2 BGP/MPLS IP VPNs 634 Ethernet Virtual Private Networks (E-VPNs) [I-D.ietf-l2vpn-evpn] 635 provide an emulated L2 service in which each tenant has its own 636 Ethernet network over a common IP or MPLS infrastructure and a BGP/ 637 MPLS control plane is used to distribute the tenant MAC addresses and 638 the MPLS labels that identify the tenants and tenant MAC addresses. 639 Within the BGP/MPLS control plane a thirty two bit Ethernet Tag is 640 used to identify the broadcast domains (VLANs) associated with a 641 given L2 VLAN service instance and these Ethernet tags are mapped to 642 VLAN IDs understood by the tenant at the service edges. This means 643 that the limit of 4096 VLANs is associated with an individual tenant 644 service edge, enabling a much higher level of scalability. 645 Interconnection between tenants is also allowed in a controlled 646 fashion. 648 VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 649 concept of a combined L2/L3 VPN service in order to support the 650 mobility of individual Virtual Machines (VMs) between Data Centers 651 connected over a common IP or MPLS infrastructure. 653 4.3. IEEE 802.1aq - Shortest Path Bridging 655 Shortest Path Bridging (SPB-M) is an IS-IS based overlay for L2 656 Ethernets. SPB-M supports multi-pathing and addresses a number of 657 shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M 658 uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit 659 I-SID, which can be used to identify virtual network instances. 660 SPB-M is entirely L2 based, extending the L2 Ethernet bridging model. 662 4.4. ARMD 664 ARMD is chartered to look at data center scaling issues with a focus 665 on address resolution. ARMD is currently chartered to develop a 666 problem statement and is not currently developing solutions. While 667 an overlay-based approach may address some of the "pain points" that 668 have been raised in ARMD (e.g., better support for multi-tenancy), an 669 overlay approach may also push some of the L2 scaling concerns (e.g., 670 excessive flooding) to the IP level (flooding via IP multicast). 671 Analysis will be needed to understand the scaling tradeoffs of an 672 overlay based approach compared with existing approaches. On the 673 other hand, existing IP-based approaches such as proxy ARP may help 674 mitigate some concerns. 676 4.5. TRILL 678 TRILL is an L2-based approach aimed at improving deficiencies and 679 limitations with current Ethernet networks and STP in particular. 681 Although it differs from Shortest Path Bridging in many architectural 682 and implementation details, it is similar in that is provides an L2- 683 based service to end systems. TRILL as defined today, supports only 684 the standard (and limited) 12-bit VLAN model. Approaches to extend 685 TRILL to support more than 4094 VLANs are currently under 686 investigation [I-D.ietf-trill-fine-labeling] 688 4.6. L2VPNs 690 The IETF has specified a number of approaches for connecting L2 691 domains together as part of the L2VPN Working Group. That group, 692 however has historically been focused on Provider-provisioned L2 693 VPNs, where the service provider participates in management and 694 provisioning of the VPN. In addition, much of the target environment 695 for such deployments involves carrying L2 traffic over WANs. Overlay 696 approaches are intended be used within data centers where the overlay 697 network is managed by the data center operator, rather than by an 698 outside party. While overlays can run across the Internet as well, 699 they will extend well into the data center itself (e.g., up to and 700 including hypervisors) and include large numbers of machines within 701 the data center itself. 703 Other L2VPN approaches, such as L2TP [RFC2661] require significant 704 tunnel state at the encapsulating and decapsulating end points. 705 Overlays require less tunnel state than other approaches, which is 706 important to allow overlays to scale to hundreds of thousands of end 707 points. It is assumed that smaller switches (i.e., virtual switches 708 in hypervisors or the adjacent devices to which VMs connect) will be 709 part of the overlay network and be responsible for encapsulating and 710 decapsulating packets. 712 4.7. Proxy Mobile IP 714 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 715 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 717 4.8. LISP 719 LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where 720 the internal addresses are end station Identifiers and the outer IP 721 addresses represent the location of the end station within the core 722 IP network topology. The LISP overlay header uses a 24-bit Instance 723 ID used to support overlapping inner IP addresses. 725 5. Further Work 727 It is believed that overlay-based approaches may be able to reduce 728 the overall amount of flooding and other multicast and broadcast 729 related traffic (e.g, ARP and ND) currently experienced within 730 current data centers with a large flat L2 network. Further analysis 731 is needed to characterize expected improvements. 733 There are a number of VPN approaches that provide some if not all of 734 the desired semantics of virtual networks. A gap analysis will be 735 needed to assess how well existing approaches satisfy the 736 requirements. 738 6. Summary 740 This document has argued that network virtualization using overlays 741 addresses a number of issues being faced as data centers scale in 742 size. In addition, careful study of current data center problems is 743 needed for development of proper requirements and standard solutions. 745 Three potential work were identified. The first involves the 746 interaction that take place when a VM attaches or detaches from an 747 overlay. A second involves the protocol an NVE would use to 748 communicate with a backend "oracle" to learn and disseminate mapping 749 information about the VMs the NVE communicates with. The third 750 potential work area involves the backend oracle itself, i.e., how it 751 provides failover and how it interacts with oracles in other domains. 753 7. Acknowledgments 755 Helpful comments and improvements to this document have come from 756 John Drake, Ariel Hendel, Vinit Jain, Thomas Morin, Benson Schliesser 757 and many others on the mailing list. 759 8. IANA Considerations 761 This memo includes no request to IANA. 763 9. Security Considerations 765 TBD 767 10. Informative References 769 [I-D.fang-vpn4dc-problem-statement] 770 Napierala, M., Fang, L., and D. Cai, "IP-VPN Data Center 771 Problem Statement and Requirements", 772 draft-fang-vpn4dc-problem-statement-01 (work in progress), 773 June 2012. 775 [I-D.ietf-l2vpn-evpn] 776 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 777 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 778 draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. 780 [I-D.ietf-lisp] 781 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 782 "Locator/ID Separation Protocol (LISP)", 783 draft-ietf-lisp-23 (work in progress), May 2012. 785 [I-D.ietf-trill-fine-labeling] 786 Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. 787 Dutt, "TRILL: Fine-Grained Labeling", 788 draft-ietf-trill-fine-labeling-01 (work in progress), 789 June 2012. 791 [I-D.kreeger-nvo3-overlay-cp] 792 Kreeger, L., Dutt, D., Narten, T., Black, D., and M. 793 Sridhavan, "Network Virtualization Overlay Control 794 Protocol Requirements", draft-kreeger-nvo3-overlay-cp-01 795 (work in progress), July 2012. 797 [I-D.lasserre-nvo3-framework] 798 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 799 Rekhter, "Framework for DC Network Virtualization", 800 draft-lasserre-nvo3-framework-03 (work in progress), 801 July 2012. 803 [I-D.raggarwa-data-center-mobility] 804 Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., 805 and L. Fang, "Data Center Mobility based on BGP/MPLS, IP 806 Routing and NHRP", draft-raggarwa-data-center-mobility-03 807 (work in progress), June 2012. 809 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 810 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 811 RFC 2661, August 1999. 813 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 814 Networks (VPNs)", RFC 4364, February 2006. 816 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 817 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 819 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 820 Mobile IPv6", RFC 5844, May 2010. 822 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 823 "Generic Routing Encapsulation (GRE) Key Option for Proxy 824 Mobile IPv6", RFC 5845, June 2010. 826 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 827 Navali, "Generic Routing Encapsulation (GRE) Key Extension 828 for Mobile IPv4", RFC 6245, May 2011. 830 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 831 Ghanwani, "Routing Bridges (RBridges): Base Protocol 832 Specification", RFC 6325, July 2011. 834 [SPBM] "IEEE P802.1aq/D4.5 Draft Standard for Local and 835 Metropolitan Area Networks -- Media Access Control (MAC) 836 Bridges and Virtual Bridged Local Area Networks, 837 Amendment 8: Shortest Path Bridging", February 2012. 839 Appendix A. Change Log 841 A.1. Changes from -01 843 1. Removed Section 4.2 (Standardization Issues) and Section 5 844 (Control Plane) as those are more appropriately covered in and 845 overlap with material in [I-D.lasserre-nvo3-framework] and 846 [I-D.kreeger-nvo3-overlay-cp]. 848 2. Expanded introduction and better explained terms such as tenant 849 and virtual network instance. These had been covered in a 850 section that has since been removed. 852 3. Added Section 3.3 "Overlay Networking Work Areas" to better 853 articulate the three separable work components (or "on-the-wire 854 protocols") where work is needed. 856 4. Added section on Shortest Path Bridging in Related Work section. 858 5. Revised some of the terminology to be consistent with 859 [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. 861 A.2. Changes from -02 863 1. Numerous changes in response to discussions on the nvo3 mailing 864 list, with majority of changes in Section 2 (Problem Details) and 865 Section 3 (Network Overlays). Best to see diffs for specific 866 text changes. 868 A.3. Changes from -03 870 1. Too numerous to enumerate; moved solution-specific descriptions 871 to Related Work section. Pulled in additional text (and authors) 872 from from [I-D.fang-vpn4dc-problem-statement], numerous editorial 873 improvements. 875 Authors' Addresses 877 Thomas Narten (editor) 878 IBM 880 Email: narten@us.ibm.com 882 David Black 883 EMC 885 Email: david.black@emc.com 887 Dinesh Dutt 889 Email: ddutt.ietf@hobbesdutt.com 891 Luyuan Fang 892 Cisco Systems 893 111 Wood Avenue South 894 Iselin, NJ 08830 895 USA 897 Email: lufang@cisco.com 899 Eric Gray 900 Ericsson 902 Email: eric.gray@ericsson.com 903 Lawrence Kreeger 904 Cisco 906 Email: kreeger@cisco.com 908 Maria Napierala 909 AT&T 910 200 Laurel Avenue 911 Middletown, NJ 07748 912 USA 914 Email: mnapierala@att.com 916 Murari Sridharan 917 Microsoft 919 Email: muraris@microsoft.com