idnits 2.17.1 draft-narten-nvo3-overlay-problem-statement-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 17, 2012) is 4299 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC4023' is defined on line 718, but no explicit reference was found in the text == Unused Reference: 'RFC5036' is defined on line 725, but no explicit reference was found in the text == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-01 == Outdated reference: A later version (-24) exists of draft-ietf-lisp-23 == Outdated reference: A later version (-07) exists of draft-ietf-trill-fine-labeling-01 == Outdated reference: A later version (-04) exists of draft-kreeger-nvo3-overlay-cp-00 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-01 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-03 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-01 Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational M. Sridharan 5 Expires: January 18, 2013 Microsoft 6 D. Dutt 8 D. Black 9 EMC 10 L. Kreeger 11 Cisco 12 July 17, 2012 14 Problem Statement: Overlays for Network Virtualization 15 draft-narten-nvo3-overlay-problem-statement-03 17 Abstract 19 This document describes issues associated with providing multi- 20 tenancy in large data center networks and an overlay-based network 21 virtualization approach to addressing them. A key multi-tenancy 22 requirement is traffic isolation, so that a tenant's traffic is not 23 visible to any other tenant. This isolation can be achieved by 24 assigning one or more virtual networks to each tenant such that 25 traffic within a virtual network is isolated from traffic in other 26 virtual networks. The primary functionality required is provisioning 27 virtual networks, associating a virtual machine's virtual network 28 interface(s) with the appropriate virtual network, and maintaining 29 that association as the virtual machine is activated, migrated and/or 30 deactivated. Use of an overlay-based approach enables scalable 31 deployment on large network infrastructures. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 18, 2013. 50 Copyright Notice 52 Copyright (c) 2012 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5 69 2.1. Dynamic Provisioning . . . . . . . . . . . . . . . . . . . 5 70 2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5 71 2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 6 72 2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 73 2.5. Decoupling Logical and Physical Configuration . . . . . . 6 74 2.6. Separating Tenant Addressing from Infrastructure 75 Addressing . . . . . . . . . . . . . . . . . . . . . . . . 7 76 2.7. Communication Between Virtual and Traditional Networks . . 7 77 2.8. Communication Between Virtual Networks . . . . . . . . . . 7 78 2.9. Overlay Design Characteristics . . . . . . . . . . . . . . 8 79 3. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 9 80 3.1. Limitations of Existing Virtual Network Models . . . . . . 9 81 3.2. Benefits of Network Overlays . . . . . . . . . . . . . . . 10 82 3.3. Overlay Networking Work Areas . . . . . . . . . . . . . . 11 83 4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 4.1. IEEE 802.1aq - Shortest Path Bridging . . . . . . . . . . 13 85 4.2. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 86 4.3. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 13 87 4.4. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 4.5. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 14 89 4.6. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 90 4.7. Individual Submissions . . . . . . . . . . . . . . . . . . 14 91 5. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15 92 6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 93 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 94 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 95 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 96 10. Informative References . . . . . . . . . . . . . . . . . . . . 15 97 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 17 98 A.1. Changes from -01 . . . . . . . . . . . . . . . . . . . . . 17 99 A.2. Changes from -02 . . . . . . . . . . . . . . . . . . . . . 18 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 102 1. Introduction 104 Server virtualization is increasingly becoming the norm in data 105 centers. With server virtualization, each physical server supports 106 multiple virtual machines (VMs), each running its own operating 107 system, middleware and applications. Virtualization is a key enabler 108 of workload agility, i.e., allowing any server to host any 109 application and providing the flexibility of adding, shrinking, or 110 moving services within the physical infrastructure. Server 111 virtualization provides numerous benefits, including higher 112 utilization, increased security, reduced user downtime, reduced power 113 usage, etc. 115 Large scale multi-tenant data centers are taking advantage of the 116 benefits of server virtualization to provide a new kind of hosting, a 117 virtual hosted data center. Multi-tenant data centers are ones where 118 individual tenants could belong to a different company (in the case 119 of a public provider) or a different department (in the case of an 120 internal company data center). Each tenant has the expectation of a 121 level of security and privacy separating their resources from those 122 of other tenants. For example, one tenant's traffic must never be 123 exposed to another tenant, except through carefully controlled 124 interfaces, such as a security gateway. 126 To a tenant, virtual data centers are similar to their physical 127 counterparts, consisting of end stations attached to a network, 128 complete with services such as load balancers and firewalls. But 129 unlike a physical data center, end stations connect to a virtual 130 network. To end stations, a virtual network looks like a normal 131 network (e.g., providing an ethernet service), except that the only 132 end stations connected to the virtual network are those belonging to 133 the tenant. 135 A tenant is the administrative entity that is responsible for and 136 manages a specific virtual network instance and its associated 137 services (whether virtual or physical). In a cloud environment, a 138 tenant would correspond to the customer that has defined and is using 139 a particular virtual network. However, a tenant may also find it 140 useful to create multiple different virtual network instances. 141 Hence, there is a one-to-many mapping between tenants and virtual 142 network instances. A single tenant may operate multiple individual 143 virtual network instances, each associated with a different service. 145 How a virtual network is implemented does not matter to the tenant. 146 It could be a pure routed network, a pure bridged network or a 147 combination of bridged and routed networks. The key requirement is 148 that each individual virtual network instance be isolated from other 149 virtual network instances. 151 This document outlines the problems encountered in scaling the number 152 of isolated networks in a data center, as well as the problems of 153 managing the creation/deletion, membership and span of these networks 154 and makes the case that an overlay based approach, where individual 155 networks are implemented within individual virtual networks that are 156 dynamically controlled by a standardized control plane provides a 157 number of advantages over current approaches. The purpose of this 158 document is to identify the set of problems that any solution has to 159 address in building multi-tenant data centers. With this approach, 160 the goal is to allow the construction of standardized, interoperable 161 implementations to allow the construction of multi-tenant data 162 centers. 164 Section 2 describes the problem space details. Section 3 describes 165 network overlays in more detail and the potential work areas. 166 Sections 4 and 5 review related and further work, while Section 6 167 closes with a summary. 169 2. Problem Details 171 The following subsections describe aspects of multi-tenant networking 172 that pose problems for large scale network infrastructure. Different 173 problem aspects may arise based on the network architecture and 174 scale. 176 2.1. Dynamic Provisioning 178 Cloud computing involves on-demand provisioning of resources for 179 multi-tenant environments. A common example of cloud computing is 180 the public cloud, where a cloud service provider offers elastic 181 services to multiple customers over the same infrastructure. The on- 182 demand nature of provisioning in conjunction with trusted hypervisors 183 controlling network access by VMs can be achieved through resilient 184 distributed network control mechanisms. 186 2.2. Virtual Machine Mobility Requirements 188 A key benefit of server virtualization is virtual machine (VM) 189 mobility. A VM can be migrated from one server to another, live, 190 i.e., while continuing to run and without needing to shut it down and 191 restart it at the new location. A key requirement for live migration 192 is that a VM retain critical network state at its new location, 193 including its IP and MAC address(es). Preservation of MAC addresses 194 may be necessary, for example, when software licences are bound to 195 MAC addresses. More generally, any change in the VM's MAC addresses 196 resulting from a move would be visible to the VM and thus potentially 197 result in unexpected disruptions. Retaining IP addresses after a 198 move is necessary to prevent existing transport connections (e.g., 199 TCP) from breaking and needing to be restarted. 201 In traditional data centers, servers are assigned IP addresses based 202 on their physical location, for example based on the Top of Rack 203 (ToR) switch for the server rack or the VLAN configured to the 204 server. Servers can only move to other locations within the same IP 205 subnet. This constraint is not problematic for physical servers, 206 which move infrequently, but it restricts the placement and movement 207 of VMs within the data center. Any solution for a scalable multi- 208 tenant data center must allow a VM to be placed (or moved) anywhere 209 within the data center, without being constrained by the subnet 210 boundary concerns of the host servers. 212 2.3. Span of Virtual Networks 214 Another use case is cross pod expansion. A pod typically consists of 215 one or more racks of servers with its associated network and storage 216 connectivity. Tenants may start off on a pod and, due to expansion, 217 require servers/VMs on other pods, especially the case when tenants 218 on the other pods are not fully utilizing all their resources. This 219 use case requires that virtual networks span multiple pods in order 220 to provide connectivity to all of the tenant's servers/VMs. 222 2.4. Inadequate Forwarding Table Sizes in Switches 224 Today's virtualized environments place additional demands on the 225 forwarding tables of switches. Instead of just one link-layer 226 address per server, the switching infrastructure has to learn 227 addresses of the individual VMs (which could range in the 100s per 228 server). This is a requirement since traffic from/to the VMs to the 229 rest of the physical network will traverse the physical network 230 infrastructure. This places a much larger demand on the switches' 231 forwarding table capacity compared to non-virtualized environments, 232 causing more traffic to be flooded or dropped when the addresses in 233 use exceeds the forwarding table capacity. 235 2.5. Decoupling Logical and Physical Configuration 237 Data center operators must be able to achieve high utilization of 238 server and network capacity. For efficient and flexible allocation, 239 operators should be able to spread a virtual network instance across 240 servers in any rack in the data center. It should also be possible 241 to migrate compute workloads to any server anywhere in the network 242 while retaining the workload's addresses. This can be achieved today 243 by stretching VLANs (e.g., by using TRILL or SPB). 245 However, in order to limit the broadcast domain of each VLAN, multi- 246 destination frames within a VLAN should optimally flow only to those 247 devices that have that VLAN configured. When workloads migrate, the 248 physical network (e.g., access lists) may need to be reconfigured 249 which is typically time consuming and error prone. 251 2.6. Separating Tenant Addressing from Infrastructure Addressing 253 It is highly desirable to be able to number the data center underlay 254 network using whatever addresses make sense for it, without having to 255 worry about address collisions between addresses used by the underlay 256 and those used by tenants. 258 2.7. Communication Between Virtual and Traditional Networks 260 Not all communication will be between devices connected to 261 virtualized networks. Devices using overlays will continue to access 262 devices and make use of services on traditional, non-virtualized 263 networks, whether in the data center, the public Internet, or at 264 remote/branch campuses. Any virtual network solution must be capable 265 of interoperating with existing routers, VPN services, load 266 balancers, intrusion detection services, firewalls, etc. on external 267 networks. 269 Communication between devices attached to a virtual network and 270 devices connected to non-virtualized networks is handled 271 architecturally by having specialized gateway devices that receive 272 packets from a virtualized network, decapsulate them, process them as 273 regular (i.e., non-virtualized) traffic, and finally forward them on 274 to their appropriate destination (and vice versa). Additional 275 identification, such as VLAN tags, could be used on the non- 276 virtualized side of such a gateway to enable forwarding of traffic 277 for multiple virtual networks over a common non-virtualized link. 279 A wide range of implementation approaches are possible. Overlay 280 gateway functionality could be combined with other network 281 functionality into a network device that implements the overlay 282 functionality, and then forwards traffic between other internal 283 components that implement functionality such as full router service, 284 load balancing, firewall support, VPN gateway, etc. 286 2.8. Communication Between Virtual Networks 288 Communication between devices on different virtual networks is 289 handled architecturally by adding specialized interconnect 290 functionality among the otherwise isolated virtual networks. For a 291 virtual network providing an Ethernet service, such interconnect 292 functionality could be IP forwarding configured as part of the 293 "default gateway" for each virtual network. For a virtual network 294 providing IP service, the interconnect functionality could be IP 295 forwarding configured as part of the IP addressing structure of each 296 virtual network. In both cases, the implementation of the 297 interconnect functionality could be distributed across the NVEs, and 298 could be combined with other network functionality (e.g., load 299 balancing, firewall support) that is applied to traffic that is 300 forwarded between virtual networks. 302 2.9. Overlay Design Characteristics 304 There are existing layer 2 overlay protocols in existence, but they 305 were not necessarily designed to solve the problem in the environment 306 of a highly virtualized data center. Below are some of the 307 characteristics of environments that must be taken into account by 308 the overlay technology: 310 1. Highly distributed systems. The overlay should work in an 311 environment where there could be many thousands of access 312 switches (e.g. residing within the hypervisors) and many more end 313 systems (e.g. VMs) connected to them. This leads to a 314 distributed mapping system that puts a low overhead on the 315 overlay tunnel endpoints. 317 2. Many highly distributed virtual networks with sparse membership. 318 Each virtual network could be highly dispersed inside the data 319 center. Also, along with expectation of many virtual networks, 320 the number of end systems connected to any one virtual network is 321 expected to be relatively low; Therefore, the percentage of 322 access switches participating in any given virtual network would 323 also be expected to be low. For this reason, efficient pruning 324 of multi-destination traffic should be taken into consideration. 326 3. Highly dynamic end systems. End systems connected to virtual 327 networks can be very dynamic, both in terms of creation/deletion/ 328 power-on/off and in terms of mobility across the access switches. 330 4. Work with existing, widely deployed network Ethernet switches and 331 IP routers without requiring wholesale replacement. The first 332 hop switch that adds and removes the overlay header will require 333 new equipment and/or new software. 335 5. Network infrastructure administered by a single administrative 336 domain. This is consistent with operation within a data center, 337 and not across the Internet. 339 3. Network Overlays 341 Virtual Networks are used to isolate a tenant's traffic from that of 342 other tenants (or even traffic within the same tenant that requires 343 isolation). There are two main characteristics of virtual networks: 345 1. Providing network address space that is isolated from other 346 virtual networks. The same network addresses may be used in 347 different virtual networks on the same underlying network 348 infrastructure. 350 2. Limiting the scope of frames sent on the virtual network. Frames 351 sent by end systems attached to a virtual network are delivered 352 as expected to other end systems on that virtual network and may 353 exit a virtual network only through controlled exit points such 354 as a security gateway. Likewise, frames sourced outside of the 355 virtual network may enter the virtual network only through 356 controlled entry points, such as a security gateway. 358 3.1. Limitations of Existing Virtual Network Models 360 Virtual networks are not new to networking. For example, VLANs are a 361 well known construct in the networking industry. A VLAN is an L2 362 bridging construct that provides some of the semantics of virtual 363 networks mentioned above: a MAC address is unique within a VLAN, but 364 not necessarily across VLANs. Traffic sourced within a VLAN 365 (including broadcast and multicast traffic) remains within the VLAN 366 it originates from. Traffic forwarded from one VLAN to another 367 typically involves router (L3) processing. The forwarding table look 368 up operation is keyed on {VLAN, MAC address} tuples. 370 But there are problems and limitations with L2 VLANs. VLANs are a 371 pure L2 bridging construct and VLAN identifiers are carried along 372 with data frames to allow each forwarding point to know what VLAN the 373 frame belongs to. A VLAN today is defined as a 12 bit number, 374 limiting the total number of VLANs to 4096 (though typically, this 375 number is 4094 since 0 and 4095 are reserved). Due to the large 376 number of tenants that a cloud provider might service, the 4094 VLAN 377 limit is often inadequate. In addition, there is often a need for 378 multiple VLANs per tenant, which exacerbates the issue. The use of a 379 sufficiently large VNID, present in the overlay control plane and 380 possibly also in the dataplane would eliminate current VLAN size 381 limitations associated with single 12-bit VLAN tags. 383 For IP/MPLS networks, Ethernet Virtual Private Network (E-VPN) 384 [I-D.ietf-l2vpn-evpn] provides an emulated Ethernet service in which 385 each tenant has its own Ethernet network over a common IP or MPLS 386 infrastructure and a BGP/MPLS control plane is used to distribute the 387 tenant MAC addresses and the MPLS labels that identify the tenants 388 and tenant MAC addresses. Within the BGP/MPLS control plane a thirty 389 two bit Ethernet Tag is used to identify the broadcast domains 390 (VLANs) associated with a given L2 VLAN service instance and these 391 Ethernet tags are mapped to VLAN IDs understood by the tenant at the 392 service edges. This means that the limit of 4096 VLANs is associated 393 with an individual tenant service edge, enabling a much higher level 394 of scalability. Interconnectivity between tenants is also allowed in 395 a controlled fashion. 397 IP/MPLS networks also provide an IP VPN service (L3 VPN) [RFC4364] in 398 which each tenant has its own IP network over a common IP or MPLS 399 infrastructure and a BGP/MPLS control plane is used to distribute the 400 tenant IP routes and the MPLS labels that identify the tenants and 401 tenant IP routes. As with E-VPNs, interconnectivity between tenants 402 is also allowed in a controlled fashion. 404 VM Mobility [I-D.raggarwa-data-center-mobility] introduces the 405 concept of a combined L2/L3 VPN service in order to support the 406 mobility of individual Virtual Machines (VMs) between Data Centers 407 connected over a common IP or MPLS infrastructure. 409 There are a number of VPN approaches that provide some if not all of 410 the desired semantics of virtual networks. A gap analysis will be 411 needed to assess how well existing approaches satisfy the 412 requirements. 414 3.2. Benefits of Network Overlays 416 To address the problems described earlier, a network overlay model 417 can be used. 419 The idea behind an overlay is quite straightforward. Each virtual 420 network instance is implemented as an overlay. The original frame is 421 encapsulated by the first hop network device. The encapsulation 422 identifies the destination of the device that will perform the 423 decapsulation before delivering the frame to the endpoint. The rest 424 of the network forwards the frame based on the encapsulation header 425 and can be oblivious to the payload that is carried inside. To avoid 426 belaboring the point each time, the first hop network device can be a 427 traditional switch or router or the virtual switch residing inside a 428 hypervisor. Furthermore, the endpoint can be a VM or it can be a 429 physical server. Examples of architectures based on network overlays 430 include BGP/MPLS VPNs [RFC4364], TRILL [RFC6325], LISP 431 [I-D.ietf-lisp], and Shortest Path Bridging [SPB]. 433 With the overlay, a virtual network identifier (or VNID) can be 434 carried as part of the overlay header so that every data frame 435 explicitly identifies the specific virtual network the frame belongs 436 to. Since both routed and bridged semantics can be supported by a 437 virtual data center, the original frame carried within the overlay 438 header can be an Ethernet frame complete with MAC addresses or just 439 the IP packet. 441 The use of a sufficiently large VNID would address current VLAN 442 limitations associated with single 12-bit VLAN tags. This VNID can 443 be carried in the control plane. In the data plane, an overlay 444 header provides a place to carry either the VNID, or a locally- 445 significant identifier. In both cases, the identifier in the overlay 446 header specifies which virtual network the data packet belongs to. 448 A key aspect of overlays is the decoupling of the "virtual" MAC and 449 IP addresses used by VMs from the physical network infrastructure and 450 the infrastructure IP addresses used by the data center. If a VM 451 changes location, the switches at the edge of the overlay simply 452 update their mapping tables to reflect the new location of the VM 453 within the data center's infrastructure space. Because an overlay 454 network is used, a VM can now be located anywhere in the data center 455 that the overlay reaches without regards to traditional constraints 456 implied by L2 properties such as VLAN numbering, or the span of an L2 457 broadcast domain scoped to a single pod or access switch. 459 Multi-tenancy is supported by isolating the traffic of one virtual 460 network instance from traffic of another. Traffic from one virtual 461 network instance cannot be delivered to another instance without 462 (conceptually) exiting the instance and entering the other instance 463 via an entity that has connectivity to both virtual network 464 instances. Without the existence of this entity, tenant traffic 465 remains isolated within each individual virtual network instance. 467 Overlays are designed to allow a set of VMs to be placed within a 468 single virtual network instance, whether that virtual network 469 provides a bridged network or a routed network. 471 3.3. Overlay Networking Work Areas 473 There are three specific and separate potential work areas needed to 474 realize an overlay solution. The areas correspond to different 475 possible "on-the-wire" protocols, where distinct entities interact 476 with each other. 478 One area of work concerns the address dissemination protocol an NVE 479 uses to build and maintain the mapping tables it uses to deliver 480 encapsulated frames to their proper destination. One approach is to 481 build mapping tables entirely via learning (as is done in 802.1 482 networks). But to provide better scaling properties, a more 483 sophisticated approach is needed, i.e., the use of a specialized 484 control plane protocol. While there are some advantages to using or 485 leveraging an existing protocol for maintaining mapping tables, the 486 fact that large numbers of NVE's will likely reside in hypervisors 487 places constraints on the resources (cpu and memory) that can be 488 dedicated to such functions. For example, routing protocols (e.g., 489 IS-IS, BGP) may have scaling difficulties if implemented directly in 490 all NVEs, based on both flooding and convergence time concerns. An 491 alternative approach would be to use a standard query protocol 492 between NVEs and the set of network nodes that maintain address 493 mappings used across the data center for the entire overlay system. 495 From an architectural perspective, one can view the address mapping 496 dissemination problem as having two distinct and separable 497 components. The first component consists of a back-end "oracle" that 498 is responsible for distributing and maintaining the mapping 499 information for the entire overlay system. The second component 500 consists of the on-the-wire protocols an NVE uses when interacting 501 with the oracle. 503 The back-end oracle could provide high performance, high resiliency, 504 failover, etc. and could be implemented in significantly different 505 ways. For example, one model uses a traditional, centralized 506 "directory-based" database, using replicated instances for 507 reliability and failover. A second model involves using and possibly 508 extending an existing routing protocol (e.g., BGP, IS-IS, etc.). To 509 support different architectural models, it is useful to have one 510 standard protocol for the NVE-oracle interaction while allowing 511 different protocols and architectural approaches for the oracle 512 itself. Separating the two allows NVEs to transparently interact 513 with different types of oracles, i.e., either of the two 514 architectural models described above. Having separate protocols 515 could also allow for a simplified NVE that only interacts with the 516 oracle for the mapping table entries it needs and allows the oracle 517 (and its associated protocols) to evolve independently over time with 518 minimal impact to the NVEs. 520 A third work area considers the attachment and detachment of VMs (or 521 Tenant End Systems [I-D.lasserre-nvo3-framework] more generally) from 522 a specific virtual network instance. When a VM attaches, the Network 523 Virtualization Edge (NVE) [I-D.lasserre-nvo3-framework] associates 524 the VM with a specific overlay for the purposes of tunneling traffic 525 sourced from or destined to the VM. When a VM disconnects, it is 526 removed from the overlay and the NVE effectively terminates any 527 tunnels associated with the VM. To achieve this functionality, a 528 standardized interaction between the NVE and hypervisor may be 529 needed, for example in the case where the NVE resides on a separate 530 device from the VM. 532 In summary, there are three areas of potential work. The first area 533 concerns the oracle itself and any on-the-wire protocols it needs. A 534 second area concerns the interaction between the oracle and NVEs. 535 The third work area concerns protocols associated with attaching and 536 detaching a VM from a particular virtual network instance. All three 537 work areas are important to the development of a scalable, 538 interoperable solution. 540 4. Related Work 542 4.1. IEEE 802.1aq - Shortest Path Bridging 544 Shortest Path Bridging (SPB) is an IS-IS based overlay for L2 545 Ethernets. SPB supports multi-pathing and addresses a number of 546 shortcoming in the original Ethernet Spanning Tree Protocol. SPB-M 547 uses IEEE 802.1ah MAC-in-MAC encapsulation and supports a 24-bit 548 I-SID, which can be used to identify virtual network instances. SPB 549 is entirely L2 based, extending the L2 Ethernet bridging model. 551 4.2. ARMD 553 ARMD is chartered to look at data center scaling issues with a focus 554 on address resolution. ARMD is currently chartered to develop a 555 problem statement and is not currently developing solutions. While 556 an overlay-based approach may address some of the "pain points" that 557 have been raised in ARMD (e.g., better support for multi-tenancy), an 558 overlay approach may also push some of the L2 scaling concerns (e.g., 559 excessive flooding) to the IP level (flooding via IP multicast). 560 Analysis will be needed to understand the scaling tradeoffs of an 561 overlay based approach compared with existing approaches. On the 562 other hand, existing IP-based approaches such as proxy ARP may help 563 mitigate some concerns. 565 4.3. TRILL 567 TRILL is an L2-based approach aimed at improving deficiencies and 568 limitations with current Ethernet networks and STP in particular. 569 Although it differs from Shortest Path Bridging in many architectural 570 and implementation details, it is similar in that is provides an L2- 571 based service to end systems. TRILL as defined today, supports only 572 the standard (and limited) 12-bit VLAN model. Approaches to extend 573 TRILL to support more than 4094 VLANs are currently under 574 investigation [I-D.ietf-trill-fine-labeling] 576 4.4. L2VPNs 578 The IETF has specified a number of approaches for connecting L2 579 domains together as part of the L2VPN Working Group. That group, 580 however has historically been focused on Provider-provisioned L2 581 VPNs, where the service provider participates in management and 582 provisioning of the VPN. In addition, much of the target environment 583 for such deployments involves carrying L2 traffic over WANs. Overlay 584 approaches are intended be used within data centers where the overlay 585 network is managed by the data center operator, rather than by an 586 outside party. While overlays can run across the Internet as well, 587 they will extend well into the data center itself (e.g., up to and 588 including hypervisors) and include large numbers of machines within 589 the data center itself. 591 Other L2VPN approaches, such as L2TP [RFC2661] require significant 592 tunnel state at the encapsulating and decapsulating end points. 593 Overlays require less tunnel state than other approaches, which is 594 important to allow overlays to scale to hundreds of thousands of end 595 points. It is assumed that smaller switches (i.e., virtual switches 596 in hypervisors or the physical switches to which VMs connect) will be 597 part of the overlay network and be responsible for encapsulating and 598 decapsulating packets. 600 4.5. Proxy Mobile IP 602 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 603 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 605 4.6. LISP 607 LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where 608 the internal addresses are end station Identifiers and the outer IP 609 addresses represent the location of the end station within the core 610 IP network topology. The LISP overlay header uses a 24-bit Instance 611 ID used to support overlapping inner IP addresses. 613 4.7. Individual Submissions 615 Many individual submissions also look to addressing some or all of 616 the issues addressed in this draft. Examples of such drafts are 617 VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE 618 [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in 619 L3 networks[I-D.wkumari-dcops-l3-vmmobility]. 621 5. Further Work 623 It is believed that overlay-based approaches may be able to reduce 624 the overall amount of flooding and other multicast and broadcast 625 related traffic (e.g, ARP and ND) currently experienced within 626 current data centers with a large flat L2 network. Further analysis 627 is needed to characterize expected improvements. 629 6. Summary 631 This document has argued that network virtualization using L3 632 overlays addresses a number of issues being faced as data centers 633 scale in size. In addition, careful consideration of a number of 634 issues would lead to the development of interoperable implementation 635 of virtualization overlays. 637 Three potential work were identified. The first involves the 638 interaction that take place when a VM attaches or detaches from an 639 overlay. A second involves the protocol an NVE would use to 640 communicate with a backend "oracle" to learn and disseminate mapping 641 information about the VMs the NVE communicates with. The third 642 potential work area involves the backend oracle itself, i.e., how it 643 provides failover and how it interacts with oracles in other domains. 645 7. Acknowledgments 647 Helpful comments and improvements to this document have come from 648 Ariel Hendel, Vinit Jain, and Benson Schliesser. 650 8. IANA Considerations 652 This memo includes no request to IANA. 654 9. Security Considerations 656 TBD 658 10. Informative References 660 [I-D.ietf-l2vpn-evpn] 661 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 662 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 663 draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. 665 [I-D.ietf-lisp] 666 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 667 "Locator/ID Separation Protocol (LISP)", 668 draft-ietf-lisp-23 (work in progress), May 2012. 670 [I-D.ietf-trill-fine-labeling] 671 Eastlake, D., Zhang, M., Agarwal, P., Perlman, R., and D. 672 Dutt, "TRILL: Fine-Grained Labeling", 673 draft-ietf-trill-fine-labeling-01 (work in progress), 674 June 2012. 676 [I-D.kreeger-nvo3-overlay-cp] 677 Black, D., Dutt, D., Kreeger, L., Sridhavan, M., and T. 678 Narten, "Network Virtualization Overlay Control Protocol 679 Requirements", draft-kreeger-nvo3-overlay-cp-00 (work in 680 progress), January 2012. 682 [I-D.lasserre-nvo3-framework] 683 Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 684 Rekhter, "Framework for DC Network Virtualization", 685 draft-lasserre-nvo3-framework-03 (work in progress), 686 July 2012. 688 [I-D.mahalingam-dutt-dcops-vxlan] 689 Sridhar, T., Bursell, M., Kreeger, L., Dutt, D., Wright, 690 C., Mahalingam, M., Duda, K., and P. Agarwal, "VXLAN: A 691 Framework for Overlaying Virtualized Layer 2 Networks over 692 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-01 693 (work in progress), February 2012. 695 [I-D.raggarwa-data-center-mobility] 696 Aggarwal, R., Rekhter, Y., Henderickx, W., Shekhar, R., 697 and L. Fang, "Data Center Mobility based on BGP/MPLS, IP 698 Routing and NHRP", draft-raggarwa-data-center-mobility-03 699 (work in progress), June 2012. 701 [I-D.sridharan-virtualization-nvgre] 702 Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang, 703 Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P., 704 and C. Tumuluri, "NVGRE: Network Virtualization using 705 Generic Routing Encapsulation", 706 draft-sridharan-virtualization-nvgre-01 (work in 707 progress), July 2012. 709 [I-D.wkumari-dcops-l3-vmmobility] 710 Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 711 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in 712 progress), August 2011. 714 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 715 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 716 RFC 2661, August 1999. 718 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, "Encapsulating 719 MPLS in IP or Generic Routing Encapsulation (GRE)", 720 RFC 4023, March 2005. 722 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 723 Networks (VPNs)", RFC 4364, February 2006. 725 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 726 Specification", RFC 5036, October 2007. 728 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 729 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 731 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 732 Mobile IPv6", RFC 5844, May 2010. 734 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 735 "Generic Routing Encapsulation (GRE) Key Option for Proxy 736 Mobile IPv6", RFC 5845, June 2010. 738 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 739 Navali, "Generic Routing Encapsulation (GRE) Key Extension 740 for Mobile IPv4", RFC 6245, May 2011. 742 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 743 Ghanwani, "Routing Bridges (RBridges): Base Protocol 744 Specification", RFC 6325, July 2011. 746 [SPB] "IEEE P802.1aq/D4.5 Draft Standard for Local and 747 Metropolitan Area Networks -- Media Access Control (MAC) 748 Bridges and Virtual Bridged Local Area Networks, 749 Amendment 8: Shortest Path Bridging", February 2012. 751 Appendix A. Change Log 753 A.1. Changes from -01 755 1. Removed Section 4.2 (Standardization Issues) and Section 5 756 (Control Plane) as those are more appropriately covered in and 757 overlap with material in [I-D.lasserre-nvo3-framework] and 758 [I-D.kreeger-nvo3-overlay-cp]. 760 2. Expanded introduction and better explained terms such as tenant 761 and virtual network instance. These had been covered in a 762 section that has since been removed. 764 3. Added Section 3.3 "Overlay Networking Work Areas" to better 765 articulate the three separable work components (or "on-the-wire 766 protocols") where work is needed. 768 4. Added section on Shortest Path Bridging in Related Work section. 770 5. Revised some of the terminology to be consistent with 771 [I-D.lasserre-nvo3-framework] and [I-D.kreeger-nvo3-overlay-cp]. 773 A.2. Changes from -02 775 1. Numerous changes in response to discussions on the nvo3 mailing 776 list, with majority of changes in Section 2 (Problem Details) and 777 Section 3 (Network Overlays). Best to see diffs for specific 778 text changes. 780 Authors' Addresses 782 Thomas Narten (editor) 783 IBM 785 Email: narten@us.ibm.com 787 Murari Sridharan 788 Microsoft 790 Email: muraris@microsoft.com 792 Dinesh Dutt 794 Email: ddutt.ietf@hobbesdutt.com 796 David Black 797 EMC 799 Email: david.black@emc.com 800 Lawrence Kreeger 801 Cisco 803 Email: kreeger@cisco.com