idnits 2.17.1 draft-narten-nvo3-overlay-problem-statement-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 31, 2011) is 4562 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.hasmit-otv' is defined on line 674, but no explicit reference was found in the text == Unused Reference: 'RFC2890' is defined on line 718, but no explicit reference was found in the text == Outdated reference: A later version (-04) exists of draft-hasmit-otv-03 == Outdated reference: A later version (-12) exists of draft-ietf-6man-udpzero-04 == Outdated reference: A later version (-24) exists of draft-ietf-lisp-15 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-00 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-00 Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force T. Narten, Ed. 3 Internet-Draft IBM 4 Intended status: Informational M. Sridharan 5 Expires: May 3, 2012 Microsoft 6 D. Dutt 7 Cisco 8 D. Black 9 EMC 10 L. Kreeger 11 Cisco 12 October 31, 2011 14 Problem Statement: Overlays for Network Virtualization 15 draft-narten-nvo3-overlay-problem-statement-01 17 Abstract 19 This document describes issues associated with providing multi- 20 tenancy in large data center networks and an overlay-based network 21 virtualization approach to addressing them. A key multi-tenancy 22 requirement is traffic isolation, so that a tenant's traffic is not 23 visible to any other tenant. This isolation can be achieved by 24 assigning one or more virtual networks to each tenant such that 25 traffic within a virtual network is isolated from traffic in other 26 virtual networks. The primary functionality required is provisioning 27 virtual networks, associating a virtual machine's NIC with the 28 appropriate virtual network, and maintaining that association as the 29 virtual machine is activated, migrated and/or deactivated. Use of an 30 overlay-based approach enables scalable deployment on large network 31 infrastructures. 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on May 3, 2012. 50 Copyright Notice 52 Copyright (c) 2011 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Problem Details . . . . . . . . . . . . . . . . . . . . . . . 5 69 2.1. Multi-tenant Environment Scale . . . . . . . . . . . . . . 5 70 2.2. Virtual Machine Mobility Requirements . . . . . . . . . . 5 71 2.3. Span of Virtual Networks . . . . . . . . . . . . . . . . . 5 72 2.4. Inadequate Forwarding Table Sizes in Switches . . . . . . 6 73 2.5. Decoupling Logical and Physical Configuration . . . . . . 6 74 2.6. Support Communication Between VMs and Non-virtualized 75 Devices . . . . . . . . . . . . . . . . . . . . . . . . . 6 76 2.7. Overlay Design Characteristics . . . . . . . . . . . . . . 6 77 3. Defining Virtual Networks and Tenants . . . . . . . . . . . . 7 78 3.1. Limitations of Existing Virtual Network Models . . . . . . 8 79 3.2. Virtual Network Instance . . . . . . . . . . . . . . . . . 8 80 3.3. Tenant . . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 4. Network Overlays . . . . . . . . . . . . . . . . . . . . . . . 9 82 4.1. Benefits of an Overlay Approach . . . . . . . . . . . . . 10 83 4.2. Standardization Issues for Overlay Networks . . . . . . . 10 84 4.2.1. Overlay Header Format . . . . . . . . . . . . . . . . 10 85 4.2.2. Fragmentation . . . . . . . . . . . . . . . . . . . . 11 86 4.2.3. Checksums and FCS . . . . . . . . . . . . . . . . . . 11 87 4.2.4. Middlebox Traversal . . . . . . . . . . . . . . . . . 12 88 4.2.5. OAM . . . . . . . . . . . . . . . . . . . . . . . . . 12 89 5. Control Plane . . . . . . . . . . . . . . . . . . . . . . . . 12 90 5.1. Populating the Forwarding Table of a Virtual Network 91 Instance . . . . . . . . . . . . . . . . . . . . . . . . . 12 92 5.2. Handling Multi-destination Frames . . . . . . . . . . . . 13 93 5.3. Associating a VNID With An Endpoint . . . . . . . . . . . 13 94 5.4. Disassociating a VNID on Termination or Move . . . . . . . 13 95 6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 13 96 6.1. ARMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 97 6.2. TRILL . . . . . . . . . . . . . . . . . . . . . . . . . . 14 98 6.3. L2VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 99 6.4. Proxy Mobile IP . . . . . . . . . . . . . . . . . . . . . 14 100 6.5. LISP . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 101 6.6. Individual Submissions . . . . . . . . . . . . . . . . . . 15 102 7. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . 15 103 8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 104 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 15 105 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 106 11. Security Considerations . . . . . . . . . . . . . . . . . . . 15 107 12. Informative References . . . . . . . . . . . . . . . . . . . . 16 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 110 1. Introduction 112 Server virtualization is increasingly becoming the norm in data 113 centers. With server virtualization, each physical server supports 114 multiple virtual machines (VMs), each running its own operating 115 system, middleware and applications. Virtualization is a key enabler 116 of workload agility, i.e., allowing any server to host any 117 application and providing the flexibility of adding, shrinking, or 118 moving services within the physical infrastructure. Server 119 virtualization provides numerous benefits, including higher 120 utilization, increased data security, reduced user downtime, reduced 121 power usage, etc. 123 Large scale multi-tenant data centers are taking advantage of the 124 benefits of server virtualization to provide a new kind of hosting, a 125 virtual hosted data center. Multi-tenant data centers are ones in 126 which each tenant could belong to a different company (in the case of 127 a public provider) or a different department (in the case of a 128 internal company data center). Each tenant has the expectation of a 129 level of security and privacy separating their resources from those 130 of other tenants. Each virtual data center looks similar to its 131 physical counterpart, consisting of end stations connected by a 132 network, complete with services such as load balancers and firewalls. 133 The network within each virtual data center can be a pure routed 134 network, a pure bridged network or a combination of bridged and 135 routed network. The key requirement is that each such virtual 136 network is isolated from the others, whether the networks belong to 137 the same tenant or different tenants. 139 This document outlines the problems encountered in scaling the number 140 of isolated networks in a data center, as well as the problems of 141 managing the creation/deletion, membership and span of these networks 142 and makes the case that an overlay based approach, where individual 143 networks are implemented within individual virtual networks that are 144 dynamically controlled by a standardized control plane provides a 145 number of advantages over current approaches. The purpose of this 146 document is to identify the set of problems that any solution has to 147 address in building multi-tenant data centers. With this approach, 148 the goal is to allow the construction of standardized, interoperable 149 implementations to allow the construction of multi-tenant data 150 centers. 152 Section 2 describes the problem space details. Section 3 defines 153 virtual networks. Section 4 provides a general discussion of 154 overlays and standardization issues. Section 5 discusses the control 155 plane issues that require addressing for virtual networks. Section 6 156 and 7 discuss related work and further work. 158 2. Problem Details 160 The following subsections describe aspects of multi-tenant networking 161 that pose problems for large scale network infrastructure. Different 162 problem aspects may arise based on the network architecture and 163 scale. 165 2.1. Multi-tenant Environment Scale 167 Cloud computing involves on-demand elastic provisioning of resources 168 for multi-tenant environments. A common example of cloud computing 169 is the public cloud, where a cloud service provider offers these 170 elastic services to multiple customers over the same infrastructure. 171 This elastic on-demand nature in conjunction with trusted hypervisors 172 to control network access by VMs calls for resilient distributed 173 network control mechanisms. 175 2.2. Virtual Machine Mobility Requirements 177 A key benefit of server virtualization is virtual machine (VM) 178 mobility. A VM can be migrated from one server to another, live i.e. 179 as it continues to run and without shutting down the VM and 180 restarting it at a new location. A key requirement for live 181 migration is that a VM retain its IP address(es) and MAC address(es) 182 in its new location (to avoid tearing down existing communication). 183 Today, servers are assigned IP addresses based on their physical 184 location, typically based on the ToR (Top of Rack) switch for the 185 server rack or the VLAN configured to the server. This works well 186 for physical servers, which cannot move, but it restricts the 187 placement and movement of the more mobile VMs within the data center 188 (DC). Any solution for a scalable multi-tenant DC must allow a VM to 189 be placed (or moved to) anywhere within the data center, without 190 being constrained by the subnet boundary concerns of the host 191 servers. 193 2.3. Span of Virtual Networks 195 Another use case is cross pod expansion. A pod typically consists of 196 one or more racks of servers with its associated network and storage 197 connectivity. Tenants may start off on a pod and, due to expansion, 198 require servers/VMs on other pods, especially the case when tenants 199 on the other pods are not fully utilizing all their resources. This 200 use case requires that virtual networks span multiple pods in order 201 to provide connectivity to all of the tenant's servers/VMs. 203 2.4. Inadequate Forwarding Table Sizes in Switches 205 Today's virtualized environments place additional demands on the 206 forwarding tables of switches. Instead of just one link-layer 207 address per server, the switching infrastructure has to learn 208 addresses of the individual VMs (which could range in the 100s per 209 server). This is a requirement since traffic from/to the VMs to the 210 rest of the physical network will traverse the physical network 211 infrastructure. This places a much larger demand on the switches' 212 forwarding table capacity compared to non-virtualized environments, 213 causing more traffic to be flooded or dropped when the addresses in 214 use exceeds the forwarding table capacity. 216 2.5. Decoupling Logical and Physical Configuration 218 Data center operators must be able to achieve high utilization of 219 server and network capacity. For efficient and flexible allocation, 220 operators should be able to spread a virtual network instance across 221 servers in any rack in the data center. It should also be possible 222 to migrate compute workloads to any server anywhere in the network 223 while retaining the workload's addresses. This can be achieved today 224 by stretching VLANs (e.g., by using TRILL or OTV). 226 However, in order to limit the broadcast domain of each VLAN, multi- 227 destination frames within a VLAN should optimally flow only to those 228 devices that have that VLAN configured. When workloads migrate, the 229 physical network (e.g., access lists) may need to be reconfigured 230 which is typically time consuming and error prone. 232 2.6. Support Communication Between VMs and Non-virtualized Devices 234 Within data centers, not all communication will be between VMs. 235 Network operators will continue to use non-virtualized servers for 236 various reasons, traditional routers to provide L2VPN and L3VPN 237 services, traditional load balancers, firewalls, intrusion detection 238 engines and so on. Any virtual network solution should be capable of 239 working with these existing systems. 241 2.7. Overlay Design Characteristics 243 There are existing layer 2 overlay protocols in existence, but they 244 were not necessarily designed to solve the problem in the environment 245 of a highly virtualized data center. Below are some of the 246 characteristics of environments that must be taken into account by 247 the overlay technology: 249 1. Highly distributed systems. The overlay should work in an 250 environment where there could be many thousands of access 251 switches (e.g. residing within the hypervisors) and many more end 252 systems (e.g. VMs) connected to them. This leads to a 253 distributed mapping system that puts a low overhead on the 254 overlay tunnel endpoints. 256 2. Many highly distributed virtual networks with sparse 257 connectivity. Each virtual network could be highly dispersed 258 inside the data center. Also, along with expectation of many 259 virtual networks, the number of end systems connected to any one 260 virtual network is expected to be relatively low; Therefore, the 261 percentage of access switches participating in any given virtual 262 network would also be expected to be low. For this reason, 263 efficient pruning of multi-destination traffic should be taken 264 into consideration. 266 3. Highly dynamic end systems. End systems connected to virtual 267 networks can be very dynamic, both in terms of creation/deletion/ 268 power-on/off and in terms of mobility across the access switches. 270 4. Work with existing, widely deployed network Ethernet switches and 271 IP routers without requiring wholesale replacement. The first 272 hop switch that adds and removes the overlay header will require 273 new equipment and/or new software. 275 5. Network infrastructure administered by a single administrative 276 domain. This is consistent with operation within a data center, 277 and not across the Internet. 279 3. Defining Virtual Networks and Tenants 281 Virtual Networks are used to isolate a tenant's traffic from other 282 tenants (or even traffic within the same tenant that requires 283 isolation). There are two main characteristics of virtual networks: 285 1. Providing network address space that is isolated from other 286 virtual networks. The same network addresses may be used in 287 different virtual networks on the same underlying network 288 infrastructure. 290 2. Limiting the scope of frames to not exit a virtual network except 291 through controlled exit points or "gateways". 293 3.1. Limitations of Existing Virtual Network Models 295 Virtual networks are not new to networking. VLANs are a well known 296 construct in the networking industry. VLAN is a bridging construct 297 which provides the semantics of virtual networks mentioned above: a 298 MAC address is unique within a VLAN, but not necessarily across VLANs 299 and broadcast traffic is limited to the VLAN it originates from. In 300 the case of IP networks, routers have the concept of a Virtual 301 Routing and Forwarding (VRF). The same router can run multiple 302 instances of routing protocols, each with their own forwarding table. 303 Each instance is referred to as a VRF, which is a mechanism that 304 provides address isolation. Since broadcasts are never forwarded 305 across IP subnets, limiting broadcasts are not applicable to VRFs. 306 In the case of both VLAN and VRF, the forwarding table is looked up 307 using the tuple {VLAN, MAC address} or {VRF, IP address}. 309 But there are two problems with these constructs. VLANs are a pure 310 bridging construct while VRF is a pure routing construct. VLANs are 311 carried along with a frame to allow each forwarding point to know 312 what VLAN the frame belongs to. VLAN today is defined as a 12 bit 313 number, limiting the total number of VLANs to 4096 (though typically, 314 this number is 4094 since 0 and 4095 are reserved). Due to the large 315 number of tenants that a cloud provider might service, the 4094 VLAN 316 limit is often inadequate. In addition, there is often a need for 317 multiple VLANs per tenant, which exacerbates the issue. 319 There is no VRF indicator carried in frames. The VRF is derived at 320 each hop using a combination of incoming interface and some 321 information in the frame. Furthermore, the VRF model has typically 322 assumed that a separate control plane governs the population of the 323 forwarding table within that VRF. Thus, a traditional VRF model 324 assumes multiple, independent control planes and has no specific tag 325 within a frame to identify the VRF of the frame. 327 3.2. Virtual Network Instance 329 To overcome the limitations of a traditional VLAN or VRF model, we 330 define a new mechanism for virtual networks called a virtual network 331 instance. Each virtual network is assigned a virtual network 332 instance ID, shortened to VNID for convenience. A virtual network 333 instance provides the semantics of a virtual network: address 334 disambiguation and multi-destination frame scoping. A virtual 335 network can be either routed or bridged. So, a VNID can be used for 336 both bridged networks and routed networks and so is unlike a VLAN or 337 a VRF. To build large multi-tenant data centers, a larger number 338 space than the 12b VLAN is required. 24 bits is the most common value 339 identified by multiple solutions that attempt to address this problem 340 space (or similar problem spaces). To simplify the building and 341 administration of these large data centers, we require that the VNID 342 be carried with each frame (similar to a VLAN, but unlike a VRF). 343 Finally, because of the nature of a virtual data center and to allow 344 scaling virtual networks to massive scales, we don't require a 345 separate control plane to run for each virtual network. We'll 346 identify other possible mechanisms to populate the forwarding tables 347 for virtual networks in section 5.1. 349 3.3. Tenant 351 Tenant is the administrative entity that that is responsible for and 352 manages a specific virtual network and its associated services 353 (whether virtual or physical). In a cloud environment, a tenant 354 would correspond to the customer that has defined and is using a 355 particular virtual network. However, there is a one-to-many mapping 356 between tenants and virtual network instances. A single tenant may 357 operate multiple individual virtual networks, each associated with a 358 different service. 360 4. Network Overlays 362 To address the problems of decoupling physical and logical 363 configuration and allowing VM mobility without exploding the 364 forwarding table sizes in the switches and routers, a network overlay 365 model can be used. 367 The idea behind an overlay is quite straightforward. The original 368 frame is encapsulated by the first hop network device. The 369 encapsulation identifies the destination as the device that will 370 perform the decapsulation before delivering the frame to the 371 endpoint. The rest of the network forwards the frame based on the 372 encapsulation header and can be oblivious to the payload that is 373 carried inside. To avoid belaboring the point each time, the first 374 hop network device can be a traditional switch or router or the 375 virtual switch residing inside a hypervisor. Furthermore, the 376 endpoint can be a VM or it can be a physical server. Some examples 377 of network overlays are tunnels such as IP GRE [RFC2784], 378 LISP[I-D.ietf-lisp] or TRILL [RFC6325]. 380 With an overlay, the VNID can be carried within the overlay header so 381 that every frame has its VNID explicitly identified in the frame. 382 Since both routed and bridged semantics can be supported by a virtual 383 data center, the original frame carried within the overlay header can 384 be an Ethernet frame complete with MAC addresses or just the IP 385 packet. 387 4.1. Benefits of an Overlay Approach 389 The use of a large (e.g., 24-bit) VNID would allow 16 million 390 distinct virtual networks within a single data center, eliminating 391 current VLAN size limitations. This VNID needs to be carried in the 392 data plane along with the packet. Adding an overlay header provides 393 a place to carry this VNID. 395 A key aspect of overlays is the decoupling of the "virtual" MAC and 396 IP addresses used by VMs from the physical network infrastructure and 397 the infrastructure IP addresses used by the data center. If a VM 398 changes location, the switches at the edge of the overlay simply 399 update their mapping tables to reflect the new location of the VM 400 within the data center's infrastructure space. Because an overlay 401 network is used, a VM can now be located anywhere in the data center 402 that the overlay reaches without regards to traditional constraints 403 implied by L2 properties such as VLAN numbering, or the span of an L2 404 broadcast domain scoped to a single pod or access switch. 406 Multi-tenancy is supported by isolating the traffic of one virtual 407 network instance from traffic of another. Traffic from one virtual 408 network instance cannot be delivered to another instance without 409 (conceptually) exiting the instance and entering the other instance 410 via an entity that has connectivity to both virtual network 411 instances. Without the existence of this entity, tenant traffic 412 remains isolated within each individual virtual network instance. 413 External communications (from a VM within a virtual network instance 414 to a machine outside of any virtual network instance, e.g. on the 415 Internet) is handled by having an ingress switch forward traffic to 416 an external router, where an egress switch decapsulates a tunneled 417 packet and delivers it to the router for normal processing. This 418 router is external to the overlay, and behaves much like existing 419 external facing routers in data centers today. 421 Overlays are designed to allow a set of VMs to be placed within a 422 single virtual network instance, whether that virtual network 423 provides the bridged network or a routed network. 425 4.2. Standardization Issues for Overlay Networks 427 4.2.1. Overlay Header Format 429 Different overlay header formats are possible as are different 430 possible encodings of the VNID. Existing overlay headers maybe 431 extended or new ones defined. This document does not address the 432 exact header format or VNID encoding except to state that any 433 solution MUST: 435 1. Carry the VNID in each frame 437 2. Allow the payload to be either a complete Ethernet frame or only 438 an IP packet 440 4.2.2. Fragmentation 442 Whenever tunneling is used, one faces the potential problem that the 443 packet plus the encapsulation overhead will exceed the MTU of the 444 path to the egress router. If the outer encapsulation is IP, 445 fragmentation could be left to the IP layer, or it could be done at 446 the overlay level in a more optimized fashion that is independent of 447 the overlay encapsulation header, or it could be left out altogether, 448 if it is believed that data center networks can be engineered to 449 prevent MTU issues from arising. 451 Related to fragmentation is the question of how best to handle Path 452 MTU issues, should they occur. Ideally, the original source of any 453 packet (i.e, the sending VM) would be notified of the optimal MTU to 454 use. Path MTU problems occurring within an overlay network would 455 result in ICMP MTU exceeded messages being sent back to the egress 456 tunnel switch at the entry point of the overlay. If the switch is 457 embedded within a hypervisor, the hypervisor could notify the VM of a 458 more appropriate MTU to use. It may be appropriate to specify a set 459 of best practices for implementers related to the handling of Path 460 MTU issues. 462 4.2.3. Checksums and FCS 464 When tunneling packets, both the inner and outer headers could have 465 their own checksum, duplicating effort and impacting performance. 466 Therefore, we strongly recommend that any solution carry only one set 467 of checksum or frame FCS. 469 When the inner packet is TCP or UDP, they already include their own 470 checksum, and adding a second outer checksum (using the same 1's 471 complement algorithm) provides little value. Similarly, if the inner 472 packet is an Ethernet frame, the frame FCS protects the original 473 frame and a new frame FCS over both the original frame and the 474 overlay header protects the new encapsulated frame. 476 In IPv4, UDP checksums can be disabled on a per-packet basis simply 477 by setting the checksum field to zero. IPv6, however, specifies that 478 UDP checksums must always be included. But even for IPv6, the LISP 479 protocol[I-D.ietf-lisp] already allows a zero checksum field. The 480 6man working group is also currently considering relaxing the IPv6 481 UDP checksum requirement [I-D.ietf-6man-udpzero]. 483 For Ethernet frames, L2 overlays such as TRILL already mandate only a 484 single frame FCS. 486 4.2.4. Middlebox Traversal 488 One issue to consider is to whether the overlay will need to run over 489 networks that include middleboxes such as NAT. Middleboxes may have 490 difficulty properly supporting multicast or other aspects of an 491 overlay header. Inside a data center, it may well be the case that 492 middlebox traversal is a non-issue. But if overlays are extended 493 across the broader Internet, the presence of middleboxes may be of 494 concern. 496 4.2.5. OAM 498 Successful deployment of an overlay approach will likely require 499 appropriate Operations, Administration and Maintenance (OAM) 500 facilities. 502 5. Control Plane 504 The control plane needs to address the following pieces, at least: 506 1. A mechanism to populate the forwarding table of a virtual network 507 instance. 509 2. A mechanism to handle multi-destination frames within a virtual 510 network instance. 512 3. A mechanism to allow an endpoint to inform the access switch 513 which virtual network instance it wishes to join on a virtual 514 network interface. 516 4. A mechanism to allow an endpoint to inform the access switch 517 about its leaving the network so that the access switch can clean 518 up state. 520 5.1. Populating the Forwarding Table of a Virtual Network Instance 522 When an access switch has to forward a frame from one endpoint to 523 another, across the network, it has to consult some form of a 524 forwarding table. When we use network overlays, the problem boils 525 down to deriving the mapping between the inner and outer addresses 526 i.e. deriving the destination address in the overlay header based on 527 the destination address sent by the endpoint. Two well known 528 mechanisms for populating the forwarding table (or deriving the 529 mapping table) of a switch are (i) via a routing control protocol and 530 (ii) learning from the data plane as Ethernet bridges do. Another 531 mechanism is through a centralized mapping database. Any solution 532 must avoid problems associated with scaling a virtual network 533 instance across a large data center. 535 5.2. Handling Multi-destination Frames 537 Another aspect of address mapping concerns the handling of multi- 538 destination frames, i.e. broadcast and multicast frames, or the 539 delivery of unicast packets when no mapping exists. Associating a 540 infrastructure multicast address is one possible way of connecting 541 together all the machines belonging to the same VNID. However, 542 existing multicast implementations do not scale to efficiently handle 543 hundreds of thousands of multicast groups, as would be required if 544 one multicast group were assigned to each VNID. 546 5.3. Associating a VNID With An Endpoint 548 When an endpoint, such as VM or physical server, connects to the 549 infrastructure, we must define a mechanism to allow the endpoint to 550 identify to the access switch the network instance that it wishes to 551 join. Typically, it is a virtual NIC (the one connected to the VM) 552 coming up that triggers this association. The access switch can then 553 determine the VNID to be associated with this virtual NIC. A 554 standard protocol that all types of overlay encapsulation points can 555 use to identify the VNID associated with an endpoint will be 556 beneficial for supporting multi-vendor implementations. This 557 protocol could also be used to distribute any per virtual network 558 information (e.g. a multicast group address). This signaling can 559 provide the stimulus to trigger the overlay termination points to 560 perform any actions needed within the infrastructure network (e.g. 561 use IGMP to join a multicast group). 563 5.4. Disassociating a VNID on Termination or Move 565 To enable cleaning up state in the access switch, we must define a 566 mechanism to allow an endpoint to signal its disconnection from the 567 network. 569 6. Related Work 571 6.1. ARMD 573 ARMD is chartered to look at data center scaling issues with a focus 574 on address resolution. ARMD is currently chartered to develop a 575 problem statement and is not currently developing solutions. While 576 an overlay-based approach may address some of the "pain points" that 577 have been raised in ARMD (e.g., better support for multi-tenancy), an 578 overlay approach may also push some of the L2 scaling concerns (e.g., 579 excessive flooding) to the IP level (flooding via IP multicast). 580 Analysis will be needed to understand the scaling trade offs of an 581 overlay based approach compared with existing approaches. On the 582 other hand, existing IP-based approaches such as proxy ARP may help 583 mitigate some concerns. 585 6.2. TRILL 587 TRILL is an L2 based approach aimed at improving deficiencies and 588 limitations with current Ethernet networks. Approaches to extend 589 TRILL to support more than 4094 VLANs are currently under 590 investigation [I-D.eastlake-trill-rbridge-fine-labeling] 592 6.3. L2VPNs 594 The IETF has specified a number of approaches for connecting L2 595 domains together as part of the L2VPN Working Group. That group, 596 however has historically been focused on Provider-provisioned L2 597 VPNs, where the service provider participates in management and 598 provisioning of the VPN. In addition, much of the target environment 599 for such deployments involves carrying L2 traffic over WANs. Overlay 600 approaches are intended be used within data centers where the overlay 601 network is managed by the data center operator, rather than by an 602 outside party. While overlays can run across the Internet as well, 603 they will extend well into the data center itself (e.g., up to and 604 including hypervisors) and include large numbers of machines within 605 the data center itself. 607 Other L2VPN approaches, such as L2TP [RFC2661] require significant 608 tunnel state at the encapsulating and decapsulating end points. 609 Overlays require less tunnel state than other approaches, which is 610 important to allow overlays to scale to hundreds of thousands of end 611 points. It is assumed that smaller switches (i.e., virtual switches 612 in hypervisors or the physical switches to which VMs connect) will be 613 part of the overlay network and be responsible for encapsulating and 614 decapsulating packets. 616 6.4. Proxy Mobile IP 618 Proxy Mobile IP [RFC5213] [RFC5844] makes use of the GRE Key Field 619 [RFC5845] [RFC6245], but not in a way that supports multi-tenancy. 621 6.5. LISP 623 LISP[I-D.ietf-lisp] essentially provides an IP over IP overlay where 624 the internal addresses are end station Identifiers and the outer IP 625 addresses represent the location of the end station within the core 626 IP network topology. The LISP overlay header uses a 24 bit Instance 627 ID used to support overlapping inner IP addresses. 629 6.6. Individual Submissions 631 Many individual submissions also look to addressing some or all of 632 the issues addressed in this draft. Examples of such drafts are 633 VXLAN [I-D.mahalingam-dutt-dcops-vxlan], NVGRE 634 [I-D.sridharan-virtualization-nvgre] and Virtual Machine Mobility in 635 L3 networks[I-D.wkumari-dcops-l3-vmmobility]. 637 7. Further Work 639 It is believed that overlay-based approaches may be able to reduce 640 the overall amount of flooding and other multicast and broadcast 641 related traffic (e.g, ARP and ND) currently experienced within 642 current data centers with a large flat L2 network. Further analysis 643 is needed to characterize expected improvements. 645 8. Summary 647 This document has argued that network virtualization using L3 648 overlays addresses a number of issues being faced as data centers 649 scale in size. In addition, careful consideration of a number of 650 issues would lead to the development of interoperable implementation 651 of virtualization overlays. 653 9. Acknowledgments 655 Helpful comments and improvements to this document have come from 656 Ariel Hendel, Vinit Jain, and Benson Schliesser. 658 10. IANA Considerations 660 This memo includes no request to IANA. 662 11. Security Considerations 664 TBD 666 12. Informative References 668 [I-D.eastlake-trill-rbridge-fine-labeling] 669 Eastlake, D., Zhang, M., Agarwal, P., Dutt, D., and R. 670 Perlman, "RBridges: Fine-Grained Labeling", 671 draft-eastlake-trill-rbridge-fine-labeling-02 (work in 672 progress), October 2011. 674 [I-D.hasmit-otv] 675 Grover, H., Rao, D., Farinacci, D., and V. Moreno, 676 "Overlay Transport Virtualization", draft-hasmit-otv-03 677 (work in progress), July 2011. 679 [I-D.ietf-6man-udpzero] 680 Fairhurst, G. and M. Westerlund, "IPv6 UDP Checksum 681 Considerations", draft-ietf-6man-udpzero-04 (work in 682 progress), October 2011. 684 [I-D.ietf-lisp] 685 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 686 "Locator/ID Separation Protocol (LISP)", 687 draft-ietf-lisp-15 (work in progress), July 2011. 689 [I-D.mahalingam-dutt-dcops-vxlan] 690 Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 691 L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A 692 Framework for Overlaying Virtualized Layer 2 Networks over 693 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-00 694 (work in progress), August 2011. 696 [I-D.sridharan-virtualization-nvgre] 697 Sridharan, M., Duda, K., Ganga, I., Greenberg, A., Lin, 698 G., Pearson, M., Thaler, P., Tumuluri, C., Venkataramaiah, 699 N., and Y. Wang, "NVGRE: Network Virtualization using 700 Generic Routing Encapsulation", 701 draft-sridharan-virtualization-nvgre-00 (work in 702 progress), September 2011. 704 [I-D.wkumari-dcops-l3-vmmobility] 705 Kumari, W. and J. Halpern, "Virtual Machine mobility in L3 706 Networks.", draft-wkumari-dcops-l3-vmmobility-00 (work in 707 progress), August 2011. 709 [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, 710 G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", 711 RFC 2661, August 1999. 713 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 715 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 716 March 2000. 718 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 719 RFC 2890, September 2000. 721 [RFC5213] Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., 722 and B. Patil, "Proxy Mobile IPv6", RFC 5213, August 2008. 724 [RFC5844] Wakikawa, R. and S. Gundavelli, "IPv4 Support for Proxy 725 Mobile IPv6", RFC 5844, May 2010. 727 [RFC5845] Muhanna, A., Khalil, M., Gundavelli, S., and K. Leung, 728 "Generic Routing Encapsulation (GRE) Key Option for Proxy 729 Mobile IPv6", RFC 5845, June 2010. 731 [RFC6245] Yegani, P., Leung, K., Lior, A., Chowdhury, K., and J. 732 Navali, "Generic Routing Encapsulation (GRE) Key Extension 733 for Mobile IPv4", RFC 6245, May 2011. 735 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 736 Ghanwani, "Routing Bridges (RBridges): Base Protocol 737 Specification", RFC 6325, July 2011. 739 Authors' Addresses 741 Thomas Narten (editor) 742 IBM 744 Email: narten@us.ibm.com 746 Murari Sridharan 747 Microsoft 749 Email: muraris@microsoft.com 751 Dinesh Dutt 752 Cisco 754 Email: ddutt@cisco.com 755 David Black 756 EMC 758 Email: david.black@emc.com 760 Lawrence Kreeger 761 Cisco 763 Email: kreeger@cisco.com