idnits 2.17.1 draft-bitar-datacenter-vpn-applicability-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 18, 2012) is 4361 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC 4719' is mentioned on line 882, but not defined == Unused Reference: 'RFC4761' is defined on line 1339, but no explicit reference was found in the text == Unused Reference: 'RFC4762' is defined on line 1343, but no explicit reference was found in the text == Unused Reference: 'RFC4719' is defined on line 1398, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-l2vpn-pbb-vpls-pe-model-04 == Outdated reference: A later version (-06) exists of draft-ietf-l2vpn-pbb-vpls-interop-02 == Outdated reference: A later version (-03) exists of draft-sajassi-l2vpn-pbb-evpn-02 == Outdated reference: A later version (-07) exists of draft-raggarwa-data-center-mobility-01 == Outdated reference: A later version (-02) exists of draft-eastlake-trill-rbridge-fine-labeling-01 Summary: 0 errors (**), 0 flaws (~~), 11 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 L2VPN Working Group Nabil Bitar 2 Internet Draft Verizon 3 Intended status: Informational 4 Expires: November 2012 Florin Balus 5 Marc Lasserre 6 Wim Henderickx 7 Alcatel-Lucent 9 Ali Sajassi 10 Luyuan Fang 11 Cisco 13 Yuichi Ikejiri 14 NTT Communications 16 Mircea Pisica 17 BT 19 May 18, 2012 21 Cloud Networking: Framework and VPN Applicability 22 draft-bitar-datacenter-vpn-applicability-02.txt 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF), its areas, and its working groups. Note that 31 other groups may also distribute working documents as Internet- 32 Drafts. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 The list of current Internet-Drafts can be accessed at 40 http://www.ietf.org/ietf/1id-abstracts.txt 42 The list of Internet-Draft Shadow Directories can be accessed at 43 http://www.ietf.org/shadow.html 45 This Internet-Draft will expire on November 18, 2012. 47 Copyright Notice 49 Copyright (c) 2012 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. 59 Abstract 61 Cloud Computing has been attracting a lot of attention from the 62 networking industry. Some of the most publicized requirements are 63 related to the evolution of the Cloud Networking Infrastructure to 64 accommodate a large number of tenants, efficient network utilization, 65 scalable loop avoidance, and Virtual Machine Mobility. 67 This draft describes a framework for cloud networking, highlighting 68 the applicability of existing work in various IETF Working Groups 69 (e.g., RFCs and drafts developed in IETF L2VPN and L3VPN Working 70 Groups) to cloud networking, and the gaps and problems that need to 71 be further addressed. That is, the goal is to understand what may be 72 re-used from the current protocols and call out requirements specific 73 to the Cloud space that need to be addressed by new standardization 74 work with proposed solutions in certain cases. 76 Table of Contents 78 1. Introduction...................................................3 79 2. General terminology............................................4 80 2.1. Conventions used in this document.........................5 81 3. Brief overview of Ethernet, L2VPN and L3VPN deployments........5 82 4. Cloud Networking Framework.....................................6 83 5. DC problem statement...........................................9 84 5.1. VLAN Space................................................9 85 5.2. MAC, IP, ARP Explosion...................................10 86 5.3. Per VLAN flood containment...............................11 87 5.4. Convergence and multipath support........................12 88 5.5. Optimal traffic forwarding...............................12 89 5.6. Efficient multicast support..............................14 90 5.7. Connectivity to existing VPN sites.......................14 91 5.8. DC Inter-connect requirements............................15 92 5.9. L3 virtualization considerations.........................15 93 5.10. VM Mobility requirements................................15 94 6. L2VPN Applicability to Cloud Networking.......................16 95 6.1. VLANs and L2VPN toolset..................................16 96 6.2. PBB and L2VPN toolset....................................17 97 6.2.1. Addressing VLAN space exhaustion and MAC explosion..18 98 6.2.2. Fast convergence, L2 multi-pathing..................19 99 6.2.3. Per ISID flood containment..........................20 100 6.2.4. Efficient multicast support.........................20 101 6.2.5. Tunneling options for PBB ELAN: Ethernet, IP, MPLS..20 102 6.2.6. Use Case examples...................................20 103 6.2.6.1. PBBN in DC, L2 VPN in DC GW....................20 104 6.2.6.2. PBBN in VSw, L2VPN in the ToR..................22 105 6.2.7. Connectivity to existing VPN sites and Internet.....23 106 6.2.8. DC Interconnect.....................................25 107 6.2.9. Interoperating with existing DC VLANs...............25 108 6.3. TRILL and L2VPN toolset..................................27 109 7. L3VPN applicability to Cloud Networking.......................28 110 8. Solutions for other DC challenges.............................29 111 8.1. Addressing IP/ARP explosion..............................29 112 8.2. Optimal traffic forwarding...............................29 113 8.3. VM Mobility..............................................29 114 9. Security Considerations.......................................30 115 10. IANA Considerations..........................................30 116 11. References...................................................30 117 11.1. Normative References....................................30 118 11.2. Informative References..................................31 119 12. Acknowledgments..............................................32 121 1. Introduction 123 The initial Data Center (DC) networks were built to address the needs 124 of individual enterprises and/or individual applications. Ethernet 125 VLANs and regular IP routing are used to provide connectivity between 126 compute, storage resources and the related customer sites. 128 The virtualization of compute resources in a DC environment provides 129 the foundation for selling compute and storage resources to multiple 130 customers, or selling application services to multiple customers. For 131 example, a customer may buy a group of Virtual Machines (VMs) that 132 may reside on server blades distributed throughout a DC or across 133 DCs. In this latter case, the DCs may be owned and operated by a 134 cloud service provider connected to one or more network service 135 providers, two or more cloud service providers each connected to one 136 or more network service providers, or a hybrid of DCs operated by the 137 customer and the cloud service provider(s). In addition, multiple 138 customers may be assigned resources on the same compute and storage 139 hardware. 141 In order to provide access for multiple customers to the virtualized 142 compute and storage resources, the DC network and DC interconnect 143 have to evolve from the basic VLAN and IP routing architecture to 144 provide equivalent connectivity virtualization at a large scale. 146 This document describes in separate sections existing DC networking 147 architecture, challenges faced by existing DC network models, and the 148 applicability of VPN technologies to address such challenges. In 149 addition, challenges not addressed by existing solutions are called 150 out to describe the problem or to suggest solutions. 152 2. General terminology 154 Some general terminology is defined here; most of the terminology 155 used is from [802.1ah] and [RFC4026]. Terminology specific to this 156 memo is introduced as needed in later sections. 158 DC: Data Center 160 ELAN: MEF ELAN, multipoint to multipoint Ethernet service 162 EVPN: Ethernet VPN as defined in [EVPN] 164 PBB: Provider Backbone Bridging, new Ethernet encapsulation designed 165 to address VLAN exhaustion and MAC explosion issues; specified in 166 IEEE 802.1ah [802.1ah] 168 PBB-EVPN: defines how EVPN can be used to transport PBB frames 170 BMAC: Backbone MACs, the backbone source or destination MAC address 171 fields defined in the 802.1ah provider MAC encapsulation header. 173 CMAC: Customer MACs, the customer source or destination MAC address 174 fields defined in the 802.1ah customer MAC encapsulation header. 176 BEB: A backbone edge bridge positioned at the edge of a provider 177 backbone bridged network. It is usually the point in the network 178 where PBB encapsulation is added or removed from the frame. 180 BCB: A backbone core bridge positioned in the core of a provider 181 backbone bridged network. It performs regular Ethernet switching 182 using the outer Ethernet header. 184 2.1. Conventions used in this document 186 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 187 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 188 document are to be interpreted as described in RFC-2119 [RFC2119]. 190 In this document, these words will appear with that interpretation 191 only when in ALL CAPS. Lower case uses of these words are not to be 192 interpreted as carrying RFC-2119 significance. 194 3. Brief overview of Ethernet, L2VPN and L3VPN deployments 196 Initial Ethernet networks have been deployed in LAN environments, 197 where the total number of hosts (hence MAC addresses) to manage was 198 limited. Physical Ethernet topologies in LANs were pretty simple. 199 Hence, a simple loop resolution protocol such as the Spanning Tree 200 Protocol was sufficient in the early days. Efficient utilisation of 201 physical links was not a major concern in LANs, while at the same 202 time leveraging existing and mature technologies. 204 As more hosts got connected to a LAN, or the need arose to create 205 multiple LANs on the same physical infrastructure, it became 206 necessary to partition the physical topology into multiple Virtual 207 LANs (VLANs). STP evolved to cope with multiple VLANs with Multiple- 208 STP (MSTP). Bridges/Switches evolved to learn behind which VLAN 209 specific MACs resided, a process known as qualified learning. As 210 Ethernet LANs moved into the provider space, the 12-bit VLAN space 211 limitation (i.e. a total of 4k VLANs) led to Q-in-Q and later to 212 Provider backbone Bridging (PBB). 214 With PBB, not only can over 16M virtual LAN instances (24-bit Service 215 I-SID) be supported, but a clean separation between customer and 216 provider domains has been defined with separate MAC address spaces 217 (Customer-MACs (CMACs) versus Provider Backbone-MACs (BMACs)). CMACs 218 are only learned at the edge of the PBB network on PBB Backbone Edge 219 Bridges (BEBs) in the context of an I-component while only B-MACs are 220 learnt by PBB Backbone Core Bridges (BCBs). This results in BEB 221 switches creating MAC-in-MAC tunnels to carry customer traffic, 222 thereby hiding C-MACs in the core. 224 In the meantime, interconnecting L2 domains across geographical areas 225 has become a necessity. VPN technologies have been defined to carry 226 both L2 and L3 traffic across IP/MPLS core networks. The same 227 technologies could also be used within the same data center to 228 provide for scale or for interconnecting services across L3 domains, 229 as needed. Virtual Private LAN Service (VPLS) has been playing a key 230 role to provide transparent LAN services over IP/MPLS WANs while IP 231 VPNs, including BGP/MPLS IP VPNs and IPsec VPNs, have been used to 232 provide virtual IP routing instances over a common IP/MPLS core 233 network. 235 All these technologies have been combined to maximize their 236 respective benefits. At the edge of the network, such as in access 237 networks, VLAN and PBB are commonly used technologies. Aggregation 238 networks typically use VPLS or BGP/MPLS IP VPNs to groom traffic on a 239 common IP/MPLS core. 241 It should be noted that Ethernet has kept evolving because of its 242 attractive features, specifically its auto-discovery capabilities and 243 the ability of hosts to physically relocate on the same LAN without 244 requiring renumbering. In addition, Ethernet switches have become 245 commodity, creating a financial incentive for interconnecting hosts 246 in the same community with Ethernet switches. The network layer 247 (layer3), on the other hand, has become pre-dominantly IP. Thus, 248 communication across LANs uses IP routing. 250 4. Cloud Networking Framework 252 A generic architecture for Cloud Networking is depicted in Figure 1: 254 ,---------. 255 ,' `. 256 ( IP/MPLS ) 257 `. ,' 258 `-+------+' 259 +--+--+ +-+---+ 260 | GW |+-+| GW | 261 +-+---+ +-----+ 262 / \ 263 +----+---+ +---+-----+ 264 | Core | | Core | 265 | SW/Rtr | | SW/Rtr | 266 +-+----`.+ +-+---+---+ 267 / \ .' \ 268 +---+--+ +-`.+--+ +--+----+ 269 | ToR | | ToR | | ToR | 270 +-+--`.+ +-+-`.-+ +-+--+--+ 271 .' \ .' \ .' `. 272 __/_ _i./ i./_ _\__ 273 :VSw : :VSw : :VSw : :VSw : 274 '----' '----' '----' '----' 276 Figure 1 : A Generic Architecture for Cloud Networking 278 A cloud network is composed of intra-Data Center (DC) networks and 279 network services, and inter-DC network connectivity. DCs may belong 280 to a cloud service provider connected to one or more network service 281 providers, different cloud service providers each connected to one or 282 more network service providers, or a hybrid of DCs operated by the 283 enterprise customers and the cloud service provider(s). It may also 284 provide access to the public and/or enterprise customers. 286 The following network components are present in a DC: 288 - VSw or virtual switch - software based Ethernet switch running 289 inside the server blades. VSw may be single or dual-homed to 290 the Top of Rack switches (ToRs). The individual VMs appear to a 291 VSw as IP hosts connected via logical interfaces. The VSw may 292 evolve to support IP routing functionality. 294 - ToR or Top of Rack - hardware-based Ethernet switch aggregating 295 all Ethernet links from the server blades in a rack 296 representing the entry point in the physical DC network for the 297 hosts. ToRs may also perform routing functionality. ToRs are 298 usually dual-homed to the Core SW. Other deployment scenarios 299 may use an EoR (End of Row) switch to provide similar function 300 as a ToR. 302 - Core SW (switch) - high capacity core node aggregating multiple 303 ToRs. This is usually a cost effective Ethernet switch. Core 304 switches can also support routing capabilities. 306 - DC GW - gateway to the outside world providing DC Interconnect 307 and connectivity to Internet and VPN customers. In the current 308 DC network model, this may be a Router with Virtual Routing 309 capabilities and/or an IPVPN/L2VPN PE. 311 A DC network also contains other network services, such as firewalls, 312 load-balancers, IPsec gateways, and SSL acceleration gateways. These 313 network services are not currently discussed in this draft as the 314 focus is on the routing and switching services. The usual DC 315 deployment employs VLANs to isolate different VM groups throughout 316 the Ethernet switching network within a DC. The VM Groups are mapped 317 to VLANs in the VSws. The ToRs and Core SWs may employ VLAN trunking 318 to eliminate provisioning touches in the DC network. In some 319 scenarios, IP routing is extended down to the ToRs, and may be 320 further extended to the hypervisor. 322 Any new DC and cloud networking technology needs to be able to fit as 323 seamlessly as possible with this existing DC model, at least in a 324 non-greenfield environment. In particular, it should be possible to 325 introduce enhancements to various tiers in this model in a phased 326 approach without disrupting the other elements. 328 Depending upon the scale, DC distribution, operations model, Capex 329 and Opex aspects, DC switching elements can act as strict L2 switches 330 and/or provide IP routing capabilities, including VPN routing and/or 331 MPLS support. In smaller DCs, it is likely that some tier layers will 332 be collapsed, and that Internet connectivity, inter-DC connectivity 333 and VPN support will be handled by Core Nodes which perform the DC GW 334 role. 336 The DC network architecture described in this section can be used to 337 provide generic L2-L3 service connectivity to each tenant as depicted 338 in Figure 2: 340 ,--+-'. ;-`.--. 341 ..... VRF1 )...... . VRF2 ) 342 | '-----' | '-----' 343 | Tenant1 |ELAN12 Tenant1| 344 |ELAN11 ....|........ |ELAN13 346 '':'''''''':' | | '':'''''''':' 347 ,'. ,'. ,+. ,+. ,'. ,'. 348 (VM )....(VM ) (VM )... (VM ) (VM )....(VM ) 349 `-' `-' `-' `-' `-' `-' 351 Figure 2 : Logical Service connectivity for one tenant 353 In this example one or more virtual routing contexts distributed on 354 multiple DC GWs and one or more ELANs (e.g., one per Application) 355 running on DC switches are assigned for DC tenant 1. ELAN is a 356 generic term for Ethernet multipoint service, which in the current DC 357 environment is implemented using 12-bit VLAN tags. Other possible 358 ELAN technologies are discussed in section 6. 360 For a multi-tenant DC, this type of service connectivity or a 361 variation could be used for each tenant. In some cases only L2 362 connectivity is required, i.e., only an ELAN may be used to 363 interconnect VMs and customer sites. 365 5. DC problem statement 367 This section summarizes the challenges faced with the present mode of 368 operation described in the previous section and implicitly describes 369 the requirements for next generation DC network. 371 With the introduction of Compute virtualization, the DC network must 372 support multiple customers or tenants that need access to their 373 respective computing and storage resources in addition to making some 374 aspect of the service available to other businesses in a B-to-B model 375 or to the public. Every tenant requires service connectivity to its 376 own resources with secure separation from other tenant domains. 377 Connectivity needs to support various deployment models, including 378 interconnecting customer-hosted data center resources to cloud 379 service provider hosted resources (Virtualized DC for the customer). 380 This connectivity may be at layer2 or layer3. 382 Currently, large DCs are often built on a service architecture where 383 VLANs configured in Ethernet edge and core switches are 384 interconnected by IP routing running in a few centralized routers. 385 There may be some cases though where IP routing might be used in the 386 core nodes or even in the TORs inside a DC. 388 5.1. VLAN Space 390 Existing DC deployments provide customer separation and flood 391 containment, including support for DC infrastructure 392 interconnectivity, using Ethernet VLANs. A 12-bit VLAN tag provides 393 support for a maximum of 4K VLANs. 395 4K VLANs are inadequate for a Cloud Provider looking to expand its 396 customer base. For example, there are a number of VPN deployments 397 (VPLS and IP VPN) which serve more than 20K customers, each requiring 398 multiple VLANs. Thus, 4K VLANs will likely support less than 4K 399 customers. 401 The cloud networking infrastructure needs to provide support for a 402 much bigger number of virtual L2 domains. 404 5.2. MAC, IP, ARP Explosion 406 Virtual Machines are the basic compute blocks being sold to Cloud 407 customers. Every server blade supports today 16-40 VMs with 100 or 408 more VMs per server blade coming in the near future. Every VM may 409 have multiple interfaces for provider and enterprise management, VM 410 mobility and tenant access, each with its own MAC and IP addresses. 411 For a sizable DC, this may translate into millions of VM IP and MAC 412 addresses. From a cloud network viewpoint, this scale number will be 413 an order of magnitude higher. 415 Supporting this amount of IP and MAC addresses, including the 416 associated dynamic behavior (e.g., ARP), throughout the DC Ethernet 417 switches and routers is very challenging in an Ethernet VLAN and 418 regular routing environment. Core Ethernet switches running Ethernet 419 VLANs learn the MAC addresses for every single VM interface that 420 sends traffic through that switch. Throwing memory to increase the 421 MAC Forwarding DataBase (FDB) size affects the cost of these 422 switches. In addition, as the number of of MACs that switches need to 423 learn increases, convergence time could increase, and flooding 424 activity will increase upon a topology change as the core switches 425 flush and re-learn the MAC addresses. Simple operational mistakes may 426 lead to duplicate MAC entries within the same VLAN domain and 427 security issues due to administrative MAC assignment used today for 428 VM interfaces. Similar concerns about memory requirements and related 429 cost apply to DC Edge switches (ToRs/EoRs) and DC GWs. 431 From a router perspective, it is important to maximize the 432 utilization of available resources in both control and data planes 433 through flexible mapping of VMs and related VLANs to routing 434 interfaces. This is not easily done in the current VLAN based 435 deployment environment where the use of VLAN trunking limits the 436 allocation of VMs to only local routers. 438 The amount of ARP traffic grows linearly with the number of hosts on 439 a LAN. For 1 million VM hosts, it can be expected that the amount of 440 ARP traffic will be in the range of half million ARPs per second at 441 the peak, which corresponds to over 200 Mbps of ARP traffic [MYERS]. 442 Similarly, on a server, the amount of ARP traffic, grows linearly 443 with the number of virtual L2 domains/ELANs instantiated on that 444 server and the number of VMs in that domain. Besides the link 445 capacity wasted, which may be small compared to the link capacities 446 deployed in DCs, the computational burden may be prohibitive. In a 447 large-DC environment, the large number of hosts and the distribution 448 of ARP traffic may lead to a number of challenges: 450 . Processing overload and overload of ARP entries on the 451 Server/Hypervisor. This is caused by the increased number of VMs 452 per server blade and the size of related ELAN domains. For 453 example, a server blade with 100 VMs, each in a separate L2 454 domain with 100 VMs each would need to support 10K ARP entries 455 and the associated ARP processing while performing the other 456 compute tasks. 458 . Processing overload and exhaustion of ARP entries on the 459 Routers/PEs and any other L3 Service Appliances (Firewall (FW), 460 Load-Balancer (LB) etc). This issue is magnified by the L3 461 virtualization at the service gateways. For example, a gateway 462 PE handling 10K ELANs each with 10 VMs will result in 100K hosts 463 sending/receiving traffic to/from the PE, thus requiring the PE 464 to learn 100K ARP entries. It should be noted that if the PE 465 supports Integrated Routing and Bridging (IRB), it must support 466 the associated virtual IP RIBs/FIBs and MAC FDBs for these hosts 467 in addition to the ARP entries. 469 . Flood explosion throughout Ethernet switching network. This is 470 caused by the use of VLAN trunking and implicitly by the lack of 471 per VPN flood containment. 473 DC and DC-interconnect technologies that minimize the negative 474 impact of ARP, MAC and IP entry explosion on individual network 475 elements in a DC or cloud network hierarchy are needed. 477 5.3. Per VLAN flood containment 479 From an operational perspective, DC operators try to minimize the 480 provisioning touches required for configuring a VLAN domain by 481 employing VLAN trunks on the L2 switches. This comes at the cost of 482 flooding broadcast, multicast and unknown unicast frames outside of 483 the boundaries of the actual VLAN domain. 485 The cloud networking infrastructure needs to prevent unnecessary 486 traffic from being sent/leaked to undesired locations. 488 5.4. Convergence and multipath support 490 Spanning Tree is used in the current DC environment for loop 491 avoidance in the Ethernet switching domain. 493 STP can take 30 to 50 seconds to repair a topology. Practical 494 experience shows that Rapid STP (RSTP) can also take multiple seconds 495 to converge, such as when the root bridge fails. 497 STP eliminates loops by disabling ports. The result is that only one 498 path is used to carry traffic. The capacity of disabled links cannot 499 be utilized, leading to inefficient use of resources. 501 In a small DC deployment, multi-chassis LAG (MC-LAG) support may be 502 sufficient initially to provide for loop-free redundancy as an STP 503 alternative. However, in medium or large DCs it is challenging to use 504 MC-LAGs solely across the network to provide for resiliency and loop- 505 free paths without introducing a layer2 routing protocol: i.e. for 506 multi-homing of server blades to ToRs, ToRs to Core SWs, Core SWs to 507 DC GWs. MC-LAG may work as a local mechanism but it has no knowledge 508 of the end-to-end paths so it does not provide any degree of traffic 509 steering across the network. 511 Efficient and mature link-state protocols, such as IS-IS, provide 512 rapid failover times, can compute optimal paths and can fully utilize 513 multiple parallel paths to forward traffic between 2 nodes in the 514 network. 516 Unlike OSPF, IS-IS runs directly at L2 (i.e. no reliance on IP) and 517 does not require any configuration. Therefore, IS-IS based DC 518 networks are to be favored over STP-based networks. IEEE Shortest 519 Path Bridging (SPB) based on IEEE 802.1aq and IEEE 802.1Qbp, and IETF 520 TRILL [RFC6325] are technologies that enable Layer2 networks using 521 IS-IS for Layer2 routing. 523 5.5. Optimal traffic forwarding 525 Optimal traffic forwarding requires (1) efficient utilization of all 526 available link capacity in a DC and DC-interconnect, and (2) traffic 527 forwarding on the shortest path between any two communicating VMs 528 within the DC or across DCs. 530 Optimizing traffic forwarding between any VM pair in the same virtual 531 domain is dependent on (1) the placement of these VMs and their 532 relative proximity from a network viewpoint, and (2) the technology 533 used for computing the routing/switching path between these VMs. The 534 latter is especially important in the context of VMotion, moving a VM 535 from one network location to another, while maintaining its layer2 536 and Layer3 addresses. 538 Ethernet-based forwarding between two VMs relies on the MAC- 539 destination Address that is unique per VM interface in the context of 540 a virtual domain. In traditional IEEE technologies (e.g., 802.1ad, 541 802.1ah) and IETF L2VPN (i.e., VPLS), Ethernet MAC reachability is 542 always learnt in the data plane. That applies to both B-MACs and C- 543 MACs. IETF EVPN [EVPN] supports C-MAC learning in the control plane 544 via BGP. In addition, with newer IEEE technologies (802.1aq and 545 802.1Qbp) and IETF PBB-EVPN [PBB-EVPN], B-MAC reachability is learnt 546 in the control plane while C-MACs are learnt in the data plane at 547 BEBs, and tunneled in PBB frames. In all these cases, it is important 548 that as a VM is moved from one location to another: (1) VM MAC 549 reachability convergence happens fast to minimize traffic black- 550 holing, and (2) forwarding takes the shortest path. 552 IP-based forwarding relies on the destination IP address. ECMP load 553 balancing relies on flow-based criteria. An IP host address is unique 554 per VM interface. However, hosts on a LAN share a subnet mask, and IP 555 routing entries are based on that subnet address. Thus, when VMs are 556 on the same LAN and traditional forwarding takes place, these VMs 557 forward traffic to each other by relying on ARP or IPv6 Neighbor 558 discovery to identify the MAC address of the destination and on the 559 underlying layer2 network to deliver the resulting MAC frame to is 560 destination. However, when VMs, as IP hosts across layer2 virtual 561 domains, need to communicate they rely on the underlying IP routing 562 infrastructure. 564 In addition, when a DC is an all-IP DC, VMs are assigned a host 565 address with /32 subnet in the IPv4 case, or /64 or /128 host address 566 in the IPv6 case, and rely on the IP routing infrastructure to route 567 the IP packets among VMs. In this latter case, there is really no 568 need for layer2 awareness potentially beyond the hypervisor switch at 569 the server hosting the VM. In either case, when a VM moves location 570 from one physical router to another while maintaining its IP identity 571 (address), the underlying IP network must be able to route the 572 traffic to the destination and must be able to do that on the 573 shortest path. 575 Thus, in the case of IP address aggregation as in a subnet, 576 optimality in traffic forwarding to a VM will require reachability to 577 the VM host address rather than only the subnet. That is what is 578 often referred to as punching a hole in the aggregate at the expense 579 of routing and forwarding table size increase. 581 As in layer2, layer3 may capitalize on hierarchical tunneling to 582 optimize the routing/FIB resource utilization at different places in 583 the network. If a hybrid of subnet-based routing and host-based 584 routing (host-based routing here is used to refer to hole-punching in 585 the aggregate) is used, then during VMotion, routing transition can 586 take place, and traffic may be routed to a location based on subnet 587 reachability or to a location where the VM used to be attached. In 588 either of these cases, traffic must not be black-holed. It must be 589 directed potentially via tunneling to the location where the VM is. 590 This requires that the old routing gateway knows where the VM is 591 currently attached. How to obtain that information can be based on 592 different techniques with tradeoffs. However, this traffic 593 triangulation is not optimal and must only exist in the transition 594 until the network converges to a shortest path to the destination. 596 5.6. Efficient multicast support 598 STP bridges typically perform IGMP and/or PIM snooping in order to 599 optimize multicast data delivery. However, this snooping is performed 600 locally by each bridge following the STP topology where all the 601 traffic goes through the root bridge. This may result in sub-optimal 602 multicast traffic delivery. In addition, each customer multicast 603 group is associated with a forwarding tree throughout the Ethernet 604 switching network. Solutions must provide for efficient Layer2 605 multicast. In an all-IP network, explicit multicast trees in the DC 606 network can be built via multicast signaling protocols (e.g., PIM- 607 SSM) that follows the shortest path between the destinations and 608 source(s). In an IPVPN context, Multicast IPVPN based on [MVPN] can 609 be used to build multicast trees shared among IPVPNs, specific to 610 VPNs, and/or shared among multicast groups across IPVPNs. 612 5.7. Connectivity to existing VPN sites 614 It is expected that cloud services will have to span larger 615 geographical areas in the near future and that existing VPN customers 616 will require access to VM and storage facilities for virtualized data 617 center applications. Hence, the DC network virtualization must 618 interoperate with deployed and evolving VPN solutions - e.g. IP VPN, 619 VPLS, VPWS, PBB-VPLS, E-VPN and PBB-EVPN. 621 5.8. DC Inter-connect requirements 623 Cloud computing requirements such as VM Mobility across DCs, 624 Management connectivity, and support for East-West traffic between 625 customer applications located in different DCs imply that inter-DC 626 connectivity must be supported. These DCs can be part of a hybrid 627 cloud operated by the cloud service provider(s) and/or the end- 628 customers. 630 Mature VPN technologies can be used to provide L2/L3 DC interconnect 631 among VLANs/virtual domains located in different DCs. 633 5.9. L3 virtualization considerations 635 In order to provide customer L3 separation while supporting 636 overlapping IP addressing and privacy, a number of schemes were 637 implemented in the DC environment. Some of these schemes, such as 638 double NATing are operationally complex and prone to operator errors. 639 Virtual Routing contexts (or Virtual Device contexts) or dedicated 640 hardware-routers are positioned in the DC environment as an 641 alternative to these mechanisms. Every customer is assigned a 642 dedicated routing context with associated control plane protocols. 643 For instance, every customer gets an IP Forwarding instance 644 controlled by its own BGP and/or IGP routing. Assigning virtual or 645 hardware routers to each customer while supporting thousands of 646 customers in a DC is neither scalable nor cost-efficient. 648 5.10. VM Mobility requirements 650 The ability to move VMs within a resource pool, whether it is a local 651 move within the same DC to another server or to a distant DC, offers 652 multiple advantages for a number of scenarios, for example: 654 - In the event of a possible natural disaster, moving VMs to a safe 655 DC location decreases downtime and allows for meeting SLA 656 requirements. 658 - Optimized resource location: VMs can be moved to locations that 659 offer significant cost reduction (e.g. power savings), or 660 locations close to the application users. They can also be moved 661 to simply load-balance across different locations. 663 When VMs change location, it is often important to maintain the 664 existing client sessions. The VM MAC and IP addresses must be 665 preserved, and the state of the VM sessions must be copied to the new 666 location. 668 Current VM mobility tools like VMware VMotion require L2 connectivity 669 among the hypervisors on the servers participating in a VMotion pool. 670 This is in addition to "tenant ELAN" connectivity which provides for 671 communication between the VM and the client(s). 673 A VMotion ELAN might need to cross multiple DC networks to provide 674 the required protection or load-balancing. In addition, in the 675 current VMotion procedure, the new VM location must be part of the 676 tenant ELAN domain. When the new VM is activated, a Gratuitous ARP is 677 sent so that the MAC FIB entries in the "tenant ELAN" are updated to 678 direct traffic destined to that VM to the new VM location. In 679 addition, if a portion of the path requires IP forwarding, the VM 680 reachability information must be updated to direct the traffic on the 681 shortest path to the VM. 683 VM mobility requirements may be addressed through the use of Inter-DC 684 VLANs to address VMotion and tenant ELANs. However expanding "tenant 685 VLANs" across two or more DCs will accelerate VLAN exhaustion and MAC 686 explosion issues. In addition, STP needs to run across DCs leading to 687 increased convergence times and the blocking of expensive WAN 688 bandwidth. VLAN trunking used throughout the network creates 689 indiscriminate flooding across DCs. 691 L2 VPN solutions over IP/MPLS are designed to interconnect sites 692 located across the WAN. 694 6. L2VPN Applicability to Cloud Networking 696 The following sections will discuss different solution alternatives, 697 re-using IEEE and IETF technologies to provide a gradual migration 698 path from the current Ethernet switching VLAN-based model to more 699 advanced Ethernet switching and IP/MPLS based models. This evolution 700 is targeted to address inter-DC requirements, cost considerations and 701 the efficient use of processing/memory resources on DC networking 702 components. 704 6.1. VLANs and L2VPN toolset 706 One approach to address some of the DC challenges discussed in the 707 previous section is to gradually deploy additional technologies 708 within existing DC networks. For example, an operator may start by 709 breaking its DC VLAN domains into different VLAN islands so that each 710 island can support up to 4K VLANs. VLAN Domains can then be 711 interconnected via VPLS using the DC GW as a VPLS PE. An ELAN service 712 can be identified with one VLAN ID in one island and another VLAN ID 713 in another island with the appropriate VLAN ID processed at the GW. 715 As the number of tenants in individual VLAN islands surpasses 4K, the 716 operator could push VPLS deployment deeper in the DC network. It is 717 possible in the end to retain existing VLAN-based solution only in 718 VSw and to provide L2VPN support starting at the ToRs. The ToR and DC 719 core elements need to be MPLS enabled with existing VPLS solutions. 721 However, this model does not solve the MAC explosion issue as ToRs 722 still need to learn VM MAC addresses. In addition, it requires 723 management of both VLAN and L2VPN addressing and mapping of service 724 profiles. Per VLAN, per port and per VPLS configurations are required 725 at the ToR, increasing the time it takes to bring up service 726 connectivity and complicating the operational model. 728 6.2. PBB and L2VPN toolset 730 As highlighted in the problem statement section, the expected large 731 number of VM MAC addresses in the DC calls out for a VM MAC hiding 732 solution so that the ToRs and the Core Switches only need to handle a 733 limited number of MAC addresses. 735 PBB IEEE 802.1ah encapsulation is a standard L2 technique developed 736 by IEEE to achieve this goal. It was designed also to address other 737 limitations of VLAN-based encapsulations while maintaining the native 738 Ethernet operational model deployed in the DC network. 740 A conceptual PBB encapsulation is described in Figure 3 (for detailed 741 encapsulation see [802.1ah]): 743 +-------------+ 744 Backbone | BMAC DA,SA |12B 745 Ethernet |-------------| 746 Header |BVID optional| 4B 747 |-------------| 748 Service ID| PBB I-tag | 6B 749 |-------------| 750 Regular |VM MAC DA,SA | 751 Payload |-------------| 752 | | 753 |VM IP Payload| 754 | | 755 +-------------+ 757 Figure 3 PBB encapsulation 759 The original Ethernet packet used in this example for Inter-VM 760 communication is encapsulated in the following PBB header: 762 - I-tag field - organized similarly with the 802.1q VLAN tag; it 763 includes the Ethertype, PCP and DEI bits and a 24 bit ISID tag 764 which replaces the 12 bit VLAN tag, extending the number of 765 virtual L2 domain support to 16 Million. It should be noted 766 that the PBB I-Tag includes also some reserved bits, and most 767 importantly the C-MAC DA and SA. What is designated as 6 bytes 768 in the figure is the I-tag information excluding the C-MAC DA 769 and SA. 771 - An optional Backbone VLAN field (BVLAN) may be used if grouping 772 of tenant domains is desired. 774 - An outer Backbone MAC header contains the source and 775 destination MAC addresses for the related server blades, 776 assuming the PBB encapsulation is done at the hypervisor 777 virtual switch on the server blade. 779 - The total resulting PBB overhead added to the VM-originated 780 Ethernet frame is 18 or 22 Bytes (depending on whether the BVID 781 is excluded or not) 783 - Note that the original PBB encapsulation allows the use of 784 CVLAN and SVLAN in between the VM MACs and IP Payload. These 785 fields were removed from Figure 3 since in a VM environment 786 these fields do not need to be used on the VSw, their function 787 is relegated to the I-SID tag. 789 6.2.1. Addressing VLAN space exhaustion and MAC explosion 791 In a DC environment, PBB maintains traditional Ethernet forwarding 792 plane and operational model. For example, a VSw implementation of PBB 793 can make use of the 24 bit ISID tag instead of the 12 bit VLAN tag to 794 identify the virtual bridging domains associated with different VM 795 groups. The VSw uplink towards the ToR in Figure 1 can still be 796 treated as an Ethernet backbone interface. A frame originated by a VM 797 can be encapsulated with the ISID assigned to the VM VSw interface 798 and with the outer DA and SA MACs associated with the respective 799 destination and source server blades, and then sent to the ToR 800 switch. Performing this encapsulation at the VSw distributes the VM 801 MAC learning to server blades with instances in the corresponding 802 layer2 domain, and therefore alleviates this load from ToRs that 803 aggregate multiple server blades. Alternatively, the PBB 804 encapsulation can be done at the ToR. 806 With PBB encapsulation, ToRs and Core SWs do not have to handle VM 807 MAC addresses so the size of their MAC FIB tables may decrease by two 808 or more orders of magnitude, depending on the number of VMs 809 configured in each server blade and the number of VM virtual 810 interfaces and associated MACs. 812 The original PBB specification [802.1ah] did not introduce any new 813 control plane or new forwarding concepts for the PBB core. Spanning 814 Tree and regular Ethernet switching based on MAC Learning and 815 Flooding were maintained to provide a smooth technology introduction 816 in existing Ethernet networks. 818 6.2.2. Fast convergence and L2 multi-pathing 820 Additional specification work for PBB control plane has been done 821 since then in both IEEE and IETF L2VPN. 823 As stated earlier, STP-based layer2 networks underutilize the 824 available network capacity as links are put in an idle state to 825 prevent loops. Similarly, existing VPLS technology for 826 interconnecting Layer2 network-islands over an IP/MPLS core does not 827 support active-active dual homing scenarios. 829 IS-IS controlled layer2 networks allow traffic to flow on multiple 830 parallel paths between any two servers, spreading traffic among 831 available links on the path. IEEE 802.1aq Shortest Path Bridging 832 (SPB) [802.1aq] and emerging IEEE 802.1Qbp [802.1Qbp] are PBB control 833 plane technologies that utilize different methods to compute parallel 834 paths and forward traffic in order to maximize the utilization of 835 available links in a DC. In addition, a BGP based solution [PBB-EVPN] 836 was submitted and discussed in IETF L2VPN WG. 838 One or both mechanisms may be employed as required. IS-IS could be 839 used inside the same administrative domain (e.g., a DC), while BGP 840 may be employed to provide reachability among interconnected 841 Autonomous Systems. Similar architectural models have been widely 842 deployed in the Internet and for large VPN deployments. 844 IS-IS and/or BGP are also used to advertise Backbone MAC addresses 845 and to eliminate B-MAC learning and unknown unicast flooding in the 846 forwarding plane, albeit with tradeoffs. The BMAC FIB entries are 847 populated as required from the resulting IS-IS or BGP RIBs. 849 Legacy loop avoidance schemes using Spanning Tree and local 850 Active/Active MC-LAG are no longer required as their function (layer2 851 routing) is replaced by the indicated routing protocols (IS-IS and 852 BGP). 854 6.2.3. Per ISID flood containment 856 Service auto-discovery provided by 802.1aq SPB [802.1aq] and BGP 857 [PBB-EVPN] is used to distribute ISID related information among DC 858 nodes, eliminating any provisioning touches throughout the PBB 859 infrastructure. This implicitly creates backbone distribution trees 860 that provide per ISID automatic flood and multicast containment. 862 6.2.4. Efficient multicast support 864 IS-IS [802.1aq] and BGP [PBB-EVPN] could be used to build optimal 865 multicast distribution trees. In addition, PBB and IP/MPLS tunnel 866 hierarchy may be used to aggregate multiple customer multicast trees 867 sharing the same nodes by associating them with the same backbone 868 forwarding tree that may be represented by a common Group BMAC and 869 optionally a P2MP LSP. More details will be discussed in a further 870 version of the draft. 872 6.2.5. Tunneling options for PBB ELAN: Ethernet, IP and MPLS 874 The previous section introduces a solution for DC ELAN domains based 875 on PBB ISIDs, PBB encapsulation and IS-IS and/or BGP control plane. 877 IETF L2 VPN specifications [PBB-VPLS] or [PBB-EVPN] enable the 878 transport of PBB frames using PW/MPLS or simply MPLS, and implicitly 879 allow the use of MPLS Traffic Engineering and Resiliency toolset to 880 provide for advanced traffic steering and faster convergence. 882 Transport over IP/L2TPv3 [RFC 4719] or IP/GRE is also possible as an 883 alternative to MPLS tunneling. Additional header optimization for PBB 884 over IP/GRE encapsulated packets may also be feasible. These 885 specifications would allow for ISID based L2 overlay using a regular 886 IP backbone. 888 6.2.6. Use Case examples 890 6.2.6.1. PBBN in DC, L2VPN in DC GW 892 DC environments based on VLANs and native Ethernet operational model 893 may want to consider using the native PBB option to provide L2 multi- 894 tenancy, in effect the DC ELAN from Figure 2. An example of a network 895 architecture that addresses this scenario is depicted in Figure 4: 897 ,---------. 898 ,' Inter-DC `. 899 (L2VPN (PBB-VPLS) 900 `.or PBB-EVPN),' 901 `|-------|-' 902 +--+--+ +-+---+ 903 |PE GW|+-+|PE GW| 904 .+-----+ +-----+. 905 .' `-. 906 .-' `\ 907 ,' `. 908 + Intra-DC PBBN \ 909 | + 910 : ; 911 `\+------+ +------+ +--+----+-' 912 | ToR |.. | ToR |..| ToR | 913 +-+--+-+ +-+--+-+ +-+--+--+ 914 .'PBB `. .'PBB `. .'PBB `. 915 +--+-+ +-+-++ +-++-+ +-+--+ 916 |VSw | :VSw : :VSw : :VSw : 917 +----+ +----+ +----+ +----+ 919 Figure 4 PBB in DC, PBB-VPLS or PBB-EVPN for DC Interconnect 921 PBB inside the DC core interoperates seamlessly with VPLS used for L2 922 DC-Interconnect to extend ELAN domains across DCs. This expansion may 923 be required to address VM Mobility requirements or to balance the 924 load on DC PE gateways. Note than in PBB-VPLS case, just one or a 925 handful of infrastructure B-VPLS instances are required, providing 926 Backbone VLAN equivalent function. 928 PBB encapsulation addresses the expansion of the ELAN service 929 identification space with 16M ISIDs and solves MAC explosion through 930 VM MAC hiding from the Ethernet core. 932 PBB SPB [802.1aq] is used for core routing in the ToRs, Core SWs and 933 PEs. If the DCs that need to be interconnected at L2 are part of the 934 same administrative domain, and scaling is not an issue, SPB/IS-IS 935 may be extended across the VPLS infrastructure. If different AS 936 domains are present, better load balancing is required between the 937 DCs and the WAN, or IS-IS extension across DCs causes scaling issues, 938 then BGP extensions described in [PBB-EVPN] must be employed. 940 The forwarding plane, MAC FIB requirements and the Layer2 operational 941 model in the ToR and Core SW are maintained. The VSw sends PBB 942 encapsulated frames to the ToR as described in the previous section. 943 ToRs and Core SWs still perform standard Ethernet switching using the 944 outer Ethernet header. 946 From a control plane perspective, VSw uses a default gateway 947 configuration to send traffic to the ToR, as in regular IP routing 948 case. VSw BMAC learning on the ToR is done through either LLDP or VM 949 Discovery Protocol (VDP) described in [802.1Qbg]. Identical 950 mechanisms may be used for the ISID. Once this information is learned 951 on the ToR it is automatically advertised through SPB. If PBB-EVPN is 952 used in the DC GWs, MultiProtcol (MP)-BGP will be used to advertise 953 the ISID and BMAC over the WAN as described in [PBB-EVPN]. 955 6.2.6.2. PBBN in VSw, L2VPN in the ToR 957 A variation of the use case example from the previous section is 958 depicted in Figure 5: 960 ,---------. 961 ,' Inter-DC `. 962 (L2VPN (PBB-VPLS) 963 `.or PBB-EVPN),' 964 `|-------|-' 965 +--+--+ +-+---+ 966 |PE GW|+-+|PE GW| 967 .+-----+ +-----+. 968 .' `-. 969 .-' `\ 970 ,' `. 971 + Intra-DC L2VPN over \ 972 | IP or MPLS tunneling + 973 : ; 974 `\+------+ +------+ +--+----+-' 975 | ToR |.. | ToR |..| ToR | 976 +-+--+-+ +-+--+-+ +-+--+--+ 977 .'PBB `. .'PBB `. .'PBB `. 978 +--+-+ +-+-++ +-++-+ +-+--+ 979 |VSw | :VSw : :VSw : :VSw : 980 +----+ +----+ +----+ +----+ 982 Figure 5 PBB in VSw, L2VPN at the ToR 984 The procedures from the previous section are used at the VSw: PBB 985 encapsulation and Ethernet BVLANs can be used on the VSw uplink. 986 L2VPN infrastructure is replacing the BVLAN at the ToR enabling the 987 use of IP (GRE or L2TP) or MPLS tunneling. 989 L2 networking still has the same control plane choices: IS-IS 990 [802.1aq] and/or BGP [PBB-EVPN], independently from the tunneling 991 choice. 993 6.2.7. Connectivity to existing VPN sites and Internet 995 The main reason for extending the ELAN space beyond the 4K VLANs is 996 to be able to serve multiple DC tenants whereby the total number of 997 service domains needed exceeds 4K. Figure 6 represents the logical 998 service view where PBB ELANs are used inside one or multiple DCs to 999 connect to existing IP VPN sites. It should be noted that the PE GW 1000 should be able to perform integrated routing in a VPN context and 1001 bridging in VSI context: 1003 Tenant 1 sites connected over IP VPN 1005 ,--+-'. ;-`.--. 1006 ( PE ) VRFs on PEs . PE ) 1007 '-----' '-----' 1008 | | 1009 ,-------------------------------. 1010 ( IP VPN over IP/MPLS WAN ) 1011 `---.'-----------------------`.-' 1012 +--+--+ IP VPN VRF on PE GWs +-+---+ 1013 .....|PE GW|...... |PE GW| 1014 DC with PBB | +-----+ | +--+--+ 1015 Tenant 1 | |PBB ELAN12 | 1016 view PBB|ELAN11 ......|...... PBB|ELAN13 1017 '':'''''''':' | | '':'''''''':' 1018 ,'. ,'. ,+. ,+. ,'. ,'. 1019 (VM )....(VM ) (VM )... (VM ) (VM )....(VM ) 1020 `-' `-' `-' `-' `-' `-' 1021 Compute Resources inside DC 1023 Figure 6 Logical Service View with IP VPN 1025 DC ELANs are identified with 24-bit ISIDs instead of VLANs. At the PE 1026 GWs, an IP VPN VRF is configured for every DC tenant. Each "ISID 1027 ELAN" for Tenant 1 is seen as a logical Ethernet endpoint and is 1028 assigned an IP interface on the Tenant 1 VRF. Tenant 1 enterprise 1029 sites are connected to IP VPN PEs distributed across the WAN. IP VPN 1030 instances on PE GWs can be automatically discovered and connected to 1031 the WAN IP VPN using standard procedures - see [RFC4364]. 1033 In certain cases, the DC GW PEs are part of the IPVPN service 1034 provider network providing IPVPN services to the enterprise 1035 customers. In other cases, DC PEs are operated and managed by the 1036 DC/cloud provider and interconnect to multiple IPVPN service 1037 providers using inter-AS BGP/MPLS models A, B, or C [RFC4364]. The 1038 same discussion applies to the case of IPSec VPNs from a PBB ELAN 1039 termination perspective. 1041 If tenant sites are connected to the DC using WAN VPLS, the PE GWs 1042 need to implement the BEB function described in the PBB-VPLS PE model 1043 [PBB-VPLS] and the procedures from [PBB-Interop] to perform the 1044 required translation. Figure 7 describes the VPLS WAN scenario: 1046 Customer sites connected over VPLS 1048 ,--+-'. ;-`.--. 1049 ( PE ) VPLS on PEs . PE ) 1050 '-----' '-----' 1051 | | 1052 ,-------------------------------. 1053 ( VPLS over IP/MPLS WAN ) 1054 `---.'-----------------------`.-' 1055 +--+--+ +-+---+ 1056 |PE GW| <-- PBB-VPLS/BEB --> |PE GW| 1057 DC with PBB +--+--+ +--+--+ 1058 Tenant 1 | | 1059 view PBB|ELAN11 PBB|ELAN13 1060 '':'''''''':' '':'''''''':' 1061 ,'. ,'. ,'. ,'. 1062 (VM ) .. (VM ) (VM ) .. (VM ) 1063 `-' `-' `-' `-' 1064 Compute Resources inside DC 1066 Figure 7 Logical Service View with VPLS WAN 1068 One VSI is required at the PE GW for every DC ELAN domain. Same as in 1069 the IP VPN case, DC PE GWs may be fully integrated as part of the WAN 1070 provider network or using Inter-AS/Inter-provider models A,B or C. 1072 The VPN connectivity may be provided by one or multiple PE GWs, 1073 depending on capacity need and/or the operational model used by the 1074 DC/cloud operator. 1076 If a VM group is serving Internet connected customers, the related 1077 ISID ELAN will be terminated into a routing context (global public 1078 instance or another VRF) connected to the Internet. Same as in the IP 1079 VPN case, the 24bit ISID will be represented as a logical Ethernet 1080 endpoint on the Internet routing context and an IP interface will be 1081 allocated to it. Same PE GW may be used to provide both VPN and 1082 Internet connectivity with the routing contexts separated internally 1083 using the IP VPN models. 1085 6.2.8. DC Interconnect 1087 L2 DC interconnect may be required to expand the ELAN domains for 1088 Management, VM Mobility or when a VM Group needs to be distributed 1089 across DCs. 1091 PBB may be used to provide ELAN extension across multiple DCs as 1092 depicted in Figure 8: 1094 ,-------------------------------. 1095 ( IP/MPLS WAN ) 1096 `---.'------------------------`.' 1097 +--+--+ +-+---+ 1098 |PE GW| <----- PBB BCB ----> |PE GW| 1099 DC with PBB +--+--+ +--+--+ 1100 Tenant 1 | | 1101 view PBB|ELAN11 PBB|ELAN11 1102 '':'''''''':' '':'''''''':' 1103 ,'. ,'. ,'. ,'. 1104 (Hvz) .. (Hvz) (Hvz) .. (Hvz) 1105 `-' `-' `-' `-' 1106 Compute Resources inside DC 1108 Figure 8 PBB BCB providing VMotion ELAN 1110 ELAN11 is expanded across DC to provide interconnect for the pool of 1111 server blades assigned to the same VMotion domain. This time 1112 Hypervisors are connected directly to ELAN11. The PE GW operates in 1113 this case as a PBB Backbone Core Bridge (BCB) [PBB-VPLS] combined 1114 with PBB-EVPN capabilities [PBB-EVPN]. The ISID ELANs do not require 1115 any additional provisioning touches and do not consume additional 1116 MPLS resources on the PE GWs. Per ISID auto-discovery and flood 1117 containment is provided by IS-IS/SPB [802.1aq] and BGP [PBB-EVPN]. 1119 6.2.9. Interoperating with existing DC VLANs 1121 While green field deployments will definitely benefit from all the 1122 advantages described in the previous sections, in many other 1123 scenarios, existing DC VLAN environments will have to be gradually 1124 migrated to the new architecture. Figure 9 depicts an example of a 1125 possible migration scenario where both PBB and VLAN technologies are 1126 present: 1128 ,---------. 1129 ,' Inter-DC `. 1130 (L2VPN (PBB-VPLS) 1131 `.or PBB-EVPN),' 1132 `-/------\-' 1133 +---+-+ +-+---+ 1134 |PE GW|+-+|PE GW| 1135 .-+-----+ +-----+:-. 1136 .-' `-. 1137 ,' `-:. 1138 + PBBN/SPB DC \ 1139 | + 1140 : ; 1141 `-+------+ +------+ +--+----+-' 1142 | ToR |.. | ToR |..| ToR | 1143 +-+--+-+ +-+--+-+ +-+--+--+ 1144 .'PBB `. .' `. .'VLAN`. 1145 +--+-+ +-+-++ +-++-+ +-+--+ 1146 |VSw | :VSw : :VSw : :VSw : 1147 +----+ +----+ +----+ +----+ 1148 Figure 9 DC with PBB and VLANs 1150 This example assumes that the two VSWs on the right do not support 1151 PBB but the ToRs do. The VSw on the left side are running PBB while 1152 the ones on the right side are still using VLANs. The left ToR is 1153 performing only Ethernet switching whereas the one on the right is 1154 translating from VLANs to ISIDs and performing PBB encapsulation 1155 using the BEB function [802.1ah] and [PBB-VPLS]. The ToR in the 1156 middle is performing both functions: core Ethernet tunneling for the 1157 PBB VSw and BEB function for the VLAN VSw. 1159 The SPB control plane is still used between the ToRs, providing the 1160 benefits described in the previous section. The VLAN VSw must use 1161 regular multi-homing functions to the ToRs: for example STP or Multi- 1162 chassis-LAG. 1164 DC VLANs may be also present initially on some of the legacy ToRs or 1165 Core SWs. PBB interoperability will be performed as follows: 1167 . If VLANs are used in the ToRs, PBB BEB function may be 1168 performed by the Core SW(s) where the ToR uplink is connected 1170 . If VLANs are used in the Core SW, PBB BEB function may be 1171 performed by the PE GWs where the Core SW uplink is connected 1173 It is possible that some DCs may run PBB or PBB-VLAN combination 1174 while others may still be running VLANs. An example of this 1175 interoperability scenario is described in Figure 10: 1177 ,-------------------------------. 1178 ( IP/MPLS WAN ) 1179 `------/-----------------\-------' 1180 +--/--+ +--\--+ 1181 |PE GW|PBB-VPLS |PE GW|VPLS 1182 .'+-----+-' .'+-----+'. 1183 / \ / \ 1184 | | | | 1185 | PBB DC | | VLAN DC | 1186 \ / \ / 1187 +---+ +---+ +---+ +---+ 1188 |VSw|.|VSw| |VSw|.|VSw| 1189 +---+ +---+ +---+ +---+ 1190 Figure 10 Interoperability to a VLAN-based DC 1192 Interoperability with existing VLAN DC is required for DC 1193 interconnect. The PE-GW in the PBB DC or the PE GW in the VLAN DC 1194 must implement PBB-VPLS PE model described in [PBB-VPLS]. This 1195 interoperability scenario is addressed in detail in [PBB-Interop]. 1197 Connectivity to existing VPN customer sites (IP VPN, VPLS, IPSec) or 1198 Internet does not require any additional procedures beyond the ones 1199 described in the VPN connectivity section. The PE GW in the DC VLAN 1200 will aggregate DC ELANs through IP interfaces assigned to VLAN 1201 logical endpoints whereas the PE GW in the PBB DC will assign IP 1202 interfaces to ISID logical endpoints. 1204 If EVPN is used to interconnect the two DCs, PBB-EVPN functions 1205 described in [PBB-EVPN] must be implemented in one of the PE-GWs. 1207 6.3. TRILL and L2VPN toolset 1209 TRILL and SPB control planes provide similar functions. IS-IS is the 1210 base protocol used in both specifications to provide multi-pathing 1211 and fast convergence for core networking. [PBB-EVPN] describes how 1212 seamless Inter-DC connectivity can be provided over an MPLS/IP 1213 network for both TRILL [RFC6325] and SPB [802.1aq]/[802.1Qbp] 1214 networks. 1216 The main differences exist in the encapsulation and data plane 1217 forwarding. TRILL encapsulation [RFC6325] was designed initially for 1218 large enterprise and campus networks where 4k VLANs are sufficient. 1219 As a consequence the ELAN space in [RFC6325] is limited to 4K VLANs; 1220 however, this VLAN scale issue is being addressed in [Fine-Grained]. 1222 7. L3VPN applicability to Cloud Networking 1224 This section discusses the role of IP VPN technology in addressing 1225 the L3 Virtualization challenges described in section 5. 1227 IP VPN technology defined in L3VPN working group may be used to 1228 provide L3 virtualization in support of multi-tenancy in the DC 1229 network as depicted in Figure 11. 1231 ,-------------------------------. 1232 ( IP VPNs over IP/MPLS WAN ) 1233 `----.'------------------------`.' 1234 ,--+-'. ;-`.--. 1235 ..... VRF1 )...... . VRF2 ) 1236 | '-----' | '-----' 1237 | Tenant1 |ELAN12 Tenant1| 1238 |ELAN11 ....|........ |ELAN13 1239 '':'''''''':' | | '':'''''''':' 1240 ,'. ,'. ,+. ,+. ,'. ,'. 1241 (VM )....(VM ) (VM )... (VM ) (VM )....(VM ) 1242 `-' `-' `-' `-' `-' `-' 1243 Figure 11 Logical Service View with IP VPN 1245 Tenant 1 might buy Cloud Services in different DC locations and 1246 choose to associate the VMs in 3 different groups, each mapped to a 1247 different ELAN: ELAN11, ELAN12 and ELAN13. L3 interconnect between 1248 the ELANs belonging to tenant1 is provided using an IP/MPLS VPN and 1249 associated VRF1 and VRF2, possibly located in different DCs. Each 1250 tenant that requires L3 virtualization will be allocated a different 1251 IP VPN instance. Using full fledge IP VPN for L3 Virtualization 1252 inside a DC presents the following advantages compared with existing 1253 DC technologies like Virtual Routing: 1255 - Interoperates with existing WAN VPN technology 1257 - Deployment tested, provides a full networking toolset 1259 - Scalable core routing - only one BGP-MP routing instance is 1260 required compared with one per customer/tenant in the Virtual 1261 Routing case 1263 - Service Auto-discovery - automatic discovery and route 1264 distribution between related service instances 1266 - Well defined and deployed Inter-Provider/Inter-AS models 1267 - Supports a variety of VRF-to-VRF tunneling options 1268 accommodating different operational models: MPLS [RFC4364], IP 1269 or GRE [RFC4797] 1271 To provide Cloud services to related customer IP VPN instances 1272 located in the WAN the following connectivity models may be employed: 1274 - DC IP VPN instance may participate directly in the WAN IP VPN 1276 - Inter-AS Options A, B or C models may be employed with 1277 applicability to both Intra and Inter-Provider use cases 1278 [RFC4364] 1280 8. Solutions for other DC challenges 1282 This section touches on some of the DC challenges that may be 1283 addressed by a combination of IP VPN, L2VPN and IP toolset. 1284 Additional details will be provided in a future revision. 1286 8.1. Addressing IP/ARP explosion 1288 Possible solutions for IP/ARP explosion are discussed in [EVPN], 1289 [PBB-EVPN], [ARPproxy] and in ARMD WG that address certain aspects. 1290 More discussion is required to clarify the requirements in this 1291 space, taking into account the different network elements potentially 1292 impacted by ARP. 1294 8.2. Optimal traffic forwarding 1296 IP networks, built using links-state protocols such as OSPF or ISIS 1297 and BGP provide optimal traffic forwarding through the use of equal 1298 cost multiple path (ECMP) and ECMP traffic load-balancing, and the 1299 use of traffic engineering tools based on BGP and/or MPLS-TE as 1300 applicable. In the Layer2 case, SPB or TRILL based protocols provide 1301 for load-balancing across parallel paths or equal cost paths between 1302 two nodes. Traffic follows the shortest path. For multicast, data 1303 plane replication at layer2 or layer3 happens in the data plane 1304 albeit with different attributes after multicast trees are built via 1305 a control plane and/or snooping. In the presence of VM mobility, 1306 optimal forwarding relates to avoiding triangulation and providing 1307 for optimum forwarding between any two VMs. 1309 8.3. VM Mobility 1311 IP VPN technology may be used to support DC Interconnect for 1312 different functions like VM Mobility and Cloud Management. A 1313 description of VM Mobility between server blades located in different 1314 IP subnets using extensions to existing BGP-MP and IP VPN procedure 1315 is described in [VM-Mobility]. Other solutions can exist as well. 1316 What is needed is a solution that provides for fast convergence 1317 toward the steady state whereby communication among any two VMs can 1318 take place on the shortest path or most optimum path, transit 1319 triangulation time is minimized, traffic black-holing is avoided, and 1320 impact on routing scale for both IPv4 and IPv6 is controllable or 1321 minimized. 1323 9. Security Considerations 1325 No new security issues are introduced beyond those described already 1326 in the related L2VPN drafts. 1328 10. IANA Considerations 1330 IANA does not need to take any action for this draft. 1332 11. References 1334 11.1. Normative References 1336 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1337 Requirement Levels", BCP 14, RFC 2119, March 1997. 1339 [RFC4761] Kompella, K. and Rekhter, Y. (Editors), "Virtual Private 1340 LAN Service (VPLS) Using BGP for Auto-Discovery and 1341 Signaling", RFC 4761, January 2007. 1343 [RFC4762] Lasserre, M. and Kompella, V. (Editors), "Virtual Private 1344 LAN Service (VPLS) Using Label Distribution Protocol (LDP) 1345 Signaling", RFC 4762, January 2007. 1347 [PBB-VPLS] Balus, F. et al. "Extensions to VPLS PE model for Provider 1348 Backbone Bridging", draft-ietf-l2vpn-pbb-vpls-pe-model- 1349 04.txt (work in progress), October 2011. 1351 [PBB-Interop] Sajassi, A. et al. "VPLS Interoperability with Provider 1352 Backbone Bridging", draft-ietf-l2vpn-pbb-vpls-interop- 1353 02.txt (work in progress), July 2011. 1355 [802.1ah] IEEE 802.1ah "Virtual Bridged Local Area Networks, 1356 Amendment 6: Provider Backbone Bridges", Approved Standard 1357 June 12th, 2008 1359 [802.1aq] IEEE Draft P802.1aq/D4.3 "Virtual Bridged Local Area 1360 Networks, Amendment: Shortest Path Bridging", Work in 1361 Progress, September 21, 2011 1363 [RFC6325] Perlman, et al., "Routing Bridges (Rbridges): Base Protocol 1364 Specification", RFC 6325, July 2011. 1366 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1367 Networks (VPNs)", RFC 4364, February 2006. 1369 [RFC4797] Rosen, E. and Y. Rekhter, " Use of Provider Edge to 1370 Provider Edge (PE-PE) Generic Routing encapsulation (GRE) 1371 or IP in BGP/MPLS IP Virtual Private Networks ", RFC 4797, 1372 January 2007. 1374 11.2. Informative References 1376 [RFC4026] Andersson, L. et Al., "Provider Provisioned Virtual Private 1377 Network (VPN) Terminology", RFC 4026, May 2005. 1379 [802.1Qbp] IEEE Draft P802.1Qbp/D0.1 "Virtual Bridged Local Area 1380 Networks, Amendment: Equal Cost Multiple Paths (ECMP)", 1381 Work in Progress, October 13, 2011 1383 [802.1Qbg] IEEE Draft P802.1Qbg/D1.8 "Virtual Bridged Local Area 1384 Networks, Amendment: Edge Virtual Bridging", Work in 1385 Progress, October 17, 2011 1387 [EVPN] Raggarwa, R. et al. "BGP MPLS based Ethernet VPN", draft- 1388 raggarwa-sajassi-l2vpn-evpn-04.txt (work in progress), 1389 September 2011. 1391 [PBB-EVPN] Sajassi, A. et al. "PBB-EVPN", draft-sajassi-l2vpn-pbb- 1392 evpn-02.txt (work in progress), July 2011. 1394 [VM-Mobility] Raggarwa, R. et al. "Data Center Mobility based on 1395 BGP/MPLS, IP Routing and NHRP", draft-raggarwa-data-center- 1396 mobility-01.txt (work in progress), September 2011. 1398 [RFC4719] Aggarwal, R. et al., "Transport of Ethernet over Layer 2 1399 Tunneling Protocol Version 3 (L2TPv3)", RFC 4719, November 1400 2006. 1402 [MVPN] Rosen, E. and Raggarwa, R. "Multicast in MPLS/BGP IP VPN", 1403 draft-ietf-l3vpn-2547bis-mcast-10.txt (work in progress), 1404 January 2010. 1406 [ARPproxy] Carl-Mitchell, S. and Quarterman, S., "Using ARP to 1407 implement transparent subnet gateways", RFC 1027, October 1408 1987. 1410 [MYERS] Myers, A., Ng, E. and Zhang, H., "Rethinking the Service 1411 Model: Scaling Ethernet to a Million Nodes" 1412 http://www.cs.cmu.edu/~acm/papers/myers-hotnetsIII.pdf 1414 [Fine-Grained] Eastlake, D. et Al., "RBridges: Fine-Grained 1415 Labeling", draft-eastlake-trill-rbridge-fine-labeling- 1416 01.txt (work in progress), October 2011. 1418 12. Acknowledgments 1420 In addition to the authors the following people have contributed to 1421 this document: 1423 Javier Benitez, Colt 1425 Dimitrios Stiliadis, Alcatel-Lucent 1427 Samer Salam, Cisco 1429 This document was prepared using 2-Word-v2.0.template.dot. 1431 Authors' Addresses 1433 Nabil Bitar 1434 Verizon 1435 40 Sylvan Road 1436 Waltham, MA 02145 1437 Email: nabil.bitar@verizon.com 1439 Florin Balus 1440 Alcatel-Lucent 1441 777 E. Middlefield Road 1442 Mountain View, CA, USA 94043 1443 Email: florin.balus@alcatel-lucent.com 1445 Marc Lasserre 1446 Alcatel-Lucent 1447 Email: marc.lasserre@alcatel-lucent.com 1448 Wim Henderickx 1449 Alcatel-Lucent 1450 Email: wim.henderickx@alcatel-lucent.com 1452 Ali Sajassi 1453 Cisco 1454 170 West Tasman Drive 1455 San Jose, CA 95134, USA 1456 Email: sajassi@cisco.com 1458 Luyuan Fang 1459 Cisco 1460 111 Wood Avenue South 1461 Iselin, NJ 08830 1462 Email: lufang@cisco.com 1464 Yuichi Ikejiri 1465 NTT Communications 1466 1-1-6, Uchisaiwai-cho, Chiyoda-ku 1467 Tokyo, 100-8019 Japan 1468 Email: y.ikejiri@ntt.com 1470 Mircea Pisica 1471 BT 1472 Telecomlaan 9 1473 Brussels 1831, Belgium 1474 Email: mircea.pisica@bt.com