idnits 2.17.1 draft-dunbar-nvo3-overlay-mobility-issues-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 6 instances of too long lines in the document, the longest one being 14 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 28, 2012) is 4319 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'NVo3-problem' is mentioned on line 172, but not defined == Unused Reference: 'IEEE802.1Qbg' is defined on line 496, but no explicit reference was found in the text == Unused Reference: 'ARMD-Problem' is defined on line 500, but no explicit reference was found in the text == Unused Reference: 'ARMD-Multicast' is defined on line 503, but no explicit reference was found in the text == Unused Reference: 'Gratuitous ARP' is defined on line 506, but no explicit reference was found in the text Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVo3 L. Dunbar 2 Internet Draft Huawei 3 Intended status: Informational June 28, 2012 4 Expires: December 2012 6 Issues of Mobility in DC Overlay network 8 draft-dunbar-nvo3-overlay-mobility-issues-00.txt 10 Abstract 12 This draft describes the issues introduced by VM mobility in Data 13 center overlay network. 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 This Internet-Draft will expire on December 28, 2011. 32 Copyright Notice 34 Copyright (c) 2012 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the BSD License. 47 Table of Contents 49 1. Introduction ................................................ 2 50 2. Terminology ................................................. 3 51 3. Issues associated with Multicast in Overlay Network........... 3 52 4. Issues associated with more than 4k Tenant Separation......... 4 53 4.1. Collision of local VLAN Identifiers when VMs Move........ 7 54 4.1.1. Local VIDs Managed by External Controller.......... 10 55 4.1.2. Local VIDs Managed by NVE ......................... 11 56 4.2. Tenant Virtual Network separation at the physical gateway 57 routers .................................................... 11 58 5. Summary and Recommendations................................. 12 59 6. Manageability Considerations................................ 13 60 7. Security Considerations..................................... 13 61 8. IANA Considerations ........................................ 13 62 9. Acknowledgments ............................................ 13 63 10. References ................................................ 13 64 Authors' Addresses ............................................ 14 65 Intellectual Property Statement................................ 14 66 Disclaimer of Validity ........................................ 14 68 1. Introduction 70 Overlay networks, such as VxLAN, NvGRE, etc, have been proposed to 71 scale networks in Data Center with massive number of hosts as the 72 result of server virtualization and business demand. 73 Overlay network can hide the massive number of VMs' addresses from 74 the switches/routers in the core (i.e. underlay network). 75 One of the key requirements stated in [NVo3-problem] is the ability 76 of moving VMs across wider range of locations, which could be 77 multiple server racks, PODs, or locations, without changing VM's 78 IP/MAC addresses. That means the association of VMs to their 79 corresponding NVE is changing as VMs migrate. This dynamic nature of 80 VM mobility in Data Center introduces new challenges and 81 complications to overlay networks. 82 This draft describes some of the issues introduced by VM migration in 83 overlay environment. The purpose of the draft is to ensure those 84 issues will be addressed by future solutions. 86 2. Terminology 88 CE: VPN Customer Edge Device 90 DC: Data Center 92 DA: Destination Address 94 EOR: End of Row switches in data center. 96 VNID: Virtual Network Identifier 98 NVE: Network Virtualization Edge 100 PE: VPN Provider Edge Device 102 SA: Source Address 104 ToR: Top of Rack Switch. It is also known as access switch. 106 VM: Virtual Machines 108 VPLS: Virtual Private LAN Service 110 3. Issues associated with Multicast in Overlay Network 112 Some data centers avoid the use of IP Multicast due, primarily, to 113 the perceptions of configuration/protocol complexity and multicast 114 scaling limits. There are also many data center operators for whom 115 multicast is critical. Among the latter group, multicast is used for 116 Internet Television (IPTV), market data, cluster load balancing, 117 gaming, just to name a few. 118 The use of multicast in overlay environment can impose some issues to 119 network when VMs move, in particular: 121 The association between multicast members to NVE becomes 122 dynamic as VMs move. At one moment, all members of a multicast 123 group could be attached to one NVE. At another moment, some 124 members of the multicast group could be attached to different 125 NVEs. Among VMs attached to one NVE, some can send, while 126 others can only receive. 127 In addition, Overlay, which hides the VM addresses, introduces the 128 IGMP snooping issue in the core. With NVE adding outer header to 129 data frames from VMs (i.e. applications), multicast addresses are 130 hidden from the underlay networks, making switches in the underlay 131 network not being able to snoop on the IGMP reports from multicast 132 members. 133 For unicast data frames, overlay network edge (e.g. TRILL edge) can 134 learn the inner-outer address mapping by observing data frames 135 passing by. Since multicast address is not placed in the inner- 136 header's SA field of data frame, the learning approach for unicast 137 won't work for multicast in overlay. 138 TRILL solves the multicast inner-outer address learning issues by 139 creating common multicast trees in the TRILL domain. If TRILL's 140 multicast approach is used for DC with VM mobility, the multicast 141 states maintained by switches/routers in the underlay network have 142 to change as VMs move, which means switches in the underlay network 143 have to be aware of VMs mobility and change multicast states 144 accordingly. 145 Overall, the VM mobility in overlay environment make multicast more 146 complicated for switches/routers in the underlay network and for 147 NVEs. 149 4. Issues associated with more than 4k Tenant Separation 151 The [NVo3-framework] has a good figure showing the logical network 152 seen by each tenant. There are L2 domains being connected by L3 153 infrastructure. Each tenant can have multiple virtual networks, which 154 are identified IEEE802.1Q compliant 12 bits VLAN ID, under its 155 logical routers (Rtr). Any VMs communicating with peers in different 156 subnets, either within DC or outside DC, will have their L2 MAC 157 address destined towards its local Router (Rtr in the figure below). 159 +----- L3 Infrastructure ----+ 160 | | 161 ,--+-'. ;--+--. 162 ..... Rtr1 )...... . Rtr2 ) 163 | '-----' | '-----' 164 | Tenant1 |LAN12 Tenant1| 165 |LAN11 ....|........ |LAN13 166 '':'''''''':' | | '':'''''''':' 167 ,'. ,'. ,+. ,+. ,'. ,'. 168 (VM ) .. (VM ) (VM ) .. (VM ) (VM ) .. (VM ) 169 `-' `-' `-' `-' `-' `-' 170 Figure 1: Logical Service Connectivity for a single tenant 172 The overlay introduced by [NVo3-problem] makes the core (i.e. the 173 underlay network) switches/routers forwarding tables not be impacted 174 when VMs belonging to different tenants are placed or moved to 175 anywhere, as shown in the Figure below (copied from [NVo3- 176 framework]). 178 +--------+ +--------+----+ 179 | Tenant | | TES: |VM1 | 180 | End +--+ +---| Blade |VM2 | 181 | System | | | | server |.. | 182 +--------+ | ................... | +--------+----+ 183 | +-+--+ +--+-+ |----VM-a 184 | | NVE| |NVE | |----VM 185 +--| #1 | |#2 |--+----VM 186 +-+--+ +--+-+ 187 / . L3 Overlay . \ 188 +--------+ / . Network . \ +--------+ 189 | Tenant +--+ . . +----| Tenant | 190 | End | . . | End | 191 | System | . +----+ . | System | 192 +--------+ .....|NVE |........ +--------+ 193 |#3 | 194 +----+ 195 | 196 | 197 +--------+ 198 | Tenant |--VM-b 199 | End |--VM 200 | System | 201 +--------+ 202 Figure 2: Overlay example 204 For client traffic "VM-a" to "VM-b", the ingress NVE encapsulates the 205 client payload with an outer header which includes at least egress 206 NVE as DA, ingress NVE as SA, and a VNID. The VNID is a 24-bits 207 identifier proposed by [NVo3-Problem] to separate tens of thousands 208 of tenant virtual networks. When the egress NVE receives the data 209 frame from its ports facing the underlay network, the egress NVE 210 decapsulates the outer header and then forward the decapsulated data 211 frame to the attached VMs. 213 When "VM-b" is on the same subnet (or VLAN) as "VM-a" and located 214 within the same data center, the corresponding egress NVE is usually 215 on a virtual switch in a server, on a ToR switch, or on a blade 216 switch. 218 When "VM-b" is on a different subnet (or VLAN), the corresponding 219 egress NVE should be next to (or located on) the logical Rtr (Figure 220 1), which is most likely located on the data center gateway 221 router(s). 223 4.1. Collision of local VLAN Identifiers when VMs Move 225 Since the VMs attached to one NVE could belong to different virtual 226 networks, the traffic under each NVE have to be identified by local 227 network identifiers, which is usually VLAN if VMs are attached to NVE 228 access ports via L2. 230 To support tens of thousands of virtual networks, the local VID 231 associated with client payload under each NVE has to be locally 232 significant. If ingress NVE simply encapsulates an outer header to 233 data frames received from VMs and forward the encapsulated data 234 frames to egress NVE via underlay network, the egress NVE can't 235 simply decapsulate the outer header and send the decapsulated data 236 frames to attached VMs as done by TRILL. Egress NVE needs to convert 237 the VID carried in the data frame to a local VID for the virtual 238 network before forwarding the data frame to the VMs attached. 240 In VPLS, operator has to configure the local VIDs under each PE to 241 specific VPN instances. In VPLS, the local VID mapping to VPN 242 instance ID doesn't change very much. In addition, most likely CE is 243 not shared by multiple tenants, so the VIDs on one physical port of 244 PE to CE are only for one tenant. For rare occasion of multiple 245 tenants sharing one CE, the CE can convert the tuple [local customer 246 VIDs & Tenant Access Port] to the VID designated by VPN operator for 247 each VPN instance on the shared link between CE port and PE port. For 248 example, in the figure below, the VIDs under CE#21 and the VIDs under 249 CE#22 can be duplicated as long as the CEs can convert the local VIDs 250 from their downstream links to the VIDs given by the VPN operators 251 for the links between PE and CEs. 253 +--------+ +--------+ 254 | CE | | CE |-> local VIDs 255 | #11 +--+ +---| | 256 | | | | | #21 | 257 +--------+ | ................... | +--------+ 258 | +-+--+ +--+-+ | 259 | | PE | | PE | |<-VIDs configured by VPN operator 260 +--| 1 | | 2 |--+ 261 +-+--+ +--+-+ 262 / . VPLS . \ 263 +--------+ / . Network . \ +--------+ 264 | CE +--+ . . +----| CE |-> Local VIDs 265 | #12 | . . | #22 | 266 | | . +----+ . | | 267 +--------+ .....| PE |........ +--------+ 268 | 3 | 269 +----+ 270 | 271 | 272 +--------+ 273 | CE | 274 | #31 | 275 | | 276 +--------+ 277 Figure 3: VPLS example 279 When all VMs under one virtual network are moved away from a NVE, the 280 local VID, which was designated for this virtual network, might need 281 to be used for different virtual network whose VMs are moved in 282 later. 284 In the Figure below, the NVE#1 may have local VID #100~#200 assigned 285 to some virtual networks attached. The NVE#2 may have local VID 286 #100~#150 assigned to different virtual networks. With VNID encoded 287 in the outer header of data frames, the traffic in the L3 Overlay 288 Network is strictly separated. 290 +--------+ +--------+ 291 | Tenant | | TES: | 292 | End +--+ +---| Blade | 293 | System | | | | server | 294 +--------+ | ................... | +--------+ 295 | +-+--+ +--+-+ | 296 | | NVE| |NVE | | 297 +--| #1 | |#2 |--+ 298 +-+--+ +--+-+ <-local VID to global VNID mapping 299 / . L3 Overlay . \ becomes dynamic 300 +--------+ / . Network . \ +--------+ 301 | Tenant +--+ . . +----| Tenant | 302 | End | . . | End | 303 | System | . +----+ . | System | 304 +--------+ .....|NVE |........ +--------+ 305 |#3 | <-May not aware of VMs added/removed 306 +----+ 307 | 308 | 309 +--------+ 310 | Tenant | 311 | End | 312 | System | 313 +--------+ 314 Figure 4: Overlay example 316 When some VMs associated with Virtual Network X using VID 120 under 317 NVE1 are moved to NVE2, a new VID must be assigned for the Virtual 318 Network X under NVE2. 320 It gets complicated when the local VIDs are tagged by none-NVE 321 devices, e.g. VMs themselves, blade server switches, or virtual 322 switches within servers. 324 The devices which add VID to untagged frames need to be informed of 325 the local VID. If data frames from VMs already have VID encoded in 326 data frames, then there has to be a mechanism to notify the first 327 switch port facing the VMs to convert the VID encoded by the VMs to 328 the local VID which is assigned for the virtual network under the new 329 NVE. That means when a VM is moved to a new location, its immediate 330 adjacent switch port has be informed of local VID to convert the VID 331 encoded in the data frames from the VM. 333 NVE will need the mapping between local VID and the VNID to be used 334 to face L3 underlay network. 336 4.1.1. Local VIDs Managed by External Controller 338 Most likely the VM assignment to a physical location is managed by a 339 non-networking entity, e.g. VM Manager or a Server Manager. NVEs may 340 not be aware of VMs being added or deleted unless NVEs have a north 341 bound interface to a controller which can communicate with VM/server 342 Manager(s). 344 When NVE can be informed of VMs being added/deleted and their 345 associated tenant virtual networks via its controller, NVE should be 346 able to get the specific VNID from its controller for untagged data 347 frames arriving at its Virtual Access Points [VNo3-framework 3.1.1]. 349 Since local VIDs under each NVE are really locally significant, it 350 might be less confusing to egress NVE if ingress NVE remove the local 351 VID attached to the data frame. So that egress NVE always has to 352 assign its own local VID to data frame before sending the 353 decapsulated data frame to attached VMs. 355 If, for whatever reason, it is necessary to have local VID in the 356 data frames before encapsulating outer header of EgressNVE-DA/ 357 IngressNVE-SA /VNID, NVE should get the specific local VID from the 358 external Controller for those untagged data frames coming to each 359 Virtual Access Point. 361 If the data frame is tagged before reaching the NVE's Virtual Access 362 Point (e.g. tagged data frames from VMs) and NVE is more than one hop 363 away from VMs, the first (virtual) port facing the VMs has be 364 informed by the external controller of the new local VID to replace 365 the VID encoded in the data frames. For reverse direction, i.e. data 366 frames coming from core towards VMs, the first switching port facing 367 VMs have to convert the VIDs encoded in the data frames to the VIDs 368 used by VMs. 370 The IEEE802.1Qbg's VDP protocol (Virtual Station Interface (VSI) 371 discovery and configuration protocol) requires hypervisor to send VM 372 profile upon a new VM is instantiated. However, not all hypervisors 373 support this function. 375 4.1.2. Local VIDs Managed by NVE 377 If NVEs don't have interface to any controllers which can be informed 378 of VMs being added to or deleted from NVEs, then NVEs have to learn 379 new VMs/VLANs being attached, figure out to which tenant virtual 380 network those VMs/VLANs belong, and/or age out VMs/VLANs after a 381 specified timer expires. Network management system has to assist NVEs 382 in making the decision, even if the network management system doesn't 383 have interface to VM/server managers. 385 When NVE receives a data frame with a new VM address (e.g. MAC) in a 386 tagged data frame from its Virtual Access Point, the new VM could be 387 from an existing local virtual network, from a different virtual 388 network (being brought in as the VM being added in), or from an 389 illegal VM. 391 Upon NVE learns a new VM being added, either by learning a new MAC 392 address or a new VID, it needs its management system to confirm the 393 validity of the new VID and/or new address. If the new address or VID 394 is from invalid or illegal source, the data frame has to be dropped. 396 4.2. Tenant Virtual Network separation at the physical gateway routers 398 When a VM communicates with peers in a different subnets, data frames 399 will be sent to the tenant logical Router (Rtr1 or Rtr2 in the Figure 400 1). Very often, the logical routers of all tenants in a data center 401 are just logical entities (e.g. VRF) on the gateway router(s). That 402 means that all the VLANs for all tenants will be terminated at the 403 Data Center Gateway router(s), as shown in the figure below. 405 ,---------. 406 ,' `. 407 ( IP/MPLS WAN ) 408 `. ,' 409 `-+------+' 410 +--+--+ +-+---+ 411 |DC GW|+-+|DC GW| 412 +-+---+ +-----+ 413 / \ <---- All VLANs of all tenants 414 / \ 415 +-------+ +------+ 416 +/------+ | +/-----+ | 417 | Aggr11| + ----- |AggrN1| + Aggregation 418 +---+---+/ +------+/ 419 / \ / \ 421 / \ / \ 422 +---+ +---+ +---+ +---+ 423 |T11|... |T1x| |T21| .. |T2y| Access Layer 424 +---+ +---+ +---+ +---+ 425 | | | | 426 +-|-+ +-|-+ +-|-+ +-|-+ 427 | |... | | | | .. | | 428 +---+ +---+ +---+ +---+ Server racks 429 | |... | | | | .. | | 430 +---+ +---+ +---+ +---+ 431 | |... | | | | .. | | 432 +---+ +---+ +---+ +---+ 434 Figure 5: Data Center Physical topology 436 Gateway routers can mitigate the overwhelming number of virtual 437 network instances by integrating NVE function within the router(s). 438 That requires routers to map VNID to VRF directly if routers' 439 outbound to external network is VPN based. That requires routers to 440 support tens of thousands of VRF instances, which can be challenging 441 to routers. 443 Data center can also use multiple gateway routers, with each handling 444 a subset of tenants in data centers. That means that each tenant's 445 VMs are only reachable by their designated routers or router ports. 446 With the typical DC design shown in Figure 5, the number of server 447 racks reachable by each gateway router is limited by the number of 448 router ports enabled for the tenant virtual networks. That means the 449 range of locations where each tenant's VMs can be moved across are 450 limited. 452 When VMs in data center communicates with external peers, data frames 453 have to go through gateway. Even though majority of data centers have 454 much more east west traffic volume than north south traffic volume, 455 majority (as high as 90%) of applications (hosted on servers or VMs) 456 in a data center still communicate with external peers. Just the 457 volume of north south traffic is much less in many data centers. 459 5. Summary and Recommendations 461 Overlay network can hide individual VMs addresses, making 462 switches/routers in the core scalable. However overlay introduces 463 other challenges, especially when VMs move across wide range of NVEs. 464 This draft is to identify those issues introduced by mobility in 465 overlay environment, to ensure that they will be addressed by future 466 solutions. 468 6. Manageability Considerations 470 7. Security Considerations 472 Security will be addressed in a separate document. 474 8. IANA Considerations 476 None. 478 9. Acknowledgments 480 We want to acknowledge the following people for their valuable 481 comments to this draft: David Black, Ben MackCrane, Peter 482 AshwoodSmith, Lucy Yong and Young Lee. 484 This document was prepared using 2-Word-v2.0.template.dot. 486 10. References 488 [NVo3-Problem] Narten, et al, "Problem Statement: Overlays for 489 Network Virtualization." Draft-narten-nvo3-overlay-problem- 490 statement-02, June 2012. 492 [NVo3-framework] Lasserre, et al, "Framework for DC Network 493 Virtualization". Draft-lasserre-nvo3-framework-02, June 494 2012 496 [IEEE802.1Qbg] "MAC Bridges and Virtual Bridged Local Area Networks - 497 Edge Virtual Switch". IEEE802.1Qbg/D2.2, Feb, 2012. Work 498 in progress 500 [ARMD-Problem] Narten,et al "draft-ietf-armd-problem-statement" in 501 progress, Oct 2011. 503 [ARMD-Multicast] McBride, Lui, "draft-mcbride-armd-mcast-overview- 504 01", in progress, March 10, 2012 506 [Gratuitous ARP] S. Cheshire, "IPv4 Address Conflict Detection", RFC 507 5227, July 2008. 509 Authors' Addresses 511 Linda Dunbar 512 Huawei Technologies 513 5340 Legacy Drive, Suite 175 514 Plano, TX 75024, USA 515 Phone: (469) 277 5840 516 Email: ldunbar@huawei.com 518 Intellectual Property Statement 520 The IETF Trust takes no position regarding the validity or scope of 521 any Intellectual Property Rights or other rights that might be 522 claimed to pertain to the implementation or use of the technology 523 described in any IETF Document or the extent to which any license 524 under such rights might or might not be available; nor does it 525 represent that it has made any independent effort to identify any 526 such rights. 528 Copies of Intellectual Property disclosures made to the IETF 529 Secretariat and any assurances of licenses to be made available, or 530 the result of an attempt made to obtain a general license or 531 permission for the use of such proprietary rights by implementers or 532 users of this specification can be obtained from the IETF on-line IPR 533 repository at http://www.ietf.org/ipr 535 The IETF invites any interested party to bring to its attention any 536 copyrights, patents or patent applications, or other proprietary 537 rights that may cover technology that may be required to implement 538 any standard or specification contained in an IETF Document. Please 539 address the information to the IETF at ietf-ipr@ietf.org. 541 Disclaimer of Validity 543 All IETF Documents and the information contained therein are provided 544 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 545 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 546 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 547 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 548 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 549 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 550 FOR A PARTICULAR PURPOSE. 552 Acknowledgment 554 Funding for the RFC Editor function is currently provided by the 555 Internet Society.