idnits 2.17.1 draft-kompella-nvo3-server2nve-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 22, 2012) is 4204 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'VDP' == Outdated reference: A later version (-11) exists of draft-ietf-l2vpn-evpn-01 == Outdated reference: A later version (-04) exists of draft-kreeger-nvo3-overlay-cp-01 == Outdated reference: A later version (-09) exists of draft-mahalingam-dutt-dcops-vxlan-02 == Outdated reference: A later version (-08) exists of draft-sridharan-virtualization-nvgre-01 Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group K. Kompella 3 Internet-Draft Y. Rekhter 4 Intended status: Standards Track Juniper Networks 5 Expires: April 25, 2013 T. Morin 6 France Telecom - Orange Labs 7 D. Black 8 EMC Corporation 9 October 22, 2012 11 Signaling Virtual Machine Activity to the Network Virtualization Edge 12 draft-kompella-nvo3-server2nve-01 14 Abstract 16 This document proposes a simplified approach for provisioning the 17 networking parameters related to Virtual Machine creation, migration 18 and termination on servers. The idea is to provision the server, 19 then have the server signal the requisite parameters to the relevant 20 network device(s). Such an approach reduces the workload on the 21 provisioning system and simplifies the data model that the 22 provisioning system needs to maintain. Furthermore, it is more 23 resilient to topology changes in server-network connectivity, for 24 example, reconnecting a server to a different network port or switch. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on April 25, 2013. 43 Copyright Notice 45 Copyright (c) 2012 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. VM Creation . . . . . . . . . . . . . . . . . . . . . . . 3 62 1.2. VM Live Migration . . . . . . . . . . . . . . . . . . . . 4 63 1.3. VM Termination . . . . . . . . . . . . . . . . . . . . . . 6 64 2. Conventions and Acronyms Used . . . . . . . . . . . . . . . . 6 65 3. Virtual Networks . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.1. Current Mode of Operation . . . . . . . . . . . . . . . . 8 67 3.2. Future Mode of Operation . . . . . . . . . . . . . . . . . 8 68 4. Provisioning DCVPNs . . . . . . . . . . . . . . . . . . . . . 9 69 5. Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . 10 70 5.1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . 10 71 5.2. VM Operations . . . . . . . . . . . . . . . . . . . . . . 10 72 5.2.1. Network Parameters . . . . . . . . . . . . . . . . . . 10 73 5.2.2. Creating a VM . . . . . . . . . . . . . . . . . . . . 12 74 5.2.3. Terminating a VM . . . . . . . . . . . . . . . . . . . 14 75 5.2.4. Migrating a VM . . . . . . . . . . . . . . . . . . . . 14 76 5.3. Signaling Protocols . . . . . . . . . . . . . . . . . . . 16 77 5.4. Liveness . . . . . . . . . . . . . . . . . . . . . . . . . 16 78 6. Interfacing with DCVPN Control Planes . . . . . . . . . . . . 16 79 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 80 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 81 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 16 82 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 83 10.1. Normative References . . . . . . . . . . . . . . . . . . . 16 84 10.2. Informative References . . . . . . . . . . . . . . . . . . 17 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 87 1. Introduction 89 To create a Virtual Machine (VM) on a server in a data center, one 90 must specify parameters for the CPU, storage, network and appliance 91 aspects of the VM. At a minimum, this requires provisioning the 92 server that will host the VM, and the Network Virtualization Edge 93 (NVE) that will implement the virtual network for the VM. Similar 94 considerations apply to live migration and terminating VMs. This 95 document proposes mechanisms whereby a server can be provisioned with 96 all of the paramters for the VM, and the server in turn signals the 97 networking aspects to the NVE. The NVE may be located on the server 98 or in an external network switch that may be directly connected to 99 the server or accessed via an L2 (Ethernet) LAN or VLAN. The 100 following subsections capture the abstract sequence of steps for VM 101 creation, live migration and deletion. 103 1.1. VM Creation 105 This subsection describes an abstract sequence of steps involved in 106 creating a VM and make it operational. The following steps are 107 intended as an illustrative example, not as prescriptive text; the 108 goal is to capture sufficient detail to set a context for the 109 signaling described in Section 5. 111 Creating a VM requires: 113 1. gathering the CPU, network, storage, and appliance parameters 114 required for the VM; 116 2. deciding which server, network, storage and appliance devices 117 best match the VM requirements in the current state of the data 118 center; 120 3. provisioning the server with the VM parameters; 122 4. provisioning the network element(s) to which the server is 123 connected with the network-related parameters of the VM; 125 5. informing the network element(s) to which the server is connected 126 about the VM's peer VMs, storage devices and other appliances 127 with which the VM needs to communicate; 129 6. informing the network element(s) to which a VM's peer VMs are 130 connected about the new VM and its addresses; 132 7. provisioning storage with the storage-related parameters; and 133 8. provisioning necessary appliances (firewalls, load balancers and 134 "middle boxes"). 136 While shown as a numbered sequence above, some of these steps may be 137 concurrent (e.g., server, storage and network provisioning for the 138 new VM may be done concurrently). 140 Steps 1 and 2 are primarily information gathering. For Steps 3 to 8, 141 the provisioning system talks actively to servers, network switches, 142 storage and appliances, and must know the details of the physical 143 server, network, storage and appliance connectivity topologies. Step 144 4 is typically done using just provisioning, whereas Steps 5 and 6 145 may be a combination of provisioning and other techniques. Steps 4 146 to 6 accomplish the task of provisioning the network for a VM, the 147 result of which is a Data Center Virtual Private Network (DCVPN) 148 overlaid on the physical network. 150 This document focuses on the case where the network elements in Step 151 4 are not co-resident with the server, and shows how the provisioning 152 in Step 4 can be replaced by signaling between server and network, 153 using information from Step 3. This document also shows how Step 4 154 can interact seamlessly with some of the realizations of Steps 5 and 155 6. 157 1.2. VM Live Migration 159 This subsection describes an abstract sequence of steps involved in 160 live migration of a VM. Live migration is sometimes referred to as 161 "hot" migration, in that from an external viewpoint, the VM appears 162 to continue to run while being migrated to another server (e.g., TCP 163 connections generally survive this class of migration). In contrast, 164 suspend/resume (or "cold") migration consistes of suspending VM 165 execution on one server and resuming it on another. The following 166 live migration steps are intended as an illustrative example, not as 167 prescriptive text; the goal is to capture sufficient detail to set a 168 context for the signaling described in Section 5. 170 For simplicity, this set of abstract steps assumes shared storage, so 171 that the VM's storage is accessible to the source and destination 172 servers. Live migration of a VM requires: 174 1. deciding which server should be the destination of the migration 175 based on the VM's requirements, data center state and reason for 176 the migration; 178 2. provisioning the destination server with the VM parameters and 179 creating a VM to receive the live migration; 181 3. provisioning the network element(s) to which the destination 182 server is connected with the network-related parameters of the 183 VM; 185 4. transferring the VM's memory image between the source and 186 destination servers; 188 5. actually moving the VM: pausing the VM's execution on the source 189 server, transferring the VM's execution state and any remaining 190 memory state to the destination server and continuing the VM's 191 execution on the destination server; 193 6. informing the network element(s) to which the destination server 194 is connected about the VM's peer VMs, storage devices and other 195 appliances with which the VM needs to communicate; 197 7. informing the network element(s) to which a VM's peer VMs are 198 connected about the VM's new location; 200 8. activating the VM's network parameters at the destination 201 server; 203 9. deactivating the VM's network parameters at the source server; 205 10. deprovisioning the VM from the network element(s) to which the 206 source server is connected; and 208 11. deleting the VM at the source server. 210 While shown as a numbered sequence above, some of these steps may be 211 concurrent (e.g., moving the VM and associated network changes). 213 Step 1 is primarily information gathering. For Steps 2, 3, 10 and 214 11, the provisioning system talks actively to servers, network 215 switches and appliances, and must know the details of the physical 216 server, network and appliance connectivity topologies. Steps 4 and 5 217 are usually handled directly by the servers involved. Steps 6 to 9 218 may be handled by the servers (e.g., a gratuitous ARP or RARP from 219 the destination server may accomplish all four steps) or other 220 techniques. 222 This document focuses on the case where the network elements are not 223 co-resident with the server, and shows how the provisioning in Step 3 224 and the deprovisioning in Step 10 can be replaced by signaling 225 between server and network, using information from Step 3. This 226 document also shows how Step 4 can interact seamlessly with some of 227 the realizations of Steps 5 and 6. 229 1.3. VM Termination 231 This subsection describes an abstract sequence of steps involved in 232 termination of a VM, also referred to as "powering off" a VM. The 233 following termination steps are intended as an illustrative example, 234 not as prescriptive text; the goal is to capture sufficient detail to 235 set a context for the signaling described in Section 5. 237 Termination of a VM requires: 239 1. ensuring that the VM is no longer executing; 241 2. deactivating the VM's network parameters at the server; 243 3. deprovisioning the VM from the network element(s) to which the 244 server is connected; and 246 4. deleting the VM from the server (the VM's image may remain in 247 storage for reuse). 249 While shown as a numbered sequence above, some of these steps may be 250 concurrent (e.g., network deprovisioning and VM deletion). 252 Steps 1, 2 and 4 are handled by the server, based on instructions 253 from the provisioning system. For Step 3, the provisioning system 254 talks actively to servers, network switches, storage and appliances, 255 and must know the details of the physical server, network, storage 256 and appliance connectivity topologies. 258 This document focuses on the case where the network elements in Step 259 3 are not co-resident with the server, and shows how the 260 deprovisioning in Step 3 can be replaced by signaling between server 261 and network. 263 2. Conventions and Acronyms Used 265 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 266 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 267 document are to be interpreted as described in [RFC2119]. 269 The following acronyms are used: 271 DCVPN: Data Center Virtual Private Network -- a virtual 272 connectivity topology overlaid on physical devices to provide 273 virtual devices with the connectivity they need and isolation from 274 other DCVPNs 275 NVE: Network Virtualization Edge -- the entities that realize 276 private communication among VMs in a DCVPN 278 lNVE: local NVE: wrt a VM, NVE elements to which it is directly 279 connected 281 rNVE: remote NVE: wrt a VM, NVE elements to which the VM's peer 282 VMs are connected 284 NVGRE: Network Virtualization using Generic Routing Encapsulation 286 VDP: VSI Discovery and Configuration Protocol 288 VID: 12-bit VLAN tag or identifier used locally between a server 289 and its lNVE 291 VLAN: Virtual Local Area Network 293 VM: Virtual Machine (same as Virtual Station) 295 peer VM: wrt a VM, other VMs in the VM's DCVPN 297 VNID: DCVPN Identifier (sometimes called a Group Identifier) 299 VSI: Virtual Station Interface 301 VXLAN: Virtual eXtensible Local Area Network 303 3. Virtual Networks 305 The goal of provisioning networks for VMs is to create an "isolation 306 domain" wherein a group of VMs can talk freely to each other, but 307 communication to and from VMs outside that group is restricted 308 (either prohibited, or mediated via a router, a firewall or other 309 network gateway). Such an isolation domain, sometimes called a 310 Closed User Group, here will be called a Data Center Virtual Private 311 Network (DCVPN). The network elements on the outer border or edge of 312 the overlay portion of a Virtual Network are called Network 313 Virtualization Edges (NVEs). 315 A DCVPN is assigned a global "name" that identifies it in the 316 management plane; this name is unique in the scope of the data 317 center, but may be unique across several cooperating data centers. A 318 DCVPN is also assigned an identifier unique in the scope of the data 319 center, the Virtual Network Group ID (VNID). The VNID is a control 320 plane entity. A data plane tag is also needed to distinguish 321 different DCVPNs' traffic; more on this later. 323 For a given VM, the NVE can be classified into two parts: the network 324 elements to which the VM's server is directly connected (the local 325 NVE or l-NVE), and those to which peer VMs are connected (the remote 326 NVE or r-NVE). In some cases, the l-NVE is co-resident with the 327 server hosting the VM; in other cases, the l-NVE is separate 328 (distributed l-NVE). The latter case is the one of interest in this 329 document. 331 A created VM is added to a DCVPN through Steps 4 to 6 in section 332 Section 1.1 which can be recast as follows. In Step 4, the l-NVE(s) 333 are informed about the VM's VNID, network addresses and policies, and 334 the lNVE and server agree on how to distinguish traffic for different 335 DCVPNs from and to the server. In Step 5 the relevant r-NVE elements 336 and the addresses of their VMs are discovered. In Step 6, the 337 r-NVE(s) are informed of the presence of the new VM and obtain its 338 addresses. 340 Once a DCVPN is created, the next steps for network provisioning are 341 to create and apply policies such as for QoS or access control. 342 These occur in three flavors: policies for all VMs in the group, 343 policies for individual VMs, and policies for communication across 344 DCVPN boundaries. 346 3.1. Current Mode of Operation 348 DCVPNs are often realized as Ethernet VLAN segments. A VLAN segment 349 satisfies the communication properties of a DCVPN. A VLAN also has 350 data plane mechanisms for discovering network elements (Layer 2 351 switches, aka bridges) and VM addresses. When a DCVPN is realized as 352 a VLAN, Step 4 requires provisioning both the server and l-NVE with 353 the VLAN tag that identifies the DCVPN. Step 6 requires provisioning 354 all involved network elements with the same VLAN tag. Address 355 learning is done by flooding, and the announcement of a new VM is 356 typically by a "gratuitous ARP". 358 While VLANs are familiar and well-understood, they fail to scale on 359 several dimensions. Underlying VLANs is a Layer 2 infrastructure. 360 The number of independent VLANs in a Layer 2 domain is limited by the 361 size of the VLAN tag. Data plane techniques (flooding and broadcast) 362 are another source of serious concern as the overall size of the 363 network grows. 365 3.2. Future Mode of Operation 367 There are several scalable realizations of DCVPNs that address the 368 isolation requirements of DCVPNs as well as the need for a scalable 369 substrate for DCVPNs and the need for scalable mechanisms for NVE and 370 VM address discovery. While these are not the goal of this document, 371 a secondary goal of this document is to show how the signaling that 372 replaces Step 4 can seamlessly interact with several of these 373 realizations of DCVPNs. 375 VLAN tags (VIDs) will be used as the data plane tag to distinguish 376 traffic for different DCVPNs' between a server and its l-NVE. Note 377 that, as used here, VIDs only have local significance between server 378 and NVE, not to be confused with the notion of VLANs, which are a 379 data center-wide concept. Data plane tags between l-NVE and r-NVE 380 depends on the encapsulation mechanism among the NVE; the l-NVE is 381 expected to map between VIDs and intra-NVE tags in both directions. 383 4. Provisioning DCVPNs 385 For VM creation as described in section Section 1.1, Step 3 386 provisions the server; Steps 4 and 5 provision the l-NVE elements; 387 Step 6 provisions the r-NVE elements. 389 In some cases, the l-NVE elements live within the server; in this 390 case, Steps 3 and 4 are "single-touch" in that the provisioning 391 system only needs to talk to the server, and both CPU and network 392 parameters can be applied by the server. However, in other cases, 393 the l-NVE is separate from the server, requiring that the 394 provisioning system talk independently to both the server and lNVE. 395 This scenario, which we call "distributed local NVE", is the one 396 considered in this document. This document resurrects "single-touch" 397 provisioning in the distributed lNVE case. 399 The approach here is to provision the server, then have the server 400 signal the requisite parameters to the l-NVE. Such an approach 401 reduces the workload on the provisioning system, allowing it to scale 402 both in the number of elements it can manage, as well as the rate at 403 which it can process changes. It also simplifies the data model that 404 the provisioning system needs to have; in particular, the 405 provisioning system does not have to maintain a full, up-to-date map 406 of server to network connectivity. Furthermore, it is more resilient 407 to topology changes in server-network connectivity that have not yet 408 been transmitted to the provisioning system. For example, if a 409 server is reconnected to a different port or a different l-NVE to 410 recover from a malfunctioning port, the server can contact the new 411 l-NVE over the new port without the provisioning system being aware 412 of the change. 414 While the current document focuses on provisioning networking 415 parameters via signaling, future extensions may address the 416 provisioning of storage and middle-box parameters in a similar 417 fashion. Companion documents will describe how NVEs to which peer 418 VMs are connected can get the required networking information via 419 signaling rather than by provisioning and/or other means. 421 5. Signaling 423 5.1. Preliminaries 425 There are three common operations in a virtualized data center: 426 creating a VM; migrating a VM from one physical server to another; 427 and terminating a VM. Creating a VM requires "associating" it with 428 its DCVPN and "activating" that association; decommissioning a VM 429 requires "deactivating" the VM's association with the DCVPN and then 430 "dissociating" the VM from its DCVPN. Moving a VM consists of 431 associating it with its DCVPN in its new location, then dissociating 432 it from its old location. . The deactivation operation is often 433 implicit in another operation, but is called out here for symmetry 434 and completeness. 436 5.2. VM Operations 438 5.2.1. Network Parameters 440 For each VM association operation, a subset of the following 441 information is needed from server to l-NVE: 443 operation: one of pre-associate, associate, or dissociate. 445 authentication: proof that this operation was authorized by the 446 provisioning system 448 VNID: identifier of DCVPN to which VM belongs 450 VID: tag to use between server and lNVE to distinguish DCVPN 451 traffic; the value zero in an associate or pre-associate operation 452 is a request to the l-NVE to assign an unused VID. These 453 specifications are meant to provide extensibility by allowing the 454 VID to be a VLAN-id, but also any another means of locally 455 multiplexing traffic betwen the server and the nve. In the case 456 where the NVE is implemented on the server, the VID can be the a 457 local name of a virtual network interface. 459 table type: realization of DCVPN on NVE (see below). 461 address entries: addresses for VM on server 462 policy: VM-specific network policies, such as access control lists 463 and/or QoS policies 465 hold time: time (in milliseconds) to keep a VM's addresses after it 466 migrates away from this l-NVE. This is set to zero when a VM is 467 terminated. 469 per-address-VID-allocation: boolean flag which can optionally be set 470 to "yes", resulting in the VID allocated to the VM being distinct 471 from the VID allocated to other VMs connected to the same DCVPN on 472 a same NVE port; this behavior will result in traffic between to/ 473 from the VM to always transit through the NVE, even from/to VMs of 474 a same DCVPN 476 Activate and deactivate are dataplane operations that reference the 477 VID, and additionally provide authentication, table type and address 478 entries information. When an activate is realized via a "gratuitous 479 ARP" in the data plane, the VID is in the Ethernet header, and all of 480 the other parameters are obtained by mapping the VID and the port on 481 which the frame containing it was received to information established 482 by a prior associate operation. 484 Realizations of DCVPNs include, among others, E-VPNs 485 ([I-D.ietf-l2vpn-evpn]), IP VPNs ([RFC4364]), NVGRE 486 ([I-D.sridharan-virtualization-nvgre], TRILL ([RFC6325]), VPLS 487 ([RFC4761], [RFC4762]), and VXLAN 488 ([I-D.mahalingam-dutt-dcops-vxlan]). The table type implicitly 489 defines whether forwarding at the NVE for the DCVPN is at Layer 2 or 490 Layer 3 or both. 492 Typically, for the pre-associate and associate messages, all the 493 information except hold time would be needed. For the dissociate 494 message, all the above information except VID and table type would be 495 needed. 497 Operations are stateful, that is, they remain in place until 498 superceded by another operation. For example, on receiving an 499 associate message, an NVE is expected to create and maintain the 500 DCVPN table for a VM until the NVE receives a dissociate message to 501 remove the table. A separate liveness protocol may be run between 502 server and NVE to let each side know that the other is still 503 operational; if the liveness protocol fails, each side may remove all 504 state installed in response to messages from the other. 506 In the descriptions below, we assume that the NVE layer provides a 507 mechanism for control plane distribution of VM addresses, as opposed 508 to doing this in the data plane. If this is not the case, NVE 509 elements can skip the parts of the procedures below that involve 510 address distribution. 512 As VIDs are local to server-NVE communication, in fact to a specific 513 port connecting these two elements, a mapping table containg 4-tuples 514 of the following form will prove useful to the NVE: 515 517 The procedures below assume that the NVE systematically reorders the 518 provided VM address entries before inserting or looking up entries in 519 this mamping table. 521 Note that valid values of VID are from 1 to 4094, inclusive. A value 522 of 0 is used to mean "unassigned". When a VID can be shared by more 523 than one VM, it is necessary to reference-count entries in this 524 table. Entries in this table have multiple uses: 526 o Find the VNID for a VID and port for association, activation and 527 traffic forwarding; 529 o Determine whether a VID exists (has already been assigned) for a 530 VNID and port. 532 o Determine which pairs to use for forwarding VNID 533 traffic that requires flooding. 535 5.2.2. Creating a VM 537 When a VM is instantiated on a server, it is assigned a VNID, VM 538 addresses and a table type for the DCVPN. The VM addresses may be 539 any of IPv4, IPv6 and MAC addresses. There may also be network 540 policies specific to the VM. To connect the VM to its DCVPN, the 541 server signals these parameters to the l-NVE via an "associate" 542 operation followed by an "activate" operation to put the parameters 543 into use. (Note that the l-NVE may consist of more than one device.) 545 On receiving an associate message on port P from server S, an NVE 546 device does the following: 548 A.1: Validate the authentication (if present). If not, inform the 549 provisioning system, log the error, and stop processing the 550 associate message. This validation may include authorization 551 checks. 553 A.2: Check the per-address-VID-allocation flag is the associate 554 message: 556 * if this flag is not set: 558 + Check if the VID in the associate message is zero (i.e., 559 an allocation request); if so, look up the VID for ; if there is none, allocate a new 561 VID 563 + If the VID in the associate message is non-zero, look up 564 for an already allocated 565 VID. If the looup is successful, associate the resulting 566 VID with . If the result is 567 zero, associate the VID with . Otherwise, the provided VID does not match the 569 one in use for , so respond to S with an error, 570 and stop processing the associate message. 572 * if this flag is set, check if the VID in the associate 573 message is zero : 575 + if so (this is an allocation request), allocate a new 576 VID, distinct from other VIDs allocated on this port; 578 + if the VID is non-zero, check that the provided VID is 579 distinct from other VIDs allocated on this port; if so, 580 associate the VID with . If 581 not, the provided VID does not match the per-address-VID- 582 constraint, so respond to S with an error, and stop 583 processing the associate message. 585 A.3: Add the -> VNID mapping to the 586 mapping table 588 A.4: If a table of appropriate type (as signaled) for VNID does not 589 already exist, create it, and add the VM's addresses to it. 591 A.5: Commmunicate with the control plane to advertise the VM's 592 addresses, and also to get the addresses of other VMs in the 593 DCVPN. Populate the table with the VM's addresses and any 594 addresses learned from the control plane (some control planes 595 may not provide all or even any of the other addresses in the 596 DCVPN at this point). 598 A.6: Finally, respond to S with the VID for , and also saying that the operation was successful. 601 After a successful associate, the network has been provisioned (at 602 least in the local NVE) for the VM's traffic, but forwarding has not 603 been enabled. On receiving an activate message on port P from server 604 S, an NVE device does the following (activate is a one-way message 605 that does not have a response): 607 B.1: Validate the authentication (if present). If not, inform the 608 provisioning system, log the error, and stop processing the 609 associate message. This validation may include authorization 610 checks. 612 B.2: Check if the VID in the activate message is zero. If so, log 613 the error, and stop processing the activate message. 615 B.3: Use the VID and port P to look up the VNID from a previous 616 associate message. If there is no VNID, log the error and stop 617 processing the activate message. 619 B.4: If forwarding is not enabled for 620 activate it, mapping VID -> VNID. 622 B.5: If the activate message is a dataplane frame that requires 623 forwarding beyond the NVE, (e.g., a "gratuitous ARP"), use the 624 activated forwarding to send the dataplane frame via the 625 virtual network identified by the VNID. 627 5.2.3. Terminating a VM 629 On receiving a request from the provisioning system to terminate a 630 VM, the server sends a dissociate message to the l-NVE with the hold 631 time set to zero. The dissociate message contains the operation, 632 authentication, VNID, table type, and VM addresses. On receiving the 633 dissociate message on port P from server S, each NVE device L does 634 the following: 636 D.1: Validate the authentication (if present). If not, inform the 637 provisioning system, log the error, and stop processing the 638 associate message. 640 D.2: Delete the VM's addresses from the mapping table and delete any 641 VM-specific network policies associated with any of the VM 642 addresses. If the VNID table is empty after deleting the VM's 643 addresses, optionally delete the table and any network policies 644 for the VNID. 646 D.3: Respond to S saying that the operation was successful. 648 5.2.4. Migrating a VM 650 NOTE: This sub section has not been updated from the -00 version of 651 this draft; it will be updated in the forthcoming -02 version. The 652 set of VM migration steps are known to be incomplete, material on 653 concurrent actions and race conditions (based on list discussion) 654 should be added and new step PA.5 is anticipated to need 655 generalization to encompass control planes that may not push all 656 addressing changes to all relevant rNVEs. Please ignore the text in 657 this subsection and beyond - this document is a draft and the authors 658 are working on it. 660 Let's say that a VM is to be migrated from server S (connected to 661 lNVE device L) to server S' (connected to lNVE device L'). The 662 sequence of steps for migration is: 664 M.1: S' gets a request to prepare to receive a copy of the VM from 665 S. 667 M.2: S gets a request to copy the VM to S'. 669 M.3: S then gets a request to terminate the VM on S. 671 M.4: Finally, S' gets a request to start up the VM on S'. 673 At Step M.1, S' initiates the move, and also sends a pre-associate 674 message to L', including the pre-associate information. The 675 processing of a pre-associate message (PA.1 to PA.7) for L' is the 676 same as that of an associate message (A.1 to A.6), with the following 677 change to step 5. 679 PA.5: Commmunicate with each rNVE device to advertise the VM's 680 addresses but as non-preferred destinations(*). Also get the 681 addresses of other VMs in the DCVPN. Populate the table with the 682 VM's addresses and addresses learned from each rNVE. 684 (*) See Section 6 for some mechanisms for doing this. This is 685 necessary so that L' does not attract traffic to the VM's new 686 location before the migration is complete, yet L knows ahead of time 687 how to send traffic to L' (Step D.2), minimizing traffic loss to the 688 VM when migration is complete. 690 At step M.2, S initiates the VM copy. If at any time L hears 691 advertisements from L' about how to communicate with the VM in its 692 new location (as unpreferred destinations), L stores that information 693 for use in step D.2. 695 At step M.3, S terminates the running of the VM on itself, and sends 696 a dissociate message to L with a non-zero hold time (either what the 697 provisioning system sends, or a default value). L processes the 698 dissociate message as above. 700 5.3. Signaling Protocols 702 There are several options for protocols to use to signal the above 703 messages. One could invent a new protocol for this purpose. One 704 could reuse existing protocols, among them LLDP, XMPP, HTTP REST, and 705 VDP [VDP], a new protocol standardized for the purposes of signaling 706 a VM's network parameters from server to lNVE. Several factors 707 influence the choice of protocol(s); at this time, the focus is on 708 what needs to be signaled, leaving for later the choice of how the 709 information is signaled, and specific encodings. 711 5.4. Liveness 713 Procedures to handle failures of the server or of the NVE will be 714 covered in a further revision. 716 6. Interfacing with DCVPN Control Planes 718 The control plane for a DCVPN manages the creation/deletion, 719 membership and span of the DCVPN 720 ([I-D.narten-nvo3-overlay-problem-statement], 721 [I-D.kreeger-nvo3-overlay-cp]). Such a control plane needs to work 722 with the server-to-nve signaling in a coordinated manner, to ensure 723 that address changes at a local NVE are reflected appropriately in 724 remote NVEs. The details of such coordination will be specified in a 725 companion document. 727 7. Security Considerations 729 8. IANA Considerations 731 9. Acknowledgments 733 Many thanks to Amit Shukla for his help with the details of EVB and 734 his insight into data center issues. 736 10. References 738 10.1. Normative References 740 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 741 Requirement Levels", BCP 14, RFC 2119, March 1997. 743 [VDP] IEEE 802.1 Working Group, "Edge Virtual Bridging 744 (802.1Qbg) (work in progress)", 2012. 746 10.2. Informative References 748 [I-D.ietf-l2vpn-evpn] 749 Sajassi, A., Aggarwal, R., Henderickx, W., Balus, F., 750 Isaac, A., and J. Uttaro, "BGP MPLS Based Ethernet VPN", 751 draft-ietf-l2vpn-evpn-01 (work in progress), July 2012. 753 [I-D.kreeger-nvo3-overlay-cp] 754 Kreeger, L., Dutt, D., Narten, T., Black, D., and M. 755 Sridhavan, "Network Virtualization Overlay Control 756 Protocol Requirements", draft-kreeger-nvo3-overlay-cp-01 757 (work in progress), July 2012. 759 [I-D.mahalingam-dutt-dcops-vxlan] 760 Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 761 L., Sridhar, T., Bursell, M., and C. Wright, "VXLAN: A 762 Framework for Overlaying Virtualized Layer 2 Networks over 763 Layer 3 Networks", draft-mahalingam-dutt-dcops-vxlan-02 764 (work in progress), August 2012. 766 [I-D.narten-nvo3-overlay-problem-statement] 767 Narten, T., Black, D., Dutt, D., Fang, L., Gray, E., 768 Kreeger, L., Napierala, M., and M. Sridhavan, "Problem 769 Statement: Overlays for Network Virtualization", 770 draft-narten-nvo3-overlay-problem-statement-04 (work in 771 progress), August 2012. 773 [I-D.sridharan-virtualization-nvgre] 774 Sridhavan, M., Greenberg, A., Venkataramaiah, N., Wang, 775 Y., Duda, K., Ganga, I., Lin, G., Pearson, M., Thaler, P., 776 and C. Tumuluri, "NVGRE: Network Virtualization using 777 Generic Routing Encapsulation", 778 draft-sridharan-virtualization-nvgre-01 (work in 779 progress), July 2012. 781 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 782 Networks (VPNs)", RFC 4364, February 2006. 784 [RFC4761] Kompella, K. and Y. Rekhter, "Virtual Private LAN Service 785 (VPLS) Using BGP for Auto-Discovery and Signaling", 786 RFC 4761, January 2007. 788 [RFC4762] Lasserre, M. and V. Kompella, "Virtual Private LAN Service 789 (VPLS) Using Label Distribution Protocol (LDP) Signaling", 790 RFC 4762, January 2007. 792 [RFC6325] Perlman, R., Eastlake, D., Dutt, D., Gai, S., and A. 793 Ghanwani, "Routing Bridges (RBridges): Base Protocol 794 Specification", RFC 6325, July 2011. 796 Authors' Addresses 798 Kireeti Kompella 799 Juniper Networks 800 1194 N. Mathilda Ave. 801 Sunnyvale, CA 94089 802 US 804 Email: kireeti@juniper.net 806 Yakov Rekhter 807 Juniper Networks 808 1194 N. Mathilda Ave. 809 Sunnyvale, CA 94089 810 US 812 Email: yakov@juniper.net 814 Thomas Morin 815 France Telecom - Orange Labs 816 2, avenue Pierre Marzin 817 Lannion 22307 818 France 820 Email: thomas.morin@orange.com 822 David L. Black 823 EMC Corporation 824 176 South St. 825 Hopkinton, MA 01748 827 Email: david.black@emc.com