idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: External NVE connecting to destination hypervisor 2 has to associate the migrating VM with it by saving VM's MAC and/or IP addresses, its VN, locally significant VID if any, and provisioning other network related parameters of VM. The NVE may be informed about the VM's peer VMs, storage devices and other network appliances with which the VM needs to communicate or is communicating. VM on destination hypervisor 2 SHOULD not go to running state before all the network provisioning and binding has been done. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: VM on source hypervisor and destination hypervisor SHOULD not be in running state at the same time during migration. VM on source hypervisor goes into shutdown state only when VM on destination hypervisor has successfully been entering the running state. It is possible that VM on the source hypervisor stays in migrating state for a while after VM on the destination hypervisor is in running state. -- The document date (July 1, 2014) is 3586 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-nvo3-framework' is defined on line 679, but no explicit reference was found in the text == Unused Reference: '8021Q' is defined on line 702, but no explicit reference was found in the text == Outdated reference: A later version (-09) exists of draft-ietf-nvo3-framework-05 == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-01 == Outdated reference: A later version (-04) exists of draft-ietf-opsawg-vmm-mib-00 Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Yizhou Li 3 INTERNET-DRAFT Lucy Yong 4 Intended Status: Informational Huawei Technologies 5 Lawrence Kreeger 6 Cisco 7 Thomas Narten 8 IBM 9 David Black 10 EMC 11 Expires: January 2, 2015 July 1, 2014 13 Hypervisor to NVE Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-00 16 Abstract 18 This document describes the control plane protocol requirements when 19 NVE is not co-located with the hypervisor on a server. A control 20 plane protocol (or protocols) between a hypervisor and its associated 21 external NVE(s) is used for the hypervisor to populate its virtual 22 machines states to the NVE(s) for further handling. This document 23 illustrates the functionalities required by such control plane 24 signaling protocols and outlines the high level requirements to be 25 fulfiled. Virtual machine states and state transitioning are 26 summarized to help clarifying the needed requirements. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as 36 Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/1id-abstracts.html 45 The list of Internet-Draft Shadow Directories can be accessed at 46 http://www.ietf.org/shadow.html 48 Copyright and License Notice 50 Copyright (c) 2013 IETF Trust and the persons identified as the 51 document authors. All rights reserved. 53 This document is subject to BCP 78 and the IETF Trust's Legal 54 Provisions Relating to IETF Documents 55 (http://trustee.ietf.org/license-info) in effect on the date of 56 publication of this document. Please review these documents 57 carefully, as they describe your rights and restrictions with respect 58 to this document. Code Components extracted from this document must 59 include Simplified BSD License text as described in Section 4.e of 60 the Trust Legal Provisions and are provided without warranty as 61 described in the Simplified BSD License. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 4 68 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 2.1 VM Creation . . . . . . . . . . . . . . . . . . . . . . . . 6 70 2.2 VM Live Migration . . . . . . . . . . . . . . . . . . . . . 7 71 2.3 VM termination . . . . . . . . . . . . . . . . . . . . . . . 7 72 2.4 VM Pause, suspension and resumption . . . . . . . . . . . . 8 73 3. Hypervisor-to-NVE Signaling protocol functionality . . . . . . 8 74 3.1 VN connect and disconnect . . . . . . . . . . . . . . . . . 8 75 3.2 TSI associate and activate . . . . . . . . . . . . . . . . . 10 76 3.3 TSI disassociate, deactivate and clear . . . . . . . . . . . 13 77 4. Hypervisor-to-NVE Signaling Protocol requirements . . . . . . . 13 78 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 14 79 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 15 80 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 81 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 8.1 Normative References . . . . . . . . . . . . . . . . . . . 15 83 8.2 Informative References . . . . . . . . . . . . . . . . . . 15 84 Appendix A. IEEE 802.1Qbg VDP Illustration (For information 85 only) . . . . . . . . . . . . . . . . . . . . . . . . . . 16 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 88 1. Introduction 90 This document describes the control plane protocol requirements when 91 NVE is not co-located with the hypervisor on a server. A control 92 plane protocol (or protocols) between a hypervisor and its associated 93 external NVE(s) is used for the hypervisor to populate its virtual 94 machines states to the NVE(s) for further handling. This protocol is 95 mentioned in NVO3 problem statement [I-D.ietf-nvo3-overlay-problem- 96 statement] as the third work item. When TS and NVE are on the 97 separate devices, we also call it split TS-NVE architecture and it is 98 the primary interest in this document. 100 Virtual machine states and state transitioning are summarized in this 101 document to illustrates the functionalities required by the control 102 plane signaling protocols between hypervisor and the external NVE. 103 Then the high level requirements to be fulfiled are outlined. 105 This document uses the term "hypervisor" throughout when describing 106 the scenario where NVE functionality is implemented on a separate 107 device from the "hypervisor" that contains a VM connected to a VN. 108 In this context, the term "hypervisor" is meant to cover any device 109 type where the NVE functionality is offloaded in this fashion, e.g., 110 a Network Service Appliance. 112 This document often uses the term "VM" and "Tenant System" (TS) 113 interchangeably, even though a VM is just one type of Tenant System 114 that may connect to a VN. For example, a service instance within a 115 Network Service Appliance may be another type of TS. When this 116 document uses the term VM, it will in most cases apply to other types 117 of TSs. 119 1.1 Terminology 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 123 document are to be interpreted as described in RFC 2119 [RFC2119]. 125 This document uses the same terminology as found in [I-D.ietf-nvo3- 126 framework] and [I-D.ietf-nvo3-nve-nva-cp-req]. This section defines 127 additional terminology used by this document. 129 VN Profile: Meta data associated with a VN that is used by an NVE 130 when ingressing/egressing packets to/from a specific VN. Meta data 131 could include such information as ACLs, QoS settings, etc. The VN 132 Profile contains parameters that apply to the VN as a whole. Control 133 protocols could use the VN ID or VN Name to obtain the VN Profile. 135 VSI: Virtual Station Interface. [IEEE 802.1Qbg] 136 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg] 138 1.2 Target Scenarios 140 In split TS-NVE architecture, an external NVE can provide an offload 141 of the encapsulation / decapsulation function, network policy 142 enforcement, as well as the VN Overlay protocol overheads. This 143 offloading may provide performance improvements and/or resource 144 savings to the End Device (e.g. hypervisor) making use of the 145 external NVE. 147 The following figures give example scenarios where the Tenant System 148 and NVE are on different devices in split TS-NVE architecture. 150 Hypervisor Access Switch 151 +------------------+ +-----+-------+ 152 | +--+ +-------+ | | | | 153 | |VM|---| | | VLAN | | | 154 | +--+ |Virtual|---------+ NVE | +--- Underlying 155 | +--+ |Switch | | Trunk | | | Network 156 | |VM|---| | | | | | 157 | +--+ +-------+ | | | | 158 +------------------+ +-----+-------+ 159 Figure 1 Hypervisor with an External NVE 161 Hypervisor L2 Switch NVE 162 +------------------+ +-----+ +-----+ 163 | +--+ +-------+ | | | | | 164 | |VM|---| | | VLAN | | VLAN | | 165 | +--+ |Virtual|---------+ +-------+ +--- Underlying 166 | +--+ |Switch | | Trunk | | Trunk | | Network 167 | |VM|---| | | | | | | 168 | +--+ +-------+ | | | | | 169 +------------------+ +-----+ +-----+ 170 Figure 2 Hypervisor with an External NVE 171 across an Ethernet Access Switch 173 Network Service Appliance Access Switch 174 +--------------------------+ +-----+-------+ 175 | +------------+ |\ | | | | 176 | |Net Service |----| \ | | | | 177 | |Instance | | \ | VLAN | | | 178 | +------------+ | |---------+ NVE | +--- Underlying 179 | +------------+ | | | Trunk| | | Network 180 | |Net Service |----| / | | | | 181 | |Instance | | / | | | | 182 | +------------+ |/ | | | | 183 +--------------------------+ +-----+-------+ 184 Figure 3 Physical Network Service Appliance with an External NVE 186 We use the term hypervisor in this document to refer to the container 187 that can run the control plane protocol on the device. Thus 188 Hypervisor has more generic meaning which also covers the network 189 service appliance device in figure 3. 191 Tenant Systems connect to NVEs via a Tenant System Interface (TSI). 192 The TSI logically connects to the NVE via a Virtual Access Point 193 (VAP) [I-D.ietf-nvo3-arch]. NVE may provide Layer 2 or Layer 3 194 forwarding. In split TS-NVE architecture, external NVE may be able to 195 reach multiple MAC and IP addresses via a TSI. For example, Tenant 196 Systems that are providing network services (such as firewall, load 197 balancer, VPN gateway) are likely to have complex address hierarchy. 198 It implies if a given TSI disassociates from one VN, all the MAC and 199 IP addresses are also disassociated. There is no need to signal the 200 deletion of every MAC or IP when the TSI is brought down or deleted. 201 In the majority of cases, a VM will be acting as a simple host that 202 will have a single TSI and single MAC and IP visible to the external 203 NVE. 205 1.3 Motivations and Purpose 207 The problem statement [I-D.ietf-nvo3-overlay-problem-statement], 208 discusses the needs for a control plane protocol (or protocols) to 209 populate each NVE with the state needed to perform its functions. 211 In one common scenario, an NVE provides overlay 212 encapsulation/decapsulation packet forwarding services to Tenant 213 Systems (TSs) that are co-resident with the NVE on the same End 214 Device (e.g. when the NVE is embedded within a hypervisor or a 215 Network Service Appliance). In such cases, there is no need for a 216 standardized protocol between the hypervisor and NVE, as the 217 interaction is implemented via software on a single device. While in 218 the split TS-NVE architecture scenarios, as shown in figure 1, some 219 control plane signaling protocol needs to run between hypervisor and 220 external NVE to pass the relevant state information. Such interaction 221 is mandatory. This document will identify the requirements for such 222 signaling protocol. 224 Section 2 describes VM states and state transitioning in its 225 lifecycle. Section 3 introduces Hypervisor-to-NVE signaling protocol 226 functionality derived from VM operations and network events. Section 227 4 outlines the requirements of the control plane protocol to achieve 228 the required functionality. 230 2. VM Lifecycle 232 [I-D.ietf-opsawg-vmm-mib] shows the state transition of a VM in its 233 figure 2. Some of the VM states are of the interest to the external 234 NVE. This section illustrates the relevant phases or event in VM 235 lifecycle. It should be noted that the following subsections do not 236 give an exhaustive traversal of VM lifecycle state. They are intended 237 as the illustrative examples which are relevant to split TS-NVE 238 architecture, not as prescriptive text; the goal is to capture 239 sufficient detail to set a context for the signaling protocol 240 functionality and requirements described in the following sections. 242 2.1 VM Creation 244 VM creation runs through the states in the order of preparing, 245 shutdown and running [I-D.ietf-opsawg-vmm-mib]. The end device 246 allocates and initializes local virtual resources like storage in the 247 VM preparing state. In shutdown state, VM has everything ready except 248 that CPU execution is not scheduled by the hypervisor and VM's memory 249 is not resident in the hypervisor. From the shutdown state to running 250 state, normally it requires the human execution or system triggered 251 event. Running state indicates the VM is in the normal execution 252 state. Frame can be sent and received correctly. No ongoing 253 migration, suspension or shutdown is in process. 255 In VM creation phase, tenant system has to be associated with the 256 external NVE. Association here indicates that hypervisor and the 257 external NVE have signaled each other and reached some agreement. 258 Relevant parameters or information have been provisioned properly. 259 External NVE should be informed with VM's MAC address and/or IP 260 address. Another example is that hypervisor may use a locally 261 significant VLAN ID to indicate the traffic destined to a specified 262 VN. Both hypervisor and NVE sides should agree on that VID value for 263 later traffic identification and forwarding. 265 External NVE needs to do some preparation work before it signals 266 successful association with tenant system. Such preparation work may 267 include locally saving the states and binding information of the 268 tenant system and its VN, communicating with peer NVEs and/or NVA for 269 network provisioning, etc. 271 Tenant System association should be performed before VM enters 272 running state, preferably in shutdown state. If association with 273 external NVE fails, VM should not go into running state. 275 2.2 VM Live Migration 277 Live migration is sometimes referred to as "hot" migration, in that 278 from an external viewpoint, the VM appears to continue to run while 279 being migrated to another server (e.g., TCP connections generally 280 survive this class of migration). In contrast, suspend/resume (or 281 "cold") migration consists of suspending VM execution on one server 282 and resuming it on another. For simplicity, the following abstract 283 summary about live migration assumes shared storage, so that the VM's 284 storage is accessible to the source and destination servers. Assume 285 VM migrates from hypervisor 1 to hypervisor 2. VM live migration 286 involves the state transition on both hypervisors, source hypervisor 287 1 and destination hypervisor 2. VM state on source hypervisor 1 288 transits from running to migrating and then to shutdown [I-D.ietf- 289 opsawg-vmm-mib]. VM state on destination hypervisor 2 transits from 290 shutdown to migrating and then running. 292 External NVE connecting to destination hypervisor 2 has to associate 293 the migrating VM with it by saving VM's MAC and/or IP addresses, its 294 VN, locally significant VID if any, and provisioning other network 295 related parameters of VM. The NVE may be informed about the VM's peer 296 VMs, storage devices and other network appliances with which the VM 297 needs to communicate or is communicating. VM on destination 298 hypervisor 2 SHOULD not go to running state before all the network 299 provisioning and binding has been done. 301 VM on source hypervisor and destination hypervisor SHOULD not be in 302 running state at the same time during migration. VM on source 303 hypervisor goes into shutdown state only when VM on destination 304 hypervisor has successfully been entering the running state. It is 305 possible that VM on the source hypervisor stays in migrating state 306 for a while after VM on the destination hypervisor is in running 307 state. 309 2.3 VM termination 311 VM termination is also referred to as "powering off" a VM. VM 312 termination leads its state going to shutdown. There are two possible 313 causes to terminate a VM [I-D.ietf-opsawg-vmm-mib], one is the normal 314 "power off" of a running VM; the other is that VM has been migrated 315 to other place and the VM image on the source hypervisor has to stop 316 executing and to be shutdown. 318 In VM termination, the external NVE connecting to that VM needs to 319 deprovision the VM, i.e. delete the network parameters associated 320 with that VM. In other words, external NVE has to de-associate the 321 VM. 323 2.4 VM Pause, suspension and resumption 325 VM pause event leads VM transiting from running state to paused 326 state. Paused state indicates VM is resident in memory but no longer 327 scheduled to execute by the hypervisor [I-D.ietf-opsawg-vmm-mib]. VM 328 can be easily re-activated from paused state to running state. 330 VM suspension leads VM to transit state from running to suspended and 331 VM resumption leads VM to transit state from suspended to running. 332 Suspended state means the memory and CPU execution state of the 333 virtual machine are saved to persistent store. During this state, 334 the virtual machine is not scheduled to execute by the hypervisor [I- 335 D.ietf-opsawg-vmm-mib]. 337 In split TS-NVE architecture, external NVE should keep any paused or 338 suspended VM in association as VM can return to running state at any 339 time. 341 3. Hypervisor-to-NVE Signaling protocol functionality 343 The following subsections show the illustrative examples of the state 344 transitions on external NVE which are relevant to Hypervisor-to-NVE 345 Signaling protocol functionality. It should be noted they are not 346 prescriptive text for full state machines. 348 3.1 VN connect and disconnect 350 When an NVE is external, a protocol is needed between the End Device 351 (e.g. Hypervisor) making use of the external NVE and the external NVE 352 in order to make the NVE aware of the changing VN membership 353 requirements of the Tenant Systems within the End Device. 355 A key driver for using a protocol rather than using static 356 configuration of the external NVE is because the VN connectivity 357 requirements can change frequently as VMs are brought up, moved and 358 brought down on various hypervisors throughout the data center. 360 +---------------+ Recv VN_connect; +-------------------+ 361 |VN_Disconnected| return Local_Tag value |VN_Connected | 362 +---------------+ for VN if successful; +-------------------+ 363 |VN_ID; |-------------------------->|VN_ID; | 364 |VN_State= | |VN_State=connected;| 365 |disconnected; | |Num_TSI_Associated;| 366 | |<----Recv VN_disconnect----|Local_Tag; | 367 +---------------+ |VN_Context; | 368 +-------------------+ 370 Figure 4 State Transition Summary of a VAP Instance 371 on an External NVE 373 Figure 4 show the state transition for a VAP on the external NVE. An 374 NVE that supports the hypervisor to NVE signaling protocol should 375 support one instance of the state machine for each active VN. The 376 state transition on the external NVE is normally triggered by the 377 hypervisor-facing side events and behaviors. Some of the interleaved 378 interaction between NVE and NVA will be illustrated for better 379 understanding of the whole procedures; while some of them may not be 380 shown. More detailed information regarding that is available in [I- 381 D.ietf-nvo3-nve-nva-cp-req]. 383 The NVE must be notified when an End Device requires connection to a 384 particular VN and when it no longer requires connection. In addition, 385 the external NVE must provide a local tag value for each connected VN 386 to the End Device to use for exchange of packets between the End 387 Device and the NVE (e.g. a locally significant 802.1Q tag value). How 388 "local" the significance is depends on whether the Hypervisor has a 389 direct physical connection to the NVE (in which case the significance 390 is local to the physical link), or whether there is an Ethernet 391 switch (e.g. a blade switch) connecting the Hypervisor to the NVE (in 392 which case the significance is local to the intervening switch and 393 all the links connected to it). 395 These VLAN tags are used to differentiate between different VNs as 396 packets cross the shared access network to the external NVE. When the 397 NVE receives packets, it uses the VLAN tag to identify the VN of 398 packets coming from a given TSI, strips the tag, and adds the 399 appropriate overlay encapsulation for that VN and send to the 400 corresponding VAP. 402 The Identification of the VN in this protocol could either be through 403 a VN Name or a VN ID. A globally unique VN Name facilitates 404 portability of a Tenant's Virtual Data Center. Once an NVE receives a 405 VN connect indication, the NVE needs a way to get a VN Context 406 allocated (or receive the already allocated VN Context) for a given 407 VN Name or ID (as well as any other information needed to transmit 408 encapsulated packets). How this is done is the subject of the NVE- 409 to-NVA (called NVE-to-NVA in this document) protocol which are part 410 of work items 1 and 2 in [I-D.ietf-nvo3-overlay-problem-statement]. 412 VN_connect message can be explicit or implicit. Explicit means the 413 hypervisor sending a message explicitly to request for the connection 414 to a VN. Implicit means the external NVE receives other messages, 415 e.g. very first TSI associate message for a given VN as in next 416 subsection, to implicitly indicate its interest to connect to a VN. 418 A VN_disconnect message will make NVE release all the resources for 419 that disconnected VN and transit to VN_disconnected state. The local 420 tag assigned for that VN can possibly be reclaimed by other VN. 422 3.2 TSI associate and activate 424 Typically, a TSI is assigned a single MAC address and all frames 425 transmitted and received on that TSI use that single MAC address. As 426 mentioned earlier, it is also possible for a Tenant System to 427 exchange frames using multiple MAC addresses or packets with multiple 428 IP addresses. 430 Particularly in the case of a TS that is forwarding frames or packets 431 from other TSs, the NVE will need to communicate the mapping between 432 the NVE's IP address (on the underlying network) and ALL the 433 addresses the TS is forwarding on behalf of to NVA in each 434 corresponding VN. 436 The NVE has two ways in which it can discover the tenant addresses 437 for which frames must be forwarded to a given End Device (and 438 ultimately to the TS within that End Device). 440 1. It can glean the addresses by inspecting the source addresses in 441 packets it receives from the End Device. 443 2. The hypervisor can explicitly signal the address associations of 444 a TSI to the external NVE. The address association includes all the 445 MAC and/or IP addresses possibly used as source addresses in a packet 446 sent from the hypervisor to external NVE. External NVE may further 447 use this information to filter the future traffic from the 448 hypervisor. 450 To perform the second approach above, the "hypervisor-to-NVE" 451 protocol requires a means to allow End Devices to communicate new 452 tenant addresses associations for a given TSI within a given VN. 454 Figure 5 shows the state machine for a TSI connecting to a VAP on the 455 external NVE. An NVE that supports the hypervisor to NVE signaling 456 protocol should support one instance of the state machine for each 457 TSI connecting to a given VN. 459 disassociate; +--------+ 460 +--------------->| Init |<--------clear-------+ 461 |or keepalive +--------+ | 462 |timer timeout; | | | 463 | | | | 464 | +--------+ | 465 | | | | 466 | associate | | activate | 467 | +-----------+ +-----------+ | 468 | | | | 469 | | | | 470 | \|/ \|/ | 471 +--------------------+ +---------------------+ 472 | Associated | | Activated | 473 +--------------------+ +---------------------+ 474 |TSI_ID; | |TSI_ID; | 475 |Port; |-----activate---->|Port; | 476 |VN_ID; | |VN_ID; | 477 |State=associated; | |State=activated ; |-+ 478 +-|Num_Of_Addr; |<---deactivate;---|Num_Of_Addr; | | 479 | |List_Of_Addr; | or keepactive List_Of_Addr; | | 480 | |ResetKeepaliveTimer;| timer timeout; |ResetKeepactiveTimer;| | 481 | +--------------------+ +---------------------+ | 482 | /|\ /|\ | 483 | | | | 484 +---------------------+ +-------------------+ 485 add/remove/updt addr; add/remove/updt addr; 486 or update port; or or update port; or 487 Recv keepalive pkt Recv keepactive pkt 488 from TSI; or data msg from TSI; 490 Figure 5 State Transition Summary of a TSI Instance 491 on an External NVE 493 Associated state of a TSI instance on an external NVE indicates all 494 the addresses for that TSI have already associated with the VAP of 495 the external NVE on port p for a given VN but no real traffic to and 496 from the TSI is expected and allowed to pass through. NVE has 497 reserved all the necessary resources for that TSI. NVE may report the 498 mappings of NVE's underlay IP address and the associated TSI 499 addresses to NVA and relevant network nodes may save such information 500 to its mapping table but not forwarding table. NVE may create ACL or 501 filter rules based on the associated TSI addresses on the attached 502 port p but not enable them yet. Local tag for the VN corresponding to 503 the TSI instance should be provisioned on port p to receive packets. 505 VM migration discussed section 2 may cause the hypervisor send 506 associate message to the NVE connecting the destination hypervisor 507 the VM migrates to. It is similar as the resource reservation request 508 to make sure the VM can be successfully migrated later. If such 509 association fails, VM may choose another destination hypervisor to 510 migrate to or alert with an administrative message. VM creation event 511 may also lead to the same practice. 513 Activated state of a TSI instance on an external NVE indicates that 514 all the addresses for that TSI functioning correctly on port p and 515 traffic can be received from and sent to that TSI on NVE. The 516 mappings of NVE's underlay IP address and the associated TSI 517 addresses should be put into the forwarding table rather than the 518 mapping table on relevant network nodes. ACL or filter rules based on 519 the associated TSI addresses on the attached port p in NVE are 520 enabled. Local tag for the VN corresponding to the TSI instance MUST 521 be provisioned on port p to receive packets. 523 Activate message makes the state transit from Init or Associated to 524 Activated. VM creation, VM migration and VM resumption events 525 discussed in section 4 may trigger activate message to be sent from 526 the hypervisor to the external NVE. 528 As mentioned in last subsection, associate or activate message from 529 the very first TSI connecting to a VN on an NVE is also considered as 530 the implicit VN_connect signal to create a VAP for that VN. 532 TSI information may get updated either in Associated or Activated 533 state. Add or remove the associated addresses, update current 534 associated addresses for example updating IP for a given MAC, update 535 NVE port information from which the message receives are all 536 considered as TSI information updating. Such update does not change 537 the state of TSI. When any address associated to a given TSI changes, 538 NVE should inform the NVA to update the mapping information on NVE's 539 underlying address and the associated TSI addresses. NVE should also 540 change its local ACL or filter settings accordingly for the relevant 541 addresses. Port information update will cause the local tag for the 542 VN corresponding to the TSI instance provisioned on new port p and 543 removed from old port. 545 NVE keeps a timer for each TSI instance associated or activated on 546 it. When NVE receives the keepalive or keepactive message for a TSI 547 instance, it should reset the timer. Keepactive timer may also be 548 reset by receiving the data packet from any associated address of the 549 corresponding TSI instance. Keepactive timer times out leads the 550 state transiting from Activated to Associated. Keepalive timer times 551 out leads the state transiting from Associated to Init. 553 3.3 TSI disassociate, deactivate and clear 555 Disassociate and deactivate conceptually are the reverse behaviors of 556 associate and activate. From Activated state to Associated state, NVE 557 needs to make sure the resources still reserved but the addresses 558 associated to the TSI not functioning and no traffic to and from the 559 TSI is expected and allowed to pass through. For example, NVE needs 560 to inform NVA to remove the relevant addresses mapping information 561 from forwarding or routing table. ACL or filtering rules regarding 562 the relevant addresses should be disabled. From Associated or 563 Activated state to Init state, NVE will release all the resource 564 relevant to TSI instances. NVE should also inform the NVA to remove 565 the relevant entries from mapping table. ACL or filtering rules 566 regarding the relevant addresses should be removed. Local tag 567 provisioning on the connecting port on NVE should be cleared. 569 VM suspension discussed in section 2 may cause the relevant TSI 570 instance(s) on NVE transit from Activated to Associated state. VM 571 pause normally does not affect the state of the relevant TSI 572 instance(s) on NVE as the VM is expected to run again soon. VM 573 shutdown will cause the relevant TSI instance(s) on NVE transit to 574 Init state from Activated state. All resources should be released. 576 VM migration will lead the TSI instance on the source NVE to leave 577 Activated state. Such state transition on source NVE should not occur 578 earlier than the TSI instance on the destination NVE transits to 579 Activated state. Otherwise traffic interruption may occur. When a VM 580 migrates to another hypervisor connecting to the same NVE, i.e. 581 source and destination NVE are the same, NVE should use TSI_ID and 582 incoming port to differentiate two TSI instance. 584 Although the triggering messages for state transition shown in Figure 585 5 does not indicate the difference between VM creation/shutdown and 586 VM migration arrival/departure, the NVE can make optimizations if it 587 is notified of such information. For example, if NVE knows the 588 incoming activate message caused by migration rather than VM 589 creation, some mechanisms may be employed or triggered to make sure 590 the dynamic configurations or provisionings on the destination NVE 591 same as those on the source NVE for the migrated VM, for example 592 multicast group memberships. 594 4. Hypervisor-to-NVE Signaling Protocol requirements 596 Req-1: The protocol is able to run between the hypervisor and its 597 associated external NVE which may directly connected or bridged in 598 split-NVE architecture. 600 Req-2: The protocol MUST support the hypervisor initiating a request 601 to its associated external NVE to be connected/disconnected to a 602 given VN. 604 Req-3: In response to the connection request to a given VN received 605 on NVE's port p as per Req-1, the protocol SHOULD support NVE 606 replying a locally significant tag assigned, for example 802.1Q tag 607 value, to each of the VN it is member of. NVE should keep the record 608 of VN ID, local tag assigned and port p triplet. 610 Req-4: The protocol MUST support the hypervisor initiating a request 611 to associate/disassociate, activate/deactive or clear address(es) of 612 a TSI instance to a VN on an NVE port. All requests should be 613 logically consistent with text in section 5.2 & 5.3. 615 Req-5: The protocol MUST support the hypervisor initiating a request 616 to add, remove or update address(es) associated with a TSI instance 617 on the external NVE. Addresses can be expressed in different formats, 618 for example, MAC, IP or pair of IP and MAC. 620 Req-6: When any request of the protocol fails, a reason code MUST be 621 provided in the reply. 623 Req-7: The protocol MAY support the hypervisor explicitly informing 624 NVE when a migration starts. It may help NVE to differentiate a new 625 associated/activated TSI resulting from VM creation or VM migration. 627 Req-8: The protocol SHOULD be extensible to carry more parameters to 628 meet future requirements, for example, QoS settings. 630 There are multiple candidate protocols probably with some simple 631 extensions that can be used as control plane protocol between 632 hypervisor and the external NVE. They include VDP [IEEE 802.1Qbg], 633 LLDP, XMPP, and HTTP REST. Multiple factors influence the choice of 634 protocol(s), for example, connection between hypervisor and external 635 NVE is L2 or L3. Appendix A illustrates VDP for reader's information. 637 5. Security Considerations 639 NVEs must ensure that only properly authorized Tenant Systems are 640 allowed to join and become a part of any specific Virtual Network. In 641 addition, NVEs will need appropriate mechanisms to ensure that any 642 hypervisor wishing to use the services of an NVE are properly 643 authorized to do so. One design point is whether the hypervisor 644 should supply the NVE with necessary information (e.g., VM addresses, 645 VN information, or other parameters) that the NVE uses directly, or 646 whether the hypervisor should only supply a VN ID and an identifier 647 for the associated VM (e.g., its MAC address), with the NVE using 648 that information to obtain the information needed to validate the 649 hypervisor-provided parameters or obtain related parameters in a 650 secure manner. 652 6. IANA Considerations 654 No IANA action is required. RFC Editor: please delete this section 655 before publication. 657 7. Acknowledgements 659 This document was initiated and merged from the drafts draft-kreeger- 660 nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve-mechanism and draft- 661 kompella-nvo3-server2nve. Thanks to all the co-authors and 662 contributing members of those drafts. 664 8. References 666 8.1 Normative References 668 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 669 Requirement Levels", BCP 14, RFC 2119, March 1997. 671 8.2 Informative References 673 [I-D.ietf-nvo3-overlay-problem-statement] Narten, T., Gray, E., 674 Black, D., Fang, L., Kreeger, L., and M. Napierala, 675 "Problem Statement: Overlays for Network Virtualization", 676 draft-ietf-nvo3-overlay-problem-statement-04 (work in 677 progress), July 2013. 679 [I-D.ietf-nvo3-framework] Lasserre, M., Balus, F., Morin, T., Bitar, 680 N., and Y. Rekhter, "Framework for DC Network 681 Virtualization", draft-ietf-nvo3-framework-05 (work in 682 progress), January 2014. 684 [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and 685 D. Black, "Network Virtualization NVE to NVA Control 686 Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-01 687 (work in progress), October 2013. 689 [I-D.ietf-nvo3-arch] Black, D., Narten, T., et al, "An Architecture 690 for Overlay Networks (NVO3)", draft-narten-nvo3-arch, work 691 in progress. 693 [I-D.ietf-opsawg-vmm-mib] Asai H., MacFaden M., Schoenwaelder J., 694 Shima K., Tsou T., "Management Information Base for 695 Virtual Machines Controlled by a Hypervisor", draft-ietf- 696 opsawg-vmm-mib-00 (work in progress), February 2014. 698 [IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual 699 Bridged Local Area Networks - Amendment 21: Edge Virtual 700 Bridging", IEEE Std 802.1Qbg, 2012 702 [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged 703 Local Area Networks", IEEE Std 802.1Q-2011, August, 2011 705 Appendix A. IEEE 802.1Qbg VDP Illustration (For information only) 707 VDP has the format shown in Figure A.1. Virtual Station Interface (VSI) 708 is an interface to a virtual station that is attached to a downlink port 709 of an internal bridging function in server. VSI's VDP packet will be 710 handled by an external bridge. VDP is the controlling protocol running 711 between the hypervisor and the external bridge. 713 +--------+--------+------+----+----+------+------+------+-----------+ 714 |TLV type|TLV info|Status|VSI |VSI |VSIID | VSIID|Filter|Filter Info| 715 | 7b |str len | |Type|Type|Format| | Info | | 716 | | 9b | 1oct |ID |Ver | | |format| | 717 | | | |3oct|1oct| 1oct |16oct |1oct | M oct | 718 +--------+--------+------+----+----+------+------+------+-----------+ 719 | | | | | 720 | | |<--VSI type&instance-->|<----Filter------>| 721 | | |<------------VSI attributes-------------->| 722 |<--TLV header--->|<-------TLV info string = 23 + M octets--------->| 724 Figure A.1: VDP TLV definitions 726 There are basically four TLV types. 728 1. Pre-Associate: Pre-Associate is used to pre-associate a VSI instance 729 with a bridge port. The bridge validates the request and returns a 730 failure Status in case of errors. Successful pre-association does not 731 imply that the indicated VSI Type or provisioning will be applied to any 732 traffic flowing through the VSI. The pre-associate enables faster 733 response to an associate, by allowing the bridge to obtain the VSI Type 734 prior to an association. 736 2. Pre-Associate with resource reservation: Pre-Associate with Resource 737 Reservation involves the same steps as Pre-Associate, but on successful 738 pre-association also reserves resources in the Bridge to prepare for a 739 subsequent Associate request. 741 3. Associate: The Associate creates and activates an association between 742 a VSI instance and a bridge port. The Bridge allocates any required 743 bridge resources for the referenced VSI. The Bridge activates the 744 configuration for the VSI Type ID. This association is then applied to 745 the traffic flow to/from the VSI instance. 747 4. Deassociate: The de-associate is used to remove an association 748 between a VSI instance and a bridge port. Pre-Associated and Associated 749 VSIs can be de-associated. De-associate releases any resources that were 750 reserved as a result of prior Associate or Pre-Associate operations for 751 that VSI instance. 753 Deassociate can be initiated by either side and the rest types of 754 messages can only be initiated by the server side. 756 Some important flag values in VDP Status field: 758 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is 759 migrating (M-bit = 1) or provides no guidance on the migration of the 760 user of the VSI (M-bit = 0). The M-bit is used as an indicator relative 761 to the VSI that the user is migrating to. 763 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 764 suspended (S-bit = 1) or provides no guidance as to whether the user of 765 the VSI is suspended (S-bit = 0). A keep-alive Associate request with 766 S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used 767 as an indicator relative to the VSI that the user is migrating from. 769 The filter information format currently supports 4 types as the 770 following. 772 1. VID Filter Info format 773 +---------+------+-------+--------+ 774 | #of | PS | PCP | VID | 775 |entries |(1bit)|(3bits)|(12bits)| 776 |(2octets)| | | | 777 +---------+------+-------+--------+ 778 |<--Repeated per entry->| 780 Figure A.2 VID Filter Info format 782 2. MAC/VID filter format 783 +---------+--------------+------+-------+--------+ 784 | #of | MAC address | PS | PCP | VID | 785 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 786 |(2octets)| | | | | 787 +---------+--------------+------+-------+--------+ 788 |<--------Repeated per entry---------->| 790 Figure A.3 MAC/VID filter format 792 3. GroupID/VID filter format 793 +---------+--------------+------+-------+--------+ 794 | #of | GroupID | PS | PCP | VID | 795 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 796 |(2octets)| | | | | 797 +---------+--------------+------+-------+--------+ 798 |<--------Repeated per entry---------->| 800 Figure A.4 GroupID/VID filter format 802 4. GroupID/MAC/VID filter format 803 +---------+----------+-------------+------+-----+--------+ 804 | #of | GroupID | MAC address | PS | PCP | VID | 805 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 806 |(2octets)| | | | | | 807 +---------+----------+-------------+------+-----+--------+ 808 |<-------------Repeated per entry------------->| 809 Figure A.5 GroupID/MAC/VID filter format 811 The null VID can be used in the VDP Request sent from the hypervisor to 812 the external bridge. Use of the null VID indicates that the set of VID 813 values associated with the VSI is expected to be supplied by the Bridge. 814 The Bridge can obtain VID values from the VSI Type whose identity is 815 specified by the VSI Type information in the VDP Request. The set of VID 816 values is returned to the station via the VDP Response. The returned VID 817 value can be a locally significant value. When GroupID is used, it is 818 equivalent to the VN ID in NVO3. GroupID will be provided by the 819 hypervisor to the bridge. The bridge will map GroupID to a locally 820 significant VLAN ID. 822 The VSIID in VDP request that identify a VM can be one of the following 823 format: IPV4 address, IPV6 address, MAC address, UUID or locally 824 defined. 826 We compare VDP against the requirements in the following Figure A.6. It 827 should be noted that the comparison is conceptual. Detail parameters 828 checking is not performed. 830 +------+-----------+----------------------------------------------+ 831 | Req | VDP | remarks | 832 | | supported?| | 833 +------+-----------+----------------------------------------------+ 834 | Req-1| partial |support directly connected but not bridged | 835 +------+-----------+----------------------------------------------+ 836 | Req-2| Yes |VN is represented by GroupID | 837 +------+-----------+----------------------------------------------+ 838 | Req-3| Yes |VID=NULL in request and bridge returns the | 839 | | |assigned value in response | 840 +------+-----------+------------------------+---------------------+ 841 | | | requiments | VDP equivalence | 842 | | +------------------------+---------------------+ 843 | Req-4| partial | associate/disassociate| pre-asso/de-asso | 844 | | | activate/deactivate | associate/nil | 845 | | | clear | de-associate | 846 +------+-----------+------------------------+---------------------+ 847 | Req-5| partial | VDP can handle MAC addresses properly. For IP| 848 | | | addresses, it is not clearly specified. | 849 +------+-----------+----------------------------------------------+ 850 | | | | 851 | Req-6| Yes | Error type indicated in Status in response | 852 +------+-----------+----------------------------------------------+ 853 | Req-7| Yes | M bit indicated in Status in request | 854 +------+-----------+----------------------------------------------+ 855 | | | For certain information,e.g. new filter info | 856 | Req-8| partial | format, VDP can easily be extended. For some,| 857 | | | extensibility may be limited. | 858 +------+-----------+----------------------------------------------+ 860 Figure A.6 Compare VDP with the requirements 862 Authors' Addresses 864 Yizhou Li 865 Huawei Technologies 866 101 Software Avenue, 867 Nanjing 210012 868 China 870 Phone: +86-25-56625409 871 EMail: liyizhou@huawei.com 873 Lucy Yong 874 Huawei Technologies, USA 876 Email: lucy.yong@huawei.com 877 Lawrence Kreeger 878 Cisco 880 Email: kreeger@cisco.com 882 Thomas Narten 883 IBM 885 Email: narten@us.ibm.com 886 David Black 887 EMC 889 Email: david.black@emc.com