idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The external NVE connected to destination hypervisor 2 has to associate the migrating VM's TSI with it by discovering the TSI's MAC and/or IP addresses, its VN, locally significant VID if any, and provisioning other network related parameters of the TSI. The external NVE may be informed about the VM's peer VMs, storage devices and other network appliances with which the VM needs to communicate or is communicating. The migrated VM on destination hypervisor 2 SHOULD not go to Running state before all the network provisioning and binding has been done. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The migrating VM SHOULD not be in Running state at the same time on the source hypervisor and destination hypervisor during migration. The VM on the source hypervisor does not transition into Shutdown state until the VM successfully enters the Running state on the destination hypervisor. It is possible that VM on the source hypervisor stays in Migrating state for a while after VM on the destination hypervisor is in Running state. -- The document date (February 9, 2015) is 3362 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2236' is mentioned on line 634, but not defined == Missing Reference: 'IEEE8021Qbg' is mentioned on line 710, but not defined == Unused Reference: 'I-D.ietf-nvo3-nve-nva-cp-req' is defined on line 813, but no explicit reference was found in the text == Unused Reference: '8021Q' is defined on line 831, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-01 == Outdated reference: A later version (-04) exists of draft-ietf-opsawg-vmm-mib-00 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Yizhou Li 3 INTERNET-DRAFT Lucy Yong 4 Intended Status: Informational Huawei Technologies 5 Lawrence Kreeger 6 Cisco 7 Thomas Narten 8 IBM 9 David Black 10 EMC 11 Expires: August 13, 2015 February 9, 2015 13 Hypervisor to NVE Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-02 16 Abstract 18 In a Split-NVE architructure, the functions of the NVE are split 19 across the hypervisor/container on a server and an external network 20 equipment which is called an external NVE. A control plane 21 protocol(s) between a hypervisor and its associated external NVE(s) 22 is used for the hypervisor to distribute its virtual machine 23 networking state to the external NVE(s) for further handling. This 24 document illustrates the functionality required by this type of 25 control plane signaling protocol and outlines the high level 26 requirements. Virtual machine states as well as state transitioning 27 are summarized to help clarifying the needed protocol requirements. 29 Status of this Memo 31 This Internet-Draft is submitted to IETF in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF), its areas, and its working groups. Note that 36 other groups may also distribute working documents as 37 Internet-Drafts. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/1id-abstracts.html 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 Copyright and License Notice 51 Copyright (c) 2013 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 68 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 5 69 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 7 70 2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 7 71 2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 8 72 2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 9 73 2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 9 74 3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 9 75 3.1 VN connect and Disconnect . . . . . . . . . . . . . . . . . 10 76 3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 11 77 3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 14 78 4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 15 79 5. VDP Applicability and Enhancement Needs . . . . . . . . . . . . 16 80 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 18 81 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 18 82 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 83 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 84 8.1 Normative References . . . . . . . . . . . . . . . . . . . 19 85 8.2 Informative References . . . . . . . . . . . . . . . . . . 19 86 Appendix A. IEEE 802.1Qbg VDP Illustration (For information 87 only) . . . . . . . . . . . . . . . . . . . . . . . . . . 19 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 90 1. Introduction 92 In the Split-NVE architecture shown in Figure 1, the functionality of 93 the NVE is split across an end device supporting virtualization and 94 an external network device which is called an external NVE. The 95 portion of the NVE functionality located on the hypervisor/container 96 is called the tNVE and the portion located on the external NVE is 97 called the nNVE in this document. Overlay encapsulation/decapsulation 98 functions are normally off-loaded to the nNVE on the external NVE. 99 The tNVE is normally implemented as a part of hypervisor or container 100 in an virtualized end device. 102 The problem statement [RFC7364], discusses the needs for a control 103 plane protocol (or protocols) to populate each NVE with the state 104 needed to perform the required functions. In one scenario, an NVE 105 provides overlay encapsulation/decapsulation packet forwarding 106 services to Tenant Systems (TSs) that are co-resident within the NVE 107 on the same End Device (e.g. when the NVE is embedded within a 108 hypervisor or a Network Service Appliance). In such cases, there is 109 no need for a standardized protocol between the hypervisor and NVE, 110 as the interaction is implemented via software on a single device. 111 While in the Split-NVE architecture scenarios, as shown in figure 2 112 to figure 4, a control plane protocol(s) between a hypervisor and its 113 associated external NVE(s) is required for the hypervisor to 114 distribute the virtual machines networking states to the NVE(s) for 115 further handling. The protocol indeed is an NVE-internal protocol and 116 runs between tNVE and nNVE logical entities. This protocol is 117 mentioned in NVO3 problem statement [RFC7364] and appears as the 118 third work item. 120 Virtual machine states and state transitioning are summarized in this 121 document to show events where the NVE needs to take specific actions. 122 Such events might correspond to actions the control plane signaling 123 protocols between the hypervisor and external NVE will need to take. 124 Then the high level requirements to be fulfilled are outlined. 126 +-- -- -- -- Split-NVE -- -- -- --+ 127 | 128 | 129 +---------------|-----+ 130 | +------------- ----+| | 131 | | +--+ +---\|/--+|| +------ --------------+ 132 | | |VM|---+ ||| | \|/ | 133 | | +--+ | ||| |+--------+ | 134 | | +--+ | tNVE |||----- - - - - - -----|| | | 135 | | |VM|---+ ||| || nNVE | | 136 | | +--+ +--------+|| || | | 137 | | || |+--------+ | 138 | +--Hpvr/Container--+| +---------------------+ 139 +---------------------+ 141 End Device External NVE 143 Figure 1 Split-NVE structure 145 This document uses the term "hypervisor" throughout when describing 146 the Split-NVE scenario where part of the NVE functionality is off- 147 loaded to a separate device from the "hypervisor" that contains a VM 148 connected to a VN. In this context, the term "hypervisor" is meant to 149 cover any device type where part of the NVE functionality is off- 150 loaded in this fashion, e.g.,a Network Service Appliance, Linux 151 Container. 153 This document often uses the term "VM" and "Tenant System" (TS) 154 interchangeably, even though a VM is just one type of Tenant System 155 that may connect to a VN. For example, a service instance within a 156 Network Service Appliance may be another type of TS, or a system 157 running on an OS-level virtualization technologies like LinuX 158 Containers. When this document uses the term VM, it will in most 159 cases apply to other types of TSs. 161 Section 2 describes VM states and state transitioning in its 162 lifecycle. Section 3 introduces Hypervisor-to-NVE control plane 163 protocol functionality derived from VM operations and network events. 164 Section 4 outlines the requirements of the control plane protocol to 165 achieve the required functionality. 167 1.1 Terminology 169 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 170 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 171 document are to be interpreted as described in RFC 2119 [RFC2119]. 173 This document uses the same terminology as found in [RFC7365] and [I- 174 D.ietf-nvo3-nve-nva-cp-req]. This section defines additional 175 terminology used by this document. 177 Split-NVE: a type of NVE that the functionalities of it are split 178 across an end device supporting virtualization and an external 179 network device. 181 tNVE: the portion of Split-NVE functionalities located on the end 182 device supporting virtualization. 184 nNVE: the portion of Split-NVE functionalities located on the network 185 device which is directly or indirectly connects to the end device 186 holding the corresponding tNVE. 188 External NVE: the physical network device holding nNVE 190 Hypervisor/Container: the logical collection of software, firmware 191 and/or hardware that allows the creation and running of server or 192 service appliance virtualization. tNVE is located on 193 Hypervisor/Container. It is loosely used in this document to refer to 194 the end device supporting the virtualization. For simplicity, we also 195 use Hypervisor in this document to represent both hypervisor and 196 container. 198 VN Profile: Meta data associated with a VN that is applied to any 199 attachment point to the VN. That is, VAP properties that are appliaed 200 to all VAPs associated with a given VN and used by an NVE when 201 ingressing/egressing packets to/from a specific VN. Meta data could 202 include such information as ACLs, QoS settings, etc. The VN Profile 203 contains parameters that apply to the VN as a whole. Control 204 protocols between the NVE and NVA could use the VN ID or VN Name to 205 obtain the VN Profile. 207 VSI: Virtual Station Interface. [IEEE 802.1Qbg] 209 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg] 211 1.2 Target Scenarios 213 In the Split-NVE architecture, an external NVE can provide an offload 214 of the encapsulation / decapsulation function, network policy 215 enforcement, as well as the VN Overlay protocol overhead. This 216 offloading may provide performance improvements and/or resource 217 savings to the End Device (e.g. hypervisor) making use of the 218 external NVE. 220 The following figures give example scenarios of a Split-NVE 221 architecture. 223 Hypervisor Access Switch 224 +------------------+ +-----+-------+ 225 | +--+ +-------+ | | | | 226 | |VM|---| | | VLAN | | | 227 | +--+ | tNVE |---------+ nNVE| +--- Underlying 228 | +--+ | | | Trunk | | | Network 229 | |VM|---| | | | | | 230 | +--+ +-------+ | | | | 231 +------------------+ +-----+-------+ 232 Figure 2 Hypervisor with an External NVE 234 Hypervisor L2 Switch 235 +---------------+ +-----+ +----+---+ 236 | +--+ +----+ | | | | | | 237 | |VM|---| | |VLAN | |VLAN | | | 238 | +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying 239 | +--+ | | |Trunk| |Trunk| | | Network 240 | |VM|---| | | | | | | | 241 | +--+ +----+ | | | | | | 242 +---------------+ +-----+ +----+---+ 243 Figure 3 Hypervisor with an External NVE 244 across an Ethernet Access Switch 246 Network Service Appliance Access Switch 247 +--------------------------+ +-----+-------+ 248 | +------------+ | \ | | | | 249 | |Net Service |----| \ | | | | 250 | |Instance | | \ | VLAN | | | 251 | +------------+ |tNVE| |------+nNVE | +--- Underlying 252 | +------------+ | | | Trunk| | | Network 253 | |Net Service |----| / | | | | 254 | |Instance | | / | | | | 255 | +------------+ | / | | | | 256 +--------------------------+ +-----+-------+ 257 Figure 4 Physical Network Service Appliance with an External NVE 259 Tenant Systems connect to external NVEs via a Tenant System Interface 260 (TSI). The TSI logically connects to the external NVE via a Virtual 261 Access Point (VAP) [I-D.ietf-nvo3-arch]. The external NVE may provide 262 Layer 2 or Layer 3 forwarding. In the Split-NVE architecture, the 263 external NVE may be able to reach multiple MAC and IP addresses via a 264 TSI. For example, Tenant Systems that are providing network services 265 (such as transparent firewall, load balancer, VPN gateway) are likely 266 to have complex address hierarchy. This implies that if a given TSI 267 disassociates from one VN, all the MAC and/or IP addresses are also 268 disassociated. There is no need to signal the deletion of every MAC 269 or IP when the TSI is brought down or deleted. In the majority of 270 cases, a VM will be acting as a simple host that will have a single 271 TSI and single MAC and IP visible to the external NVE. 273 2. VM Lifecycle 275 Figure 2 of [I-D.ietf-opsawg-vmm-mib] shows the state transition of a 276 VM. Some of the VM states are of interest to the external NVE. This 277 section illustrates the relevant phases and events in the VM 278 lifecycle. It should be noted that the following subsections do not 279 give an exhaustive traversal of VM lifecycle state. They are intended 280 as the illustrative examples which are relevant to Split-NVE 281 architecture, not as prescriptive text; the goal is to capture 282 sufficient detail to set a context for the signaling protocol 283 functionality and requirements described in the following sections. 285 2.1 VM Creation Event 287 VM creation event makes the VM state transiting from Preparing to 288 Shutdown and then to Running [I-D.ietf-opsawg-vmm-mib]. The end 289 device allocates and initializes local virtual resources like storage 290 in the VM Preparing state. In Shutdown state, the VM has everything 291 ready except that CPU execution is not scheduled by the hypervisor 292 and VM's memory is not resident in the hypervisor. From the Shutdown 293 state to Running state, normally it requires the human execution or 294 system triggered event. Running state indicates the VM is in the 295 normal execution state. As part of transitioning the VM to the 296 Running state, the hypervisor must also provision network 297 connectivity for the VM's TSI(s) so that Ethernet frames can be sent 298 and received correctly. No ongoing migration, suspension or shutdown 299 is in process. 301 In the VM creation phase, the VM's TSI has to be associated with the 302 external NVE. Association here indicates that hypervisor and the 303 external NVE have signaled each other and reached some agreement. 304 Relevant networking parameters or information have been provisioned 305 properly. The External NVE should be informed of the VM's TSI MAC 306 address and/or IP address. In addition to external network 307 connectivity, the hypervisor may provide local network connectivity 308 between the VM's TSI and other VM's TSI that are co-resident on the 309 same hypervisor. When the intra or inter-hypervisor connectivity is 310 extended to the external NVE, a locally significant tag, e.g. VLAN 311 ID, should be used between the hypervisor and the external NVE to 312 differentiate each VN's traffic. Both the hypervisor and external NVE 313 sides must agree on that tag value for traffic identification, 314 isolation and forwarding. 316 The external NVE may need to do some preparation work before it 317 signals successful association with TSI. Such preparation work may 318 include locally saving the states and binding information of the 319 tenant system interface and its VN, communicating with the NVA for 320 network provisioning, etc. 322 Tenant System interface association should be performed before the VM 323 enters running state, preferably in Shutdown state. If association 324 with external NVE fails, the VM should not go into running state. 326 2.2 VM Live Migration Event 328 Live migration is sometimes referred to as "hot" migration, in that 329 from an external viewpoint, the VM appears to continue to run while 330 being migrated to another server (e.g., TCP connections generally 331 survive this class of migration). In contrast, "cold" migration 332 consists of shutdown VM execution on one server and restart it on 333 another. For simplicity, the following abstract summary about live 334 migration assumes shared storage, so that the VM's storage is 335 accessible to the source and destination servers. Assume VM live 336 migrates from hypervisor 1 to hypervisor 2. Such migration event 337 involves the state transition on both hypervisors, source hypervisor 338 1 and destination hypervisor 2. VM state on source hypervisor 1 339 transits from Running to Migrating and then to Shutdown [I-D.ietf- 340 opsawg-vmm-mib]. VM state on destination hypervisor 2 transits from 341 Shutdown to Migrating and then Running. 343 The external NVE connected to destination hypervisor 2 has to 344 associate the migrating VM's TSI with it by discovering the TSI's MAC 345 and/or IP addresses, its VN, locally significant VID if any, and 346 provisioning other network related parameters of the TSI. The 347 external NVE may be informed about the VM's peer VMs, storage devices 348 and other network appliances with which the VM needs to communicate 349 or is communicating. The migrated VM on destination hypervisor 2 350 SHOULD not go to Running state before all the network provisioning 351 and binding has been done. 353 The migrating VM SHOULD not be in Running state at the same time on 354 the source hypervisor and destination hypervisor during migration. 355 The VM on the source hypervisor does not transition into Shutdown 356 state until the VM successfully enters the Running state on the 357 destination hypervisor. It is possible that VM on the source 358 hypervisor stays in Migrating state for a while after VM on the 359 destination hypervisor is in Running state. 361 2.3 VM Termination Event 363 VM termination event is also referred to as "powering off" a VM. VM 364 termination event leads to its state going to Shutdown. There are two 365 possible causes to terminate a VM [I-D.ietf-opsawg-vmm-mib], one is 366 the normal "power off" of a running VM; the other is that VM has been 367 migrated to another hypervisor and the VM image on the source 368 hypervisor has to stop executing and to be shutdown. 370 In VM termination, the external NVE connecting to that VM needs to 371 deprovision the VM, i.e. delete the network parameters associated 372 with that VM. In other words, the external NVE has to de-associate 373 the VM's TSI. 375 2.4 VM Pause, Suspension and Resumption Events 377 The VM pause event leads to the VM transiting from Running state to 378 Paused state. The Paused state indicates that the VM is resident in 379 memory but no longer scheduled to execute by the hypervisor [I- 380 D.ietf-opsawg-vmm-mib]. The VM can be easily re-activated from Paused 381 state to Running state. 383 The VM suspension event leads to the VM transiting from Running state 384 to Suspended state. The VM resumption event leads to the VM 385 transiting state from Suspended state to Running state. Suspended 386 state means the memory and CPU execution state of the virtual machine 387 are saved to persistent store. During this state, the virtual 388 machine is not scheduled to execute by the hypervisor [I-D.ietf- 389 opsawg-vmm-mib]. 391 In the Split-NVE architecture, the external NVE should keep any 392 paused or suspended VM in association as the VM can return to Running 393 state at any time. 395 3. Hypervisor-to-NVE Control Plane Protocol Functionality 397 The following subsections show the illustrative examples of the state 398 transitions on external NVE which are relevant to Hypervisor-to-NVE 399 Signaling protocol functionality. It should be noted they are not 400 prescriptive text for full state machines. 402 3.1 VN connect and Disconnect 404 In Split-NVE scenario, a protocol is needed between the End 405 Device(e.g. Hypervisor) making use of the external NVE and the 406 external NVE in order to make the external NVE aware of the changing 407 VN membership requirements of the Tenant Systems within the End 408 Device. 410 A key driver for using a protocol rather than using static 411 configuration of the external NVE is because the VN connectivity 412 requirements can change frequently as VMs are brought up, moved and 413 brought down on various hypervisors throughout the data center or 414 external cloud. 416 +---------------+ Recv VN_connect; +-------------------+ 417 |VN_Disconnected| return Local_Tag value |VN_Connected | 418 +---------------+ for VN if successful; +-------------------+ 419 |VN_ID; |-------------------------->|VN_ID; | 420 |VN_State= | |VN_State=connected;| 421 |disconnected; | |Num_TSI_Associated;| 422 | |<----Recv VN_disconnect----|Local_Tag; | 423 +---------------+ |VN_Context; | 424 +-------------------+ 426 Figure 5 State Transition Example of a VAP Instance 427 on an External NVE 429 Figure 5 shows the state transition for a VAP on the external NVE. An 430 NVE that supports the hypervisor to NVE control plane protocol should 431 support one instance of the state machine for each active VN. The 432 state transition on the external NVE is normally triggered by the 433 hypervisor-facing side events and behaviors. Some of the interleaved 434 interaction between NVE and NVA will be illustrated for better 435 understanding of the whole procedure; while others of them may not be 436 shown. More detailed information regarding that is available in [I- 437 D.ietf-nvo3-nve-nva-cp-req]. 439 The external NVE must be notified when an End Device requires 440 connection to a particular VN and when it no longer requires 441 connection. In addition, the external NVE must provide a local tag 442 value for each connected VN to the End Device to use for exchange of 443 packets between the End Device and the external NVE (e.g. a locally 444 significant 802.1Q tag value). How "local" the significance is 445 depends on whether the Hypervisor has a direct physical connection to 446 the external NVE (in which case the significance is local to the 447 physical link), or whether there is an Ethernet switch (e.g. a blade 448 switch) connecting the Hypervisor to the NVE (in which case the 449 significance is local to the intervening switch and all the links 450 connected to it). 452 These VLAN tags are used to differentiate between different VNs as 453 packets cross the shared access network to the external NVE. When the 454 external NVE receives packets, it uses the VLAN tag to identify the 455 VN of packets coming from a given TSI, strips the tag, and adds the 456 appropriate overlay encapsulation for that VN and sends it towards 457 the corresponding remote NVE across the underlying IP network. 459 The Identification of the VN in this protocol could either be through 460 a VN Name or a VN ID. A globally unique VN Name facilitates 461 portability of a Tenant's Virtual Data Center. Once an external NVE 462 receives a VN connect indication, the NVE needs a way to get a VN 463 Context allocated (or receive the already allocated VN Context) for a 464 given VN Name or ID (as well as any other information needed to 465 transmit encapsulated packets). How this is done is the subject of 466 the NVE-to-NVA protocol which are part of work items 1 and 2 in 467 [RFC7364]. 469 VN_connect message can be explicit or implicit. Explicit means the 470 hypervisor sending a message explicitly to request for the connection 471 to a VN. Implicit means the external NVE receives other messages, 472 e.g. very first TSI associate message (see the next subsection) for a 473 given VN, to implicitly indicate its interest to connect to a VN. 475 A VN_disconnect message will indicate that the NVE can release all 476 the resources for that disconnected VN and transit to VN_disconnected 477 state. The local tag assigned for that VN can possibly be reclaimed 478 by other VN. 480 3.2 TSI Associate and Activate 482 Typically, a TSI is assigned a single MAC address and all frames 483 transmitted and received on that TSI use that single MAC address. As 484 mentioned earlier, it is also possible for a Tenant System to 485 exchange frames using multiple MAC addresses or packets with multiple 486 IP addresses. 488 Particularly in the case of a TS that is forwarding frames or packets 489 from other TSs, the external NVE will need to communicate the mapping 490 between the NVE's IP address (on the underlying network) and ALL the 491 addresses the TS is forwarding on behalf of for the corresponding VN 492 to the NVA. 494 The NVE has two ways in which it can discover the tenant addresses 495 for which frames must be forwarded to a given End Device (and 496 ultimately to the TS within that End Device). 498 1. It can glean the addresses by inspecting the source addresses in 499 packets it receives from the End Device. 501 2. The hypervisor can explicitly signal the address associations of 502 a TSI to the external NVE. The address association includes all the 503 MAC and/or IP addresses possibly used as source addresses in a packet 504 sent from the hypervisor to external NVE. The external NVE may 505 further use this information to filter the future traffic from the 506 hypervisor. 508 To perform the second approach above, the "hypervisor-to-NVE" 509 protocol requires a means to allow End Devices to communicate new 510 tenant addresses associations for a given TSI within a given VN. 512 Figure 6 shows the example of a state transition for a TSI connecting 513 to a VAP on the external NVE. An NVE that supports the hypervisor to 514 NVE control plane protocol may support one instance of the state 515 machine for each TSI connecting to a given VN. 517 disassociate; +--------+ disassociate 518 +--------------->| Init |<--------------------+ 519 | +--------+ | 520 | | | | 521 | | | | 522 | +--------+ | 523 | | | | 524 | associate | | activate | 525 | +-----------+ +-----------+ | 526 | | | | 527 | | | | 528 | \|/ \|/ | 529 +--------------------+ +---------------------+ 530 | Associated | | Activated | 531 +--------------------+ +---------------------+ 532 |TSI_ID; | |TSI_ID; | 533 |Port; |-----activate---->|Port; | 534 |VN_ID; | |VN_ID; | 535 |State=associated; | |State=activated ; |-+ 536 +-|Num_Of_Addr; |<---deactivate;---|Num_Of_Addr; | | 537 | |List_Of_Addr; | |List_Of_Addr; | | 538 | +--------------------+ +---------------------+ | 539 | /|\ /|\ | 540 | | | | 541 +---------------------+ +-------------------+ 542 add/remove/updt addr; add/remove/updt addr; 543 or update port; or update port; 545 Figure 6 State Transition Example of a TSI Instance 546 on an External NVE 548 Associated state of a TSI instance on an external NVE indicates all 549 the addresses for that TSI have already associated with the VAP of 550 the external NVE on port p for a given VN but no real traffic to and 551 from the TSI is expected and allowed to pass through. An NVE has 552 reserved all the necessary resources for that TSI. An external NVE 553 may report the mappings of its' underlay IP address and the 554 associated TSI addresses to NVA and relevant network nodes may save 555 such information to its mapping table but not forwarding table. A NVE 556 may create ACL or filter rules based on the associated TSI addresses 557 on the attached port p but not enable them yet. Local tag for the VN 558 corresponding to the TSI instance should be provisioned on port p to 559 receive packets. 561 VM migration event(discussed section 2) may cause the hypervisor to 562 send an associate message to the NVE connected to the destination 563 hypervisor the VM migrates to. VM creation event may also lead to the 564 same practice. 566 The Activated state of a TSI instance on an external NVE indicates 567 that all the addresses for that TSI functioning correctly on port p 568 and traffic can be received from and sent to that TSI via the NVE. 569 The mappings of the NVE's underlay IP address and the associated TSI 570 addresses should be put into the forwarding table rather than the 571 mapping table on relevant network nodes. ACL or filter rules based on 572 the associated TSI addresses on the attached port p in NVE are 573 enabled. Local tag for the VN corresponding to the TSI instance MUST 574 be provisioned on port p to receive packets. 576 The Activate message makes the state transit from Init or Associated 577 to Activated. VM creation, VM migration and VM resumption events 578 discussed in section 4 may trigger the Activate message to be sent 579 from the hypervisor to the external NVE. 581 TSI information may get updated either in Associated or Activated 582 state. The following are considered updates to the TSI information: 583 add or remove the associated addresses, update current associated 584 addresses (for example updating IP for a given MAC), update NVE port 585 information based on where the NVE receives messages. Such updates do 586 not change the state of TSI. When any address associated to a given 587 TSI changes, the NVE should inform the NVA to update the mapping 588 information on NVE's underlying address and the associated TSI 589 addresses. The NVE should also change its local ACL or filter 590 settings accordingly for the relevant addresses. Port information 591 update will cause the local tag for the VN corresponding to the TSI 592 instance to be provisioned on new port p and removed from the old 593 port. 595 3.3 TSI Disassociate and Deactivate 597 Disassociate and deactivate conceptually are the reverse behaviors of 598 associate and activate. From Activated state to Associated state, the 599 external NVE needs to make sure the resources are still reserved but 600 the addresses associated to the TSI are not functioning and no 601 traffic to and from the TSI is expected and allowed to pass through. 602 For example, the NVE needs to inform the NVA to remove the relevant 603 addresses mapping information from forwarding or routing table. ACL 604 or filtering rules regarding the relevant addresses should be 605 disabled. From Associated or Activated state to the Init state, the 606 NVE will release all the resources relevant to TSI instances. The NVE 607 should also inform the NVA to remove the relevant entries from 608 mapping table. ACL or filtering rules regarding the relevant 609 addresses should be removed. Local tag provisioning on the connecting 610 port on NVE should be cleared. 612 A VM suspension event(discussed in section 2) may cause the relevant 613 TSI instance(s) on the NVE to transit from Activated to Associated 614 state. A VM pause event normally does not affect the state of the 615 relevant TSI instance(s) on the NVE as the VM is expected to run 616 again soon. The VM shutdown event will normally cause the relevant 617 TSI instance(s) on NVE transit to Init state from Activated state. 618 All resources should be released. 620 A VM migration will lead the TSI instance on the source NVE to leave 621 Activated state. When a VM migrates to another hypervisor connecting 622 to the same NVE, i.e. source and destination NVE are the same, NVE 623 should use TSI_ID and incoming port to differentiate two TSI 624 instance. 626 Although the triggering messages for state transition shown in Figure 627 6 does not indicate the difference between VM creation/shutdown event 628 and VM migration arrival/departure event, the external NVE can make 629 optimizations if it is notified of such information. For example, if 630 the NVE knows the incoming activate message is caused by migration 631 rather than VM creation, some mechanisms may be employed or triggered 632 to make sure the dynamic configurations or provisionings on the 633 destination NVE are the same as those on the source NVE for the 634 migrated VM. For example IGMP query [RFC2236] can be triggered by the 635 destination external NVE to the migrated VM on destination hypervisor 636 so that the VM is forced to answer an IGMP report to the multicast 637 router. Then multicast router can correctly send the multicast 638 traffic to the new external NVE for those multicast groups the VM had 639 joined before the migration. 641 4. Hypervisor-to-NVE Control Plane Protocol Requirements 643 Req-1: The protocol MUST support a bridged network connecting End 644 Devices to External NVE. 646 Req-2: The protocol MUST support multiple End Devices sharing the 647 same External NVE via the same physical port across a bridged 648 network. 650 Req-3: The protocol MAY support an End Device using multiple external 651 NVEs simultaneously, but only one external NVE for each VN. 653 Req-4: The protocol MAY support an End Device using multiple external 654 NVEs simultaneously for the same VN. 656 Req-5: The protocol MUST allow the End Device initiating a request to 657 its associated External NVE to be connected/disconnected to a given 658 VN. 660 Req-6: The protocol MUST allow an External NVE initiating a request 661 to its connected End Devices to be disconnected to a given VN. 663 Req-7: When a TS attaches to a VN, the protocol MUST allow for an End 664 Device and its external NVE to negotiate a locally-significant tag 665 for carrying traffic associated with a specific VN (e.g., 802.1Q 666 tags). 668 Req-8: The protocol MUST allow an End Device initiating a request to 669 associate/disassociate and/or activate/deactive address(es) of a TSI 670 instance to a VN on an NVE port. 672 Req-9: The protocol MUST allow the External NVE initiating a request 673 to disassociate and/or deactivate address(es) of a TSI instance to a 674 VN on an NVE port. 676 Req-10: The protocol MUST allow an End Device initiating a request to 677 add, remove or update address(es) associated with a TSI instance on 678 the external NVE. Addresses can be expressed in different formats, 679 for example, MAC, IP or pair of IP and MAC. 681 Req-11: The protocol MUST allow the External NVE to authenticate the 682 End Device connected. 684 Req-12: The protocol MUST be able to run over L2 links between the 685 End Device and its External NVE. 687 Req-13: The protocol SHOULD support the End Device indicating if an 688 associate or activate request from it results from a VM hot migration 689 event. 691 5. VDP Applicability and Enhancement Needs 693 Virtual Station Interface (VSI) Discovery and Configuration Protocol 694 (VDP) [IEEE 802.1Qbg] can be the control plane protocol running 695 between the hypervisor and the external NVE. Appendix A illustrates 696 VDP for reader's information. 698 VDP facilitates the automatic discovery and configuration for Edge 699 Virtual Bridging (EVB) station and Edge Virtual Bridging (EVB) 700 bridge. EVB station is normally an end station running multiple VMs. 701 It is conceptually equivalent to hypervisor in this document. And EVB 702 bridge is conceptually equivalent to the external NVE. 704 VDP is able to pre-associate/associate/de-associate a VSI on EVB 705 station to a port on the EVB bridge. VSI is approximately the concept 706 of a virtual port a VM connects to the hypervisor in this document 707 context. The EVB station and the EVB bridge can reach the agreement 708 on VLAN ID(s) assigned to a VSI via VDP message exchange. Other 709 configuration parameters can be exchanged via VDP as well. VDP is 710 carried over Edge Control Protocol(ECP) [IEEE8021Qbg] which provides 711 a reliable transportation over a layer 2 network. 713 VDP protocol needs some extensions to fulfill the requirements listed 714 in this document. Table 1 shows the needed extensions and/or 715 clarifications in NVO3 context. 717 +------+-----------+-----------------------------------------------+ 718 | Req | VDP | remarks | 719 | | supported?| | 720 +------+-----------+-----------------------------------------------+ 721 | Req-1| |Needs extension. Dest MAC can be a specific | 722 +------+ Partially |unicast MAC besides Nearest Customer Bridge | 723 | Req-2| |group MAC | 724 +------+-----------+-----------------------------------------------+ 725 | Req-3| |Needs clarification and extension for link | 726 | | |aggregation support. | 727 +------+ Partially |For req-4, (pre-)associate status needs to be | 728 | Req-4| |synchronized on all NVE ports. | 729 +------+-----------+-----------------------------------------------+ 730 | Req-5| Yes |VN is indicated by GroupID | 731 +------+-----------+-----------------------------------------------+ 732 | Req-6| Yes |Bridge sends De-Associate | 733 +------+-----------+------------------------+----------------------+ 734 | Req-7| Yes |VID==NULL in request and bridge returns the | 735 | | |assigned value in response | 736 +------+-----------+------------------------+----------------------+ 737 | | | requirements | VDP equivalence | 738 | | +------------------------+----------------------+ 739 | | | associate/disassociate|pre-asso/de-associate | 740 | Req-8| Partially | activate/deactivate |associate/de-associate| 741 | | +------------------------+----------------------| 742 | | |Needs extension to allow associate->pre-assoc | 743 +------+-----------+------------------------+----------------------+ 744 | Req-9| Yes | VDP bridge initiates de-associate | 745 +------+-----------+-----------------------------------------------+ 746 |Req-10| Partially |Needs extension for IPv4/IPv6 address | 747 +------+-----------+-----------------------------------------------+ 748 |Req-11| No |Needs extension for authentication | 749 +------+-----------+-----------------------------------------------+ 750 |Req-12| Yes |L2 protocol naturally | 751 +------+-----------+-----------------------------------------------+ 752 | | |M bit for migrated VM on destination hypervisor| 753 | | |and S bit for that on source hypervisor. | 754 |Req-13| Partially |It is indistinguishable when M/S is 0 between | 755 | | |no guidance and events not caused by migration | 756 | | |where NVE may act differently. Needs extension | 757 | | |to clearly define them. | 758 +------+-----------+-----------------------------------------------+ 760 Table 1 Compare VDP with the requirements 762 Simply adding the ability to carry layer 3 addresses, VDP can serve 763 the Hypervisor-to-NVE control plane functions pretty well. Other 764 extensions are the improvement of the protocol capabilities for 765 better fit in NVO3 network. 767 6. Security Considerations 769 NVEs must ensure that only properly authorized Tenant Systems are 770 allowed to join and become a part of any specific Virtual Network. In 771 addition, NVEs will need appropriate mechanisms to ensure that any 772 hypervisor wishing to use the services of an NVE are properly 773 authorized to do so. One design point is whether the hypervisor 774 should supply the NVE with necessary information (e.g., VM addresses, 775 VN information, or other parameters) that the NVE uses directly, or 776 whether the hypervisor should only supply a VN ID and an identifier 777 for the associated VM (e.g., its MAC address), with the NVE using 778 that information to obtain the information needed to validate the 779 hypervisor-provided parameters or obtain related parameters in a 780 secure manner. 782 7. IANA Considerations 784 No IANA action is required. RFC Editor: please delete this section 785 before publication. 787 8. Acknowledgements 789 This document was initiated and merged from the drafts draft-kreeger- 790 nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve-mechanism and draft- 791 kompella-nvo3-server2nve. Thanks to all the co-authors and 792 contributing members of those drafts. 794 The authors would like to specially thank Jon Hudson for his generous 795 help in improving the readability of this document. 797 8. References 798 8.1 Normative References 800 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 801 Requirement Levels", BCP 14, RFC 2119, March 1997. 803 8.2 Informative References 805 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and 806 M. Napierala, "Problem Statement: Overlays for Network 807 Virtualization", October 2014. 809 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 810 Rekhter, "Framework for DC Network Virtualization", 811 October 2014. 813 [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and 814 D. Black, "Network Virtualization NVE to NVA Control 815 Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-01 816 (work in progress), October 2013. 818 [I-D.ietf-nvo3-arch] Black, D., Narten, T., et al, "An Architecture 819 for Overlay Networks (NVO3)", draft-narten-nvo3-arch, work 820 in progress. 822 [I-D.ietf-opsawg-vmm-mib] Asai H., MacFaden M., Schoenwaelder J., 823 Shima K., Tsou T., "Management Information Base for 824 Virtual Machines Controlled by a Hypervisor", draft-ietf- 825 opsawg-vmm-mib-00 (work in progress), February 2014. 827 [IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual 828 Bridged Local Area Networks - Amendment 21: Edge Virtual 829 Bridging", IEEE Std 802.1Qbg, 2012 831 [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged 832 Local Area Networks", IEEE Std 802.1Q-2011, August, 2011 834 Appendix A. IEEE 802.1Qbg VDP Illustration (For information only) 836 VDP has the format shown in Figure A.1. Virtual Station Interface (VSI) 837 is an interface to a virtual station that is attached to a downlink port 838 of an internal bridging function in server. VSI's VDP packet will be 839 handled by an external bridge. VDP is the controlling protocol running 840 between the hypervisor and the external bridge. 842 +--------+--------+------+----+----+------+------+------+-----------+ 843 |TLV type|TLV info|Status|VSI |VSI |VSIID | VSIID|Filter|Filter Info| 844 | 7b |str len | |Type|Type|Format| | Info | | 845 | | 9b | 1oct |ID |Ver | | |format| | 846 | | | |3oct|1oct| 1oct |16oct |1oct | M oct | 847 +--------+--------+------+----+----+------+------+------+-----------+ 848 | | | | | 849 | | |<--VSI type&instance-->|<----Filter------>| 850 | | |<------------VSI attributes-------------->| 851 |<--TLV header--->|<-------TLV info string = 23 + M octets--------->| 853 Figure A.1: VDP TLV definitions 855 There are basically four TLV types. 857 1. Pre-Associate: Pre-Associate is used to pre-associate a VSI instance 858 with a bridge port. The bridge validates the request and returns a 859 failure Status in case of errors. Successful pre-association does not 860 imply that the indicated VSI Type or provisioning will be applied to any 861 traffic flowing through the VSI. The pre-associate enables faster 862 response to an associate, by allowing the bridge to obtain the VSI Type 863 prior to an association. 865 2. Pre-Associate with resource reservation: Pre-Associate with Resource 866 Reservation involves the same steps as Pre-Associate, but on successful 867 pre-association also reserves resources in the Bridge to prepare for a 868 subsequent Associate request. 870 3. Associate: The Associate creates and activates an association between 871 a VSI instance and a bridge port. The Bridge allocates any required 872 bridge resources for the referenced VSI. The Bridge activates the 873 configuration for the VSI Type ID. This association is then applied to 874 the traffic flow to/from the VSI instance. 876 4. Deassociate: The de-associate is used to remove an association 877 between a VSI instance and a bridge port. Pre-Associated and Associated 878 VSIs can be de-associated. De-associate releases any resources that were 879 reserved as a result of prior Associate or Pre-Associate operations for 880 that VSI instance. 882 Deassociate can be initiated by either side and the rest types of 883 messages can only be initiated by the server side. 885 Some important flag values in VDP Status field: 887 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is 888 migrating (M-bit = 1) or provides no guidance on the migration of the 889 user of the VSI (M-bit = 0). The M-bit is used as an indicator relative 890 to the VSI that the user is migrating to. 892 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 893 suspended (S-bit = 1) or provides no guidance as to whether the user of 894 the VSI is suspended (S-bit = 0). A keep-alive Associate request with 895 S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used 896 as an indicator relative to the VSI that the user is migrating from. 898 The filter information format currently supports 4 types as the 899 following. 901 1. VID Filter Info format 902 +---------+------+-------+--------+ 903 | #of | PS | PCP | VID | 904 |entries |(1bit)|(3bits)|(12bits)| 905 |(2octets)| | | | 906 +---------+------+-------+--------+ 907 |<--Repeated per entry->| 909 Figure A.2 VID Filter Info format 911 2. MAC/VID filter format 912 +---------+--------------+------+-------+--------+ 913 | #of | MAC address | PS | PCP | VID | 914 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 915 |(2octets)| | | | | 916 +---------+--------------+------+-------+--------+ 917 |<--------Repeated per entry---------->| 919 Figure A.3 MAC/VID filter format 921 3. GroupID/VID filter format 922 +---------+--------------+------+-------+--------+ 923 | #of | GroupID | PS | PCP | VID | 924 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 925 |(2octets)| | | | | 926 +---------+--------------+------+-------+--------+ 927 |<--------Repeated per entry---------->| 929 Figure A.4 GroupID/VID filter format 931 4. GroupID/MAC/VID filter format 932 +---------+----------+-------------+------+-----+--------+ 933 | #of | GroupID | MAC address | PS | PCP | VID | 934 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 935 |(2octets)| | | | | | 936 +---------+----------+-------------+------+-----+--------+ 937 |<-------------Repeated per entry------------->| 938 Figure A.5 GroupID/MAC/VID filter format 940 The null VID can be used in the VDP Request sent from the hypervisor to 941 the external bridge. Use of the null VID indicates that the set of VID 942 values associated with the VSI is expected to be supplied by the Bridge. 943 The Bridge can obtain VID values from the VSI Type whose identity is 944 specified by the VSI Type information in the VDP Request. The set of VID 945 values is returned to the station via the VDP Response. The returned VID 946 value can be a locally significant value. When GroupID is used, it is 947 equivalent to the VN ID in NVO3. GroupID will be provided by the 948 hypervisor to the bridge. The bridge will map GroupID to a locally 949 significant VLAN ID. 951 The VSIID in VDP request that identify a VM can be one of the following 952 format: IPV4 address, IPV6 address, MAC address, UUID or locally 953 defined. 955 Authors' Addresses 957 Yizhou Li 958 Huawei Technologies 959 101 Software Avenue, 960 Nanjing 210012 961 China 963 Phone: +86-25-56625409 964 EMail: liyizhou@huawei.com 966 Lucy Yong 967 Huawei Technologies, USA 969 Email: lucy.yong@huawei.com 971 Lawrence Kreeger 972 Cisco 974 Email: kreeger@cisco.com 975 Thomas Narten 976 IBM 978 Email: narten@us.ibm.com 979 David Black 980 EMC 982 Email: david.black@emc.com