idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The external NVE connected to destination hypervisor 2 has to associate the migrating VM's TSI with it by discovering the TSI's MAC and/or IP addresses, its VN, locally significant VID if any, and provisioning other network related parameters of the TSI. The external NVE may be informed about the VM's peer VMs, storage devices and other network appliances with which the VM needs to communicate or is communicating. The migrated VM on destination hypervisor 2 SHOULD not go to Running state before all the network provisioning and binding has been done. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The migrating VM SHOULD not be in Running state at the same time on the source hypervisor and destination hypervisor during migration. The VM on the source hypervisor does not transition into Shutdown state until the VM successfully enters the Running state on the destination hypervisor. It is possible that VM on the source hypervisor stays in Migrating state for a while after VM on the destination hypervisor is in Running state. -- The document date (August 26, 2015) is 3156 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2236' is mentioned on line 649, but not defined == Missing Reference: 'IEEE8021Qbg' is mentioned on line 725, but not defined == Unused Reference: 'I-D.ietf-nvo3-nve-nva-cp-req' is defined on line 832, but no explicit reference was found in the text == Unused Reference: '8021Q' is defined on line 850, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-01 == Outdated reference: A later version (-04) exists of draft-ietf-opsawg-vmm-mib-00 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Yizhou Li 3 INTERNET-DRAFT Lucy Yong 4 Intended Status: Informational Huawei Technologies 5 Lawrence Kreeger 6 Cisco 7 Thomas Narten 8 IBM 9 David Black 10 EMC 11 Expires: February 27, 2016 August 26, 2015 13 Split-NVE Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-03 16 Abstract 18 In a Split-NVE architecture, the functions of the NVE are split 19 across a server and an external network equipment which is called an 20 external NVE. The server-resident control plane functionality 21 resides in control software, which may be part of a hypervisor or 22 container management software; for simplicity, this draft refers to 23 the hypervisor as the location of this software. 25 A control plane protocol(s) between a hypervisor and its associated 26 external NVE(s) is used for the hypervisor to distribute its virtual 27 machine networking state to the external NVE(s) for further handling. 28 This document illustrates the functionality required by this type of 29 control plane signaling protocol and outlines the high level 30 requirements. Virtual machine states as well as state transitioning 31 are summarized to help clarifying the needed protocol requirements. 33 Status of this Memo 35 This Internet-Draft is submitted to IETF in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF), its areas, and its working groups. Note that 40 other groups may also distribute working documents as 41 Internet-Drafts. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/1id-abstracts.html 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html 54 Copyright and License Notice 56 Copyright (c) 2015 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 73 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 6 74 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 8 75 2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 8 76 2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 9 77 2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 10 78 2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 10 79 3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 10 80 3.1 VN connect and Disconnect . . . . . . . . . . . . . . . . . 11 81 3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 12 82 3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 15 83 4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 16 84 5. VDP Applicability and Enhancement Needs . . . . . . . . . . . . 17 85 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 19 86 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 19 87 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 88 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 89 8.1 Normative References . . . . . . . . . . . . . . . . . . . 20 90 8.2 Informative References . . . . . . . . . . . . . . . . . . 20 92 Appendix A. IEEE 802.1Qbg VDP Illustration (For information 93 only) . . . . . . . . . . . . . . . . . . . . . . . . . . 20 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 96 1. Introduction 98 In the Split-NVE architecture shown in Figure 1, the functionality of 99 the NVE is split across an end device supporting virtualization and 100 an external network device which is called an external NVE. The 101 portion of the NVE functionality located on the end device is called 102 the tNVE and the portion located on the external NVE is called the 103 nNVE in this document. Overlay encapsulation/decapsulation functions 104 are normally off-loaded to the nNVE on the external NVE. 106 The tNVE is normally implemented as a part of hypervisor or container 107 and/or virtual switch in an virtualized end device. This document 108 uses the term "hypervisor" throughout when describing the Split-NVE 109 scenario where part of the NVE functionality is off-loaded to a 110 separate device from the "hypervisor" that contains a VM connected to 111 a VN. In this context, the term "hypervisor" is meant to cover any 112 device type where part of the NVE functionality is off-loaded in this 113 fashion, e.g.,a Network Service Appliance, Linux Container. 115 The problem statement [RFC7364], discusses the needs for a control 116 plane protocol (or protocols) to populate each NVE with the state 117 needed to perform the required functions. In one scenario, an NVE 118 provides overlay encapsulation/decapsulation packet forwarding 119 services to Tenant Systems (TSs) that are co-resident within the NVE 120 on the same End Device (e.g. when the NVE is embedded within a 121 hypervisor or a Network Service Appliance). In such cases, there is 122 no need for a standardized protocol between the hypervisor and NVE, 123 as the interaction is implemented via software on a single device. 124 While in the Split-NVE architecture scenarios, as shown in figure 2 125 to figure 4, a control plane protocol(s) between a hypervisor and its 126 associated external NVE(s) is required for the hypervisor to 127 distribute the virtual machines networking states to the NVE(s) for 128 further handling. The protocol indeed is an NVE-internal protocol and 129 runs between tNVE and nNVE logical entities. This protocol is 130 mentioned in NVO3 problem statement [RFC7364] and appears as the 131 third work item. 133 Virtual machine states and state transitioning are summarized in this 134 document to show events where the NVE needs to take specific actions. 135 Such events might correspond to actions the control plane signaling 136 protocols between the hypervisor and external NVE will need to take. 137 Then the high level requirements to be fulfilled are outlined. 139 +-- -- -- -- Split-NVE -- -- -- --+ 140 | 141 | 142 +---------------|-----+ 143 | +------------- ----+| | 144 | | +--+ +---\|/--+|| +------ --------------+ 145 | | |VM|---+ ||| | \|/ | 146 | | +--+ | ||| |+--------+ | 147 | | +--+ | tNVE |||----- - - - - - -----|| | | 148 | | |VM|---+ ||| || nNVE | | 149 | | +--+ +--------+|| || | | 150 | | || |+--------+ | 151 | +--Hypervisor------+| +---------------------+ 152 +---------------------+ 154 End Device External NVE 156 Figure 1 Split-NVE structure 158 This document uses VMs as an example of Tenant Systems (TSs) in order 159 to describe the requirements, even though a VM is just one type of 160 Tenant System that may connect to a VN. For example, a service 161 instance within a Network Service Appliance is another type of TS, as 162 are systems running on an OS-level virtualization technologies like 163 containers. The fact that VMs have lifecycles(e.g., can be created 164 and destroyed), can be moved, and can be started or stopped results 165 in a general set of protocol requirements, most of which are 166 applicable to other forms of TSs. It should also be noted that not 167 all of the requirements are applicable to all forms of TSs. 169 Section 2 describes VM states and state transitioning in its 170 lifecycle. Section 3 introduces Hypervisor-to-NVE control plane 171 protocol functionality derived from VM operations and network events. 172 Section 4 outlines the requirements of the control plane protocol to 173 achieve the required functionality. 175 1.1 Terminology 177 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 178 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 179 document are to be interpreted as described in RFC 2119 [RFC2119]. 181 This document uses the same terminology as found in [RFC7365] and [I- 182 D.ietf-nvo3-nve-nva-cp-req]. This section defines additional 183 terminology used by this document. 185 Split-NVE: a type of NVE that the functionalities of it are split 186 across an end device supporting virtualization and an external 187 network device. 189 tNVE: the portion of Split-NVE functionalities located on the end 190 device supporting virtualization. It interacts with tenant system by 191 internal interface in end device. 193 nNVE: the portion of Split-NVE functionalities located on the network 194 device which is directly or indirectly connects to the end device 195 holding the corresponding tNVE. nNVE normally performs encapsulation 196 and decapsulation to the overlay network. 198 External NVE: the physical network device holding nNVE 200 Hypervisor/Container: the logical collection of software, firmware 201 and/or hardware that allows the creation and running of server or 202 service appliance virtualization. tNVE is located on 203 Hypervisor/Container. It is loosely used in this document to refer to 204 the end device supporting the virtualization. For simplicity, we also 205 use Hypervisor in this document to represent both hypervisor and 206 container. 208 VN Profile: Meta data associated with a VN that is applied to any 209 attachment point to the VN. That is, VAP properties that are appliaed 210 to all VAPs associated with a given VN and used by an NVE when 211 ingressing/egressing packets to/from a specific VN. Meta data could 212 include such information as ACLs, QoS settings, etc. The VN Profile 213 contains parameters that apply to the VN as a whole. Control 214 protocols between the NVE and NVA could use the VN ID or VN Name to 215 obtain the VN Profile. 217 VSI: Virtual Station Interface. [IEEE 802.1Qbg] 219 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg] 221 1.2 Target Scenarios 223 In the Split-NVE architecture, an external NVE can provide an offload 224 of the encapsulation / decapsulation function, network policy 225 enforcement, as well as the VN Overlay protocol overhead. This 226 offloading may provide performance improvements and/or resource 227 savings to the End Device (e.g. hypervisor) making use of the 228 external NVE. 230 The following figures give example scenarios of a Split-NVE 231 architecture. 233 Hypervisor Access Switch 234 +------------------+ +-----+-------+ 235 | +--+ +-------+ | | | | 236 | |VM|---| | | VLAN | | | 237 | +--+ | tNVE |---------+ nNVE| +--- Underlying 238 | +--+ | | | Trunk | | | Network 239 | |VM|---| | | | | | 240 | +--+ +-------+ | | | | 241 +------------------+ +-----+-------+ 242 Figure 2 Hypervisor with an External NVE 244 Hypervisor L2 Switch 245 +---------------+ +-----+ +----+---+ 246 | +--+ +----+ | | | | | | 247 | |VM|---| | |VLAN | |VLAN | | | 248 | +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying 249 | +--+ | | |Trunk| |Trunk| | | Network 250 | |VM|---| | | | | | | | 251 | +--+ +----+ | | | | | | 252 +---------------+ +-----+ +----+---+ 253 Figure 3 Hypervisor with an External NVE 254 across an Ethernet Access Switch 256 Network Service Appliance Access Switch 257 +--------------------------+ +-----+-------+ 258 | +------------+ | \ | | | | 259 | |Net Service |----| \ | | | | 260 | |Instance | | \ | VLAN | | | 261 | +------------+ |tNVE| |------+nNVE | +--- Underlying 262 | +------------+ | | | Trunk| | | Network 263 | |Net Service |----| / | | | | 264 | |Instance | | / | | | | 265 | +------------+ | / | | | | 266 +--------------------------+ +-----+-------+ 267 Figure 4 Physical Network Service Appliance with an External NVE 269 Tenant Systems connect to external NVEs via a Tenant System Interface 270 (TSI). The TSI logically connects to the external NVE via a Virtual 271 Access Point (VAP) [I-D.ietf-nvo3-arch]. The external NVE may provide 272 Layer 2 or Layer 3 forwarding. In the Split-NVE architecture, the 273 external NVE may be able to reach multiple MAC and IP addresses via a 274 TSI. For example, Tenant Systems that are providing network services 275 (such as transparent firewall, load balancer, VPN gateway) are likely 276 to have complex address hierarchy. This implies that if a given TSI 277 disassociates from one VN, all the MAC and/or IP addresses are also 278 disassociated. There is no need to signal the deletion of every MAC 279 or IP when the TSI is brought down or deleted. In the majority of 280 cases, a VM will be acting as a simple host that will have a single 281 TSI and single MAC and IP visible to the external NVE. 283 Figures 2-4 show the use of VLANs to separate traffic for multiple 284 VNs between the tNVE and nNVE; VLANs are not strictly necessary if 285 only one VN is involved, but multiple VNs are expected in most cases, 286 and hence this draft assumes their presence. 288 2. VM Lifecycle 290 Figure 2 of [I-D.ietf-opsawg-vmm-mib] shows the state transition of a 291 VM. Some of the VM states are of interest to the external NVE. This 292 section illustrates the relevant phases and events in the VM 293 lifecycle. It should be noted that the following subsections do not 294 give an exhaustive traversal of VM lifecycle state. They are intended 295 as the illustrative examples which are relevant to Split-NVE 296 architecture, not as prescriptive text; the goal is to capture 297 sufficient detail to set a context for the signaling protocol 298 functionality and requirements described in the following sections. 300 2.1 VM Creation Event 302 VM creation event makes the VM state transiting from Preparing to 303 Shutdown and then to Running [I-D.ietf-opsawg-vmm-mib]. The end 304 device allocates and initializes local virtual resources like storage 305 in the VM Preparing state. In Shutdown state, the VM has everything 306 ready except that CPU execution is not scheduled by the hypervisor 307 and VM's memory is not resident in the hypervisor. From the Shutdown 308 state to Running state, normally it requires the human execution or 309 system triggered event. Running state indicates the VM is in the 310 normal execution state. As part of transitioning the VM to the 311 Running state, the hypervisor must also provision network 312 connectivity for the VM's TSI(s) so that Ethernet frames can be sent 313 and received correctly. No ongoing migration, suspension or shutdown 314 is in process. 316 In the VM creation phase, the VM's TSI has to be associated with the 317 external NVE. Association here indicates that hypervisor and the 318 external NVE have signaled each other and reached some agreement. 319 Relevant networking parameters or information have been provisioned 320 properly. The External NVE should be informed of the VM's TSI MAC 321 address and/or IP address. In addition to external network 322 connectivity, the hypervisor may provide local network connectivity 323 between the VM's TSI and other VM's TSI that are co-resident on the 324 same hypervisor. When the intra or inter-hypervisor connectivity is 325 extended to the external NVE, a locally significant tag, e.g. VLAN 326 ID, should be used between the hypervisor and the external NVE to 327 differentiate each VN's traffic. Both the hypervisor and external NVE 328 sides must agree on that tag value for traffic identification, 329 isolation and forwarding. 331 The external NVE may need to do some preparation work before it 332 signals successful association with TSI. Such preparation work may 333 include locally saving the states and binding information of the 334 tenant system interface and its VN, communicating with the NVA for 335 network provisioning, etc. 337 Tenant System interface association should be performed before the VM 338 enters running state, preferably in Shutdown state. If association 339 with external NVE fails, the VM should not go into running state. 341 2.2 VM Live Migration Event 343 Live migration is sometimes referred to as "hot" migration, in that 344 from an external viewpoint, the VM appears to continue to run while 345 being migrated to another server (e.g., TCP connections generally 346 survive this class of migration). In contrast, "cold" migration 347 consists of shutdown VM execution on one server and restart it on 348 another. For simplicity, the following abstract summary about live 349 migration assumes shared storage, so that the VM's storage is 350 accessible to the source and destination servers. Assume VM live 351 migrates from hypervisor 1 to hypervisor 2. Such migration event 352 involves the state transition on both hypervisors, source hypervisor 353 1 and destination hypervisor 2. VM state on source hypervisor 1 354 transits from Running to Migrating and then to Shutdown [I-D.ietf- 355 opsawg-vmm-mib]. VM state on destination hypervisor 2 transits from 356 Shutdown to Migrating and then Running. 358 The external NVE connected to destination hypervisor 2 has to 359 associate the migrating VM's TSI with it by discovering the TSI's MAC 360 and/or IP addresses, its VN, locally significant VID if any, and 361 provisioning other network related parameters of the TSI. The 362 external NVE may be informed about the VM's peer VMs, storage devices 363 and other network appliances with which the VM needs to communicate 364 or is communicating. The migrated VM on destination hypervisor 2 365 SHOULD not go to Running state before all the network provisioning 366 and binding has been done. 368 The migrating VM SHOULD not be in Running state at the same time on 369 the source hypervisor and destination hypervisor during migration. 370 The VM on the source hypervisor does not transition into Shutdown 371 state until the VM successfully enters the Running state on the 372 destination hypervisor. It is possible that VM on the source 373 hypervisor stays in Migrating state for a while after VM on the 374 destination hypervisor is in Running state. 376 2.3 VM Termination Event 378 VM termination event is also referred to as "powering off" a VM. VM 379 termination event leads to its state going to Shutdown. There are two 380 possible causes to terminate a VM [I-D.ietf-opsawg-vmm-mib], one is 381 the normal "power off" of a running VM; the other is that VM has been 382 migrated to another hypervisor and the VM image on the source 383 hypervisor has to stop executing and to be shutdown. 385 In VM termination, the external NVE connecting to that VM needs to 386 deprovision the VM, i.e. delete the network parameters associated 387 with that VM. In other words, the external NVE has to de-associate 388 the VM's TSI. 390 2.4 VM Pause, Suspension and Resumption Events 392 The VM pause event leads to the VM transiting from Running state to 393 Paused state. The Paused state indicates that the VM is resident in 394 memory but no longer scheduled to execute by the hypervisor [I- 395 D.ietf-opsawg-vmm-mib]. The VM can be easily re-activated from Paused 396 state to Running state. 398 The VM suspension event leads to the VM transiting from Running state 399 to Suspended state. The VM resumption event leads to the VM 400 transiting state from Suspended state to Running state. Suspended 401 state means the memory and CPU execution state of the virtual machine 402 are saved to persistent store. During this state, the virtual 403 machine is not scheduled to execute by the hypervisor [I-D.ietf- 404 opsawg-vmm-mib]. 406 In the Split-NVE architecture, the external NVE should keep any 407 paused or suspended VM in association as the VM can return to Running 408 state at any time. 410 3. Hypervisor-to-NVE Control Plane Protocol Functionality 412 The following subsections show the illustrative examples of the state 413 transitions on external NVE which are relevant to Hypervisor-to-NVE 414 Signaling protocol functionality. It should be noted they are not 415 prescriptive text for full state machines. 417 3.1 VN connect and Disconnect 419 In Split-NVE scenario, a protocol is needed between the End 420 Device(e.g. Hypervisor) making use of the external NVE and the 421 external NVE in order to make the external NVE aware of the changing 422 VN membership requirements of the Tenant Systems within the End 423 Device. 425 A key driver for using a protocol rather than using static 426 configuration of the external NVE is because the VN connectivity 427 requirements can change frequently as VMs are brought up, moved and 428 brought down on various hypervisors throughout the data center or 429 external cloud. 431 +---------------+ Recv VN_connect; +-------------------+ 432 |VN_Disconnected| return Local_Tag value |VN_Connected | 433 +---------------+ for VN if successful; +-------------------+ 434 |VN_ID; |-------------------------->|VN_ID; | 435 |VN_State= | |VN_State=connected;| 436 |disconnected; | |Num_TSI_Associated;| 437 | |<----Recv VN_disconnect----|Local_Tag; | 438 +---------------+ |VN_Context; | 439 +-------------------+ 441 Figure 5 State Transition Example of a VAP Instance 442 on an External NVE 444 Figure 5 shows the state transition for a VAP on the external NVE. An 445 NVE that supports the hypervisor to NVE control plane protocol should 446 support one instance of the state machine for each active VN. The 447 state transition on the external NVE is normally triggered by the 448 hypervisor-facing side events and behaviors. Some of the interleaved 449 interaction between NVE and NVA will be illustrated for better 450 understanding of the whole procedure; while others of them may not be 451 shown. More detailed information regarding that is available in [I- 452 D.ietf-nvo3-nve-nva-cp-req]. 454 The external NVE must be notified when an End Device requires 455 connection to a particular VN and when it no longer requires 456 connection. In addition, the external NVE must provide a local tag 457 value for each connected VN to the End Device to use for exchange of 458 packets between the End Device and the external NVE (e.g. a locally 459 significant 802.1Q tag value). How "local" the significance is 460 depends on whether the Hypervisor has a direct physical connection to 461 the external NVE (in which case the significance is local to the 462 physical link), or whether there is an Ethernet switch (e.g. a blade 463 switch) connecting the Hypervisor to the NVE (in which case the 464 significance is local to the intervening switch and all the links 465 connected to it). 467 These VLAN tags are used to differentiate between different VNs as 468 packets cross the shared access network to the external NVE. When the 469 external NVE receives packets, it uses the VLAN tag to identify the 470 VN of packets coming from a given TSI, strips the tag, and adds the 471 appropriate overlay encapsulation for that VN and sends it towards 472 the corresponding remote NVE across the underlying IP network. 474 The Identification of the VN in this protocol could either be through 475 a VN Name or a VN ID. A globally unique VN Name facilitates 476 portability of a Tenant's Virtual Data Center. Once an external NVE 477 receives a VN connect indication, the NVE needs a way to get a VN 478 Context allocated (or receive the already allocated VN Context) for a 479 given VN Name or ID (as well as any other information needed to 480 transmit encapsulated packets). How this is done is the subject of 481 the NVE-to-NVA protocol which are part of work items 1 and 2 in 482 [RFC7364]. 484 VN_connect message can be explicit or implicit. Explicit means the 485 hypervisor sending a message explicitly to request for the connection 486 to a VN. Implicit means the external NVE receives other messages, 487 e.g. very first TSI associate message (see the next subsection) for a 488 given VN, to implicitly indicate its interest to connect to a VN. 490 A VN_disconnect message will indicate that the NVE can release all 491 the resources for that disconnected VN and transit to VN_disconnected 492 state. The local tag assigned for that VN can possibly be reclaimed 493 by other VN. 495 3.2 TSI Associate and Activate 497 Typically, a TSI is assigned a single MAC address and all frames 498 transmitted and received on that TSI use that single MAC address. As 499 mentioned earlier, it is also possible for a Tenant System to 500 exchange frames using multiple MAC addresses or packets with multiple 501 IP addresses. 503 Particularly in the case of a TS that is forwarding frames or packets 504 from other TSs, the external NVE will need to communicate the mapping 505 between the NVE's IP address (on the underlying network) and ALL the 506 addresses the TS is forwarding on behalf of for the corresponding VN 507 to the NVA. 509 The NVE has two ways in which it can discover the tenant addresses 510 for which frames must be forwarded to a given End Device (and 511 ultimately to the TS within that End Device). 513 1. It can glean the addresses by inspecting the source addresses in 514 packets it receives from the End Device. 516 2. The hypervisor can explicitly signal the address associations of 517 a TSI to the external NVE. The address association includes all the 518 MAC and/or IP addresses possibly used as source addresses in a packet 519 sent from the hypervisor to external NVE. The external NVE may 520 further use this information to filter the future traffic from the 521 hypervisor. 523 To perform the second approach above, the "hypervisor-to-NVE" 524 protocol requires a means to allow End Devices to communicate new 525 tenant addresses associations for a given TSI within a given VN. 527 Figure 6 shows the example of a state transition for a TSI connecting 528 to a VAP on the external NVE. An NVE that supports the hypervisor to 529 NVE control plane protocol may support one instance of the state 530 machine for each TSI connecting to a given VN. 532 disassociate; +--------+ disassociate 533 +--------------->| Init |<--------------------+ 534 | +--------+ | 535 | | | | 536 | | | | 537 | +--------+ | 538 | | | | 539 | associate | | activate | 540 | +-----------+ +-----------+ | 541 | | | | 542 | | | | 543 | \|/ \|/ | 544 +--------------------+ +---------------------+ 545 | Associated | | Activated | 546 +--------------------+ +---------------------+ 547 |TSI_ID; | |TSI_ID; | 548 |Port; |-----activate---->|Port; | 549 |VN_ID; | |VN_ID; | 550 |State=associated; | |State=activated ; |-+ 551 +-|Num_Of_Addr; |<---deactivate;---|Num_Of_Addr; | | 552 | |List_Of_Addr; | |List_Of_Addr; | | 553 | +--------------------+ +---------------------+ | 554 | /|\ /|\ | 555 | | | | 556 +---------------------+ +-------------------+ 557 add/remove/updt addr; add/remove/updt addr; 558 or update port; or update port; 560 Figure 6 State Transition Example of a TSI Instance 561 on an External NVE 563 Associated state of a TSI instance on an external NVE indicates all 564 the addresses for that TSI have already associated with the VAP of 565 the external NVE on port p for a given VN but no real traffic to and 566 from the TSI is expected and allowed to pass through. An NVE has 567 reserved all the necessary resources for that TSI. An external NVE 568 may report the mappings of its' underlay IP address and the 569 associated TSI addresses to NVA and relevant network nodes may save 570 such information to its mapping table but not forwarding table. A NVE 571 may create ACL or filter rules based on the associated TSI addresses 572 on the attached port p but not enable them yet. Local tag for the VN 573 corresponding to the TSI instance should be provisioned on port p to 574 receive packets. 576 VM migration event(discussed section 2) may cause the hypervisor to 577 send an associate message to the NVE connected to the destination 578 hypervisor the VM migrates to. VM creation event may also lead to the 579 same practice. 581 The Activated state of a TSI instance on an external NVE indicates 582 that all the addresses for that TSI functioning correctly on port p 583 and traffic can be received from and sent to that TSI via the NVE. 584 The mappings of the NVE's underlay IP address and the associated TSI 585 addresses should be put into the forwarding table rather than the 586 mapping table on relevant network nodes. ACL or filter rules based on 587 the associated TSI addresses on the attached port p in NVE are 588 enabled. Local tag for the VN corresponding to the TSI instance MUST 589 be provisioned on port p to receive packets. 591 The Activate message makes the state transit from Init or Associated 592 to Activated. VM creation, VM migration and VM resumption events 593 discussed in section 4 may trigger the Activate message to be sent 594 from the hypervisor to the external NVE. 596 TSI information may get updated either in Associated or Activated 597 state. The following are considered updates to the TSI information: 598 add or remove the associated addresses, update current associated 599 addresses (for example updating IP for a given MAC), update NVE port 600 information based on where the NVE receives messages. Such updates do 601 not change the state of TSI. When any address associated to a given 602 TSI changes, the NVE should inform the NVA to update the mapping 603 information on NVE's underlying address and the associated TSI 604 addresses. The NVE should also change its local ACL or filter 605 settings accordingly for the relevant addresses. Port information 606 update will cause the local tag for the VN corresponding to the TSI 607 instance to be provisioned on new port p and removed from the old 608 port. 610 3.3 TSI Disassociate and Deactivate 612 Disassociate and deactivate conceptually are the reverse behaviors of 613 associate and activate. From Activated state to Associated state, the 614 external NVE needs to make sure the resources are still reserved but 615 the addresses associated to the TSI are not functioning and no 616 traffic to and from the TSI is expected and allowed to pass through. 617 For example, the NVE needs to inform the NVA to remove the relevant 618 addresses mapping information from forwarding or routing table. ACL 619 or filtering rules regarding the relevant addresses should be 620 disabled. From Associated or Activated state to the Init state, the 621 NVE will release all the resources relevant to TSI instances. The NVE 622 should also inform the NVA to remove the relevant entries from 623 mapping table. ACL or filtering rules regarding the relevant 624 addresses should be removed. Local tag provisioning on the connecting 625 port on NVE should be cleared. 627 A VM suspension event(discussed in section 2) may cause the relevant 628 TSI instance(s) on the NVE to transit from Activated to Associated 629 state. A VM pause event normally does not affect the state of the 630 relevant TSI instance(s) on the NVE as the VM is expected to run 631 again soon. The VM shutdown event will normally cause the relevant 632 TSI instance(s) on NVE transit to Init state from Activated state. 633 All resources should be released. 635 A VM migration will lead the TSI instance on the source NVE to leave 636 Activated state. When a VM migrates to another hypervisor connecting 637 to the same NVE, i.e. source and destination NVE are the same, NVE 638 should use TSI_ID and incoming port to differentiate two TSI 639 instance. 641 Although the triggering messages for state transition shown in Figure 642 6 does not indicate the difference between VM creation/shutdown event 643 and VM migration arrival/departure event, the external NVE can make 644 optimizations if it is notified of such information. For example, if 645 the NVE knows the incoming activate message is caused by migration 646 rather than VM creation, some mechanisms may be employed or triggered 647 to make sure the dynamic configurations or provisionings on the 648 destination NVE are the same as those on the source NVE for the 649 migrated VM. For example IGMP query [RFC2236] can be triggered by the 650 destination external NVE to the migrated VM on destination hypervisor 651 so that the VM is forced to answer an IGMP report to the multicast 652 router. Then multicast router can correctly send the multicast 653 traffic to the new external NVE for those multicast groups the VM had 654 joined before the migration. 656 4. Hypervisor-to-NVE Control Plane Protocol Requirements 658 Req-1: The protocol MUST support a bridged network connecting End 659 Devices to External NVE. 661 Req-2: The protocol MUST support multiple End Devices sharing the 662 same External NVE via the same physical port across a bridged 663 network. 665 Req-3: The protocol MAY support an End Device using multiple external 666 NVEs simultaneously, but only one external NVE for each VN. 668 Req-4: The protocol MAY support an End Device using multiple external 669 NVEs simultaneously for the same VN. 671 Req-5: The protocol MUST allow the End Device initiating a request to 672 its associated External NVE to be connected/disconnected to a given 673 VN. 675 Req-6: The protocol MUST allow an External NVE initiating a request 676 to its connected End Devices to be disconnected to a given VN. 678 Req-7: When a TS attaches to a VN, the protocol MUST allow for an End 679 Device and its external NVE to negotiate one or more locally- 680 significant tag(s) for carrying traffic associated with a specific VN 681 (e.g., 802.1Q tags). 683 Req-8: The protocol MUST allow an End Device initiating a request to 684 associate/disassociate and/or activate/deactive address(es) of a TSI 685 instance to a VN on an NVE port. 687 Req-9: The protocol MUST allow the External NVE initiating a request 688 to disassociate and/or deactivate address(es) of a TSI instance to a 689 VN on an NVE port. 691 Req-10: The protocol MUST allow an End Device initiating a request to 692 add, remove or update address(es) associated with a TSI instance on 693 the external NVE. Addresses can be expressed in different formats, 694 for example, MAC, IP or pair of IP and MAC. 696 Req-11: The protocol MUST allow the External NVE to authenticate the 697 End Device connected. 699 Req-12: The protocol MUST be able to run over L2 links between the 700 End Device and its External NVE. 702 Req-13: The protocol SHOULD support the End Device indicating if an 703 associate or activate request from it results from a VM hot migration 704 event. 706 5. VDP Applicability and Enhancement Needs 708 Virtual Station Interface (VSI) Discovery and Configuration Protocol 709 (VDP) [IEEE 802.1Qbg] can be the control plane protocol running 710 between the hypervisor and the external NVE. Appendix A illustrates 711 VDP for reader's information. 713 VDP facilitates the automatic discovery and configuration for Edge 714 Virtual Bridging (EVB) station and Edge Virtual Bridging (EVB) 715 bridge. EVB station is normally an end station running multiple VMs. 716 It is conceptually equivalent to hypervisor in this document. And EVB 717 bridge is conceptually equivalent to the external NVE. 719 VDP is able to pre-associate/associate/de-associate a VSI on EVB 720 station to a port on the EVB bridge. VSI is approximately the concept 721 of a virtual port a VM connects to the hypervisor in this document 722 context. The EVB station and the EVB bridge can reach the agreement 723 on VLAN ID(s) assigned to a VSI via VDP message exchange. Other 724 configuration parameters can be exchanged via VDP as well. VDP is 725 carried over Edge Control Protocol(ECP) [IEEE8021Qbg] which provides 726 a reliable transportation over a layer 2 network. 728 VDP protocol needs some extensions to fulfill the requirements listed 729 in this document. Table 1 shows the needed extensions and/or 730 clarifications in NVO3 context. 732 +------+-----------+-----------------------------------------------+ 733 | Req | VDP | remarks | 734 | | supported?| | 735 +------+-----------+-----------------------------------------------+ 736 | Req-1| | | 737 +------+ |Needs extension. Must be able to send to a | 738 | Req-2| |specific unicast MAC and should be able to send| 739 +------+ Partially |to a non-reserved well known multicast address | 740 | Req-3| |other than the nearest customer bridge address | 741 +------+ | | 742 | Req-4| | | 743 +------+-----------+-----------------------------------------------+ 744 | Req-5| Yes |VN is indicated by GroupID | 745 +------+-----------+-----------------------------------------------+ 746 | Req-6| Yes |Bridge sends De-Associate | 747 +------+-----------+------------------------+----------------------+ 748 | | |VID==NULL in request and bridge returns the | 749 | Req-7| Yes |assigned value in response or specify GroupID | 750 | | |in request and get VID assigned in returning | 751 | | |response. Multiple VLANs per group is allowed | 752 +------+-----------+------------------------+----------------------+ 753 | | | requirements | VDP equivalence | 754 | | +------------------------+----------------------+ 755 | | | associate/disassociate|pre-asso/de-associate | 756 | Req-8| Partially | activate/deactivate |associate/de-associate| 757 | | +------------------------+----------------------| 758 | | |Needs extension to allow associate->pre-assoc | 759 +------+-----------+------------------------+----------------------+ 760 | Req-9| Yes | VDP bridge initiates de-associate | 761 +------+-----------+-----------------------------------------------+ 762 |Req-10| Partially |Needs extension for IPv4/IPv6 address. Add a | 763 | | |new "filter info format" type | 764 +------+-----------+-----------------------------------------------+ 765 |Req-11| No |Out-of-band mechanism is preferred, e.g. MACSec| 766 | | |or 802.1x. | 767 +------+-----------+-----------------------------------------------+ 768 |Req-12| Yes |L2 protocol naturally | 769 +------+-----------+-----------------------------------------------+ 770 | | |M bit for migrated VM on destination hypervisor| 771 | | |and S bit for that on source hypervisor. | 772 |Req-13| Partially |It is indistinguishable when M/S is 0 between | 773 | | |no guidance and events not caused by migration | 774 | | |where NVE may act differently. Needs new | 775 | | |New bits for migration indication in new | 776 | | |"filter info format" type | 777 +------+-----------+-----------------------------------------------+ 778 Table 1 Compare VDP with the requirements 780 Simply adding the ability to carry layer 3 addresses, VDP can serve 781 the Hypervisor-to-NVE control plane functions pretty well. Other 782 extensions are the improvement of the protocol capabilities for 783 better fit in NVO3 network. 785 6. Security Considerations 787 NVEs must ensure that only properly authorized Tenant Systems are 788 allowed to join and become a part of any specific Virtual Network. In 789 addition, NVEs will need appropriate mechanisms to ensure that any 790 hypervisor wishing to use the services of an NVE are properly 791 authorized to do so. One design point is whether the hypervisor 792 should supply the NVE with necessary information (e.g., VM addresses, 793 VN information, or other parameters) that the NVE uses directly, or 794 whether the hypervisor should only supply a VN ID and an identifier 795 for the associated VM (e.g., its MAC address), with the NVE using 796 that information to obtain the information needed to validate the 797 hypervisor-provided parameters or obtain related parameters in a 798 secure manner. 800 7. IANA Considerations 802 No IANA action is required. RFC Editor: please delete this section 803 before publication. 805 8. Acknowledgements 807 This document was initiated and merged from the drafts draft-kreeger- 808 nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve-mechanism and draft- 809 kompella-nvo3-server2nve. Thanks to all the co-authors and 810 contributing members of those drafts. 812 The authors would like to specially thank Jon Hudson for his generous 813 help in improving the readability of this document. 815 8. References 817 8.1 Normative References 819 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 820 Requirement Levels", BCP 14, RFC 2119, March 1997. 822 8.2 Informative References 824 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and 825 M. Napierala, "Problem Statement: Overlays for Network 826 Virtualization", October 2014. 828 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 829 Rekhter, "Framework for DC Network Virtualization", 830 October 2014. 832 [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and 833 D. Black, "Network Virtualization NVE to NVA Control 834 Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-01 835 (work in progress), October 2013. 837 [I-D.ietf-nvo3-arch] Black, D., Narten, T., et al, "An Architecture 838 for Overlay Networks (NVO3)", draft-narten-nvo3-arch, work 839 in progress. 841 [I-D.ietf-opsawg-vmm-mib] Asai H., MacFaden M., Schoenwaelder J., 842 Shima K., Tsou T., "Management Information Base for 843 Virtual Machines Controlled by a Hypervisor", draft-ietf- 844 opsawg-vmm-mib-00 (work in progress), February 2014. 846 [IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual 847 Bridged Local Area Networks - Amendment 21: Edge Virtual 848 Bridging", IEEE Std 802.1Qbg, 2012 850 [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged 851 Local Area Networks", IEEE Std 802.1Q-2011, August, 2011 853 Appendix A. IEEE 802.1Qbg VDP Illustration (For information only) 855 VDP has the format shown in Figure A.1. Virtual Station Interface (VSI) 856 is an interface to a virtual station that is attached to a downlink port 857 of an internal bridging function in server. VSI's VDP packet will be 858 handled by an external bridge. VDP is the controlling protocol running 859 between the hypervisor and the external bridge. 861 +--------+--------+------+----+----+------+------+------+-----------+ 862 |TLV type|TLV info|Status|VSI |VSI |VSIID | VSIID|Filter|Filter Info| 863 | 7b |str len | |Type|Type|Format| | Info | | 864 | | 9b | 1oct |ID |Ver | | |format| | 865 | | | |3oct|1oct| 1oct |16oct |1oct | M oct | 866 +--------+--------+------+----+----+------+------+------+-----------+ 867 | | | | | 868 | | |<--VSI type&instance-->|<----Filter------>| 869 | | |<------------VSI attributes-------------->| 870 |<--TLV header--->|<-------TLV info string = 23 + M octets--------->| 872 Figure A.1: VDP TLV definitions 874 There are basically four TLV types. 876 1. Pre-Associate: Pre-Associate is used to pre-associate a VSI instance 877 with a bridge port. The bridge validates the request and returns a 878 failure Status in case of errors. Successful pre-association does not 879 imply that the indicated VSI Type or provisioning will be applied to any 880 traffic flowing through the VSI. The pre-associate enables faster 881 response to an associate, by allowing the bridge to obtain the VSI Type 882 prior to an association. 884 2. Pre-Associate with resource reservation: Pre-Associate with Resource 885 Reservation involves the same steps as Pre-Associate, but on successful 886 pre-association also reserves resources in the Bridge to prepare for a 887 subsequent Associate request. 889 3. Associate: The Associate creates and activates an association between 890 a VSI instance and a bridge port. The Bridge allocates any required 891 bridge resources for the referenced VSI. The Bridge activates the 892 configuration for the VSI Type ID. This association is then applied to 893 the traffic flow to/from the VSI instance. 895 4. Deassociate: The de-associate is used to remove an association 896 between a VSI instance and a bridge port. Pre-Associated and Associated 897 VSIs can be de-associated. De-associate releases any resources that were 898 reserved as a result of prior Associate or Pre-Associate operations for 899 that VSI instance. 901 Deassociate can be initiated by either side and the rest types of 902 messages can only be initiated by the server side. 904 Some important flag values in VDP Status field: 906 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is 907 migrating (M-bit = 1) or provides no guidance on the migration of the 908 user of the VSI (M-bit = 0). The M-bit is used as an indicator relative 909 to the VSI that the user is migrating to. 911 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 912 suspended (S-bit = 1) or provides no guidance as to whether the user of 913 the VSI is suspended (S-bit = 0). A keep-alive Associate request with 914 S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used 915 as an indicator relative to the VSI that the user is migrating from. 917 The filter information format currently supports 4 types as the 918 following. 920 1. VID Filter Info format 921 +---------+------+-------+--------+ 922 | #of | PS | PCP | VID | 923 |entries |(1bit)|(3bits)|(12bits)| 924 |(2octets)| | | | 925 +---------+------+-------+--------+ 926 |<--Repeated per entry->| 928 Figure A.2 VID Filter Info format 930 2. MAC/VID filter format 931 +---------+--------------+------+-------+--------+ 932 | #of | MAC address | PS | PCP | VID | 933 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 934 |(2octets)| | | | | 935 +---------+--------------+------+-------+--------+ 936 |<--------Repeated per entry---------->| 938 Figure A.3 MAC/VID filter format 940 3. GroupID/VID filter format 941 +---------+--------------+------+-------+--------+ 942 | #of | GroupID | PS | PCP | VID | 943 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 944 |(2octets)| | | | | 945 +---------+--------------+------+-------+--------+ 946 |<--------Repeated per entry---------->| 948 Figure A.4 GroupID/VID filter format 950 4. GroupID/MAC/VID filter format 951 +---------+----------+-------------+------+-----+--------+ 952 | #of | GroupID | MAC address | PS | PCP | VID | 953 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 954 |(2octets)| | | | | | 955 +---------+----------+-------------+------+-----+--------+ 956 |<-------------Repeated per entry------------->| 957 Figure A.5 GroupID/MAC/VID filter format 959 The null VID can be used in the VDP Request sent from the hypervisor to 960 the external bridge. Use of the null VID indicates that the set of VID 961 values associated with the VSI is expected to be supplied by the Bridge. 962 The Bridge can obtain VID values from the VSI Type whose identity is 963 specified by the VSI Type information in the VDP Request. The set of VID 964 values is returned to the station via the VDP Response. The returned VID 965 value can be a locally significant value. When GroupID is used, it is 966 equivalent to the VN ID in NVO3. GroupID will be provided by the 967 hypervisor to the bridge. The bridge will map GroupID to a locally 968 significant VLAN ID. 970 The VSIID in VDP request that identify a VM can be one of the following 971 format: IPV4 address, IPV6 address, MAC address, UUID or locally 972 defined. 974 Authors' Addresses 976 Yizhou Li 977 Huawei Technologies 978 101 Software Avenue, 979 Nanjing 210012 980 China 982 Phone: +86-25-56625409 983 EMail: liyizhou@huawei.com 985 Lucy Yong 986 Huawei Technologies, USA 988 Email: lucy.yong@huawei.com 990 Lawrence Kreeger 991 Cisco 993 Email: kreeger@cisco.com 994 Thomas Narten 995 IBM 997 Email: narten@us.ibm.com 998 David Black 999 EMC 1001 Email: david.black@emc.com