idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The external NVE connected to destination hypervisor 2 has to associate the migrating VM's TSI with it by discovering the TSI's MAC and/or IP addresses, its VN, locally significant VLAN ID if any, and provisioning other network related parameters of the TSI. The external NVE may be informed about the VM's peer VMs, storage devices and other network appliances with which the VM needs to communicate or is communicating. The migrated VM on destination hypervisor 2 SHOULD not go to Running state before all the network provisioning and binding has been done. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: The migrating VM SHOULD not be in Running state at the same time on the source hypervisor and destination hypervisor during migration. The VM on the source hypervisor does not transition into Shutdown state until the VM successfully enters the Running state on the destination hypervisor. It is possible that VM on the source hypervisor stays in Migrating state for a while after VM on the destination hypervisor is in Running state. -- The document date (October 27, 2017) is 2372 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2236' is mentioned on line 651, but not defined == Missing Reference: 'IEEE8021Qbg' is mentioned on line 727, but not defined == Unused Reference: 'I-D.ietf-nvo3-nve-nva-cp-req' is defined on line 834, but no explicit reference was found in the text == Unused Reference: '8021Q' is defined on line 852, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-ietf-nvo3-nve-nva-cp-req-01 == Outdated reference: A later version (-04) exists of draft-ietf-opsawg-vmm-mib-00 Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Y. Li 3 INTERNET-DRAFT D. Eastlake 4 Intended Status: Informational Huawei Technologies 5 L. Kreeger 6 Arrcus, Inc 7 T. Narten 8 IBM 9 D. Black 10 EMC 11 Expires: April 30, 2018 October 27, 2017 13 Split-NVE Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-10 16 Abstract 18 In a Split-NVE architecture, the functions of the NVE (Network 19 Virtualization Edge) are split across a server and an external 20 network equipment which is called an external NVE. The server- 21 resident control plane functionality resides in control software, 22 which may be part of a hypervisor or container management software; 23 for simplicity, this draft refers to the hypervisor as the location 24 of this software. 26 Control plane protocol(s) between a hypervisor and its associated 27 external NVE(s) are used by the hypervisor to distribute its virtual 28 machine networking state to the external NVE(s) for further handling. 29 This document illustrates the functionality required by this type of 30 control plane signaling protocol and outlines the high level 31 requirements. Virtual machine states as well as state transitioning 32 are summarized to help clarify the needed protocol requirements. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2017 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 6 75 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 8 77 2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 9 78 2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 10 79 2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 10 80 3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 10 81 3.1 VN Connect and Disconnect . . . . . . . . . . . . . . . . . 11 82 3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 12 83 3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 15 84 4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 16 85 5. VDP Applicability and Enhancement Needs . . . . . . . . . . . . 17 86 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 19 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 19 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 89 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1 Normative References . . . . . . . . . . . . . . . . . . . 20 91 8.2 Informative References . . . . . . . . . . . . . . . . . . 20 92 Appendix A. IEEE 802.1Qbg VDP Illustration (For information 93 only) . . . . . . . . . . . . . . . . . . . . . . . . . . 20 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 96 1. Introduction 98 In the Split-NVE architecture shown in Figure 1, the functionality of 99 the NVE (Network Virtualization Edge) is split across an end device 100 supporting virtualization and an external network device which is 101 called an external NVE. The portion of the NVE functionality located 102 on the end device is called the tNVE and the portion located on the 103 external NVE is called the nNVE in this document. Overlay 104 encapsulation/decapsulation functions are normally off-loaded to the 105 nNVE on the external NVE. 107 The tNVE is normally implemented as a part of hypervisor or container 108 and/or virtual switch in an virtualized end device. This document 109 uses the term "hypervisor" throughout when describing the Split-NVE 110 scenario where part of the NVE functionality is off-loaded to a 111 separate device from the "hypervisor" that contains a VM connected to 112 a VN. In this context, the term "hypervisor" is meant to cover any 113 device type where part of the NVE functionality is off-loaded in this 114 fashion, e.g.,a Network Service Appliance or Linux Container. 116 The problem statement [RFC7364], discusses the needs for a control 117 plane protocol (or protocols) to populate each NVE with the state 118 needed to perform the required functions. In one scenario, an NVE 119 provides overlay encapsulation/decapsulation packet forwarding 120 services to Tenant Systems (TSs) that are co-resident within the NVE 121 on the same End Device (e.g. when the NVE is embedded within a 122 hypervisor or a Network Service Appliance). In such cases, there is 123 no need for a standardized protocol between the hypervisor and NVE, 124 as the interaction is implemented via software on a single device. 125 While in the Split-NVE architecture scenarios, as shown in figure 2 126 to figure 4, control plane protocol(s) between a hypervisor and its 127 associated external NVE(s) are required for the hypervisor to 128 distribute the virtual machines networking states to the NVE(s) for 129 further handling. The protocol is an NVE-internal protocol and runs 130 between tNVE and nNVE logical entities. This protocol is mentioned in 131 NVO3 problem statement [RFC7364] and appears as the third work item. 133 Virtual machine states and state transitioning are summarized in this 134 document to show events where the NVE needs to take specific actions. 135 Such events might correspond to actions the control plane signaling 136 protocol(s) need to take between the hypervisor and external NVE 137 will. Then the high level requirements to be fulfilled are 138 satisfied. 140 +-- -- -- -- Split-NVE -- -- -- --+ 141 | 142 | 143 +---------------|-----+ 144 | +------------- ----+| | 145 | | +--+ +---\|/--+|| +------ --------------+ 146 | | |VM|---+ ||| | \|/ | 147 | | +--+ | ||| |+--------+ | 148 | | +--+ | tNVE |||----- - - - - - -----|| | | 149 | | |VM|---+ ||| || nNVE | | 150 | | +--+ +--------+|| || | | 151 | | || |+--------+ | 152 | +--Hypervisor------+| +---------------------+ 153 +---------------------+ 155 End Device External NVE 157 Figure 1 Split-NVE structure 159 This document uses VMs as an example of Tenant Systems (TSs) in order 160 to describe the requirements, even though a VM is just one type of 161 Tenant System that may connect to a VN. For example, a service 162 instance within a Network Service Appliance is another type of TS, as 163 are systems running on an OS-level virtualization technologies like 164 containers. The fact that VMs have lifecycles (e.g., can be created 165 and destroyed, can be moved, and can be started or stopped) results 166 in a general set of protocol requirements, most of which are 167 applicable to other forms of TSs. Note that not all of the 168 requirements are applicable to all forms of TSs. 170 Section 2 describes VM states and state transitioning in the VM's 171 lifecycle. Section 3 introduces Hypervisor-to-NVE control plane 172 protocol functionality derived from VM operations and network events. 173 Section 4 outlines the requirements of the control plane protocol to 174 achieve the required functionality. 176 1.1 Terminology 178 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 179 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 180 document are to be interpreted as described in RFC 2119 [RFC2119]. 182 This document uses the same terminology as found in [RFC7365] and [I- 183 D.ietf-nvo3-nve-nva-cp-req]. This section defines additional 184 terminology used by this document. 186 Split-NVE: a type of NVE that the functionalities of it are split 187 across an end device supporting virtualization and an external 188 network device. 190 tNVE: the portion of Split-NVE functionalities located on the end 191 device supporting virtualization. It interacts with tenant system by 192 internal interface in end device. 194 nNVE: the portion of Split-NVE functionalities located on the network 195 device which is directly or indirectly connects to the end device 196 holding the corresponding tNVE. nNVE normally performs encapsulation 197 and decapsulation to the overlay network. 199 External NVE: the physical network device holding nNVE 201 Hypervisor/Container: the logical collection of software, firmware 202 and/or hardware that allows the creation and running of server or 203 service appliance virtualization. tNVE is located on 204 Hypervisor/Container. It is loosely used in this document to refer to 205 the end device supporting the virtualization. For simplicity, we also 206 use Hypervisor in this document to represent both hypervisor and 207 container. 209 VN Profile: Meta data associated with a VN that is applied to any 210 attachment point to the VN. That is, VAP properties that are appliaed 211 to all VAPs associated with a given VN and used by an NVE when 212 ingressing/egressing packets to/from a specific VN. Meta data could 213 include such information as ACLs, QoS settings, etc. The VN Profile 214 contains parameters that apply to the VN as a whole. Control 215 protocols between the NVE and NVA could use the VN ID or VN Name to 216 obtain the VN Profile. 218 VSI: Virtual Station Interface. [IEEE 802.1Qbg] 220 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg] 222 1.2 Target Scenarios 224 In the Split-NVE architecture, an external NVE can provide an offload 225 of the encapsulation / decapsulation functions and network policy 226 enforcement as well as the VN Overlay protocol overhead. This 227 offloading may provide performance improvements and/or resource 228 savings to the End Device (e.g. hypervisor) making use of the 229 external NVE. 231 The following figures give example scenarios of a Split-NVE 232 architecture. 234 Hypervisor Access Switch 235 +------------------+ +-----+-------+ 236 | +--+ +-------+ | | | | 237 | |VM|---| | | VLAN | | | 238 | +--+ | tNVE |---------+ nNVE| +--- Underlying 239 | +--+ | | | Trunk | | | Network 240 | |VM|---| | | | | | 241 | +--+ +-------+ | | | | 242 +------------------+ +-----+-------+ 243 Figure 2 Hypervisor with an External NVE 245 Hypervisor L2 Switch 246 +---------------+ +-----+ +----+---+ 247 | +--+ +----+ | | | | | | 248 | |VM|---| | |VLAN | |VLAN | | | 249 | +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying 250 | +--+ | | |Trunk| |Trunk| | | Network 251 | |VM|---| | | | | | | | 252 | +--+ +----+ | | | | | | 253 +---------------+ +-----+ +----+---+ 254 Figure 3 Hypervisor with an External NVE 255 across an Ethernet Access Switch 257 Network Service Appliance Access Switch 258 +--------------------------+ +-----+-------+ 259 | +------------+ | \ | | | | 260 | |Net Service |----| \ | | | | 261 | |Instance | | \ | VLAN | | | 262 | +------------+ |tNVE| |------+nNVE | +--- Underlying 263 | +------------+ | | | Trunk| | | Network 264 | |Net Service |----| / | | | | 265 | |Instance | | / | | | | 266 | +------------+ | / | | | | 267 +--------------------------+ +-----+-------+ 268 Figure 4 Physical Network Service Appliance with an External NVE 270 Tenant Systems connect to external NVEs via a Tenant System Interface 271 (TSI). The TSI logically connects to the external NVE via a Virtual 272 Access Point (VAP) [I-D.ietf-nvo3-arch]. The external NVE may provide 273 Layer 2 or Layer 3 forwarding. In the Split-NVE architecture, the 274 external NVE may be able to reach multiple MAC and IP addresses via a 275 TSI. For example, Tenant Systems that are providing network services 276 (such as transparent firewall, load balancer, or VPN gateway) are 277 likely to have a complex address hierarchy. This implies that if a 278 given TSI disassociates from one VN, all the MAC and/or IP addresses 279 are also disassociated. There is no need to signal the deletion of 280 every MAC or IP when the TSI is brought down or deleted. In the 281 majority of cases, a VM will be acting as a simple host that will 282 have a single TSI and single MAC and IP visible to the external NVE. 284 Figures 2 through 4 show the use of VLANs to separate traffic for 285 multiple VNs between the tNVE and nNVE; VLANs are not strictly 286 necessary if only one VN is involved, but multiple VNs are expected 287 in most cases, and hence this draft assumes their presence. 289 2. VM Lifecycle 291 Figure 2 of [I-D.ietf-opsawg-vmm-mib] shows the state transition of a 292 VM. Some of the VM states are of interest to the external NVE. This 293 section illustrates the relevant phases and events in the VM 294 lifecycle. Note that the following subsections do not give an 295 exhaustive traversal of VM lifecycle state. They are intended as the 296 illustrative examples which are relevant to Split-NVE architecture, 297 not as prescriptive text; the goal is to capture sufficient detail to 298 set a context for the signaling protocol functionality and 299 requirements described in the following sections. 301 2.1 VM Creation Event 303 The VM creation event causes the VM state transition from Preparing 304 to Shutdown and then to Running [I-D.ietf-opsawg-vmm-mib]. The end 305 device allocates and initializes local virtual resources like storage 306 in the VM Preparing state. In the Shutdown state, the VM has 307 everything ready except that CPU execution is not scheduled by the 308 hypervisor and VM's memory is not resident in the hypervisor. From 309 the Shutdown state to the Running state, normally it requires human 310 action or a system triggered event. Running state indicates the VM is 311 in the normal execution state. As part of transitioning the VM to the 312 Running state, the hypervisor must also provision network 313 connectivity for the VM's TSI(s) so that Ethernet frames can be sent 314 and received correctly. No ongoing migration, suspension or shutdown 315 is in process. 317 In the VM creation phase, the VM's TSI has to be associated with the 318 external NVE. Association here indicates that hypervisor and the 319 external NVE have signaled each other and reached some agreement. 320 Relevant networking parameters or information have been provisioned 321 properly. The External NVE SHOULD be informed of the VM's TSI MAC 322 address and/or IP address. In addition to external network 323 connectivity, the hypervisor may provide local network connectivity 324 between the VM's TSI and other VM's TSI that are co-resident on the 325 same hypervisor. When the intra or inter-hypervisor connectivity is 326 extended to the external NVE, a locally significant tag, e.g. VLAN 327 ID, SHOULD be used between the hypervisor and the external NVE to 328 differentiate each VN's traffic. Both the hypervisor and external NVE 329 sides must agree on that tag value for traffic identification, 330 isolation, and forwarding. 332 The external NVE may need to do some preparation before it signals 333 successful association with TSI. Such preparation may include locally 334 saving the states and binding information of the tenant system 335 interface and its VN, communicating with the NVA for network 336 provisioning, etc. 338 Tenant System interface association SHOULD be performed before the VM 339 enters the Running state, preferably in the Shutdown state. If 340 association with external NVE fails, the VM SHOULD NOT go into the 341 Running state. 343 2.2 VM Live Migration Event 345 Live migration is sometimes referred to as "hot" migration in that, 346 from an external viewpoint, the VM appears to continue to run while 347 being migrated to another server (e.g., TCP connections generally 348 survive this class of migration). In contrast, "cold" migration 349 consists of shutting down VM execution on one server and restarting 350 it on another. For simplicity, the following abstract summary about 351 live migration assumes shared storage, so that the VM's storage is 352 accessible to the source and destination servers. Assume VM live 353 migrates from hypervisor 1 to hypervisor 2. Such migration event 354 involves the state transition on both hypervisors, source hypervisor 355 1 and destination hypervisor 2. VM state on source hypervisor 1 356 transits from Running to Migrating and then to Shutdown [I-D.ietf- 357 opsawg-vmm-mib]. VM state on destination hypervisor 2 transits from 358 Shutdown to Migrating and then Running. 360 The external NVE connected to destination hypervisor 2 has to 361 associate the migrating VM's TSI with it by discovering the TSI's MAC 362 and/or IP addresses, its VN, locally significant VLAN ID if any, and 363 provisioning other network related parameters of the TSI. The 364 external NVE may be informed about the VM's peer VMs, storage devices 365 and other network appliances with which the VM needs to communicate 366 or is communicating. The migrated VM on destination hypervisor 2 367 SHOULD not go to Running state before all the network provisioning 368 and binding has been done. 370 The migrating VM SHOULD not be in Running state at the same time on 371 the source hypervisor and destination hypervisor during migration. 372 The VM on the source hypervisor does not transition into Shutdown 373 state until the VM successfully enters the Running state on the 374 destination hypervisor. It is possible that VM on the source 375 hypervisor stays in Migrating state for a while after VM on the 376 destination hypervisor is in Running state. 378 2.3 VM Termination Event 380 VM termination event is also referred to as "powering off" a VM. VM 381 termination event leads to its state going to Shutdown. There are two 382 possible causes of VM termination [I-D.ietf-opsawg-vmm-mib]. One is 383 the normal "power off" of a running VM; the other is that VM has been 384 migrated to another hypervisor and the VM image on the source 385 hypervisor has to stop executing and to be shutdown. 387 In VM termination, the external NVE connecting to that VM needs to 388 deprovision the VM, i.e. delete the network parameters associated 389 with that VM. In other words, the external NVE has to de-associate 390 the VM's TSI. 392 2.4 VM Pause, Suspension and Resumption Events 394 The VM pause event leads to the VM transiting from Running state to 395 Paused state. The Paused state indicates that the VM is resident in 396 memory but no longer scheduled to execute by the hypervisor [I- 397 D.ietf-opsawg-vmm-mib]. The VM can be easily re-activated from Paused 398 state to Running state. 400 The VM suspension event leads to the VM transiting from Running state 401 to Suspended state. The VM resumption event leads to the VM 402 transiting state from Suspended state to Running state. Suspended 403 state means the memory and CPU execution state of the virtual machine 404 are saved to persistent store. During this state, the virtual 405 machine is not scheduled to execute by the hypervisor [I-D.ietf- 406 opsawg-vmm-mib]. 408 In the Split-NVE architecture, the external NVE should keep any 409 paused or suspended VM in association as the VM can return to Running 410 state at any time. 412 3. Hypervisor-to-NVE Control Plane Protocol Functionality 414 The following subsections show the illustrative examples of the state 415 transitions on external NVE which are relevant to Hypervisor-to-NVE 416 Signaling protocol functionality. It should be noted they are not 417 prescriptive text for full state machines. 419 3.1 VN Connect and Disconnect 421 In Split-NVE scenario, a protocol is needed between the End Device 422 (e.g. Hypervisor) making use of the external NVE and the external NVE 423 in order to make the external NVE aware of the changing VN membership 424 requirements of the Tenant Systems within the End Device. 426 A key driver for using a protocol rather than using static 427 configuration of the external NVE is because the VN connectivity 428 requirements can change frequently as VMs are brought up, moved, and 429 brought down on various hypervisors throughout the data center or 430 external cloud. 432 +---------------+ Recv VN_connect; +-------------------+ 433 |VN_Disconnected| return Local_Tag value |VN_Connected | 434 +---------------+ for VN if successful; +-------------------+ 435 |VN_ID; |-------------------------->|VN_ID; | 436 |VN_State= | |VN_State=connected;| 437 |disconnected; | |Num_TSI_Associated;| 438 | |<----Recv VN_disconnect----|Local_Tag; | 439 +---------------+ |VN_Context; | 440 +-------------------+ 442 Figure 5. State Transition Example of a VAP Instance 443 on an External NVE 445 Figure 5 shows the state transition for a VAP on the external NVE. An 446 NVE that supports the hypervisor to NVE control plane protocol should 447 support one instance of the state machine for each active VN. The 448 state transition on the external NVE is normally triggered by the 449 hypervisor-facing side events and behaviors. Some of the interleaved 450 interaction between NVE and NVA will be illustrated for better 451 understanding of the whole procedure; while others of them may not be 452 shown. More detailed information regarding that is available in [I- 453 D.ietf-nvo3-nve-nva-cp-req]. 455 The external NVE MUST be notified when an End Device requires 456 connection to a particular VN and when it no longer requires 457 connection. In addition, the external NVE must provide a local tag 458 value for each connected VN to the End Device to use for exchange of 459 packets between the End Device and the external NVE (e.g. a locally 460 significant 802.1Q tag value). How "local" the significance is 461 depends on whether the Hypervisor has a direct physical connection to 462 the external NVE (in which case the significance is local to the 463 physical link), or whether there is an Ethernet switch (e.g. a blade 464 switch) connecting the Hypervisor to the NVE (in which case the 465 significance is local to the intervening switch and all the links 466 connected to it). 468 These VLAN tags are used to differentiate between different VNs as 469 packets cross the shared access network to the external NVE. When the 470 external NVE receives packets, it uses the VLAN tag to identify the 471 VN of packets coming from a given TSI, strips the tag, and adds the 472 appropriate overlay encapsulation for that VN and sends it towards 473 the corresponding remote NVE across the underlying IP network. 475 The Identification of the VN in this protocol could either be through 476 a VN Name or a VN ID. A globally unique VN Name facilitates 477 portability of a Tenant's Virtual Data Center. Once an external NVE 478 receives a VN connect indication, the NVE needs a way to get a VN 479 Context allocated (or receive the already allocated VN Context) for a 480 given VN Name or ID (as well as any other information needed to 481 transmit encapsulated packets). How this is done is the subject of 482 the NVE-to-NVA protocol which are part of work items 1 and 2 in 483 [RFC7364]. 485 The VN_connect message can be explicit or implicit. Explicit means 486 the hypervisor sending a message explicitly to request for the 487 connection to a VN. Implicit means the external NVE receives other 488 messages, e.g. very first TSI associate message (see the next 489 subsection) for a given VN, to implicitly indicate its interest to 490 connect to a VN. 492 A VN_disconnect message will indicate that the NVE can release all 493 the resources for that disconnected VN and transit to VN_disconnected 494 state. The local tag assigned for that VN can possibly be reclaimed 495 by another VN. 497 3.2 TSI Associate and Activate 499 Typically, a TSI is assigned a single MAC address and all frames 500 transmitted and received on that TSI use that single MAC address. As 501 mentioned earlier, it is also possible for a Tenant System to 502 exchange frames using multiple MAC addresses or packets with multiple 503 IP addresses. 505 Particularly in the case of a TS that is forwarding frames or packets 506 from other TSs, the external NVE will need to communicate the mapping 507 between the NVE's IP address (on the underlying network) and ALL the 508 addresses the TS is forwarding on behalf of for the corresponding VN 509 to the NVA. 511 The NVE has two ways in which it can discover the tenant addresses 512 for which frames are to be forwarded to a given End Device (and 513 ultimately to the TS within that End Device). 515 1. It can glean the addresses by inspecting the source addresses in 516 packets it receives from the End Device. 518 2. The hypervisor can explicitly signal the address associations of 519 a TSI to the external NVE. The address association includes all the 520 MAC and/or IP addresses possibly used as source addresses in a packet 521 sent from the hypervisor to external NVE. The external NVE may 522 further use this information to filter the future traffic from the 523 hypervisor. 525 To perform the second approach above, the "hypervisor-to-NVE" 526 protocol requires a means to allow End Devices to communicate new 527 tenant addresses associations for a given TSI within a given VN. 529 Figure 6 shows the example of a state transition for a TSI connecting 530 to a VAP on the external NVE. An NVE that supports the hypervisor to 531 NVE control plane protocol may support one instance of the state 532 machine for each TSI connecting to a given VN. 534 disassociate; +--------+ disassociate 535 +--------------->| Init |<--------------------+ 536 | +--------+ | 537 | | | | 538 | | | | 539 | +--------+ | 540 | | | | 541 | associate | | activate | 542 | +-----------+ +-----------+ | 543 | | | | 544 | | | | 545 | \|/ \|/ | 546 +--------------------+ +---------------------+ 547 | Associated | | Activated | 548 +--------------------+ +---------------------+ 549 |TSI_ID; | |TSI_ID; | 550 |Port; |-----activate---->|Port; | 551 |VN_ID; | |VN_ID; | 552 |State=associated; | |State=activated ; |-+ 553 +-|Num_Of_Addr; |<---deactivate;---|Num_Of_Addr; | | 554 | |List_Of_Addr; | |List_Of_Addr; | | 555 | +--------------------+ +---------------------+ | 556 | /|\ /|\ | 557 | | | | 558 +---------------------+ +-------------------+ 559 add/remove/updt addr; add/remove/updt addr; 560 or update port; or update port; 562 Figure 6 State Transition Example of a TSI Instance 563 on an External NVE 565 The Associated state of a TSI instance on an external NVE indicates 566 all the addresses for that TSI have already associated with the VAP 567 of the external NVE on port p for a given VN but no real traffic to 568 and from the TSI is expected and allowed to pass through. An NVE has 569 reserved all the necessary resources for that TSI. An external NVE 570 may report the mappings of its' underlay IP address and the 571 associated TSI addresses to NVA and relevant network nodes may save 572 such information to its mapping table but not forwarding table. A NVE 573 may create ACL or filter rules based on the associated TSI addresses 574 on the attached port p but not enable them yet. Local tag for the VN 575 corresponding to the TSI instance should be provisioned on port p to 576 receive packets. 578 VM migration event (discussed section 2) may cause the hypervisor to 579 send an associate message to the NVE connected to the destination 580 hypervisor the VM migrates to. VM creation event may also lead to the 581 same practice. 583 The Activated state of a TSI instance on an external NVE indicates 584 that all the addresses for that TSI are functioning correctly on port 585 p and traffic can be received from and sent to that TSI via the NVE. 586 The mappings of the NVE's underlay IP address and the associated TSI 587 addresses should be put into the forwarding table rather than the 588 mapping table on relevant network nodes. ACL or filter rules based on 589 the associated TSI addresses on the attached port p in NVE are 590 enabled. The local tag for the VN corresponding to the TSI instance 591 MUST be provisioned on port p to receive packets. 593 The Activate message makes the state transit from Init or Associated 594 to Activated. VM creation, VM migration and VM resumption events 595 discussed in Section 4 may trigger the Activate message to be sent 596 from the hypervisor to the external NVE. 598 TSI information may get updated either in Associated or Activated 599 state. The following are considered updates to the TSI information: 600 add or remove the associated addresses, update current associated 601 addresses (for example updating IP for a given MAC), update NVE port 602 information based on where the NVE receives messages. Such updates do 603 not change the state of TSI. When any address associated to a given 604 TSI changes, the NVE should inform the NVA to update the mapping 605 information on NVE's underlying address and the associated TSI 606 addresses. The NVE should also change its local ACL or filter 607 settings accordingly for the relevant addresses. Port information 608 update will cause the local tag for the VN corresponding to the TSI 609 instance to be provisioned on new port p and removed from the old 610 port. 612 3.3 TSI Disassociate and Deactivate 614 Disassociate and deactivate conceptually are the reverse behaviors of 615 associate and activate. From Activated state to Associated state, the 616 external NVE needs to make sure the resources are still reserved but 617 the addresses associated to the TSI are not functioning and no 618 traffic to and from the TSI is expected and allowed to pass through. 619 For example, the NVE needs to inform the NVA to remove the relevant 620 addresses mapping information from forwarding or routing table. ACL 621 or filtering rules regarding the relevant addresses should be 622 disabled. From Associated or Activated state to the Init state, the 623 NVE will release all the resources relevant to TSI instances. The NVE 624 should also inform the NVA to remove the relevant entries from 625 mapping table. ACL or filtering rules regarding the relevant 626 addresses should be removed. Local tag provisioning on the connecting 627 port on NVE SHOULD be cleared. 629 A VM suspension event (discussed in section 2) may cause the relevant 630 TSI instance(s) on the NVE to transit from Activated to Associated 631 state. A VM pause event normally does not affect the state of the 632 relevant TSI instance(s) on the NVE as the VM is expected to run 633 again soon. The VM shutdown event will normally cause the relevant 634 TSI instance(s) on NVE transition to Init state from Activated state. 635 All resources should be released. 637 A VM migration will lead the TSI instance on the source NVE to leave 638 Activated state. When a VM migrates to another hypervisor connecting 639 to the same NVE, i.e. source and destination NVE are the same, NVE 640 should use TSI_ID and incoming port to differentiate two TSI 641 instance. 643 Although the triggering messages for state transition shown in Figure 644 6 does not indicate the difference between VM creation/shutdown event 645 and VM migration arrival/departure event, the external NVE can make 646 optimizations if it is notified of such information. For example, if 647 the NVE knows the incoming activate message is caused by migration 648 rather than VM creation, some mechanisms may be employed or triggered 649 to make sure the dynamic configurations or provisionings on the 650 destination NVE are the same as those on the source NVE for the 651 migrated VM. For example an IGMP query [RFC2236] can be triggered by 652 the destination external NVE to the migrated VM on destination 653 hypervisor so that the VM is forced to answer an IGMP report to the 654 multicast router. Then a multicast router can correctly send the 655 multicast traffic to the new external NVE for those multicast groups 656 the VM had joined before the migration. 658 4. Hypervisor-to-NVE Control Plane Protocol Requirements 660 Req-1: The protocol MUST support a bridged network connecting End 661 Devices to External NVE. 663 Req-2: The protocol MUST support multiple End Devices sharing the 664 same External NVE via the same physical port across a bridged 665 network. 667 Req-3: The protocol MAY support an End Device using multiple external 668 NVEs simultaneously, but only one external NVE for each VN. 670 Req-4: The protocol MAY support an End Device using multiple external 671 NVEs simultaneously for the same VN. 673 Req-5: The protocol MUST allow the End Device initiating a request to 674 its associated External NVE to be connected/disconnected to a given 675 VN. 677 Req-6: The protocol MUST allow an External NVE initiating a request 678 to its connected End Devices to be disconnected to a given VN. 680 Req-7: When a TS attaches to a VN, the protocol MUST allow for an End 681 Device and its external NVE to negotiate one or more locally- 682 significant tag(s) for carrying traffic associated with a specific VN 683 (e.g., 802.1Q tags). 685 Req-8: The protocol MUST allow an End Device initiating a request to 686 associate/disassociate and/or activate/deactive address(es) of a TSI 687 instance to a VN on an NVE port. 689 Req-9: The protocol MUST allow the External NVE initiating a request 690 to disassociate and/or deactivate address(es) of a TSI instance to a 691 VN on an NVE port. 693 Req-10: The protocol MUST allow an End Device initiating a request to 694 add, remove or update address(es) associated with a TSI instance on 695 the external NVE. Addresses can be expressed in different formats, 696 for example, MAC, IP or pair of IP and MAC. 698 Req-11: The protocol MUST allow the External NVE to authenticate the 699 End Device connected. 701 Req-12: The protocol MUST be able to run over L2 links between the 702 End Device and its External NVE. 704 Req-13: The protocol SHOULD support the End Device indicating if an 705 associate or activate request from it results from a VM hot migration 706 event. 708 5. VDP Applicability and Enhancement Needs 710 Virtual Station Interface (VSI) Discovery and Configuration Protocol 711 (VDP) [IEEE 802.1Qbg] can be the control plane protocol running 712 between the hypervisor and the external NVE. Appendix A illustrates 713 VDP for the reader's information. 715 VDP facilitates the automatic discovery and configuration for Edge 716 Virtual Bridging (EVB) station and Edge Virtual Bridging (EVB) 717 bridge. EVB station is normally an end station running multiple VMs. 718 It is conceptually equivalent to hypervisor in this document. And EVB 719 bridge is conceptually equivalent to the external NVE. 721 VDP is able to pre-associate/associate/de-associate a VSI on an EVB 722 station to a port on the EVB bridge. VSI is approximately the concept 723 of a virtual port a VM connects to the hypervisor in this document 724 context. The EVB station and the EVB bridge can reach agreement on 725 VLAN ID(s) assigned to a VSI via VDP message exchange. Other 726 configuration parameters can be exchanged via VDP as well. VDP is 727 carried over the Edge Control Protocol(ECP) [IEEE8021Qbg] which 728 provides a reliable transportation over a layer 2 network. 730 VDP protocol needs some extensions to fulfill the requirements listed 731 in this document. Table 1 shows the needed extensions and/or 732 clarifications in the NVO3 context. 734 +------+-----------+-----------------------------------------------+ 735 | Req | VDP | remarks | 736 | | supported?| | 737 +------+-----------+-----------------------------------------------+ 738 | Req-1| | | 739 +------+ |Needs extension. Must be able to send to a | 740 | Req-2| |specific unicast MAC and should be able to send| 741 +------+ Partially |to a non-reserved well known multicast address | 742 | Req-3| |other than the nearest customer bridge address | 743 +------+ | | 744 | Req-4| | | 745 +------+-----------+-----------------------------------------------+ 746 | Req-5| Yes |VN is indicated by GroupID | 747 +------+-----------+-----------------------------------------------+ 748 | Req-6| Yes |Bridge sends De-Associate | 749 +------+-----------+------------------------+----------------------+ 750 | | |VID==NULL in request and bridge returns the | 751 | Req-7| Yes |assigned value in response or specify GroupID | 752 | | |in request and get VID assigned in returning | 753 | | |response. Multiple VLANs per group is allowed | 754 +------+-----------+------------------------+----------------------+ 755 | | | requirements | VDP equivalence | 756 | | +------------------------+----------------------+ 757 | | | associate/disassociate|pre-asso/de-associate | 758 | Req-8| Partially | activate/deactivate |associate/de-associate| 759 | | +------------------------+----------------------| 760 | | |Needs extension to allow associate->pre-assoc | 761 +------+-----------+------------------------+----------------------+ 762 | Req-9| Yes | VDP bridge initiates de-associate | 763 +------+-----------+-----------------------------------------------+ 764 |Req-10| Partially |Needs extension for IPv4/IPv6 address. Add a | 765 | | |new "filter info format" type | 766 +------+-----------+-----------------------------------------------+ 767 |Req-11| No |Out-of-band mechanism is preferred, e.g. MACSec| 768 | | |or 802.1X. | 769 +------+-----------+-----------------------------------------------+ 770 |Req-12| Yes |L2 protocol naturally | 771 +------+-----------+-----------------------------------------------+ 772 | | |M bit for migrated VM on destination hypervisor| 773 | | |and S bit for that on source hypervisor. | 774 |Req-13| Partially |It is indistinguishable when M/S is 0 between | 775 | | |no guidance and events not caused by migration | 776 | | |where NVE may act differently. Needs new | 777 | | |New bits for migration indication in new | 778 | | |"filter info format" type | 779 +------+-----------+-----------------------------------------------+ 780 Table 1 Compare VDP with the requirements 782 Simply adding the ability to carry layer 3 addresses, VDP can serve 783 the Hypervisor-to-NVE control plane functions pretty well. Other 784 extensions are the improvement of the protocol capabilities for 785 better fit in NVO3 network. 787 6. Security Considerations 789 NVEs must ensure that only properly authorized Tenant Systems are 790 allowed to join and become a part of any specific Virtual Network. In 791 addition, NVEs will need appropriate mechanisms to ensure that any 792 hypervisor wishing to use the services of an NVE are properly 793 authorized to do so. One design point is whether the hypervisor 794 should supply the NVE with necessary information (e.g., VM addresses, 795 VN information, or other parameters) that the NVE uses directly, or 796 whether the hypervisor should only supply a VN ID and an identifier 797 for the associated VM (e.g., its MAC address), with the NVE using 798 that information to obtain the information needed to validate the 799 hypervisor-provided parameters or obtain related parameters in a 800 secure manner. 802 7. IANA Considerations 804 No IANA action is required. RFC Editor: please delete this section 805 before publication. 807 8. Acknowledgements 809 This document was initiated based on the merger from the drafts 810 draft-kreeger-nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve- 811 mechanism, and draft-kompella-nvo3-server2nve. Thanks to all the co- 812 authors and contributing members of those drafts. 814 The authors would like to specially thank Lucy Yong and Jon Hudson 815 for their generous help in improving this document. 817 8. References 819 8.1 Normative References 821 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 822 Requirement Levels", BCP 14, RFC 2119, March 1997. 824 8.2 Informative References 826 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and 827 M. Napierala, "Problem Statement: Overlays for Network 828 Virtualization", October 2014. 830 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 831 Rekhter, "Framework for DC Network Virtualization", 832 October 2014. 834 [I-D.ietf-nvo3-nve-nva-cp-req] Kreeger, L., Dutt, D., Narten, T., and 835 D. Black, "Network Virtualization NVE to NVA Control 836 Protocol Requirements", draft-ietf-nvo3-nve-nva-cp-req-01 837 (work in progress), October 2013. 839 [I-D.ietf-nvo3-arch] Black, D., Narten, T., et al, "An Architecture 840 for Overlay Networks (NVO3)", draft-narten-nvo3-arch, work 841 in progress. 843 [I-D.ietf-opsawg-vmm-mib] Asai H., MacFaden M., Schoenwaelder J., 844 Shima K., Tsou T., "Management Information Base for 845 Virtual Machines Controlled by a Hypervisor", draft-ietf- 846 opsawg-vmm-mib-00 (work in progress), February 2014. 848 [IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual 849 Bridged Local Area Networks - Amendment 21: Edge Virtual 850 Bridging", IEEE Std 802.1Qbg, 2012 852 [8021Q] IEEE, "Media Access Control (MAC) Bridges and Virtual Bridged 853 Local Area Networks", IEEE Std 802.1Q-2011, August, 2011 855 Appendix A. IEEE 802.1Qbg VDP Illustration (For information only) 857 VDP (VSI Discovery and Discovery and Configuration Protocol) messages 858 are formatted as a TLV as shown in Figure A.1. Virtual Station Interface 859 (VSI) is an interface to a virtual station that is attached to a 860 downlink port of an internal bridging function in a server. VSI's VDP 861 packet will be handled by an external bridge. VDP is the controlling 862 protocol running between the hypervisor and the external bridge. 864 +--------+--------+------+----+----+------+------+------+-----------+ 865 |TLV type|TLV info|Status|VSI |VSI |VSIID | VSIID|Filter|Filter Info| 866 | 7b |str len | |Type|Type|Format| | Info | | 867 | | 9b | 1oct |ID |Ver | | |format| | 868 | | | |3oct|1oct| 1oct |16oct |1oct | M oct | 869 +--------+--------+------+----+----+------+------+------+-----------+ 870 | | | | | 871 | | |<--VSI type&instance-->|<----Filter------>| 872 | | |<------------VSI attributes-------------->| 873 |<--TLV header--->|<-------TLV info string = 23 + M octets--------->| 875 Figure A.1: VDP TLV definitions 877 There are basically four TLV types. 879 1. Pre-Associate: Pre-Associate is used to pre-associate a VSI instance 880 with a bridge port. The bridge validates the request and returns a 881 failure Status in case of errors. Successful pre-association does not 882 imply that the indicated VSI Type or provisioning will be applied to any 883 traffic flowing through the VSI. The pre-associate enables faster 884 response to an associate, by allowing the bridge to obtain the VSI Type 885 prior to an association. 887 2. Pre-Associate with resource reservation: Pre-Associate with Resource 888 Reservation involves the same steps as Pre-Associate, but on successful 889 pre-association also reserves resources in the Bridge to prepare for a 890 subsequent Associate request. 892 3. Associate: The Associate creates and activates an association between 893 a VSI instance and a bridge port. The Bridge allocates any required 894 bridge resources for the referenced VSI. The Bridge activates the 895 configuration for the VSI Type ID. This association is then applied to 896 the traffic flow to/from the VSI instance. 898 4. Deassociate: The de-associate is used to remove an association 899 between a VSI instance and a bridge port. Pre-Associated and Associated 900 VSIs can be de-associated. De-associate releases any resources that were 901 reserved as a result of prior Associate or Pre-Associate operations for 902 that VSI instance. 904 Deassociate can be initiated by either side and the rest types of 905 messages can only be initiated by the server side. 907 Some important flag values in VDP Status field: 909 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is 910 migrating (M-bit = 1) or provides no guidance on the migration of the 911 user of the VSI (M-bit = 0). The M-bit is used as an indicator relative 912 to the VSI that the user is migrating to. 914 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 915 suspended (S-bit = 1) or provides no guidance as to whether the user of 916 the VSI is suspended (S-bit = 0). A keep-alive Associate request with 917 S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used 918 as an indicator relative to the VSI that the user is migrating from. 920 The filter information format currently supports 4 types as the 921 following. 923 1. VID Filter Info format 924 +---------+------+-------+--------+ 925 | #of | PS | PCP | VID | 926 |entries |(1bit)|(3bits)|(12bits)| 927 |(2octets)| | | | 928 +---------+------+-------+--------+ 929 |<--Repeated per entry->| 931 Figure A.2 VID Filter Info format 933 2. MAC/VID filter format 934 +---------+--------------+------+-------+--------+ 935 | #of | MAC address | PS | PCP | VID | 936 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 937 |(2octets)| | | | | 938 +---------+--------------+------+-------+--------+ 939 |<--------Repeated per entry---------->| 941 Figure A.3 MAC/VID filter format 943 3. GroupID/VID filter format 944 +---------+--------------+------+-------+--------+ 945 | #of | GroupID | PS | PCP | VID | 946 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 947 |(2octets)| | | | | 948 +---------+--------------+------+-------+--------+ 949 |<--------Repeated per entry---------->| 951 Figure A.4 GroupID/VID filter format 953 4. GroupID/MAC/VID filter format 954 +---------+----------+-------------+------+-----+--------+ 955 | #of | GroupID | MAC address | PS | PCP | VID | 956 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 957 |(2octets)| | | | | | 958 +---------+----------+-------------+------+-----+--------+ 959 |<-------------Repeated per entry------------->| 960 Figure A.5 GroupID/MAC/VID filter format 962 The null VID can be used in the VDP Request sent from the hypervisor to 963 the external bridge. Use of the null VID indicates that the set of VID 964 values associated with the VSI is expected to be supplied by the Bridge. 965 The Bridge can obtain VID values from the VSI Type whose identity is 966 specified by the VSI Type information in the VDP Request. The set of VID 967 values is returned to the station via the VDP Response. The returned VID 968 value can be a locally significant value. When GroupID is used, it is 969 equivalent to the VN ID in NVO3. GroupID will be provided by the 970 hypervisor to the bridge. The bridge will map GroupID to a locally 971 significant VLAN ID. 973 The VSIID in VDP request that identify a VM can be one of the following 974 format: IPV4 address, IPV6 address, MAC address, UUID or locally 975 defined. 977 Authors' Addresses 979 Yizhou Li 980 Huawei Technologies 981 101 Software Avenue, 982 Nanjing 210012 983 China 985 Phone: +86-25-56625409 986 EMail: liyizhou@huawei.com 988 Donald Eastlake 989 Huawei R&D USA 990 155 Beaver Street 991 Milford, MA 01757 USA 993 Phone: +1-508-333-2270 994 EMail: d3e3e3@gmail.com 996 Lawrence Kreeger 997 Arrcus, Inc 999 Email: lkreeger@gmail.com 1001 Thomas Narten 1002 IBM 1004 Email: narten@us.ibm.com 1005 David Black 1006 EMC 1008 Email: david.black@emc.com