idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 24, 2018) is 2256 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Y. Li 3 INTERNET-DRAFT D. Eastlake 4 Intended Status: Informational Huawei Technologies 5 L. Kreeger 6 Arrcus, Inc 7 T. Narten 8 IBM 9 D. Black 10 EMC 11 Expires: July 28, 2018 January 24, 2018 13 Split Network Virtualization Edge (Split-NVE) Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-13 16 Abstract 18 In a Split Network Virtualization Edge (Split-NVE) architecture, the 19 functions of the NVE (Network Virtualization Edge) are split across a 20 server and an external network equipment which is called an external 21 NVE. The server-resident control plane functionality resides in 22 control software, which may be part of hypervisor or container 23 management software; for simplicity, this document refers to the 24 hypervisor as the location of this software. 26 Control plane protocol(s) between a hypervisor and its associated 27 external NVE(s) are used by the hypervisor to distribute its virtual 28 machine networking state to the external NVE(s) for further handling. 29 This document illustrates the functionality required by this type of 30 control plane signaling protocol and outlines the high level 31 requirements. Virtual machine states as well as state transitioning 32 are summarized to help clarify the protocol requirements. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2018 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 6 75 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 8 77 2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 9 78 2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 10 79 2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 10 80 3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 10 81 3.1 VN Connect and Disconnect . . . . . . . . . . . . . . . . . 11 82 3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 12 83 3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 15 84 4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 16 85 5. VDP Applicability and Enhancement Needs . . . . . . . . . . . . 17 86 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 19 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 19 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 89 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1 Normative References . . . . . . . . . . . . . . . . . . . 20 91 8.2 Informative References . . . . . . . . . . . . . . . . . . 20 92 Appendix A. IEEE 802.1Qbg VDP Illustration (For information 93 only) . . . . . . . . . . . . . . . . . . . . . . . . . . 21 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 96 1. Introduction 98 In the Split-NVE architecture shown in Figure 1, the functionality of 99 the NVE (Network Virtualization Edge) is split across an end device 100 supporting virtualization and an external network device which is 101 called an external NVE. The portion of the NVE functionality located 102 on the end device is called the tNVE and the portion located on the 103 external NVE is called the nNVE in this document. Overlay 104 encapsulation/decapsulation functions are normally off-loaded to the 105 nNVE on the external NVE. 107 The tNVE is normally implemented as a part of hypervisor or container 108 and/or virtual switch in an virtualized end device. This document 109 uses the term "hypervisor" throughout when describing the Split-NVE 110 scenario where part of the NVE functionality is off-loaded to a 111 separate device from the "hypervisor" that contains a VM (Virtual 112 Machine) connected to a VN (Virutal Network). In this context, the 113 term "hypervisor" is meant to cover any device type where part of the 114 NVE functionality is off-loaded in this fashion, e.g.,a Network 115 Service Appliance or Linux Container. 117 The NVO3 problem statement [RFC7364], discusses the needs for a 118 control plane protocol (or protocols) to populate each NVE with the 119 state needed to perform the required functions. In one scenario, an 120 NVE provides overlay encapsulation/decapsulation packet forwarding 121 services to Tenant Systems (TSs) that are co-resident within the NVE 122 on the same End Device (e.g. when the NVE is embedded within a 123 hypervisor or a Network Service Appliance). In such cases, there is 124 no need for a standardized protocol between the hypervisor and NVE, 125 as the interaction is implemented via software on a single device. 126 While in the Split-NVE architecture scenarios, as shown in figure 2 127 to figure 4, control plane protocol(s) between a hypervisor and its 128 associated external NVE(s) are required for the hypervisor to 129 distribute the virtual machines networking states to the NVE(s) for 130 further handling. The protocol is an NVE-internal protocol and runs 131 between tNVE and nNVE logical entities. This protocol is mentioned in 132 the NVO3 problem statement [RFC7364] and appears as the third work 133 item. 135 Virtual machine states and state transitioning are summarized in this 136 document showing events where the NVE needs to take specific actions. 137 Such events might correspond to actions the control plane signaling 138 protocol(s) need to take between tNVE and nNVE in the Split-NVE 139 scenario. The high level requirements to be fulfilled are stated. 141 +-- -- -- -- Split-NVE -- -- -- --+ 142 | 143 | 144 +---------------|-----+ 145 | +------------- ----+| | 146 | | +--+ +---\|/--+|| +------ --------------+ 147 | | |VM|---+ ||| | \|/ | 148 | | +--+ | ||| |+--------+ | 149 | | +--+ | tNVE |||----- - - - - - -----|| | | 150 | | |VM|---+ ||| || nNVE | | 151 | | +--+ +--------+|| || | | 152 | | || |+--------+ | 153 | +--Hypervisor------+| +---------------------+ 154 +---------------------+ 156 End Device External NVE 158 Figure 1 Split-NVE structure 160 This document uses VMs as an example of Tenant Systems (TSs) in order 161 to describe the requirements, even though a VM is just one type of 162 Tenant System that may connect to a VN. For example, a service 163 instance within a Network Service Appliance is another type of TS, as 164 are systems running on an OS-level virtualization technologies like 165 containers. The fact that VMs have lifecycles (e.g., can be created 166 and destroyed, can be moved, and can be started or stopped) results 167 in a general set of protocol requirements, most of which are 168 applicable to other forms of TSs although not all of the requirements 169 are applicable to all forms of TSs. 171 Section 2 describes VM states and state transitioning in the VM's 172 lifecycle. Section 3 introduces Hypervisor-to-NVE control plane 173 protocol functionality derived from VM operations and network events. 174 Section 4 outlines the requirements of the control plane protocol to 175 achieve the required functionality. 177 1.1 Terminology 179 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 180 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 181 document are to be interpreted as described in RFC 2119 [RFC2119] and 182 RFC 8174 [RFC8174]. 184 This document uses the same terminology as found in [RFC7365]. This 185 section defines additional terminology used by this document. 187 Split-NVE: a type of NVE (Network Virtualization Edge) where the 188 functionalities are split across an end device supporting 189 virtualization and an external network device. 191 tNVE: the portion of Split-NVE functionalities located on the end 192 device supporting virtualization. It interacts with a tenant system 193 through an internal interface in the end device. 195 nNVE: the portion of Split-NVE functionalities located on the network 196 device that is directly or indirectly connected to the end device 197 holding the corresponding tNVE. nNVE normally performs encapsulation 198 to and decapsulation from the overlay network. 200 External NVE: the physical network device holding the nNVE 202 Hypervisor: the logical collection of software, firmware and/or 203 hardware that allows the creation and running of server or service 204 appliance virtualization. tNVE is located under a Hypervisor. 205 Hypervisor is loosely used in this document to refer to the end 206 device supporting the virtualization. For simplicity, we also use 207 Hypervisor to represent both hypervisor and container. 209 Container: Please refer to Hypervisor. For simplicity this document 210 use the term hypervisor to represent both hypervisor and container. 212 VN Profile: Meta data associated with a VN (Virtual Network) that is 213 applied to any attachment point to the VN. That is, VAP (Virtual 214 Access Point) properties that are applied to all VAPs associated with 215 a given VN and used by an NVE when ingressing/egressing packets 216 to/from a specific VN. Meta data could include such information as 217 ACLs, QoS settings, etc. The VN Profile contains parameters that 218 apply to the VN as a whole. Control protocols between the NVE and 219 NVA (Network Virtualization Authority) could use the VN ID or VN Name 220 to obtain the VN Profile. 222 VSI: Virtual Station Interface. [IEEE 802.1Qbg] 224 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Qbg] 226 1.2 Target Scenarios 228 In the Split-NVE architecture, an external NVE can provide an offload 229 of the encapsulation / decapsulation functions and network policy 230 enforcement as well as the VN Overlay protocol overhead. This 231 offloading may improve performance and/or save resources in the End 232 Device (e.g. hypervisor) using the external NVE. 234 The following figures give example scenarios of a Split-NVE 235 architecture. 237 Hypervisor Access Switch 238 +------------------+ +-----+-------+ 239 | +--+ +-------+ | | | | 240 | |VM|---| | | VLAN | | | 241 | +--+ | tNVE |---------+ nNVE| +--- Underlying 242 | +--+ | | | Trunk | | | Network 243 | |VM|---| | | | | | 244 | +--+ +-------+ | | | | 245 +------------------+ +-----+-------+ 246 Figure 2 Hypervisor with an External NVE 248 Hypervisor L2 Switch 249 +---------------+ +-----+ +----+---+ 250 | +--+ +----+ | | | | | | 251 | |VM|---| | |VLAN | |VLAN | | | 252 | +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying 253 | +--+ | | |Trunk| |Trunk| | | Network 254 | |VM|---| | | | | | | | 255 | +--+ +----+ | | | | | | 256 +---------------+ +-----+ +----+---+ 257 Figure 3 Hypervisor with an External NVE 258 connected through an Ethernet Access Switch 260 Network Service Appliance Access Switch 261 +--------------------------+ +-----+-------+ 262 | +------------+ | \ | | | | 263 | |Net Service |----| \ | | | | 264 | |Instance | | \ | VLAN | | | 265 | +------------+ |tNVE| |------+nNVE | +--- Underlying 266 | +------------+ | | | Trunk| | | Network 267 | |Net Service |----| / | | | | 268 | |Instance | | / | | | | 269 | +------------+ | / | | | | 270 +--------------------------+ +-----+-------+ 271 Figure 4 Physical Network Service Appliance with an External NVE 273 Tenant Systems connect to external NVEs via a Tenant System Interface 274 (TSI). The TSI logically connects to the external NVE via a Virtual 275 Access Point (VAP) [RFC8014]. The external NVE may provide Layer 2 or 276 Layer 3 forwarding. In the Split-NVE architecture, the external NVE 277 may be able to reach multiple MAC and IP addresses via a TSI. For 278 example, Tenant Systems that are providing network services (such as 279 transparent firewall, load balancer, or VPN gateway) are likely to 280 have a complex address hierarchy. This implies that if a given TSI 281 disassociates from one VN, all the MAC and/or IP addresses are also 282 disassociated. There is no need to signal the deletion of every MAC 283 or IP when the TSI is brought down or deleted. In the majority of 284 cases, a VM will be acting as a simple host that will have a single 285 TSI and single MAC and IP visible to the external NVE. 287 Figures 2 through 4 show the use of VLANs to separate traffic for 288 multiple VNs between the tNVE and nNVE; VLANs are not strictly 289 necessary if only one VN is involved, but multiple VNs are expected 290 in most cases. Hence this draft assumes the presence of VLANs. 292 2. VM Lifecycle 294 Figure 2 of [RFC7666] shows the state transition of a VM. Some of the 295 VM states are of interest to the external NVE. This section 296 illustrates the relevant phases and events in the VM lifecycle. Note 297 that the following subsections do not give an exhaustive traversal of 298 VM lifecycle state. They are intended as the illustrative examples 299 which are relevant to Split-NVE architecture, not as prescriptive 300 text; the goal is to capture sufficient detail to set a context for 301 the signaling protocol functionality and requirements described in 302 the following sections. 304 2.1 VM Creation Event 306 The VM creation event causes the VM state transition from Preparing 307 to Shutdown and then to Running [RFC7666]. The end device allocates 308 and initializes local virtual resources like storage in the VM 309 Preparing state. In the Shutdown state, the VM has everything ready 310 except that CPU execution is not scheduled by the hypervisor and VM's 311 memory is not resident in the hypervisor. The transition from the 312 Shutdown state to the Running state normally requires human action or 313 a system triggered event. Running state indicates the VM is in the 314 normal execution state. As part of transitioning the VM to the 315 Running state, the hypervisor must also provision network 316 connectivity for the VM's TSI(s) so that Ethernet frames can be sent 317 and received correctly. Initially, when Running, no ongoing 318 migration, suspension or shutdown is in process. 320 In the VM creation phase, the VM's TSI has to be associated with the 321 external NVE. Association here indicates that hypervisor and the 322 external NVE have signaled each other and reached some agreement. 323 Relevant networking parameters or information have been provisioned 324 properly. The External NVE should be informed of the VM's TSI MAC 325 address and/or IP address. In addition to external network 326 connectivity, the hypervisor may provide local network connectivity 327 between the VM's TSI and other VM's TSI that are co-resident on the 328 same hypervisor. When the intra- or inter-hypervisor connectivity is 329 extended to the external NVE, a locally significant tag, e.g. VLAN 330 ID, should be used between the hypervisor and the external NVE to 331 differentiate each VN's traffic. Both the hypervisor and external NVE 332 sides must agree on that tag value for traffic identification, 333 isolation, and forwarding. 335 The external NVE may need to do some preparation before it signals 336 successful association with the TSI. Such preparation may include 337 locally saving the states and binding information of the tenant 338 system interface and its VN, communicating with the NVA for network 339 provisioning, etc. 341 Tenant System interface association should be performed before the VM 342 enters the Running state, preferably in the Shutdown state. If 343 association with an external NVE fails, the VM should not go into the 344 Running state. 346 2.2 VM Live Migration Event 348 Live migration is sometimes referred to as "hot" migration in that, 349 from an external viewpoint, the VM appears to continue to run while 350 being migrated to another server (e.g., TCP connections generally 351 survive this class of migration). In contrast, "cold" migration 352 consists of shutting down VM execution on one server and restarting 353 it on another. For simplicity, the following abstract summary of live 354 migration assumes shared storage, so that the VM's storage is 355 accessible to the source and destination servers. Assume VM live 356 migrates from hypervisor 1 to hypervisor 2. Such a migration event 357 involves state transitions on both source hypervisor 1 and 358 destination hypervisor 2. The VM state on source hypervisor 1 359 transits from Running to Migrating and then to Shutdown [RFC7666]. 360 The VM state on destination hypervisor 2 transits from Shutdown to 361 Migrating and then Running. 363 The external NVE connected to destination hypervisor 2 has to 364 associate the migrating VM's TSI with it by discovering the TSI's MAC 365 and/or IP addresses, its VN, locally significant VLAN ID if any, and 366 provisioning other network related parameters of the TSI. The 367 external NVE may be informed about the VM's peer VMs, storage devices 368 and other network appliances with which the VM needs to communicate 369 or is communicating. The migrated VM on destination hypervisor 2 370 should not go to Running state until all the network provisioning and 371 binding has been done. 373 The migrating VM should not be in Running state at the same time on 374 the source hypervisor and destination hypervisor during migration. 375 The VM on the source hypervisor does not transition into Shutdown 376 state until the VM successfully enters the Running state on the 377 destination hypervisor. It is possible that the VM on the source 378 hypervisor stays in Migrating state for a while after the VM on the 379 destination hypervisor enters Running state. 381 2.3 VM Termination Event 383 A VM termination event is also referred to as "powering off" a VM. A 384 VM termination event leads to its state becoming Shutdown. There are 385 two possible causes of VM termination [RFC7666]. One is the normal 386 "power off" of a running VM; the other is that the VM has been 387 migrated to another hypervisor and the VM image on the source 388 hypervisor has to stop executing and be shutdown. 390 In VM termination, the external NVE connecting to that VM needs to 391 deprovision the VM, i.e. delete the network parameters associated 392 with that VM. In other words, the external NVE has to de-associate 393 the VM's TSI. 395 2.4 VM Pause, Suspension and Resumption Events 397 A VM pause event leads to the VM transiting from Running state to 398 Paused state. The Paused state indicates that the VM is resident in 399 memory but no longer scheduled to execute by the hypervisor 400 [RFC7666]. The VM can be easily re-activated from Paused state to 401 Running state. 403 A VM suspension event leads to the VM transiting from Running state 404 to Suspended state. A VM resumption event leads to the VM transiting 405 state from Suspended state to Running state. Suspended state means 406 the memory and CPU execution state of the virtual machine are saved 407 to persistent store. During this state, the virtual machine is not 408 scheduled to execute by the hypervisor [RFC7666]. 410 In the Split-NVE architecture, the external NVE SHOULD NOT 411 disassociate the paused or suspended VM as the VM can return to 412 Running state at any time. 414 3. Hypervisor-to-NVE Control Plane Protocol Functionality 416 The following subsections show illustrative examples of the state 417 transitions of an external NVE which are relevant to Hypervisor-to- 418 NVE Signaling protocol functionality. It should be noted this is not 419 prescriptive text for the full state machine. 421 3.1 VN Connect and Disconnect 423 In the Split-NVE scenario, a protocol is needed between the End 424 Device (e.g. Hypervisor) and the external NVE it is using in order to 425 make the external NVE aware of the changing VN membership 426 requirements of the Tenant Systems within the End Device. 428 A key driver for using a protocol rather than using static 429 configuration of the external NVE is because the VN connectivity 430 requirements can change frequently as VMs are brought up, moved, and 431 brought down on various hypervisors throughout the data center or 432 external cloud. 434 +---------------+ Receive VN_connect; +-------------------+ 435 |VN_Disconnected| return Local_Tag value |VN_Connected | 436 +---------------+ for VN if successful; +-------------------+ 437 |VN_ID; |-------------------------->|VN_ID; | 438 |VN_State= | |VN_State=connected;| 439 |disconnected; | |Num_TSI_Associated;| 440 | |<--Receive VN_disconnect---|Local_Tag; | 441 +---------------+ |VN_Context; | 442 +-------------------+ 444 Figure 5. State Transition Example of a VAP Instance 445 on an External NVE 447 Figure 5 shows the state transition for a VAP on the external NVE. An 448 NVE that supports the hypervisor to NVE control plane protocol should 449 support one instance of the state machine for each active VN. The 450 state transition on the external NVE is normally triggered by the 451 hypervisor-facing side events and behaviors. Some of the interleaved 452 interaction between NVE and NVA will be illustrated to better explain 453 the whole procedure; while others of them may not be shown. 455 The external NVE MUST be notified when an End Device requires 456 connection to a particular VN and when it no longer requires 457 connection. In addition, the external NVE must provide a local tag 458 value for each connected VN to the End Device to use for exchanging 459 packets between the End Device and the external NVE (e.g. a locally 460 significant [IEEE 802.1Q] tag value). How "local" the significance is 461 depends on whether the Hypervisor has a direct physical connection to 462 the external NVE (in which case the significance is local to the 463 physical link), or whether there is an Ethernet switch (e.g. a blade 464 switch) connecting the Hypervisor to the NVE (in which case the 465 significance is local to the intervening switch and all the links 466 connected to it). 468 These VLAN tags are used to differentiate between different VNs as 469 packets cross the shared access network to the external NVE. When the 470 external NVE receives packets, it uses the VLAN tag to identify their 471 VN coming from a given TSI, strips the tag, adds the appropriate 472 overlay encapsulation for that VN, and sends it towards the 473 corresponding remote NVE across the underlying IP network. 475 The Identification of the VN in this protocol could either be through 476 a VN Name or a VN ID. A globally unique VN Name facilitates 477 portability of a Tenant's Virtual Data Center. Once an external NVE 478 receives a VN connect indication, the NVE needs a way to get a VN 479 Context allocated (or receive the already allocated VN Context) for a 480 given VN Name or ID (as well as any other information needed to 481 transmit encapsulated packets). How this is done is the subject of 482 the NVE-to-NVA protocol which are part of work items 1 and 2 in 483 [RFC7364]. 485 The VN_connect message can be explicit or implicit. Explicit means 486 the hypervisor sends a request message explicitly for the connection 487 to a VN. Implicit means the external NVE receives other messages, 488 e.g. very first TSI associate message (see the next subsection) for a 489 given VN, that implicitly indicate its interest in connecting to a 490 VN. 492 A VN_disconnect message indicates that the NVE can release all the 493 resources for that disconnected VN and transit to VN_disconnected 494 state. The local tag assigned for that VN can possibly be reclaimed 495 for use by another VN. 497 3.2 TSI Associate and Activate 499 Typically, a TSI is assigned a single MAC address and all frames 500 transmitted and received on that TSI use that single MAC address. As 501 mentioned earlier, it is also possible for a Tenant System to 502 exchange frames using multiple MAC addresses or packets with multiple 503 IP addresses. 505 Particularly in the case of a TS that is forwarding frames or packets 506 from other TSs, the external NVE will need to communicate the mapping 507 between the NVE's IP address on the underlying network and ALL the 508 addresses the TS is forwarding on behalf of the corresponding VN to 509 the NVA. 511 The NVE has two ways it can discover the tenant addresses for which 512 frames are to be forwarded to a given End Device (and ultimately to 513 the TS within that End Device). 515 1. It can glean the addresses by inspecting the source addresses in 516 packets it receives from the End Device. 518 2. The hypervisor can explicitly signal the address associations of 519 a TSI to the external NVE. An address association includes all the 520 MAC and/or IP addresses possibly used as source addresses in a packet 521 sent from the hypervisor to external NVE. The external NVE may 522 further use this information to filter the future traffic from the 523 hypervisor. 525 To use the second approach above, the "hypervisor-to-NVE" protocol 526 must support End Devices communicating new tenant addresses 527 associations for a given TSI within a given VN. 529 Figure 6 shows the example of a state transition for a TSI connecting 530 to a VAP on the external NVE. An NVE that supports the hypervisor to 531 NVE control plane protocol may support one instance of the state 532 machine for each TSI connecting to a given VN. 534 disassociate +--------+ disassociate 535 +--------------->| Init |<--------------------+ 536 | +--------+ | 537 | | | | 538 | | | | 539 | +--------+ | 540 | | | | 541 | associate | | activate | 542 | +-----------+ +-----------+ | 543 | | | | 544 | | | | 545 | \|/ \|/ | 546 +--------------------+ +---------------------+ 547 | Associated | | Activated | 548 +--------------------+ +---------------------+ 549 |TSI_ID; | |TSI_ID; | 550 |Port; |-----activate---->|Port; | 551 |VN_ID; | |VN_ID; | 552 |State=associated; | |State=activated ; |-+ 553 +-|Num_Of_Addr; |<---deactivate ---|Num_Of_Addr; | | 554 | |List_Of_Addr; | |List_Of_Addr; | | 555 | +--------------------+ +---------------------+ | 556 | /|\ /|\ | 557 | | | | 558 +---------------------+ +-------------------+ 559 add/remove/updt addr; add/remove/updt addr; 560 or update port; or update port; 562 Figure 6 State Transition Example of a TSI Instance 563 on an External NVE 565 The Associated state of a TSI instance on an external NVE indicates 566 all the addresses for that TSI have already associated with the VAP 567 of the external NVE on a given port e.g. on port p for a given VN but 568 no real traffic to and from the TSI is expected and allowed to pass 569 through. An NVE has reserved all the necessary resources for that 570 TSI. An external NVE may report the mappings of its underlay IP 571 address and the associated TSI addresses to NVA and relevant network 572 nodes may save such information to their mapping tables but not their 573 forwarding tables. An NVE may create ACL or filter rules based on the 574 associated TSI addresses on that attached port p but not enable them 575 yet. The local tag for the VN corresponding to the TSI instance 576 should be provisioned on port p to receive packets. 578 The VM migration event (discussed section 2) may cause the hypervisor 579 to send an associate message to the NVE connected to the destination 580 hypervisor of the migration. A VM creation event may also cause to 581 the same practice. 583 The Activated state of a TSI instance on an external NVE indicates 584 that all the addresses for that TSI are functioning correctly on a 585 given port e.g. port p and traffic can be received from and sent to 586 that TSI via the NVE. The mappings of the NVE's underlay IP address 587 and the associated TSI addresses should be put into the forwarding 588 table rather than the mapping table on relevant network nodes. ACL or 589 filter rules based on the associated TSI addresses on the attached 590 port p in the NVE are enabled. The local tag for the VN corresponding 591 to the TSI instance MUST be provisioned on port p to receive packets. 593 The Activate message makes the state transit from Init or Associated 594 to Activated. VM creation, VM migration, and VM resumption events 595 discussed in Section 4 may trigger sending the Activate message from 596 the hypervisor to the external NVE. 598 TSI information may get updated in either the Associated or Activated 599 state. The following are considered updates to the TSI information: 600 add or remove the associated addresses, update the current associated 601 addresses (for example updating IP for a given MAC), and update the 602 NVE port information based on where the NVE receives messages. Such 603 updates do not change the state of TSI. When any address associated 604 with a given TSI changes, the NVE should inform the NVA to update the 605 mapping information for NVE's underlying address and the associated 606 TSI addresses. The NVE should also change its local ACL or filter 607 settings accordingly for the relevant addresses. Port information 608 updates will cause the provisioning of the local tag for the VN 609 corresponding to the TSI instance on new port and removal from the 610 old port. 612 3.3 TSI Disassociate and Deactivate 614 Disassociate and deactivate behaviors are conceptually the reverse of 615 associate and activate. 617 From Activated state to Associated state, the external NVE needs to 618 make sure the resources are still reserved but the addresses 619 associated to the TSI are not functioning. No traffic to or from the 620 TSI is expected or allowed to pass through. For example, the NVE 621 needs to tell the NVA to remove the relevant addresses mapping 622 information from forwarding and routing tables. ACL and filtering 623 rules regarding the relevant addresses should be disabled. 625 From Associated or Activated state to the Init state, the NVE 626 releases all the resources relevant to TSI instances. The NVE should 627 also inform the NVA to remove the relevant entries from mapping 628 table. ACL or filtering rules regarding the relevant addresses should 629 be removed. Local tag provisioning on the connecting port on NVE 630 SHOULD be cleared. 632 A VM suspension event (discussed in section 2) may cause the relevant 633 TSI instance(s) on the NVE to transit from Activated to Associated 634 state. 636 A VM pause event normally does not affect the state of the relevant 637 TSI instance(s) on the NVE as the VM is expected to run again soon. 639 A VM shutdown event will normally cause the relevant TSI instance(s) 640 on the NVE to transition to Init state from Activated state. All 641 resources should be released. 643 A VM migration will cause the TSI instance on the source NVE to leave 644 Activated state. When a VM migrates to another hypervisor connecting 645 to the same NVE, i.e. source and destination NVE are the same, NVE 646 should use TSI_ID and incoming port to differentiate two TSI 647 instances. 649 Although the triggering messages for the state transition shown in 650 Figure 6 does not indicate the difference between a VM 651 creation/shutdown event and a VM migration arrival/departure event, 652 the external NVE can make optimizations if it is given such 653 information. For example, if the NVE knows the incoming activate 654 message is caused by migration rather than VM creation, some 655 mechanisms may be employed or triggered to make sure the dynamic 656 configurations or provisionings on the destination NVE are the same 657 as those on the source NVE for the migrated VM. For example an IGMP 658 query [RFC2236] can be triggered by the destination external NVE to 659 the migrated VM so that VM is forced to send an IGMP report to the 660 multicast router. Then a multicast router can correctly route the 661 multicast traffic to the new external NVE for those multicast groups 662 the VM joined before the migration. 664 4. Hypervisor-to-NVE Control Plane Protocol Requirements 666 Req-1: The protocol MUST support a bridged network connecting End 667 Devices to the External NVE. 669 Req-2: The protocol MUST support multiple End Devices sharing the 670 same External NVE via the same physical port across a bridged 671 network. 673 Req-3: The protocol MAY support an End Device using multiple external 674 NVEs simultaneously, but only one external NVE for each VN. 676 Req-4: The protocol MAY support an End Device using multiple external 677 NVEs simultaneously for the same VN. 679 Req-5: The protocol MUST allow the End Device to initiate a request 680 to its associated External NVE to be connected/disconnected to a 681 given VN. 683 Req-6: The protocol MUST allow an External NVE initiating a request 684 to its connected End Devices to be disconnected from a given VN. 686 Req-7: When a TS attaches to a VN, the protocol MUST allow for an End 687 Device and its external NVE to negotiate one or more locally- 688 significant tag(s) for carrying traffic associated with a specific VN 689 (e.g., [IEEE 802.1Q] tags). 691 Req-8: The protocol MUST allow an End Device initiating a request to 692 associate/disassociate and/or activate/deactive some or all 693 address(es) of a TSI instance to a VN on an NVE port. 695 Req-9: The protocol MUST allow the External NVE initiating a request 696 to disassociate and/or deactivate some or all address(es) of a TSI 697 instance to a VN on an NVE port. 699 Req-10: The protocol MUST allow an End Device initiating a request to 700 add, remove or update address(es) associated with a TSI instance on 701 the external NVE. Addresses can be expressed in different formats, 702 for example, MAC, IP or pair of IP and MAC. 704 Req-11: The protocol MUST allow the External NVE to authenticate the 705 End Device connected. 707 Req-12: The protocol MUST be able to run over L2 links between the 708 End Device and its External NVE. 710 Req-13: The protocol SHOULD support the End Device indicating if an 711 associate or activate request from it is the result of a VM hot 712 migration event. 714 5. VDP Applicability and Enhancement Needs 716 Virtual Station Interface (VSI) Discovery and Configuration Protocol 717 (VDP) [IEEE 802.1Qbg] can be the control plane protocol running 718 between the hypervisor and the external NVE. Appendix A illustrates 719 VDP for the reader's information. 721 VDP facilitates the automatic discovery and configuration of Edge 722 Virtual Bridging (EVB) stations and Edge Virtual Bridging (EVB) 723 bridges. An EVB station is normally an end station running multiple 724 VMs. It is conceptually equivalent to a hypervisor in this document. 725 An EVB bridge is conceptually equivalent to the external NVE. 727 VDP is able to pre-associate/associate/de-associate a VSI on an EVB 728 station with a port on the EVB bridge. A VSI is approximately the 729 concept of a virtual port by which a VM connects to the hypervisor in 730 this document's context. The EVB station and the EVB bridge can reach 731 agreement on VLAN ID(s) assigned to a VSI via VDP message exchange. 732 Other configuration parameters can be exchanged via VDP as well. VDP 733 is carried over the Edge Control Protocol(ECP) [IEEE 802.1Qbg] which 734 provides a reliable transportation over a layer 2 network. 736 VDP protocol needs some extensions to fulfill the requirements listed 737 in this document. Table 1 shows the needed extensions and/or 738 clarifications in the NVO3 context. 740 +------+-----------+-----------------------------------------------+ 741 | Req | Supported | remarks | 742 | | by VDP? | | 743 +------+-----------+-----------------------------------------------+ 744 | Req-1| | | 745 +------+ |Needs extension. Must be able to send to a | 746 | Req-2| |specific unicast MAC and should be able to send| 747 +------+ Partially |to a non-reserved well known multicast address | 748 | Req-3| |other than the nearest customer bridge address.| 749 +------+ | | 750 | Req-4| | | 751 +------+-----------+-----------------------------------------------+ 752 | Req-5| Yes |VN is indicated by GroupID | 753 +------+-----------+-----------------------------------------------+ 754 | Req-6| Yes |Bridge sends De-Associate | 755 +------+-----------+------------------------+----------------------+ 756 | | |VID==NULL in request and bridge returns the | 757 | Req-7| Yes |assigned value in response or specify GroupID | 758 | | |in request and get VID assigned in returning | 759 | | |response. Multiple VLANs per group are allowed.| 760 +------+-----------+------------------------+----------------------+ 761 | | | requirements | VDP equivalence | 762 | | +------------------------+----------------------+ 763 | | | associate/disassociate|pre-asso/de-associate | 764 | Req-8| Partially | activate/deactivate |associate/de-associate| 765 | | +------------------------+----------------------| 766 | | |Needs extension to allow associate->pre-assoc | 767 +------+-----------+------------------------+----------------------+ 768 | Req-9| Yes | VDP bridge initiates de-associate | 769 +------+-----------+-----------------------------------------------+ 770 |Req-10| Partially |Needs extension for IPv4/IPv6 address. Add a | 771 | | |new "filter info format" type. | 772 +------+-----------+-----------------------------------------------+ 773 |Req-11| No |Out-of-band mechanism is preferred, e.g. MACSec| 774 | | |or 802.1X. | 775 +------+-----------+-----------------------------------------------+ 776 |Req-12| Yes |L2 protocol naturally | 777 +------+-----------+-----------------------------------------------+ 778 | | |M bit for migrated VM on destination hypervisor| 779 | | |and S bit for that on source hypervisor. | 780 |Req-13| Partially |It is indistinguishable when M/S is 0 between | 781 | | |no guidance and events not caused by migration | 782 | | |where NVE may act differently. Needs new | 783 | | |New bits for migration indication in new | 784 | | |"filter info format" type. | 785 +------+-----------+-----------------------------------------------+ 786 Table 1 Compare VDP with the requirements 788 Simply adding the ability to carry layer 3 addresses, VDP can serve 789 the Hypervisor-to-NVE control plane functions pretty well. Other 790 extensions are the improvement of the protocol capabilities for 791 better fit in an NVO3 network. 793 6. Security Considerations 795 NVEs must ensure that only properly authorized Tenant Systems are 796 allowed to join and become a part of any particular Virtual Network. 797 In addition, NVEs will need appropriate mechanisms to ensure that any 798 hypervisor wishing to use the services of an NVE are properly 799 authorized to do so. One design point is whether the hypervisor 800 should supply the NVE with necessary information (e.g., VM addresses, 801 VN information, or other parameters) that the NVE uses directly, or 802 whether the hypervisor should only supply a VN ID and an identifier 803 for the associated VM (e.g., its MAC address), with the NVE using 804 that information to obtain the information needed to validate the 805 hypervisor-provided parameters or obtain related parameters in a 806 secure manner. 808 7. IANA Considerations 810 No IANA action is required. RFC Editor: please delete this section 811 before publication. 813 8. Acknowledgements 814 This document was initiated based on the merger from the drafts 815 draft-kreeger-nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve- 816 mechanism, and draft-kompella-nvo3-server2nve. Thanks to all the co- 817 authors and contributing members of those drafts. 819 The authors would like to specially thank Lucy Yong and Jon Hudson 820 for their generous help in improving this document. 822 8. References 824 8.1 Normative References 826 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 827 Requirement Levels", BCP 14, RFC 2119, March 1997. 829 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 830 2119 Key Words ", BCP 14, RFC 8174, May 2017. 832 8.2 Informative References 834 [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 835 2", RFC 2236, November 1997. 837 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 838 Unique IDentifier (UUID) URN Namespace", RFC 4122, July 839 2005. 841 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and 842 M. Napierala, "Problem Statement: Overlays for Network 843 Virtualization", October 2014. 845 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 846 Rekhter, "Framework for DC Network Virtualization", 847 October 2014. 849 [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., Narten, 850 T., "An Architecture for Data-Center Network 851 Virtualization over Layer 3 (NVO3)", December 2016. 853 [RFC7666] Asai H., MacFaden M., Schoenwaelder J., Shima K., Tsou T., 854 "Management Information Base for Virtual Machines 855 Controlled by a Hypervisor", October 2015. 857 [IEEE 802.1Qbg] IEEE, "Media Access Control (MAC) Bridges and Virtual 858 Bridged Local Area Networks - Amendment 21: Edge Virtual 859 Bridging", IEEE Std 802.1Qbg, 2012 861 [IEEE 802.1Q] IEEE, "Media Access Control (MAC) Bridges and Virtual 862 Bridged Local Area Networks", IEEE Std 802.1Q-2014, 863 November 2014. 865 Appendix A. IEEE 802.1Qbg VDP Illustration (For information only) 867 The VDP (VSI Discovery and Discovery and Configuration Protocol [IEEE 868 802.1Qbg]) can be considered as a controlling protocol running between 869 the hypervisor and the external bridge. VDP association TLV structure 870 are formatted as shown in Figure A.1. 872 +--------+--------+------+-----+--------+------+------+-------+------+ 873 |TLV type|TLV info|Status|VSI |VSI Type|VSI ID|VSI ID|Filter |Filter| 874 | |string | |Type |Version |Format| |Info |Info | 875 | |length | |ID | | | |format | | 876 +--------+--------+------+-----+--------+------+------+-------+------+ 877 | | |<----VSI type&instance----->|<--Filter---->| 878 | | |<-------------VSI attributes-------------->| 879 |<--TLV header--->|<-----------TLV information string -------------->| 881 Figure A.1: VDP association TLV 883 There are basically four TLV types. 885 1. Pre-associate: Pre-associate is used to pre-associate a VSI instance 886 with a bridge port. The bridge validates the request and returns a 887 failure Status in case of errors. Successful pre-associate does not 888 imply that the indicated VSI Type or provisioning will be applied to any 889 traffic flowing through the VSI. The pre-associate enables faster 890 response to an associate, by allowing the bridge to obtain the VSI Type 891 prior to an association. 893 2. Pre-associate with resource reservation: Pre-associate with Resource 894 Reservation involves the same steps as Pre-associate, but on success it 895 also reserves resources in the bridge to prepare for a subsequent 896 Associate request. 898 3. Associate: Associate creates and activates an association between a 899 VSI instance and a bridge port. An bridge allocates any required bridge 900 resources for the referenced VSI. The bridge activates the configuration 901 for the VSI Type ID. This association is then applied to the traffic 902 flow to/from the VSI instance. 904 4. De-associate: The de-associate is used to remove an association 905 between a VSI instance and a bridge port. Pre-associated and associated 906 VSIs can be de-associated. De-associate releases any resources that were 907 reserved as a result of prior associate or pre-Associate operations for 908 that VSI instance. 910 De-associate can be initiated by either side and the other types can 911 only be initiated by the server side. 913 Some important flag values in VDP Status field: 915 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) is 916 migrating (M-bit = 1) or provides no guidance on the migration of the 917 user of the VSI (M-bit = 0). The M-bit is used as an indicator relative 918 to the VSI that the user is migrating to. 920 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 921 suspended (S-bit = 1) or provides no guidance as to whether the user of 922 the VSI is suspended (S-bit = 0). A keep-alive Associate request with 923 S-bit = 1 can be sent when the VSI user is suspended. The S-bit is used 924 as an indicator relative to the VSI that the user is migrating from. 926 The filter information format currently defines 4 types. Each of the 927 filter information is shown in details as follows. 929 1. VID Filter Info format 930 +---------+------+-------+--------+ 931 | #of | PS | PCP | VID | 932 |entries |(1bit)|(3bits)|(12bits)| 933 |(2octets)| | | | 934 +---------+------+-------+--------+ 935 |<--Repeated per entry->| 937 Figure A.2 VID Filter Info format 939 2. MAC/VID Filter Info format 940 +---------+--------------+------+-------+--------+ 941 | #of | MAC address | PS | PCP | VID | 942 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 943 |(2octets)| | | | | 944 +---------+--------------+------+-------+--------+ 945 |<--------Repeated per entry---------->| 947 Figure A.3 MAC/VID filter format 949 3. GroupID/VID Filter Info format 950 +---------+--------------+------+-------+--------+ 951 | #of | GroupID | PS | PCP | VID | 952 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 953 |(2octets)| | | | | 954 +---------+--------------+------+-------+--------+ 955 |<--------Repeated per entry---------->| 957 Figure A.4 GroupID/VID filter format 959 4. GroupID/MAC/VID Filter Info format 960 +---------+----------+-------------+------+-----+--------+ 961 | #of | GroupID | MAC address | PS | PCP | VID | 962 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 963 |(2octets)| | | | | | 964 +---------+----------+-------------+------+-----+--------+ 965 |<-------------Repeated per entry------------->| 966 Figure A.5 GroupID/MAC/VID filter format 968 The null VID can be used in the VDP Request sent from the station to the 969 external bridge. Use of the null VID indicates that the set of VID 970 values associated with the VSI is expected to be supplied by the bridge. 971 The set of VID values is returned to the station via the VDP Response. 972 The returned VID value can be a locally significant value. When GroupID 973 is used, it is equivalent to the VN ID in NVO3. GroupID will be provided 974 by the station to the bridge. The bridge maps GroupID to a locally 975 significant VLAN ID. 977 The VSI ID in VDP association TLV that identify a VM can be one of the 978 following format: IPV4 address, IPV6 address, MAC address, UUID 979 [RFC4122], or locally defined. 981 Authors' Addresses 983 Yizhou Li 984 Huawei Technologies 985 101 Software Avenue, 986 Nanjing 210012 987 China 989 Phone: +86-25-56625409 990 EMail: liyizhou@huawei.com 992 Donald Eastlake 993 Huawei R&D USA 994 155 Beaver Street 995 Milford, MA 01757 USA 997 Phone: +1-508-333-2270 998 EMail: d3e3e3@gmail.com 1000 Lawrence Kreeger 1001 Arrcus, Inc 1003 Email: lkreeger@gmail.com 1005 Thomas Narten 1006 IBM 1008 Email: narten@us.ibm.com 1009 David Black 1010 EMC 1012 Email: david.black@emc.com