idnits 2.17.1 draft-ietf-nvo3-hpvr2nve-cp-req-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 25, 2018) is 2242 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group Y. Li 3 INTERNET-DRAFT D. Eastlake 4 Intended Status: Informational Huawei Technologies 5 L. Kreeger 6 Arrcus, Inc 7 T. Narten 8 IBM 9 D. Black 10 Dell EMC 11 Expires: August 29, 2018 February 25, 2018 13 Split Network Virtualization Edge (Split-NVE) Control Plane Requirements 14 draft-ietf-nvo3-hpvr2nve-cp-req-16 16 Abstract 18 In a Split Network Virtualization Edge (Split-NVE) architecture, the 19 functions of the NVE (Network Virtualization Edge) are split across a 20 server and an external network equipment which is called an external 21 NVE. The server-resident control plane functionality resides in 22 control software, which may be part of hypervisor or container 23 management software; for simplicity, this document refers to the 24 hypervisor as the location of this software. 26 Control plane protocol(s) between a hypervisor and its associated 27 external NVE(s) are used by the hypervisor to distribute its virtual 28 machine networking state to the external NVE(s) for further handling. 29 This document illustrates the functionality required by this type of 30 control plane signaling protocol and outlines the high level 31 requirements. Virtual machine states as well as state transitioning 32 are summarized to help clarify the protocol requirements. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as 42 Internet-Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 49 The list of current Internet-Drafts can be accessed at 50 http://www.ietf.org/1id-abstracts.html 52 The list of Internet-Draft Shadow Directories can be accessed at 53 http://www.ietf.org/shadow.html 55 Copyright and License Notice 57 Copyright (c) 2018 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the Simplified BSD License. 70 Table of Contents 72 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 5 74 1.2 Target Scenarios . . . . . . . . . . . . . . . . . . . . . 6 75 2. VM Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . 8 76 2.1 VM Creation Event . . . . . . . . . . . . . . . . . . . . . 8 77 2.2 VM Live Migration Event . . . . . . . . . . . . . . . . . . 9 78 2.3 VM Termination Event . . . . . . . . . . . . . . . . . . . . 10 79 2.4 VM Pause, Suspension and Resumption Events . . . . . . . . . 10 80 3. Hypervisor-to-NVE Control Plane Protocol Functionality . . . . 11 81 3.1 VN Connect and Disconnect . . . . . . . . . . . . . . . . . 11 82 3.2 TSI Associate and Activate . . . . . . . . . . . . . . . . . 13 83 3.3 TSI Disassociate and Deactivate . . . . . . . . . . . . . . 15 84 4. Hypervisor-to-NVE Control Plane Protocol Requirements . . . . . 16 85 5. VDP Applicability and Enhancement Needs . . . . . . . . . . . . 17 86 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 19 87 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 20 88 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 89 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 90 8.1 Normative References . . . . . . . . . . . . . . . . . . . 20 91 8.2 Informative References . . . . . . . . . . . . . . . . . . 21 92 Appendix A. IEEE 802.1Q VDP Illustration (For information only) . 21 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 95 1. Introduction 97 In the Split-NVE architecture shown in Figure 1, the functionality of 98 the NVE (Network Virtualization Edge) is split across an end device 99 supporting virtualization and an external network device which is 100 called an external NVE. The portion of the NVE functionality located 101 on the end device is called the tNVE (terminal-side NVE) and the 102 portion located on the external NVE is called the nNVE (network-side 103 NVE) in this document. Overlay encapsulation/decapsulation functions 104 are normally off-loaded to the nNVE on the external NVE. 106 The tNVE is normally implemented as a part of hypervisor or container 107 and/or virtual switch in an virtualized end device. This document 108 uses the term "hypervisor" throughout when describing the Split-NVE 109 scenario where part of the NVE functionality is off-loaded to a 110 separate device from the "hypervisor" that contains a VM (Virtual 111 Machine) connected to a VN (Virutal Network). In this context, the 112 term "hypervisor" is meant to cover any device type where part of the 113 NVE functionality is off-loaded in this fashion, e.g.,a Network 114 Service Appliance or Linux Container. 116 The NVO3 problem statement [RFC7364], discusses the needs for a 117 control plane protocol (or protocols) to populate each NVE with the 118 state needed to perform the required functions. In one scenario, an 119 NVE provides overlay encapsulation/decapsulation packet forwarding 120 services to Tenant Systems (TSs) that are co-resident within the NVE 121 on the same End Device (e.g. when the NVE is embedded within a 122 hypervisor or a Network Service Appliance). In such cases, there is 123 no need for a standardized protocol between the hypervisor and NVE, 124 as the interaction is implemented via software on a single device. 125 While in the Split-NVE architecture scenarios, as shown in figure 2 126 to figure 4, control plane protocol(s) between a hypervisor and its 127 associated external NVE(s) are required for the hypervisor to 128 distribute the virtual machines networking states to the NVE(s) for 129 further handling. The protocol is an NVE-internal protocol and runs 130 between tNVE and nNVE logical entities. This protocol is mentioned in 131 the NVO3 problem statement [RFC7364] and appears as the third work 132 item. 134 Virtual machine states and state transitioning are summarized in this 135 document showing events where the NVE needs to take specific actions. 136 Such events might correspond to actions the control plane signaling 137 protocol(s) need to take between tNVE and nNVE in the Split-NVE 138 scenario. The high level requirements to be fulfilled are stated. 140 +------------ Split-NVE ---------+ 141 | | 142 | | 143 +-----------------|-----+ | 144 | +---------------|----+| | 145 | | +--+ \|/ || | 146 | | |V |TSI +-------+ || +------|-------------+ 147 | | |M |-----+ | || | \|/ | 148 | | +--+ | | || |+--------+ | 149 | | +--+ | tNVE | ||-------------------|| | | 150 | | |V |TSI | | || || nNVE | | 151 | | |M |-----| | || || | | 152 | | +--+ +-------+ || |+--------+ | 153 | | || +--------------------+ 154 | +-----Hypervisor-----+| 155 +-----------------------+ 156 End Device External NVE 158 Figure 1 Split-NVE structure 160 This document uses VMs as an example of Tenant Systems (TSs) in order 161 to describe the requirements, even though a VM is just one type of 162 Tenant System that may connect to a VN. For example, a service 163 instance within a Network Service Appliance is another type of TS, as 164 are systems running on an OS-level virtualization technologies like 165 containers. The fact that VMs have lifecycles (e.g., can be created 166 and destroyed, can be moved, and can be started or stopped) results 167 in a general set of protocol requirements, most of which are 168 applicable to other forms of TSs although not all of the requirements 169 are applicable to all forms of TSs. 171 Section 2 describes VM states and state transitioning in the VM's 172 lifecycle. Section 3 introduces Hypervisor-to-NVE control plane 173 protocol functionality derived from VM operations and network events. 174 Section 4 outlines the requirements of the control plane protocol to 175 achieve the required functionality. 177 1.1 Terminology 179 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 180 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 181 "OPTIONAL" in this document are to be interpreted as described in BCP 182 14 [RFC2119] [RFC8174] when, and only when, they appear in all 183 capitals, as shown here. 185 This document uses the same terminology as found in [RFC7365]. This 186 section defines additional terminology used by this document. 188 Split-NVE: a type of NVE (Network Virtualization Edge) where the 189 functionalities are split across an end device supporting 190 virtualization and an external network device. 192 tNVE: the portion of Split-NVE functionalities located on the end 193 device supporting virtualization. It interacts with a tenant system 194 through an internal interface in the end device. 196 nNVE: the portion of Split-NVE functionalities located on the network 197 device that is directly or indirectly connected to the end device 198 holding the corresponding tNVE. nNVE normally performs encapsulation 199 to and decapsulation from the overlay network. 201 External NVE: the physical network device holding the nNVE 203 Hypervisor: the logical collection of software, firmware and/or 204 hardware that allows the creation and running of server or service 205 appliance virtualization. tNVE is located under a Hypervisor. 206 Hypervisor is loosely used in this document to refer to the end 207 device supporting the virtualization. For simplicity, we also use 208 Hypervisor to represent both hypervisor and container. 210 Container: Please refer to Hypervisor. For simplicity this document 211 use the term hypervisor to represent both hypervisor and container. 213 VN Profile: Meta data associated with a VN (Virtual Network) that is 214 applied to any attachment point to the VN. That is, VAP (Virtual 215 Access Point) properties that are applied to all VAPs associated with 216 a given VN and used by an NVE when ingressing/egressing packets 217 to/from a specific VN. Meta data could include such information as 218 ACLs, QoS settings, etc. The VN Profile contains parameters that 219 apply to the VN as a whole. Control protocols between the NVE and 220 NVA (Network Virtualization Authority) could use the VN ID or VN Name 221 to obtain the VN Profile. 223 VSI: Virtual Station Interface. [IEEE 802.1Q] 225 VDP: VSI Discovery and Configuration Protocol [IEEE 802.1Q] 227 1.2 Target Scenarios 229 In the Split-NVE architecture, an external NVE can provide an offload 230 of the encapsulation / decapsulation functions and network policy 231 enforcement as well as the VN Overlay protocol overhead. This 232 offloading may improve performance and/or save resources in the End 233 Device (e.g. hypervisor) using the external NVE. 235 The following figures give example scenarios of a Split-NVE 236 architecture. 238 Hypervisor Access Switch 239 +------------------+ +-----+-------+ 240 | +--+ +-------+ | | | | 241 | |VM|---| | | VLAN | | | 242 | +--+ | tNVE |---------+ nNVE| +--- Underlying 243 | +--+ | | | Trunk | | | Network 244 | |VM|---| | | | | | 245 | +--+ +-------+ | | | | 246 +------------------+ +-----+-------+ 247 Figure 2 Hypervisor with an External NVE 249 Hypervisor L2 Switch 250 +---------------+ +-----+ +----+---+ 251 | +--+ +----+ | | | | | | 252 | |VM|---| | |VLAN | |VLAN | | | 253 | +--+ |tNVE|-------+ +-----+nNVE| +--- Underlying 254 | +--+ | | |Trunk| |Trunk| | | Network 255 | |VM|---| | | | | | | | 256 | +--+ +----+ | | | | | | 257 +---------------+ +-----+ +----+---+ 258 Figure 3 Hypervisor with an External NVE 259 connected through an Ethernet Access Switch 261 Network Service Appliance Access Switch 262 +--------------------------+ +-----+-------+ 263 | +------------+ | \ | | | | 264 | |Net Service |----| \ | | | | 265 | |Instance | | \ | VLAN | | | 266 | +------------+ |tNVE| |------+nNVE | +--- Underlying 267 | +------------+ | | | Trunk| | | Network 268 | |Net Service |----| / | | | | 269 | |Instance | | / | | | | 270 | +------------+ | / | | | | 271 +--------------------------+ +-----+-------+ 272 Figure 4 Physical Network Service Appliance with an External NVE 274 Tenant Systems connect to external NVEs via a Tenant System Interface 275 (TSI). The TSI logically connects to the external NVE via a Virtual 276 Access Point (VAP) [RFC8014]. The external NVE may provide Layer 2 or 277 Layer 3 forwarding. In the Split-NVE architecture, the external NVE 278 may be able to reach multiple MAC and IP addresses via a TSI. An IP 279 address can be in either IPv4 or IPv6 format. For example, Tenant 280 Systems that are providing network services (such as transparent 281 firewall, load balancer, or VPN gateway) are likely to have a complex 282 address hierarchy. This implies that if a given TSI disassociates 283 from one VN, all the MAC and/or IP addresses are also disassociated. 284 There is no need to signal the deletion of every MAC or IP when the 285 TSI is brought down or deleted. In the majority of cases, a VM will 286 be acting as a simple host that will have a single TSI and single MAC 287 and IP visible to the external NVE. 289 Figures 2 through 4 show the use of VLANs to separate traffic for 290 multiple VNs between the tNVE and nNVE; VLANs are not strictly 291 necessary if only one VN is involved, but multiple VNs are expected 292 in most cases. Hence this draft assumes the presence of VLANs. 294 2. VM Lifecycle 296 Figure 2 of [RFC7666] shows the state transition of a VM. Some of the 297 VM states are of interest to the external NVE. This section 298 illustrates the relevant phases and events in the VM lifecycle. Note 299 that the following subsections do not give an exhaustive traversal of 300 VM lifecycle state. They are intended as the illustrative examples 301 which are relevant to Split-NVE architecture, not as prescriptive 302 text; the goal is to capture sufficient detail to set a context for 303 the signaling protocol functionality and requirements described in 304 the following sections. 306 2.1 VM Creation Event 308 The VM creation event causes the VM state transition from Preparing 309 to Shutdown and then to Running [RFC7666]. The end device allocates 310 and initializes local virtual resources like storage in the VM 311 Preparing state. In the Shutdown state, the VM has everything ready 312 except that CPU execution is not scheduled by the hypervisor and VM's 313 memory is not resident in the hypervisor. The transition from the 314 Shutdown state to the Running state normally requires human action or 315 a system triggered event. Running state indicates the VM is in the 316 normal execution state. As part of transitioning the VM to the 317 Running state, the hypervisor must also provision network 318 connectivity for the VM's TSI(s) so that Ethernet frames can be sent 319 and received correctly. Initially, when Running, no ongoing 320 migration, suspension or shutdown is in process. 322 In the VM creation phase, the VM's TSI has to be associated with the 323 external NVE. Association here indicates that hypervisor and the 324 external NVE have signaled each other and reached some agreement. 325 Relevant networking parameters or information have been provisioned 326 properly. The External NVE should be informed of the VM's TSI MAC 327 address and/or IP address. In addition to external network 328 connectivity, the hypervisor may provide local network connectivity 329 between the VM's TSI and other VM's TSI that are co-resident on the 330 same hypervisor. When the intra- or inter-hypervisor connectivity is 331 extended to the external NVE, a locally significant tag, e.g. VLAN 332 ID, should be used between the hypervisor and the external NVE to 333 differentiate each VN's traffic. Both the hypervisor and external NVE 334 sides must agree on that tag value for traffic identification, 335 isolation, and forwarding. 337 The external NVE may need to do some preparation before it signals 338 successful association with the TSI. Such preparation may include 339 locally saving the states and binding information of the tenant 340 system interface and its VN, communicating with the NVA for network 341 provisioning, etc. 343 Tenant System interface association should be performed before the VM 344 enters the Running state, preferably in the Shutdown state. If 345 association with an external NVE fails, the VM should not go into the 346 Running state. 348 2.2 VM Live Migration Event 350 Live migration is sometimes referred to as "hot" migration in that, 351 from an external viewpoint, the VM appears to continue to run while 352 being migrated to another server (e.g., TCP connections generally 353 survive this class of migration). In contrast, "cold" migration 354 consists of shutting down VM execution on one server and restarting 355 it on another. For simplicity, the following abstract summary of live 356 migration assumes shared storage, so that the VM's storage is 357 accessible to the source and destination servers. Assume VM live 358 migrates from hypervisor 1 to hypervisor 2. Such a migration event 359 involves state transitions on both source hypervisor 1 and 360 destination hypervisor 2. The VM state on source hypervisor 1 361 transits from Running to Migrating and then to Shutdown [RFC7666]. 362 The VM state on destination hypervisor 2 transits from Shutdown to 363 Migrating and then Running. 365 The external NVE connected to destination hypervisor 2 has to 366 associate the migrating VM's TSI with it by discovering the TSI's MAC 367 and/or IP addresses, its VN, locally significant VLAN ID if any, and 368 provisioning other network related parameters of the TSI. The 369 external NVE may be informed about the VM's peer VMs, storage devices 370 and other network appliances with which the VM needs to communicate 371 or is communicating. The migrated VM on destination hypervisor 2 372 should not go to Running state until all the network provisioning and 373 binding has been done. 375 The states of VM on the source and destination hypervisors both are 376 Migrating during transfer of migration execution. The migrating VM 377 should not be in Running state at the same time on the source 378 hypervisor and destination hypervisor during migration. The VM on the 379 source hypervisor does not transition into Shutdown state until the 380 VM successfully enters the Running state on the destination 381 hypervisor. It is possible that the VM on the source hypervisor stays 382 in Migrating state for a while after the VM on the destination 383 hypervisor enters Running state. 385 2.3 VM Termination Event 387 A VM termination event is also referred to as "powering off" a VM. A 388 VM termination event leads to its state becoming Shutdown. There are 389 two possible causes of VM termination [RFC7666]. One is the normal 390 "power off" of a running VM; the other is that the VM has been 391 migrated to another hypervisor and the VM image on the source 392 hypervisor has to stop executing and be shutdown. 394 In VM termination, the external NVE connecting to that VM needs to 395 deprovision the VM, i.e. delete the network parameters associated 396 with that VM. In other words, the external NVE has to de-associate 397 the VM's TSI. 399 2.4 VM Pause, Suspension and Resumption Events 401 A VM pause event leads to the VM transiting from Running state to 402 Paused state. The Paused state indicates that the VM is resident in 403 memory but no longer scheduled to execute by the hypervisor 404 [RFC7666]. The VM can be easily re-activated from Paused state to 405 Running state. 407 A VM suspension event leads to the VM transiting from Running state 408 to Suspended state. A VM resumption event leads to the VM transiting 409 state from Suspended state to Running state. Suspended state means 410 the memory and CPU execution state of the virtual machine are saved 411 to persistent store. During this state, the virtual machine is not 412 scheduled to execute by the hypervisor [RFC7666]. 414 In the Split-NVE architecture, the external NVE should not 415 disassociate the paused or suspended VM as the VM can return to 416 Running state at any time. 418 3. Hypervisor-to-NVE Control Plane Protocol Functionality 420 The following subsections show illustrative examples of the state 421 transitions of an external NVE which are relevant to Hypervisor-to- 422 NVE Signaling protocol functionality. It should be noted this is not 423 prescriptive text for the full state machine. 425 3.1 VN Connect and Disconnect 427 In the Split-NVE scenario, a protocol is needed between the End 428 Device (e.g. Hypervisor) and the external NVE it is using in order to 429 make the external NVE aware of the changing VN membership 430 requirements of the Tenant Systems within the End Device. 432 A key driver for using a protocol rather than using static 433 configuration of the external NVE is because the VN connectivity 434 requirements can change frequently as VMs are brought up, moved, and 435 brought down on various hypervisors throughout the data center or 436 external cloud. 438 +---------------+ Receive VN_connect; +-------------------+ 439 |VN_Disconnected| return Local_Tag value |VN_Connected | 440 +---------------+ for VN if successful; +-------------------+ 441 |VN_ID; |-------------------------->|VN_ID; | 442 |VN_State= | |VN_State=connected;| 443 |disconnected; | |Num_TSI_Associated;| 444 | |<--Receive VN_disconnect---|Local_Tag; | 445 +---------------+ |VN_Context; | 446 +-------------------+ 448 Figure 5. State Transition Example of a VAP Instance 449 on an External NVE 451 Figure 5 shows the state transition for a VAP on the external NVE. An 452 NVE that supports the hypervisor to NVE control plane protocol should 453 support one instance of the state machine for each active VN. The 454 state transition on the external NVE is normally triggered by the 455 hypervisor-facing side events and behaviors. Some of the interleaved 456 interaction between NVE and NVA will be illustrated to better explain 457 the whole procedure; while others of them may not be shown. 459 The external NVE must be notified when an End Device requires 460 connection to a particular VN and when it no longer requires 461 connection. Connection clean up for the failed devices should be 462 employed which is out of the scope of the protocol specified in this 463 document. 465 In addition, the external NVE should provide a local tag value for 466 each connected VN to the End Device to use for exchanging packets 467 between the End Device and the external NVE (e.g. a locally 468 significant [IEEE 802.1Q] tag value). How "local" the significance is 469 depends on whether the Hypervisor has a direct physical connection to 470 the external NVE (in which case the significance is local to the 471 physical link), or whether there is an Ethernet switch (e.g. a blade 472 switch) connecting the Hypervisor to the NVE (in which case the 473 significance is local to the intervening switch and all the links 474 connected to it). 476 These VLAN tags are used to differentiate between different VNs as 477 packets cross the shared access network to the external NVE. When the 478 external NVE receives packets, it uses the VLAN tag to identify their 479 VN coming from a given TSI, strips the tag, adds the appropriate 480 overlay encapsulation for that VN, and sends it towards the 481 corresponding remote NVE across the underlying IP network. 483 The Identification of the VN in this protocol could either be through 484 a VN Name or a VN ID. A globally unique VN Name facilitates 485 portability of a Tenant's Virtual Data Center. Once an external NVE 486 receives a VN connect indication, the NVE needs a way to get a VN 487 Context allocated (or receive the already allocated VN Context) for a 488 given VN Name or ID (as well as any other information needed to 489 transmit encapsulated packets). How this is done is the subject of 490 the NVE-to-NVA protocol which are part of work items 1 and 2 in 491 [RFC7364]. The external NVE needs to synchronize the mapping 492 information of the local tag and VN Name or VN ID with NVA. 494 The VN_connect message can be explicit or implicit. Explicit means 495 the hypervisor sends a request message explicitly for the connection 496 to a VN. Implicit means the external NVE receives other messages, 497 e.g. very first TSI associate message (see the next subsection) for a 498 given VN, that implicitly indicate its interest in connecting to a 499 VN. 501 A VN_disconnect message indicates that the NVE can release all the 502 resources for that disconnected VN and transit to VN_disconnected 503 state. The local tag assigned for that VN can possibly be reclaimed 504 for use by another VN. 506 3.2 TSI Associate and Activate 508 Typically, a TSI is assigned a single MAC address and all frames 509 transmitted and received on that TSI use that single MAC address. As 510 mentioned earlier, it is also possible for a Tenant System to 511 exchange frames using multiple MAC addresses or packets with multiple 512 IP addresses. 514 Particularly in the case of a TS that is forwarding frames or packets 515 from other TSs, the external NVE will need to communicate the mapping 516 between the NVE's IP address on the underlying network and ALL the 517 addresses the TS is forwarding on behalf of the corresponding VN to 518 the NVA. 520 The NVE has two ways it can discover the tenant addresses for which 521 frames are to be forwarded to a given End Device (and ultimately to 522 the TS within that End Device). 524 1. It can glean the addresses by inspecting the source addresses in 525 packets it receives from the End Device. 527 2. The hypervisor can explicitly signal the address associations of 528 a TSI to the external NVE. An address association includes all the 529 MAC and/or IP addresses possibly used as source addresses in a packet 530 sent from the hypervisor to external NVE. The external NVE may 531 further use this information to filter the future traffic from the 532 hypervisor. 534 To use the second approach above, the "hypervisor-to-NVE" protocol 535 must support End Devices communicating new tenant addresses 536 associations for a given TSI within a given VN. 538 Figure 6 shows the example of a state transition for a TSI connecting 539 to a VAP on the external NVE. An NVE that supports the hypervisor to 540 NVE control plane protocol may support one instance of the state 541 machine for each TSI connecting to a given VN. 543 disassociate +--------+ disassociate 544 +--------------->| Init |<--------------------+ 545 | +--------+ | 546 | | | | 547 | | | | 548 | +--------+ | 549 | | | | 550 | associate | | activate | 551 | +-----------+ +-----------+ | 552 | | | | 553 | | | | 554 | \|/ \|/ | 555 +--------------------+ +---------------------+ 556 | Associated | | Activated | 557 +--------------------+ +---------------------+ 558 |TSI_ID; | |TSI_ID; | 559 |Port; |-----activate---->|Port; | 560 |VN_ID; | |VN_ID; | 561 |State=associated; | |State=activated ; |-+ 562 +-|Num_Of_Addr; |<---deactivate ---|Num_Of_Addr; | | 563 | |List_Of_Addr; | |List_Of_Addr; | | 564 | +--------------------+ +---------------------+ | 565 | /|\ /|\ | 566 | | | | 567 +---------------------+ +-------------------+ 568 add/remove/updt addr; add/remove/updt addr; 569 or update port; or update port; 571 Figure 6 State Transition Example of a TSI Instance 572 on an External NVE 574 The Associated state of a TSI instance on an external NVE indicates 575 all the addresses for that TSI have already associated with the VAP 576 of the external NVE on a given port e.g. on port p for a given VN but 577 no real traffic to and from the TSI is expected and allowed to pass 578 through. An NVE has reserved all the necessary resources for that 579 TSI. An external NVE may report the mappings of its underlay IP 580 address and the associated TSI addresses to NVA and relevant network 581 nodes may save such information to their mapping tables but not their 582 forwarding tables. An NVE may create ACL or filter rules based on the 583 associated TSI addresses on that attached port p but not enable them 584 yet. The local tag for the VN corresponding to the TSI instance 585 should be provisioned on port p to receive packets. 587 The VM migration event (discussed section 2) may cause the hypervisor 588 to send an associate message to the NVE connected to the destination 589 hypervisor of the migration. A VM creation event may also cause to 590 the same practice. 592 The Activated state of a TSI instance on an external NVE indicates 593 that all the addresses for that TSI are functioning correctly on a 594 given port e.g. port p and traffic can be received from and sent to 595 that TSI via the NVE. The mappings of the NVE's underlay IP address 596 and the associated TSI addresses should be put into the forwarding 597 table rather than the mapping table on relevant network nodes. ACL or 598 filter rules based on the associated TSI addresses on the attached 599 port p in the NVE are enabled. The local tag for the VN corresponding 600 to the TSI instance must be provisioned on port p to receive packets. 602 The Activate message makes the state transit from Init or Associated 603 to Activated. VM creation, VM migration, and VM resumption events 604 discussed in Section 4 may trigger sending the Activate message from 605 the hypervisor to the external NVE. 607 TSI information may get updated in either the Associated or Activated 608 state. The following are considered updates to the TSI information: 609 add or remove the associated addresses, update the current associated 610 addresses (for example updating IP for a given MAC), and update the 611 NVE port information based on where the NVE receives messages. Such 612 updates do not change the state of TSI. When any address associated 613 with a given TSI changes, the NVE should inform the NVA to update the 614 mapping information for NVE's underlying address and the associated 615 TSI addresses. The NVE should also change its local ACL or filter 616 settings accordingly for the relevant addresses. Port information 617 updates will cause the provisioning of the local tag for the VN 618 corresponding to the TSI instance on new port and removal from the 619 old port. 621 3.3 TSI Disassociate and Deactivate 623 Disassociate and deactivate behaviors are conceptually the reverse of 624 associate and activate. 626 From Activated state to Associated state, the external NVE needs to 627 make sure the resources are still reserved but the addresses 628 associated to the TSI are not functioning. No traffic to or from the 629 TSI is expected or allowed to pass through. For example, the NVE 630 needs to tell the NVA to remove the relevant addresses mapping 631 information from forwarding and routing tables. ACL and filtering 632 rules regarding the relevant addresses should be disabled. 634 From Associated or Activated state to the Init state, the NVE 635 releases all the resources relevant to TSI instances. The NVE should 636 also inform the NVA to remove the relevant entries from mapping 637 table. ACL or filtering rules regarding the relevant addresses should 638 be removed. Local tag provisioning on the connecting port on NVE 639 should be cleared. 641 A VM suspension event (discussed in section 2) may cause the relevant 642 TSI instance(s) on the NVE to transit from Activated to Associated 643 state. 645 A VM pause event normally does not affect the state of the relevant 646 TSI instance(s) on the NVE as the VM is expected to run again soon. 648 A VM shutdown event will normally cause the relevant TSI instance(s) 649 on the NVE to transition to Init state from Activated state. All 650 resources should be released. 652 A VM migration will cause the TSI instance on the source NVE to leave 653 Activated state. When a VM migrates to another hypervisor connecting 654 to the same NVE, i.e. source and destination NVE are the same, NVE 655 should use TSI_ID and incoming port to differentiate two TSI 656 instances. 658 Although the triggering messages for the state transition shown in 659 Figure 6 does not indicate the difference between a VM 660 creation/shutdown event and a VM migration arrival/departure event, 661 the external NVE can make optimizations if it is given such 662 information. For example, if the NVE knows the incoming activate 663 message is caused by migration rather than VM creation, some 664 mechanisms may be employed or triggered to make sure the dynamic 665 configurations or provisionings on the destination NVE are the same 666 as those on the source NVE for the migrated VM. For example an IGMP 667 query [RFC2236] can be triggered by the destination external NVE to 668 the migrated VM so that VM is forced to send an IGMP report to the 669 multicast router. Then a multicast router can correctly route the 670 multicast traffic to the new external NVE for those multicast groups 671 the VM joined before the migration. 673 4. Hypervisor-to-NVE Control Plane Protocol Requirements 675 Req-1: The protocol MUST support a bridged network connecting End 676 Devices to the External NVE. 678 Req-2: The protocol MUST support multiple End Devices sharing the 679 same External NVE via the same physical port across a bridged 680 network. 682 Req-3: The protocol MAY support an End Device using multiple external 683 NVEs simultaneously, but only one external NVE for each VN. 685 Req-4: The protocol MAY support an End Device using multiple external 686 NVEs simultaneously for the same VN. 688 Req-5: The protocol MUST allow the End Device to initiate a request 689 to its associated External NVE to be connected/disconnected to a 690 given VN. 692 Req-6: The protocol MUST allow an External NVE initiating a request 693 to its connected End Devices to be disconnected from a given VN. 695 Req-7: When a TS attaches to a VN, the protocol MUST allow for an End 696 Device and its external NVE to negotiate one or more locally- 697 significant tag(s) for carrying traffic associated with a specific VN 698 (e.g., [IEEE 802.1Q] tags). 700 Req-8: The protocol MUST allow an End Device initiating a request to 701 associate/disassociate and/or activate/deactive some or all 702 address(es) of a TSI instance to a VN on an NVE port. 704 Req-9: The protocol MUST allow the External NVE initiating a request 705 to disassociate and/or deactivate some or all address(es) of a TSI 706 instance to a VN on an NVE port. 708 Req-10: The protocol MUST allow an End Device initiating a request to 709 add, remove or update address(es) associated with a TSI instance on 710 the external NVE. Addresses can be expressed in different formats, 711 for example, MAC, IP or pair of IP and MAC. 713 Req-11: The protocol MUST allow the External NVE to authenticate the 714 End Device connected. 716 Req-12: The protocol MUST be able to run over L2 links between the 717 End Device and its External NVE. 719 Req-13: The protocol SHOULD support the End Device indicating if an 720 associate or activate request from it is the result of a VM hot 721 migration event. 723 5. VDP Applicability and Enhancement Needs 725 Virtual Station Interface (VSI) Discovery and Configuration Protocol 726 (VDP) [IEEE 802.1Q] can be the control plane protocol running between 727 the hypervisor and the external NVE. Appendix A illustrates VDP for 728 the reader's information. 730 VDP facilitates the automatic discovery and configuration of Edge 731 Virtual Bridging (EVB) stations and Edge Virtual Bridging (EVB) 732 bridges. An EVB station is normally an end station running multiple 733 VMs. It is conceptually equivalent to a hypervisor in this document. 734 An EVB bridge is conceptually equivalent to the external NVE. 736 VDP is able to pre-associate/associate/de-associate a VSI on an EVB 737 station with a port on the EVB bridge. A VSI is approximately the 738 concept of a virtual port by which a VM connects to the hypervisor in 739 this document's context. The EVB station and the EVB bridge can reach 740 agreement on VLAN ID(s) assigned to a VSI via VDP message exchange. 741 Other configuration parameters can be exchanged via VDP as well. VDP 742 is carried over the Edge Control Protocol(ECP) [IEEE 802.1Q] which 743 provides a reliable transportation over a layer 2 network. 745 VDP protocol needs some extensions to fulfill the requirements listed 746 in this document. Table 1 shows the needed extensions and/or 747 clarifications in the NVO3 context. 749 +------+-----------+-----------------------------------------------+ 750 | Req | Supported | remarks | 751 | | by VDP? | | 752 +------+-----------+-----------------------------------------------+ 753 | Req-1| | | 754 +------+ |Needs extension. Must be able to send to a | 755 | Req-2| |specific unicast MAC and should be able to send| 756 +------+ Partially |to a non-reserved well known multicast address | 757 | Req-3| |other than the nearest customer bridge address.| 758 +------+ | | 759 | Req-4| | | 760 +------+-----------+-----------------------------------------------+ 761 | Req-5| Yes |VN is indicated by GroupID | 762 +------+-----------+-----------------------------------------------+ 763 | Req-6| Yes |Bridge sends De-Associate | 764 +------+-----------+------------------------+----------------------+ 765 | | |VID==NULL in request and bridge returns the | 766 | Req-7| Yes |assigned value in response or specify GroupID | 767 | | |in request and get VID assigned in returning | 768 | | |response. Multiple VLANs per group are allowed.| 769 +------+-----------+------------------------+----------------------+ 770 | | | requirements | VDP equivalence | 771 | | +------------------------+----------------------+ 772 | | | associate/disassociate|pre-asso/de-associate | 773 | Req-8| Partially | activate/deactivate |associate/de-associate| 774 | | +------------------------+----------------------| 775 | | |Needs extension to allow associate->pre-assoc | 776 +------+-----------+------------------------+----------------------+ 777 | Req-9| Yes | VDP bridge initiates de-associate | 778 +------+-----------+-----------------------------------------------+ 779 |Req-10| Partially |Needs extension for IPv4/IPv6 address. Add a | 780 | | |new "filter info format" type. | 781 +------+-----------+-----------------------------------------------+ 782 |Req-11| No |Out-of-band mechanism is preferred, e.g. MACSec| 783 | | |or 802.1X. | 784 +------+-----------+-----------------------------------------------+ 785 |Req-12| Yes |L2 protocol naturally | 786 +------+-----------+-----------------------------------------------+ 787 | | |M bit for migrated VM on destination hypervisor| 788 | | |and S bit for that on source hypervisor. | 789 |Req-13| Partially |It is indistinguishable when M/S is 0 between | 790 | | |no guidance and events not caused by migration | 791 | | |where NVE may act differently. Needs new | 792 | | |New bits for migration indication in new | 793 | | |"filter info format" type. | 794 +------+-----------+-----------------------------------------------+ 795 Table 1 Compare VDP with the requirements 797 Simply adding the ability to carry layer 3 addresses, VDP can serve 798 the Hypervisor-to-NVE control plane functions pretty well. Other 799 extensions are the improvement of the protocol capabilities for 800 better fit in an NVO3 network. 802 6. Security Considerations 804 External NVEs must ensure that only properly authorized Tenant 805 Systems are allowed to join and become a part of any particular 806 Virtual Network. In some cases, tNVE may want to connect to the 807 authenticated nNVE for provisioning purpose. Then a mutual 808 authentication between tNVE tand nNVE is required. If a secure 809 channel is required between tNVE and nNVE to carry encrypted split- 810 NVE control plane protocol payload, the existing mechanisms like 811 MACsec [IEEE 802.1AE] can be used. 813 In addition, external NVEs will need appropriate mechanisms to ensure 814 that any hypervisor wishing to use the services of an NVE is properly 815 authorized to do so. One design point is whether the hypervisor 816 should supply the external NVE with necessary information (e.g., VM 817 addresses, VN information, or other parameters) that the external NVE 818 uses directly, or whether the hypervisor should only supply a VN ID 819 and an identifier for the associated VM (e.g., its MAC address), with 820 the external NVE using that information to obtain the information 821 needed to validate the hypervisor-provided parameters or obtain 822 related parameters in a secure manner. The former approach can be 823 used in a trusted environment so that the external NVE can directly 824 use all the information retrieved from the hypervisor for local 825 configuration. It saves the effort on the external NVE side from 826 information retrieval and/or validation. The latter approach gives 827 more reliable information as the external NVE needs to retrieve them 828 from some management system database. Especially some network related 829 parameters like VLAN IDs can be passed back to hypervisor to be used 830 as a more authoritative provisioning. However in certain cases, it is 831 difficult or inefficient for an external NVE to have access or query 832 on some information to those management systems. Then the external 833 NVE has to obtain those information from hypervisor. 835 7. IANA Considerations 837 No IANA action is required. 839 8. Acknowledgements 841 This document was initiated based on the merger of the drafts draft- 842 kreeger-nvo3-hypervisor-nve-cp, draft-gu-nvo3-tes-nve-mechanism, and 843 draft-kompella-nvo3-server2nve. Thanks to all the co-authors and 844 contributing members of those drafts. 846 The authors would like to specially thank Lucy Yong and Jon Hudson 847 for their generous help in improving this document. 849 8. References 851 8.1 Normative References 853 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 854 Requirement Levels", BCP 14, RFC 2119, March 1997. 856 [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. 857 Rekhter, "Framework for DC Network Virtualization", 858 October 2014. 860 [RFC7666] Asai H., MacFaden M., Schoenwaelder J., Shima K., Tsou T., 861 "Management Information Base for Virtual Machines 862 Controlled by a Hypervisor", October 2015. 864 [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., Narten, 865 T., "An Architecture for Data-Center Network 866 Virtualization over Layer 3 (NVO3)", December 2016. 868 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 869 Key Words ", BCP 14, RFC 8174, May 2017. 871 [IEEE 802.1Q] IEEE, "Media Access Control (MAC) Bridges and Virtual 872 Bridged Local Area Networks", IEEE Std 802.1Q-2014, 873 November 2014. 875 8.2 Informative References 877 [RFC2236] Fenner, W., "Internet Group Management Protocol, Version 878 2", RFC 2236, November 1997. 880 [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally 881 Unique IDentifier (UUID) URN Namespace", RFC 4122, July 882 2005. 884 [RFC7364] Narten, T., Gray, E., Black, D., Fang, L., Kreeger, L., and 885 M. Napierala, "Problem Statement: Overlays for Network 886 Virtualization", October 2014. 888 [IEEE 802.1AE] IEEE, "MAC Security (MACsec)", IEEE Std 802.1AE-2006, 889 August 2006. 891 Appendix A. IEEE 802.1Q VDP Illustration (For information only) 893 The VDP (VSI Discovery and Discovery and Configuration Protocol, 894 clause 41 of [IEEE 802.1Q]) can be considered as a controlling 895 protocol running between the hypervisor and the external bridge. VDP 896 association TLV structure are formatted as shown in Figure A.1. 898 +--------+--------+------+-----+--------+------+------+------+------+ 899 |TLV type|TLV info|Status|VSI |VSI Type|VSI ID|VSI ID|Filter|Filter| 900 | |string | |Type |Version |Format| |Info |Info | 901 | |length | |ID | | | |format| | 902 +--------+--------+------+-----+--------+------+------+------+------+ 903 | | |<----VSI type&instance----->|<--Filter--->| 904 | | |<-------------VSI attributes------------->| 905 |<--TLV header--->|<-----------TLV information string ------------->| 907 Figure A.1: VDP association TLV 909 There are basically four TLV types. 911 1. Pre-associate: Pre-associate is used to pre-associate a VSI 912 instance with a bridge port. The bridge validates the request and 913 returns a failure Status in case of errors. Successful pre-associate 914 does not imply that the indicated VSI Type or provisioning will be 915 applied to any traffic flowing through the VSI. The pre-associate 916 enables faster response to an associate, by allowing the bridge to 917 obtain the VSI Type prior to an association. 919 2. Pre-associate with resource reservation: Pre-associate with 920 Resource Reservation involves the same steps as Pre-associate, but on 921 success it also reserves resources in the bridge to prepare for a 922 subsequent Associate request. 924 3. Associate: Associate creates and activates an association between 925 a VSI instance and a bridge port. An bridge allocates any required 926 bridge resources for the referenced VSI. The bridge activates the 927 configuration for the VSI Type ID. This association is then applied 928 to the traffic flow to/from the VSI instance. 930 4. De-associate: The de-associate is used to remove an association 931 between a VSI instance and a bridge port. Pre-associated and 932 associated VSIs can be de-associated. De-associate releases any 933 resources that were reserved as a result of prior associate or pre- 934 Associate operations for that VSI instance. 936 De-associate can be initiated by either side and the other types can 937 only be initiated by the server side. 939 Some important flag values in VDP Status field: 941 1. M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) 942 is migrating (M-bit = 1) or provides no guidance on the migration of 943 the user of the VSI (M-bit = 0). The M-bit is used as an indicator 944 relative to the VSI that the user is migrating to. 946 2. S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 947 suspended (S-bit = 1) or provides no guidance as to whether the user 948 of the VSI is suspended (S-bit = 0). A keep-alive Associate request 949 with S-bit = 1 can be sent when the VSI user is suspended. The S-bit 950 is used as an indicator relative to the VSI that the user is 951 migrating from. 953 The filter information format currently defines 4 types. Each of the 954 filter information is shown in details as follows. 956 1. VID Filter Info format 957 +---------+------+-------+--------+ 958 | #of | PS | PCP | VID | 959 |entries |(1bit)|(3bits)|(12bits)| 960 |(2octets)| | | | 961 +---------+------+-------+--------+ 962 |<--Repeated per entry->| 964 Figure A.2 VID Filter Info format 966 2. MAC/VID Filter Info format 967 +---------+--------------+------+-------+--------+ 968 | #of | MAC address | PS | PCP | VID | 969 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 970 |(2octets)| | | | | 971 +---------+--------------+------+-------+--------+ 972 |<--------Repeated per entry---------->| 974 Figure A.3 MAC/VID filter format 976 3. GroupID/VID Filter Info format 977 +---------+--------------+------+-------+--------+ 978 | #of | GroupID | PS | PCP | VID | 979 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 980 |(2octets)| | | | | 981 +---------+--------------+------+-------+--------+ 982 |<--------Repeated per entry---------->| 984 Figure A.4 GroupID/VID filter format 986 4. GroupID/MAC/VID Filter Info format 987 +---------+----------+-------------+------+-----+--------+ 988 | #of | GroupID | MAC address | PS | PCP | VID | 989 |entries |(4 octets)| (6 octets) |(1bit)|(3b )|(12bits)| 990 |(2octets)| | | | | | 991 +---------+----------+-------------+------+-----+--------+ 992 |<-------------Repeated per entry------------->| 993 Figure A.5 GroupID/MAC/VID filter format 995 The null VID can be used in the VDP Request sent from the station to 996 the external bridge. Use of the null VID indicates that the set of 997 VID values associated with the VSI is expected to be supplied by the 998 bridge. The set of VID values is returned to the station via the VDP 999 Response. The returned VID value can be a locally significant value. 1000 When GroupID is used, it is equivalent to the VN ID in NVO3. GroupID 1001 will be provided by the station to the bridge. The bridge maps 1002 GroupID to a locally significant VLAN ID. 1004 The VSI ID in VDP association TLV that identify a VM can be one of 1005 the following format: IPV4 address, IPV6 address, MAC address, UUID 1006 [RFC4122], or locally defined. 1008 Authors' Addresses 1010 Yizhou Li 1011 Huawei Technologies 1012 101 Software Avenue, 1013 Nanjing 210012 1014 China 1016 Phone: +86-25-56625409 1017 EMail: liyizhou@huawei.com 1019 Donald Eastlake 1020 Huawei R&D USA 1021 155 Beaver Street 1022 Milford, MA 01757 USA 1024 Phone: +1-508-333-2270 1025 EMail: d3e3e3@gmail.com 1027 Lawrence Kreeger 1028 Arrcus, Inc 1030 Email: lkreeger@gmail.com 1032 Thomas Narten 1033 IBM 1035 Email: narten@us.ibm.com 1037 David Black 1038 Dell EMC 1039 176 South Street, 1040 Hopkinton, MA 01748 USA 1042 Email: david.black@dell.com