idnits 2.17.1 draft-gu-nvo3-tes-nve-mechanism-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 13 instances of too long lines in the document, the longest one being 19 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (Oct 19, 2012) is 4206 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'Qbg' Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Y. Gu 3 Internet-Draft Y. Li 4 Intended status: Standards Track Huawei 5 Expires: April 22, 2013 Oct 19, 2012 7 The mechanism and signalling between TES and NVE 8 draft-gu-nvo3-tes-nve-mechanism-01 10 Abstract 12 his draft introduces the interaction required between TES to NVE when 13 NVE is located in an external box to TES . The signaling between TES 14 and NVE has to be designed carefully to reflect all the interaction 15 requirements. This document describes the relevant considerations 16 for such design and also provides a basic analysis of the potential 17 reusable protocols. Currently this draft focuses on the general 18 interaction procedures with relevant parameters and the signaling 19 design consideration. It may be extended to show more detailed 20 signalling design recommendation and/or solution recommendation in 21 the future with the progress of NVO3's work. 23 Status of this Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on April 22, 2013. 40 Copyright Notice 42 Copyright (c) 2012 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Terminologies and concepts . . . . . . . . . . . . . . . . . . 6 59 3. TES to NVE Interaction . . . . . . . . . . . . . . . . . . . . 9 60 3.1. Interaction Intentions . . . . . . . . . . . . . . . . . . 9 61 3.2. VM Lifetime Events . . . . . . . . . . . . . . . . . . . . 9 62 3.2.1. VM Creation . . . . . . . . . . . . . . . . . . . . . 9 63 3.2.2. VM Pre-associate with NVE . . . . . . . . . . . . . . 10 64 3.2.3. VM Associate with NVE . . . . . . . . . . . . . . . . 10 65 3.2.4. VM Suspension . . . . . . . . . . . . . . . . . . . . 10 66 3.2.5. VM Resume . . . . . . . . . . . . . . . . . . . . . . 11 67 3.2.6. VM Migration . . . . . . . . . . . . . . . . . . . . . 11 68 3.2.7. VM Termination . . . . . . . . . . . . . . . . . . . . 11 69 3.2.8. VM Full Lifecycle Sketch . . . . . . . . . . . . . . . 11 70 3.3. Events,Interaction and Parameters . . . . . . . . . . . . 13 71 3.3.1. VM Pre-association . . . . . . . . . . . . . . . . . . 13 72 3.3.2. VM Association . . . . . . . . . . . . . . . . . . . . 14 73 3.3.3. VM Suspension . . . . . . . . . . . . . . . . . . . . 15 74 3.3.4. VM Resume . . . . . . . . . . . . . . . . . . . . . . 15 75 3.3.5. VM Emigration . . . . . . . . . . . . . . . . . . . . 16 76 3.3.6. VM Immigration . . . . . . . . . . . . . . . . . . . . 16 77 3.3.7. VM Termination . . . . . . . . . . . . . . . . . . . . 17 78 3.3.8. Keep-alive . . . . . . . . . . . . . . . . . . . . . . 17 79 3.3.9. NVE Local Changes . . . . . . . . . . . . . . . . . . 18 80 3.4. Signalling Design Considerations . . . . . . . . . . . . . 18 81 3.4.1. General Requirements . . . . . . . . . . . . . . . . . 18 82 3.4.2. Consideration . . . . . . . . . . . . . . . . . . . . 19 83 3.4.3. Signalling States Machine . . . . . . . . . . . . . . 19 84 4. Security Considerations . . . . . . . . . . . . . . . . . . . 20 85 5. Appendix 1: Mechanism Analysis . . . . . . . . . . . . . . . . 20 86 5.1. IEEE 802.1Qbg . . . . . . . . . . . . . . . . . . . . . . 20 87 5.1.1. Brief Introduction . . . . . . . . . . . . . . . . . . 21 88 5.2. BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 89 5.3. External Controller . . . . . . . . . . . . . . . . . . . 23 90 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 91 6.1. Normative Reference . . . . . . . . . . . . . . . . . . . 23 92 6.2. Informative Reference . . . . . . . . . . . . . . . . . . 23 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 95 1. Introduction 97 Tenant End System (TES) is the physical host where tenant deploys 98 their applications. Tenants' applications can be deployed on a 99 physical server directly or on a virtual machine resided on a 100 physical server. Tenant's virtual network, or say virtual data 101 center, is an overlay network which is built on the underlying 102 network, but logically independent of the underlying network. 103 Network Virtualization Edge (NVE) is implemented with virtualization 104 functions to encapsulate or decapsulate a tenant's packet that allow 105 for L2 and/or L3 tenant separation and for hiding tenant addressing 106 information (MAC and IP addresses). A Tenant End System attaches to 107 a Network Virtualization Edge (NVE) node, either directly or via a 108 switched network (typically Ethernet). TES and NVE can be on the 109 same physical server or on the separate devices. Fig1 to Fig3 show 110 different NVE location cases. While TES and NVE are on the same 111 physical server, the interaction between TES and NVE is via some 112 proprietary internal interface which does not require a standard 113 signaling protocol. Therefore such scenario is not the target of 114 this document.For all the other scenarios, as long as the signaling 115 between TES and NVE is visibile to network developer, it is in the 116 scope of this draft. We tried to examine the different locations of 117 NVE to make sure the signaling interaction between NVE and TES cover 118 as possible scenarios as possible. 120 o (NVE Location 1) NVE and TES are co-located in a physical server. 121 VM connects to NVE on Hypervisor. In this case, there should be 122 some mechanism to assist Hypervisor know of VM changes, including 123 adding, deleting and migration. Both VM and Hypervisor, as well 124 as network service appliance, are controlled by VM Manager. VM 125 Manager is aware of any VM identity and event, hence it can easily 126 notify NVE about the information through some internal interface. 127 A publicaly available standard protocol is not necessary in this 128 case. Refer to Fig1. 130 +-------------+------------+ 131 | +--------------------+ | 132 | | +--------------+ | | 133 | | |Overlay Module| | | 134 | | +----+---------+ | | 135 | | | VN context| | 136 | | +-----+-------+ | | 137 | | | VNI | | | 138 | | +-+---------+-+ | | 139 | | | VAPs | | | 140 | +----+---------+-----+ | 141 | | | | 142 | +--+---------+---+ | 143 | | VM | | 144 | +----------------+ | 145 | | 146 +--------------------------+ 147 Tenant End Systems 149 Figure 1 151 o (NVE Location 2) TES connects to NVE on an external network entity 152 next to it(Figure 2). VM is controlled by VM 153 Manager, while NVE is controlled by some other management entity 154 like network management system. Hence proprietary protocol 155 between TES and NVE may not fit all the scanarios. A standard 156 protocol to signal between TES and NVE is mandatory in this case. 157 Refer to Fig2. 159 +------- L3 Network --------+ 160 | | 161 | Tunnel Overlay | 162 +------------+---------+ +---------+------------+ 163 | +----------+-------+ | | +---------+--------+ | 164 | | Overlay Module | | | | Overlay Module | | 165 | +---------+--------+ | | +---------+--------+ | 166 | |VN context| | VN context| | 167 | | | | | | 168 | +--------+-------+ | | +--------+-------+ | 169 | | VNI | | | | VNI | | 170 NVE1 | +-+------------+-+ | | +-+-----------+--+ | NVE2 171 | | VAPs | | | | VAPs | | 172 +----+------------+----+ +----+-----------+-----+ 173 | | | | 174 -------+------------+-----------------+-----------+------- 175 | | Tenant | | 176 | | Service IF | | 177 +----+------------+--------+ +---+-----------+-------+ 178 | +----------------+ | | +---------------+ | 179 | | Hypervisor | | | | Hypervisor | | 180 | +--------+-------+ | | +-------+-------+ | 181 | | | | | | 182 | +-------+------+ | | +------+------+ | 183 | | VM | | | | VM | | 184 | +--------------+ | | +-------------+ | 185 | | | | 186 +--------------------------+ +-----------------------+ 188 Tenant End Systems Tenant End Systems 190 Figure 2: NVE Location3: VM connects to NVE on external network 191 entity 193 o (NVE Location 3) TES and NVE are indirectly connected. Refer to 194 Fig3. 196 +------- L3 Network ------+ 197 | | 198 | Tunnel Overlay | 199 +------------+--------+ +--------+------------+ 200 | +----------+------+ | | +------+----------+ | 201 | | Overlay Module | | | | Overlay Module | | 202 | +--------+--------+ | | +--------+--------+ | 203 | |VN Context| | |VN Context| 204 | | | | | | 205 | +-------+-------+ | | +------+-------+ | 206 | | VNI | | | | VNI | | 207 NVE1 | +-+-----------+-+ | | +-+----------+-+ | NVE2 208 | | VAPs | | | | VAPs | | 209 +----+-----------+----+ +----+-----------+----+ /\ 210 | | | | | 211 ................... ................... | 212 -----: switched network: : switched network: |signalling 213 ................... ................... | 214 | | Tenant | | | 215 | | Service IF | | \/ 216 Tenant End Systems Tenant End Systems 218 Figure 3: Reference model when TES and NVE are indirectly 219 connected 221 In the mail list discussion, more than one mechanisms to be used 222 between TESand NVE were discussed, including VDP (VSI Discovery and 223 Configuration Protocol ), BGP and others.. This draft is not going 224 to make assertion about which protocol is better. We believe that 225 each candidate protocol can, with some revision or updating, be used 226 to exchange necessary events and information between TES and NVE. 227 The final decision on which one to be used does not only depend on 228 functionalities, but also some other aspects, e.g. lightweight to be 229 implemented on server, widely deployment in the industry, efficiency 230 and performance etc. 232 This draft first presents the recommended procedures of the TES and 233 NVE signalling, key parameters of each step, and issues need to be 234 addressed. Then a set of signaling design considerations are 235 provided, which can be used as design requirements for the future 236 signalling definition. In the appendix, we give a brief analysis on 237 two existing protocols and also show how they can be revised to adapt 238 to TES and NVE signaling. 240 2. Terminologies and concepts 242 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 243 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 244 document are to be interpreted as described in [RFC2119]. 246 The document uses terms defined in [framework]. 248 VN: Virtual Network. This is a virtual L2 or L3 domain that belongs 249 a tenant. 251 VNI: Virtual Network Instance. This is one instance of a virtual 252 overlay network. Two Virtual Networks are isolated from one another 253 and may use overlapping addresses. 255 Virtual Network Context or VN Context: Field that is part of the 256 overlay encapsulation header which allows the encapsulated frame to 257 be delivered to the appropriate virtual network endpoint by the 258 egress NVE. The egress NVE uses this field to determine the 259 appropriate virtual network context in which to process the packet. 260 This field MAY be an explicit, unique (to the administrative domain) 261 virtual network identifier (VNID) or MAY express the necessary 262 context information in other ways (e.g. a locally significant 263 identifier). 265 VNID: Virtual Network Identifier. In the case where the VN context 266 has global significance, this is the ID value that is carried in each 267 data packet in the overlay encapsulation that identifies the Virtual 268 Network the packet belongs to. 270 NVE: Network Virtualization Edge. It is a network entity that sits 271 on the edge of the NVO3 network. It implements network 272 virtualization functions that allow for L2 and/or L3 tenant 273 separation and for hiding tenant addressing information (MAC and IP 274 addresses). An NVE could be implemented as part of a virtual switch 275 within a hypervisor, a physical switch or router, a Network Service 276 Appliance or even be embedded within an End Station. 278 Underlay or Underlying Network: This is the network that provides the 279 connectivity between NVEs. The Underlying Network can be completely 280 unaware of the overlay packets. Addresses within the Underlying 281 Network are also referred to as "outer addresses" because they exist 282 in the outer encapsulation. The Underlying Network can use a 283 completely different protocol (and address family) from that of the 284 overlay. 286 Data Center (DC): A physical complex housing physical servers, 287 network switches and routers, Network Service Appliances and 288 networked storage. The purpose of a Data Center is to provide 289 application and/or compute and/or storage services. One such service 290 is virtualized data center services, also known as Infrastructure as 291 a Service. 293 VM: Virtual Machine. Several Virtual Machines can share the 294 resources of a single physical computer server using the services of 295 a Hypervisor (see below definition). 297 Hypervisor: Server virtualization software running on a physical 298 compute server that hosts Virtual Machines. The hypervisor provides 299 shared compute/memory/storage and network connectivity to the VMs 300 that it hosts. Hypervisors often embed a Virtual Switch (see below). 302 Virtual Switch: A function within a Hypervisor (typically implemented 303 in software) that provides similar services to a physical Ethernet 304 switch. It switches Ethernet frames between VMs' virtual NICs within 305 the same physical server, or between a VM and a physical NIC card 306 connecting the server to a physical Ethernet switch. It also 307 enforces network isolation between VMs that should not communicate 308 with each other. 310 Tenant: A customer who consumes virtualized data center services 311 offered by a cloud service provider. A single tenant may consume one 312 or more Virtual Data Centers hosted by the same cloud service 313 provider. 315 Tenant End System: It defines an end system of a particular tenant, 316 which can be for instance a virtual machine (VM), a non-virtualized 317 server, or a physical appliance. 319 Virtual Access Points (VAPs): Tenant End Systems are connected to the 320 Tenant Instance through Virtual Access Points (VAPs). The VAPs can 321 be in reality physical ports on a ToR or virtual ports identified 322 through logical interface identifiers (VLANs, internal VSwitch 323 Interface ID leading to a VM). 325 VN Name: A globally unique name for a VN. The VN Name is not carried 326 in data packets originating from End Stations, but must be mapped 327 into an appropriate VN-ID for a particular encapsulating technology. 328 Using VN Names rather than VN-IDs to identify VNs in configuration 329 files and control protocols increases the portability of a VDC and 330 its associated VNs when moving among different administrative domains 331 (e.g. switching to a different cloud service provider). 333 VSI: Virtual Station Interface. Typically, a VSI is a virtual NIC 334 connected directly with a VM. [Qbg] 336 3. TES to NVE Interaction 338 3.1. Interaction Intentions 340 While TES is a non-virtualized physical server, a single physical 341 interface on NVE is exclusively attached to a single tenant and the 342 attachement doesn't change very frequently. In this case, NVE can be 343 pre-configured with tenant's network properties and policies to 344 execute appropriate packet proccessing. And when a physical server 345 moves, which means a server change its attach point to the network, 346 the new NVE, to which the server is going to attach with in the new 347 location, can also be preconfigured. In this case, there is no need 348 to proceed signalling between TES and NVE. 350 While TES is a virualized server with multiple VMs, the interaction 351 between TES and NVE becomes necessary. A physical interface on NVE 352 can be attached to multiple VMs, which could belong to the same or 353 different tenants, and VMs can be moved to new locations without 354 physical shutdown, which means NVE not able to know VMs' attachemnt 355 and/or detachment by checking the physical port. As described in 356 [framework], NVE need to establish Virtual Network Instance for each 357 tenant virtual network attached to it through physical interface, NVE 358 must be able to know which tenants are attached to it and the 359 corresponding VMs belongs to each tenants. So that NVE must be able 360 to 1) identify and distinguish VMs attched to NVE through the same 361 physical interface; 2) identify which tenant the VM belongs to; 3) 362 get the network policies that is associated with the tenant. That's 363 why a interaction signalling between TES and NVE is needed. Of 364 course the signalling between TES and NVE are not limited to the 365 above intentions. While looking into the detail proccessing of VM 366 events, we will find more signalling functionalities and proccessing 367 on TES and NVE. 369 3.2. VM Lifetime Events 371 Not every VM has to pass through all the listed VM lifetime events. 372 Any VM can have at least two or a combination of the following 373 events. 375 3.2.1. VM Creation 377 VM Manager indicates the hypervisor to schedule resources on server 378 for a particular VM, including CPU, Memory, Storage and Network 379 resources. After the VM is created on the server, the VM has 380 necessary resource and is ready to be launched. The creation of VM 381 doesn't necessarily mean the VM is running. The VM can created but 382 not launched for some while as long as the manager would like. The 383 VM can be created and launched at once. Launching a VM just like 384 startup a physical computer. 386 Though VM creation is a very important events for VM, but the 387 attached NVE needn't be aware of this event. 389 3.2.2. VM Pre-associate with NVE 391 VM Manager can decide when to luanch a VM and connect the VM to the 392 network. Before VM connects to network, operator need to provision 393 VM's network properties and policies to the NVE that the VM is 394 attached to. The examples of network properties are VM MAC address, 395 tenant virtual network identifier. The exmaples of policies are ACL 396 and QoS. But these properties and policies are not immediately 397 activated on NVE unless the VM Manager indicate the VM to connect to 398 network. This is called Pre-association. Pre-association is 399 optional event. 401 3.2.3. VM Associate with NVE 403 This event means the VM is going to connect to the network. NVE has 404 to get VM's network properties and policies, assign resources and 405 install these properties and policies. If there is Pre-association 406 before Association, NVE can reduce the time for Association. While 407 VM is associated, it can use network resources as a physical server 408 does. 410 Association can happen with or without pre-association. If there is 411 Pre-association before Association, NVE has already the net work 412 properties and policies restored, or even installed. If the network 413 properties and policies in Association message is the same as the 414 pre-association, NVE can activate the installed network properties 415 and policies. If they are different, the old reserved resources 416 should be released and the new network properties and policies are 417 installed and activated. 419 3.2.4. VM Suspension 421 Creating and terminating VM may take a considerable amount of time. 422 Instead of performing these operations, operators can suspend a 423 virtual machine for the required time and quickly resume it later. 424 Suspending a VM is similar to putting a real computer into the sleep 425 mode. When suspending a VM, VM's current state (including the state 426 of all applications and processes running in the VM) is stored. When 427 the suspended virtual machine is resumed, it continues operating at 428 the same point the virtual machine was at the time of its suspending. 430 3.2.5. VM Resume 432 To activate the suspended VM. The suspended applications will start 433 again at the state the VM was suspended. It's not always predictable 434 on when a suspended VM will be resumed. 436 3.2.6. VM Migration 438 Two kinds VM migration, i.e. hot migration (or live migraiton) and 439 offline migration. The proccessing of offline migration is similar 440 to terminating the VM on one server and creating it on another 441 server. The running applications on the VM will be broken and then 442 be restarted again on the new location. For live migration, VM is 443 lively migrated from one location to another, and the running 444 applications should not be visibly disrupted. There is no 445 termination or creation during live migration, so it's highly 446 important to let NVE be aware of the migration so that corresponding 447 network properties and policies can be correctly obtained, installed 448 and activated on new location, and removed from the old location. 449 Otherwise, there might be security risk and will influence or even 450 interrupted running applications. 452 There are two sub-type for VM migration: VM emigration and VM 453 immigration. 455 o VM Emigrating: VM is emigrating from this server. Hence, all the 456 relevant resources on the server and attached NVE are disabled, 457 but not removed right now, and is ready to be removed once VM is 458 successfully migrated. If VM is failed to immigrate on the new 459 location, VM has to be resumed on old location with the states and 460 policies disabled by old NVE. 462 o VM Immigrating: VM is immigrating to this server. The srever and 463 attached NVE has prepared the necessary resources and is ready to 464 enable the VM's properties and policies once VM is successfully 465 migrated. 467 3.2.7. VM Termination 469 All applications and processing on VM is terminated. All VM's 470 resources on server, including CPU, Memory, Storage and network 471 resources, are released. There is no such a VM any more. 473 3.2.8. VM Full Lifecycle Sketch 475 Not every VM has to pass through all the lifetime events emulated in 476 above. A simplest VM life has only VM Creation, VM Associating with 477 NVE and VM Termination. A most complex VM life has all the events 478 listed in above. In this section, we show a sketch for a VM's full 479 lifecycle with all listed events. This is helpful for the signalling 480 designation in the future. 481 /~~~~~~~~~~~~\ /~~~~~\ 482 |VM Terminate|--Aged out-->|NULL | 483 \~~~~~~~~~~~~/ \~~~~~/ 484 ^ | 485 VM Terminate v 486 | /~~~~~~~~~~~\ 487 +-----------------|VM Creation|<---------. 488 | \~~~~~~~~~~~/ | 489 | | Fail 490 | v | 491 | /~~~~~~~~~~~~~~~~\ | 492 +--------------|VM Pre-Associate|--------. 493 | |with NVE |<-------. 494 | \~~~~~~~~~~~~~~~~/ | 495 | | Fail 496 | v | 497 +----------------/~~~~~~~~~~~~~\<--------|-----------------. 498 | .----------->|VM Associate |---------. | 499 | | |with NVE |<--------. | 500 | | \~~~~~~~~~~~~~/ | Successful Immigraiton 501 |VM Resume | or | or | | to this server 502 | | | .---. .---. | | 503 | | v | | | /~~~~~~~~~~~~~~\ 504 +---|-----/~~~~~~~~~~~~~\ | .------|---------->|VM Immigrating| 505 | .-----|VM Suspension| | | \~~~~~~~~~~~~~~/ 506 | \~~~~~~~~~~~~~/ | | | 507 | | Failed Immigration | 508 | | to other server | 509 | v | | 510 | /~~~~~~~~~~~~~\ | Failed Immigration 511 +--------------------|VM Emigrating|-----. to this server 512 | \~~~~~~~~~~~~~/ | 513 | | | 514 | Successful Immigration to other server | 515 | | | 516 +---------------------------. | 517 | | 518 +-----------------------------------------------------------. 520 Figure 4: VM Full Lifecycle Sketch 522 3.3. Events,Interaction and Parameters 524 In this section, we will present description of interaction, 525 parameters and special concerns for each VM events are provided. The 526 interaction has strong relationship with VM lifetime events, but is 527 not one-to-one mapping, for example, there is no interaction for VM 528 Creation. For VM events, the interaction is initiated by hypervisor 529 on behalf of a VM and sent to VNI on attached NVE. But this is not 530 always the case, since NVE may also initiate interaction if there is 531 some changes happen on NVE and those changes must be learned by 532 particular VMs. 534 3.3.1. VM Pre-association 536 o Interaction: This event will trigger Hypervisor to compose a pre- 537 association message, and then Hypervisor sends the message to NVE. 538 While receives the pre-association message, NVE needs to authorize 539 the VM and/or Hypervisor, obtain VM's network properties and 540 policies, and install the properties and policies on NVE. 542 o Parameters: The signalling from TES to NVE should at least include 543 the following mandatory parameters. 545 * Operation, i.e. Pre-association. 547 * VMID, a global unique ID in Data Center for a VM. A VM can 548 have more than one MAC addresses and belongs to more than one 549 VNID, so a VMID is necessary for NVE to accosicate the VNIDs 550 and MACs with the particular VM. 552 * VNID(s), a global unique ID in Data Center for a tenant's 553 virtual network. 555 * MAC addresses, a VM may have more than one MAC addresses. A VM 556 may also belongs to more than one virtual network. So the MAC 557 address(s) and VNID should be presented in a way that NVE can 558 identify which MAC addresses belongs to which VNID. 560 * Policies, including ACL, QoS, Priority and etc. In the case 561 there are more than one VNID associated with the VM, Policies 562 should be explicitely indicated to belong to which VNID. 564 o Response: After NVE processes pre-association message, it repond 565 to TES with processing result. The response can be SUCCESS or 566 FAIL with such indicated reasons as FAILED AUTHORIZTION, CONFLICT 567 POLICIES(e.g. the provisioned policies are conflict with other 568 existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE 569 has not enough resources to install the provisioned policies). 571 3.3.2. VM Association 573 o Interaction: This event will trigger Hypervisor to compose an 574 Association message, and then Hypervisor sends the message to NVE. 575 Association can happen with or without a Pre-association message. 577 * If there is a Pre-association message before Association, NVE 578 needs to compare the information provided by Pre-association 579 and Association. If they are same, NVE can activate the pre- 580 installed resources. If they are different, NVE needs to do 581 some additional work depending on what information has been 582 changed from pte-association to association. For example, if 583 policy or VNID is changed, NVE needs to update its memory. 585 * If there is no Pre-association message before Association, NVE 586 needs to do authorization, obtain VM's network properties and 587 policies, and install and activate the properties and policies 588 on NVE. 590 * If there is another successful Association message before this 591 Association, NVE needs to compare the information provided by 592 previous provisioned Association and this Association. If all 593 is the same, NVE do nothing except for update the VM's timer. 594 If there is different in comparision, NVE needs to do some 595 additional work, depends on what information is changed. For 596 example, if policies or VNID is changed, NVE needs to update 597 its memory. 599 o Parameters: The signalling from TES to NVE should at least include 600 the following mandatory parameters. 602 * Operation, i.e. Association. 604 * VMID 606 * VNID(s) 608 * MAC addresses 610 * Policies 612 o Response: After NVE processes Association message, it repond to 613 TES with processing result. The response can be SUCCESS or FAIL 614 with such indicated reasons as FAILED AUTHORIZTION, CONFLICT 615 POLICIES(e.g. the provisioned policies are conflict with other 616 existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE 617 has not enough resources to install the provisioned policies). 619 3.3.3. VM Suspension 621 o Interaction: This event will trigger Hypervisor to compose an 622 Suspension message or an Association message with Suspension 623 indication, and then Hypervisor sends the message to NVE. 624 Suspension must happen after Successful Association. On receiving 625 a Suspension message, NVE inactivate, but not remove, the VM's 626 resources and prepare for the next Resume message. In the state 627 of suspension, NVE acts similar as it in Pre-association state. 628 The FDB can be aged out during VM suspension. 630 o Parameters: The signalling from TES to NVE should at least include 631 the following mandatory parameters. 633 * Operation, i.e. Suspension or an Association message with 634 Suspension indication 636 * VMID 638 o Response: After NVE processes Suspension message, it repond to TES 639 with processing result. The response can be SUCCESS or FAIL . If 640 it's FAIL, it may be because the NVE is too busy to process the 641 message. 643 3.3.4. VM Resume 645 o Interaction: This event will trigger Hypervisor to compose an 646 Resume message or an Association message with Resume indication, 647 and then Hypervisor sends the message to NVE. Resume is supposed 648 to happen after a successful Suspension message, otherwise, it 649 will be responded with a SUCCESS message and NVE will do nothing 650 to the message.. On receiving a Resume message, NVE activates the 651 VM's resources and prepare. 653 o Parameters: The signalling from TES to NVE should at least include 654 the following mandatory parameters. 656 * Operation, i.e. Resume or an Association message with Resume 657 indication 659 * VMID 661 o Response: After NVE processes Resume message, it repond to TES 662 with processing result. The response can be SUCCESS or FAIL. If 663 it's FAIL, it may be because the NVE is too busy to process the 664 message. 666 3.3.5. VM Emigration 668 o Interaction: This event will trigger Hypervisor to compose an 669 Emigration message or an Association message with Emigration 670 indication, and then Hypervisor sends the message to NVE. 671 Emigration can happen after Pre-association, Association, 672 Suspension or Resume. 674 o On receiving VM Emigration message or indication, NVE inactivate 675 VM's resources. But NVE doesn't immediately reomve VM's resources 676 and states, because an emigration maybe fail if the immigration on 677 the remote server or NVE is failed. In that case, the emigrating 678 VM may need to continue its work on the current server. NVE will 679 wait for a next Termination message to remove the VM's resources 680 or states on NVE. 682 o Parameters: The signalling from TES to NVE should at least include 683 the following mandatory parameters. 685 * Operation, i.e. Association. 687 * VMID 689 o Response: After NVE processes VM Emigration, it repond to TES with 690 processing result. The response can be SUCCESS or FAIL. If it's 691 FAIL, it may be because the NVE is too busy to process the 692 message. 694 3.3.6. VM Immigration 696 o Interaction: This event will trigger Hypervisor to compose an 697 Immigration message, or an Pre-association/Association message 698 with Immigration indication, call them immigration(Pre-asso) and 699 Immigration(Asso). NVE's reaction to VM Immigration is silimar to 700 its reaction to Pre-association or Association. If the result of 701 Immigration processing is FAIL, the VM will not migrate to the new 702 location and continue its work on old server. VM Manger may have 703 to find another new location for the VM to migrate to. 705 o To distinguish Immigration from Pre-association and Association is 706 meaningful, [statemigration-framework]shows the problem of VM's 707 flow-coupled state migration in case of VM live migration. The 708 Immigration message can be a indication or trigger for the flow- 709 coupled state migration on middleboxes. 711 o Parameters: The signalling from TES to NVE should at least include 712 the following mandatory parameters. 714 * Operation, i.e. Immigration or an (Pre-)Association message 715 with Immigration indication. 717 * VMID 719 * VNID(s) 721 * MAC addresses 723 * Policies 725 o Response: After NVE processes Immigration message, it repond to 726 TES with processing result. The response can be SUCCESS or FAIL 727 with such indicated reasons as FAILED AUTHORIZTION, CONFLICT 728 POLICIES(e.g. the provisioned policies are conflict with other 729 existed policies on NVE), NON-SUFFICIENT RESOURCES(e.g. the NVE 730 has not enough resources to install the provisioned policies). 732 3.3.7. VM Termination 734 o Interaction: This event will trigger Hypervisor to compose an 735 Termination message. NVE' will release VM's resources on NVE and 736 remove all state about this VM. 738 o Parameters: The signalling from TES to NVE should at least include 739 the following mandatory parameters. 741 * Operation, i.e. Termination 743 * VMID 745 o Response: After NVE processes Termination message, it repond to 746 TES with processing result. The response can be SUCCESS or FAIL. 747 If it's FAIL, it maybe because NVE is too busy to process the 748 Termination message, however the VM can be terminated on the 749 server anyway. 751 3.3.8. Keep-alive 753 This is not a VM lifetime events. Since the resources on NVE is 754 precious, if a associated, pre-associated or suspended VM keeps idle 755 for a pre-defined time, NVE will remove the VM's resources, so that 756 NVE can serve other active VMs. In order to keep VM's resource on 757 NVE, Hypervisor has to create keep-alive message, or an Pre- 758 association/Association message with Keep-alive indication, NVE will 759 update VM's timer upon the Keep-alive message. 761 Parameters: The signalling from TES to NVE should at least include 762 the following mandatory parameters. 764 o Operation, i.e. Keep-alive or an (Pre-)Association message with 765 Keep-alive indication. 767 o VMID 769 3.3.9. NVE Local Changes 771 While VM associate with a VNID on NVE, NVE will generate local 772 significant indicators for the VM and VNIDs, e.g. VID. If the 773 indicators are sent to Hypervisor in previous response, and the 774 indicators change later on, NVE need to create an Associate or a 775 dedicated message with the changed indicators and send to Hypervisor, 776 and Hypervisor will respond with processing result. 778 Note: Although we use the VM Lifetime events names as the names of 779 messages in this section, it does mean that there should be a 780 dedicated message for each event in the future signalling. Some of 781 the events can be carried in one signalled message with different 782 operation type. For example, an Association message with Immigration 783 indication or an Association message with Suspension indication. 785 3.4. Signalling Design Considerations 787 3.4.1. General Requirements 789 3.4.1.1. Basic Requirements 791 REQUIREMENT-1: The TNS (TES to NVE Signalling) MUST support TES to 792 notify NVE about the VM's events, including but not limited to 793 Pre-Association, Association, Emigration, Immigration and 794 Termination. 796 REQUIREMENT-2: The TNS MUST support TES to notify NVE about the VM's 797 VNID, which can be one identifier or a combination of several 798 indentifier. 800 REQUIREMENT-3: The TNS MUST support TES to notify NVE about the VM's 801 address. The address MUST include one or both of MAC address of 802 VM's virtual NIC and VM's IP address. And it SHOULD be 803 extensible to carry new address type. 805 REQUIREMENT-4: The TNS MUST support NVE to notify TES about the VM's 806 local tag. The local Tag type supported by TNP MUST include IEEE 807 802.1Q tag. And it SHOULD be extensible to carry other type of 808 local tag. 810 3.4.1.2. Extension Requirements 812 REQUIREMENT-5: The TNS SHOULD support NVE to notify TES about the 813 VM's traffic PCP value. 815 In typical DC, where physical server connects to adjacent bridge, the 816 data frame from server can be tagged with PCP or untaggged. If a 817 data frame is untagged, it can be tagged with PCP on adjacent bridge. 818 While in virtualized DC, the adjacent bridge is Hypervisor. There 819 are two options to deal with PCP tag, 1) data frame is tagged with 820 PCP by VM, 2)data frame is tagged with PCP by Hypervisor and 3) data 821 frame is tagged with PCP by NVE. 823 In cloud service, the VM can be anybody and it may want a higher 824 priority than it should have. The VM can tag it's data frame with 825 higher PCP value and get better service. Based on the assumption 826 that PCP provided by VM is not reliable, it's more reasonable to let 827 the network to define the PCP value based on VM's priority, and 828 enable bridges to tag the PCP value, as 2) or 3). 830 This problem is similar to local VID, which can be tagged either by 831 Hypervisor or by NVE. The benefit to tag PCP by Hypervisor is to 832 reduce the load on NVE. 834 3.4.2. Consideration 836 To be added. 838 3.4.3. Signalling States Machine 840 The interaction should be stateful. Both Hypervisor and NVE need to 841 record the state of their signalling state. The main states are Pre- 842 association, Association, Suspension, and Termination. The following 843 diagram shows a the state machine of TES to NVE signalling. Only 844 reasonable situations are listed in the diagram. In the future, more 845 situation will be added to the state machine. 847 |------------------->/```\----------------------| 848 | \~~~/ | 849 | |Pre-Asso | 850 | |or | 851 | |Immigration(Pre-Asso) | 852 /~~~~~~~~~~~\ Aged out v | 853 |Termination|<----| /~~~~~~~~~~~~~~~~\ Asso 854 \~~~~~~~~~~~/<-\ ---|Pre-Association | or 855 ^ \ \~~~~~~~~~~~~~~~~/ Immigration(Asso) 856 | \ | | 857 Aged out Aged out |Asso | 858 or or |or | 859 Termination Termination |Immigration(Asso) | 860 | \----| v | 861 /~~~~~~~~~~~\Suspension/~~~~~~~~~~~~~\ | 862 |Suspension |<---------| Association |<----------------| 863 \~~~~~~~~~~~/--------->\~~~~~~~~~~~~~/ 864 Resume / ^ 865 / \ 866 /~~~\ | | 867 \~~~/ States |-Emigration-| 868 or 869 Immigration(Asso) 870 ------ Message 872 Figure 5: TES to NVE signalling State Machine 874 4. Security Considerations 876 There are some considerations on security in [overlay-cp]. Most of 877 the considerations are about mechanism between NVE and external 878 controller, and the attack on underlying networks, which can not be 879 resolved only by the mechanism between TES and NVE. One security 880 issue related to the mechanism between TES and NVE is about the 881 authentication of VM who announces to associate with a particular VN. 882 There is a hypervisor between VMs and NVEs, and both VMs and 883 hypervisor are not always reliable. For example, a poisoned 884 hypervisor may modify the VN Name, or identification for similar 885 intention, in order to associate with a VN that it doesn't belong to. 887 5. Appendix 1: Mechanism Analysis 889 5.1. IEEE 802.1Qbg 890 5.1.1. Brief Introduction 892 VDP has four basic TLV types. 894 o Pre-Associate: Pre-Associate is used to pre-associate a VSI 895 instance with a bridge port. The bridge validates the request and 896 returns a failure Status in case of errors. Successful pre- 897 association does not imply that the indicated VSI Type will be 898 applied to any traffic flowing through the VSI. The pre-associate 899 enables faster response to an associate, by allowing the bridge to 900 obtain the VSI Type prior to an association. 902 o Pre-Associate with resource reservation: Pre-Associate with 903 Resource Reservation involves the same steps as Pre-Associate, but 904 on successful pre-association also reserves resources in the 905 Bridge to prepare for a subsequent Associate request. 907 o Associate: The Associate TLV Type creates and activates an 908 association between a VSI instance and a bridge port. The Bridge 909 allocates any required bridge resources for the referenced VSI. 910 The Bridge activates the configuration for the VSI Type ID. This 911 association is then applied to the traffic flow to/from the VSI 912 instance. 914 o Deassociate: The de-associate TLV Type is used to remove an 915 association between a VSI instance and a bridge port. Pre- 916 Associated and Associated VSIs can be de-associated. De-associate 917 releases any resources that were reserved as a result of prior 918 Associate or Pre-Associate operations for that VSI instance. 920 |1 |2 |3 |4 |7 |8 |9 |25 |26 |25+M 921 |---------+--------+--------+--------+--------+------+-------+-----------+------------| 922 |TLV type|TLV info | Status |VSI Type|VSI Type|VSIID |VSIID |Filter Info|Filter Infor| 923 |(7bits) |strlength|(1octet)| ID |version |format|(16oct)| format | (M octets) | 924 | | (9bits) | |(3oct) |(1oct) |(1oct)| | (1 octet)| | 925 |--------+---------+--------+--------+--------+------+-------+-----------+------------| 926 | |<-------VSI type&instance------>|<-------Filter----------| 927 | |<--------------------VSI attibutes---------------------->| 928 |<----TLV header--><--------------TLV information string = 23+Moctets---------------->| 930 Figure 6: VDP TLV definitions 932 Some important flag values in VDP request: 934 o M-bit (Bit 5): Indicates that the user of the VSI (e.g., the VM) 935 is migrating (M-bit = 1) or provides no guidance on the migration 936 of the user of the VSI (M-bit = 0). The M-bit is used as an 937 indicator relative to the VSI that the user is migrating to. 939 o S-bit (Bit 6): Indicates that the VSI user (e.g., the VM) is 940 suspended (S-bit = 1) or provides no guidance as to whether the 941 user of the VSI is suspended (S-bit = 0). A keep-alive Associate 942 request with S-bit = 1 can be sent when the VSI user is suspended. 943 The S-bit is used as an indicator relative to the VSI that the 944 user is migrating from. 946 The filter information field supports the following format: 948 o VID 949 +---------+------+-------+--------+ 950 | #of | PS | PCP | VID | 951 |entries |(1bit)|(3bits)|(12bits)| 952 |(2octets)| | | | 953 +---------+------+-------+--------+ 954 |<--Repeated per entry->| 956 Figure 7 958 o MAC/VID 959 +---------+--------------+------+-------+--------+ 960 | #of | MAC address | PS | PCP | VID | 961 |entries | (6 octets) |(1bit)|(3bits)|(12bits)| 962 |(2octets)| | | | | 963 +---------+--------------+------+-------+--------+ 964 |<--------Repeated per entry---------->| 966 Figure 8 968 o GroupID/VID 969 +---------+--------------+------+-------+--------+ 970 | #of | GroupID | PS | PCP | VID | 971 |entries | (4 octets) |(1bit)|(3bits)|(12bits)| 972 |(2octets)| | | | | 973 +---------+--------------+------+-------+--------+ 974 |<--------Repeated per entry---------->| 976 Figure 9 978 o GroupID/MAC/VID 979 +---------+-----------+-------------+------+-------+--------+ 980 | #of | GroupID | MAC address | PS | PCP | VID | 981 |entries |(4 octets) | (6 octets) |(1bit)|(3bits)|(12bits)| 982 |(2octets)| | | | | | 983 +---------+-----------+-------------+------+-------+--------+ 984 |<--------------Repeated per entry--------------->| 986 Figure 10 988 In each format, the null VID can be used in the VDP Request. In this 989 case, the Bridge is expected to supply the corresponding local VID 990 value in the VDP Response. 992 The VSIID in VDP request that identify a VM can be one of the 993 following format: IPV4 address, IPV6 address, MAC address, UUID or 994 locally defined. 996 +--------------------------------------------------+----------------+ 997 | VDP features | Requirements | 998 | | Matching | 999 +--------------------------------------------------+----------------+ 1000 | Pre-Associate/ Pre-Associate with resource | Requirement-1 | 1001 | reservation/ Associate/ Deassociate | | 1002 | M-bit/S-bit | Requirement-1 | 1003 | VSI type&instance in VDP request | Requirement-2 | 1004 | Filter Infor | Requirement-3 | 1005 | VID infor in VDP response | Requirement-4 | 1006 | PCP in VDP response | Requirement-5 | 1007 +--------------------------------------------------+----------------+ 1009 VDP TLV types 1011 5.2. BGP 1013 gives a brief analysis on how BGP can be reused for TES and NVE 1014 signalling. Please refer to it for more information. [server2nve] 1016 5.3. External Controller 1018 6. References 1020 6.1. Normative Reference 1022 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1023 Requirement Levels", March 1997. 1025 [Qbg] "IEEE P802.1Qbg Edge Virtual Bridging". 1027 6.2. Informative Reference 1029 [framework] 1030 Marc Lasserre, Marc., Balus, Florin., Morin, Thomas., 1031 Bitar, Nabil., and Yakov. Rekhter, 1032 "draft-ietf-nvo3-framework-00", September 2012. 1034 [overlay-cp] 1035 Kreeger, L., Dutt, D., Narten, T., Black, D., and M. 1036 Sridharan, "draft-kreeger-nvo3-overlay-cp-00", Jan 2012. 1038 [server2nve] 1039 Kompella, K., 1040 "draft-dunbar-nvo3-overlay-mobility-issues-00", July 2012. 1042 [statemigration-framework] 1043 Gu, Y., Shore, M., and S. Sivakumar, "A Framework and 1044 Problem Statement for Flow-associated Middlebox State 1045 Migration", October 2012. 1047 Authors' Addresses 1049 Gu Yingjie 1050 Huawei 1051 No. 101 Software Avenue 1052 Nanjing, Jiangsu Province 210001 1053 P.R.China 1055 Phone: +86-25-56625392 1056 Email: guyingjie@huawei.com 1058 Yizhou Li 1059 Huawei 1060 No. 101 Software Avenue 1061 Nanjing, Jiangsu Province 210001 1062 P.R.China 1064 Phone: 1065 Email: liyizhou@huawei.com