idnits 2.17.1 draft-ietf-nvo3-mcast-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 10, 2016) is 2997 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'VXLAN' is mentioned on line 177, but not defined == Missing Reference: 'FW' is mentioned on line 195, but not defined == Missing Reference: 'BIER-ARCH' is mentioned on line 422, but not defined == Unused Reference: 'RFC7348' is defined on line 535, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 working group A. Ghanwani 2 Internet Draft Dell 3 Intended status: Informational L. Dunbar 4 Expires: August 9, 2016 M. McBride 5 Huawei 6 V. Bannai 7 Google 8 R. Krishnan 9 Dell 11 February 10, 2016 13 A Framework for Multicast in NVO3 14 draft-ietf-nvo3-mcast-framework-02 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. This document may not be modified, 23 and derivative works of it may not be created, except to publish it 24 as an RFC and to translate it into languages other than English. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other documents 33 at any time. It is inappropriate to use Internet-Drafts as 34 reference material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 This Internet-Draft will expire on August 9 2016. 44 Copyright Notice 46 Copyright (c) 2016 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described in 56 Section 4.e of the Trust Legal Provisions and are provided without 57 warranty as described in the Simplified BSD License. 59 Abstract 61 This document discusses a framework of supporting multicast traffic 62 in a network that uses Network Virtualization Overlays over Layer 3 63 (NVO3). Both infrastructure multicast and application-specific 64 multicast are discussed. It describes the various mechanisms that 65 can be used for delivering such traffic as well as the data plane 66 and control plane considerations for each of the mechanisms. 68 Table of Contents 70 1. Introduction...................................................3 71 1.1. Infrastructure multicast..................................3 72 1.2. Application-specific multicast............................3 73 1.3. Terminology clarification.................................4 74 2. Acronyms.......................................................4 75 3. Multicast mechanisms in networks that use NVO3.................5 76 3.1. No multicast support......................................5 77 3.2. Replication at the source NVE.............................6 78 3.3. Replication at a multicast service node...................8 79 3.4. IP multicast in the underlay..............................9 80 3.5. Other schemes............................................11 81 4. Simultaneous use of more than one mechanism...................11 82 5. Other issues..................................................11 83 5.1. Multicast-agnostic NVEs..................................11 84 5.2. Multicast membership management for DC with VMs..........12 85 6. Summary.......................................................12 86 7. Security Considerations.......................................13 87 8. IANA Considerations...........................................13 88 9. References....................................................13 89 9.1. Normative References.....................................13 90 9.2. Informative References...................................13 91 10. Acknowledgments..............................................14 93 1. Introduction 95 Network virtualization using Overlays over Layer 3 (NVO3) is a 96 technology that is used to address issues that arise in building 97 large, multitenant data centers that make extensive use of server 98 virtualization [RFC7364]. 100 This document provides a framework for supporting multicast traffic, 101 in a network that uses Network Virtualization using Overlays over 102 Layer 3 (NVO3). Both infrastructure multicast (ARP/ND, DHCP, mDNS, 103 etc.) and application-specific multicast are considered. It 104 describes the various mechanisms and considerations that can be used 105 for delivering such traffic in networks that use NVO3. 107 The reader is assumed to be familiar with the terminology as defined 108 in the NVO3 Framework document [RFC7365] and NVO3 Architecture 109 document [NVO3-ARCH]. 111 1.1. Infrastructure multicast 113 Infrastructure multicast includes protocols such as ARP/ND, DHCP, 114 and mDNS. It is possible to provide solutions for these that do not 115 involve multicast in the underlay network. In the case of ARP/ND, 116 an NVA can be used for distributing the mappings of IP address to 117 MAC address to all NVEs, and the NVEs can respond to ARP messages 118 from the TSs that are attached to it in a way that is similar to 119 proxy-ARP. In the case of DHCP, the NVE can be configured to 120 forward these messages using a helper function. 122 Of course it is possible to support all of these infrastructure 123 multicast protocols natively if the underlay provides multicast 124 transport. However, even in the presence of multicast transport, it 125 may be beneficial to use the optimizations mentioned above to reduce 126 the amount of such traffic in the network. 128 1.2. Application-specific multicast 130 Application-specific multicast traffic, which may be either Source- 131 Specific Multicast (SSM) or Any-Source Multicast (ASM)[RFC3569], 132 has the following characteristics: 134 1. Receiver hosts are expected to subscribe to multicast content 135 using protocols such as IGMP [RFC3376] (IPv4) or MLD (IPv6). 136 Multicast sources and listeners participant in these protocols 137 using addresses that are in the Tenant System address domain. 139 2. The list of multicast listeners for each multicast group is not 140 known in advance. Therefore, it may not be possible for an NVA 141 to get the list of participants for each multicast group ahead 142 of time. 144 1.3. Terminology clarification 146 In this document, the terms host, tenant system (TS) and virtual 147 machine (VM) are used interchangeably to represent an end station 148 that originates or consumes data packets. 150 2. Acronyms 152 ASM: Any-Source Multicast 154 LISP: Locator/ID Separation Protocol 156 MSN: Multicast Service Node 158 NVA: Network Virtualization Authority 160 NVE: Network Virtualization Edge 162 NVGRE: Network Virtualization using GRE 164 SSM: Source-Specific Multicast 166 STT: Stateless Tunnel Transport 168 TS: Tenant system 170 VM: Virtual Machine 172 VXLAN: Virtual eXtensible LAN 174 3. Multicast mechanisms in networks that use NVO3 176 In NVO3 environments, traffic between NVEs is transported using an 177 encapsulation such as VXLAN [VXLAN], NVGRE [RFC7637], STT [STT], 178 etc. 180 Besides the need to support the Address Resolution Protocol (ARP) 181 and Neighbor Discovery (ND), there are several applications that 182 require the support of multicast and/or broadcast in data centers 183 [DC-MC]. With NVO3, there are many possible ways that multicast may 184 be handled in such networks. We discuss some of the attributes of 185 the following four methods: 187 1. No multicast support. 189 2. Replication at the source NVE. 191 3. Replication at a multicast service node. 193 4. IP multicast in the underlay. 195 These mechanisms are briefly mentioned in the NVO3 Framework [FW] 196 and NVO3 architecture [NVO3-ARCH] document. This document attempts 197 to provide more details about the basic mechanisms underlying each 198 of these mechanisms and discusses the issues and tradeoffs of each. 200 We note that other methods are also possible, such as [EDGE-REP], 201 but we focus on the above four because they are the most common. 203 3.1. No multicast support 205 In this scenario, there is no support whatsoever for multicast 206 traffic when using the overlay. This method can only work if the 207 following conditions are met: 209 1. All of the application traffic in the network is unicast 210 traffic and the only multicast/broadcast traffic is from ARP/ND 211 protocols. 213 2. A network virtualization authority (NVA) is used by the NVEs to 214 determine the mapping of a given Tenant System's MAC/IP address 215 to its NVE. In other words, there is no data plane learning. 216 Address resolution requests via ARP/ND that are issued by the 217 Tenant Systems must be resolved by the NVE that they are 218 attached to. 220 With this approach, it is not possible to support application- 221 specific multicast. However, certain multicast/broadcast 222 applications such as DHCP can be supported by use of a helper 223 function in the NVE. 225 The main drawback of this approach, even for unicast traffic, is 226 that it is not possible to initiate communication with a Tenant 227 System for which a mapping to an NVE does not already exist with the 228 NVA. This is a problem in the case where the NVE is implemented in 229 a physical switch and the Tenant System is a physical end station 230 that has not registered with the NVA. 232 3.2. Replication at the source NVE 234 With this method, the overlay attempts to provide a multicast 235 service without requiring any specific support from the underlay, 236 other than that of a unicast service. A multicast or broadcast 237 transmission is achieved by replicating the packet at the source 238 NVE, and making copies, one for each destination NVE that the 239 multicast packet must be sent to. 241 For this mechanism to work, the source NVE must know, a priori, the 242 IP addresses of all destination NVEs that need to receive the 243 packet. For the purpose of ARP/ND, this would involve knowing the 244 IP addresses of all the NVEs that have Tenant Systems in the virtual 245 network instance (VNI) of the Tenant System that generated the 246 request. For the support of application-specific multicast traffic, 247 a method similar to that of receiver-sites registration for a 248 particular multicast group described in [LISP-Signal-Free] can be 249 used. The registrations from different receiver-sites can be merged 250 at the NVA, which can construct a multicast replication-list 251 inclusive of all NVEs to which receivers for a particular multicast 252 group are attached. The replication-list for each specific multicast 253 group is maintained by the NVA. 255 The receiver-sites registration is achieved by egress NVEs 256 performing the IGMP/MLD snooping to maintain state for which 257 attached Tenant Systems have subscribed to a given IP multicast 258 group. When the members of a multicast group are outside the NVO3 259 domain, it is necessary for NVO3 gateways to keep track of the 260 remote members of each multicast group. The NVEs and NVO3 gateways 261 then communicate the multicast groups that are of interest to the 262 NVA. If the membership is not communicated to the NVA, and if it is 263 necessary to prevent hosts attached to an NVE that have not 264 subscribed to a multicast group from receiving the multicast 265 traffic, the NVE would need to maintain multicast group membership 266 information. 268 In multi-homing environments, i.e. more than one NVE can reach a 269 specific TS, the NVA would be expected to provide all the NVEs that 270 can reach the given TS. The ingress NVE can choose any one of the 271 egress NVEs for the data frames destined towards the TS. 273 In the absence of IGMP/MLD snooping, the traffic would be delivered 274 to all hosts that are part of the VNI. 276 This method requires multiple copies of the same packet to all NVEs 277 that participate in the VN. If, for example, a tenant subnet is 278 spread across 50 NVEs, the packet would have to be replicated 50 279 times at the source NVE. This also creates an issue with the 280 forwarding performance of the NVE. 282 Note that this method is similar to what was used in VPLS [RFC4762] 283 prior to support of MPLS multicast [RFC7117]. While there are some 284 similarities between MPLS VPN and the NVO3 overlay, there are some 285 key differences: 287 - The CE-to-PE attachment in VPNs is somewhat static, whereas in a 288 DC that allows VMs to migrate anywhere, the TS attachment to NVE 289 is much more dynamic. 291 - The number of PEs to which a single VPN customer is attached in 292 an MPLS VPN environment is normally far less than the number of 293 NVEs to which a VNI's VMs are attached in a DC. 295 When a VPN customer has multiple multicast groups, [RFC6513] 296 "Multicast VPN" combines all those multicast groups within each 297 VPN client to one single multicast group in the MPLS (or VPN) 298 core. The result is that messages from any of the multicast 299 groups belonging to one VPN customer will reach all the PE nodes 300 of the client. In other words, any messages belonging to any 301 multicast groups under customer X will reach all PEs of the 302 customer X. When the customer X is attached to only a handful of 303 PEs, the use of this approach does not result in excessive wastage 304 of bandwidth in the provider's network. 306 In a DC environment, a typical server/hypervisor based virtual 307 switch may only support 10's VMs (as of this writing). A subnet 308 with N VMs may be, in the worst case, spread across N vSwitches. 309 Using "MPLS VPN multicast" approach in such a scenario would 310 require the creation of a Multicast group in the core for this VNI 311 to reach all N NVEs. If only small percentage of this client's VMs 312 participate in application specific multicast, a great number of 313 NVEs will receive multicast traffic that is not forwarded to any 314 of their attached VMs, resulting in considerable wastage of 315 bandwidth. 317 Therefore, the Multicast VPN solution may not scale in DC 318 environment with dynamic attachment of Virtual Networks to NVEs and 319 greater number of NVEs for each virtual network. 321 3.3. Replication at a multicast service node 323 With this method, all multicast packets would be sent using a 324 unicast tunnel encapsulation from the ingress NVE to a multicast 325 service node (MSN). The MSN, in turn, would create multiple copies 326 of the packet and would deliver a copy, using a unicast tunnel 327 encapsulation, to each of the NVEs that are part of the multicast 328 group for which the packet is intended. 330 This mechanism is similar to that used by the ATM Forum's LAN 331 Emulation [LANE] specification [LANE]. 333 The following are the possible ways for the MSN to get the 334 membership information for each multicast group: 336 - The MSN can obtain this information by snooping the IGMP/MLD 337 messages from the Tenant Systems and/or sending query messages to 338 the Tenant Systems. In order for MSN to snoop the IGMP/MLD 339 messages between TSs and their corresponding routers, the NVEs 340 that TSs are attached have to encapsulate a special outer header, 341 e.g. outer destination being the multicast server node. See 342 Section 3.3.2 for detail. 344 - The MSN can obtain the membership information from the NVEs that 345 snoop the IGMP/MLD messages. This can be done by having the MSN 346 communicate with the NVEs, or by having the NVA obtain the 347 information from the NVEs, and in turn have MSN communicate with 348 the NVA. 350 Unlike the method described in Section 3.2, there is no performance 351 impact at the ingress NVE, nor are there any issues with multiple 352 copies of the same packet from the source NVE to the multicast 353 service node. However there remain issues with multiple copies of 354 the same packet on links that are common to the paths from the MSN 355 to each of the egress NVEs. Additional issues that are introduced 356 with this method include the availability of the MSN, methods to 357 scale the services offered by the MSN, and the sub-optimality of the 358 delivery paths. 360 Finally, the IP address of the source NVE must be preserved in 361 packet copies created at the multicast service node if data plane 362 learning is in use. This could create problems if IP source address 363 reverse path forwarding (RPF) checks are in use. 365 3.4. IP multicast in the underlay 367 In this method, the underlay supports IP multicast and the ingress 368 NVE encapsulates the packet with the appropriate IP multicast 369 address in the tunnel encapsulation header for delivery to the 370 desired set of NVEs. The protocol in the underlay could be any 371 variant of Protocol Independent Multicast (PIM), or protocol 372 dependent multicast, such as [ISIS-Multicast]. 374 If an NVE connects to its attached TSs via Layer 2 network, there 375 are multiple ways for NVEs to support the application specific 376 multicast: 378 - The NVE only supports the basic IGMP/MLD snooping function, let 379 the TSs routers handling the application specific multicast. This 380 scheme doesn't utilize the underlay IP multicast protocols. 382 - The NVE can act as a pseudo multicast router for the directly 383 attached VMs and support proper mapping of IGMP/MLD's messages to 384 the messages needed by the underlay IP multicast protocols. 386 With this method, there are none of the issues with the methods 387 described in Sections 3.2. 389 With PIM Sparse Mode (PIM-SM), the number of flows required would be 390 (n*g), where n is the number of source NVEs that source packets for 391 the group, and g is the number of groups. Bidirectional PIM (BIDIR- 392 PIM) would offer better scalability with the number of flows 393 required being g. 395 In the absence of any additional mechanism, e.g. using an NVA for 396 address resolution, for optimal delivery, there would have to be a 397 separate group for each tenant, plus a separate group for each 398 multicast address (used for multicast applications) within a tenant. 400 Additional considerations are that only the lower 23 bits of the IP 401 address (regardless of whether IPv4 or IPv6 is in use) are mapped to 402 the outer MAC address, and if there is equipment that prunes 403 multicasts at Layer 2, there will be some aliasing. Finally, a 404 mechanism to efficiently provision such addresses for each group 405 would be required. 407 There are additional optimizations which are possible, but they come 408 with their own restrictions. For example, a set of tenants may be 409 restricted to some subset of NVEs and they could all share the same 410 outer IP multicast group address. This however introduces a problem 411 of sub-optimal delivery (even if a particular tenant within the 412 group of tenants doesn't have a presence on one of the NVEs which 413 another one does, the former's multicast packets would still be 414 delivered to that NVE). It also introduces an additional network 415 management burden to optimize which tenants should be part of the 416 same tenant group (based on the NVEs they share), which somewhat 417 dilutes the value proposition of NVO3 which is to completely 418 decouple the overlay and physical network design allowing complete 419 freedom of placement of VMs anywhere within the data center. 421 Multicast schemes such as BIER (Bit Indexed Explicit Replication) 422 [BIER-ARCH] may be able to provide optimizations by allowing the 423 underlay network to provide optimum multicast delivery without 424 requiring routers in the core of the network to main per-multicast 425 group state. 427 3.5. Other schemes 429 There are still other mechanisms that may be used that attempt to 430 combine some of the advantages of the above methods by offering 431 multiple replication points, each with a limited degree of 432 replication [EDGE-REP]. Such schemes offer a trade-off between the 433 amount of replication at an intermediate node (router) versus 434 performing all of the replication at the source NVE or all of the 435 replication at a multicast service node. 437 4. Simultaneous use of more than one mechanism 439 While the mechanisms discussed in the previous section have been 440 discussed individually, it is possible for implementations to rely 441 on more than one of these. For example, the method of Section 3.1 442 could be used for minimizing ARP/ND, while at the same time, 443 multicast applications may be supported by one, or a combination of, 444 the other methods. For small multicast groups, the methods of 445 source NVE replication or the use of a multicast service node may be 446 attractive, while for larger multicast groups, the use of multicast 447 in the underlay may be preferable. 449 5. Other issues 451 5.1. Multicast-agnostic NVEs 453 Some hypervisor-based NVEs do not process or recognize IGMP/MLD 454 frames; i.e. those NVEs simply encapsulate the IGMP/MLD messages in 455 the same way as they do for regular data frames. 457 By default, TSs router periodically sends IGMP/MLD query messages to 458 all the hosts in the subnet to trigger the hosts that are interested 459 in the multicast stream to send back IGMP/MLD reports. In order for 460 the MSN to get the updated multicast group information, the MSN can 461 also send the IGMP/MLD query message comprising a client specific 462 multicast address, encapsulated in an overlay header to all the NVEs 463 to which the TSs in the VN are attached. 465 However, the MSN may not always be aware of the client specific 466 multicast addresses. In order to perform multicast filtering, the 467 MSN has to snoop the IGMP/MLD messages between TSs and their 468 corresponding routers to maintain the multicast membership. In order 469 for the MSN to snoop the IGMP/MLD messages between TSs and their 470 router, the NVA needs to configure the NVE to send copies of the 471 IGMP/MLD messages to the MSN in addition to the default behavior of 472 sending them to the TSs' routers; e.g. the NVA has to inform the 473 NVEs to encapsulate data frames with DA being 224.0.0.2 (destination 474 address of IGMP report) to TSs' router and MSN. 476 This process is similar to "Source Replication" described in Section 477 3.2, except the NVEs only replicate the message to TSs' router and 478 MSN. 480 5.2. Multicast membership management for DC with VMs 482 For data centers with virtualized servers, VMs can be added, deleted 483 or moved very easily. When VMs are added, deleted or moved, the NVEs 484 to which the VMs are attached are changed. 486 When a VM is deleted from an NVE or a new VM is added to an NVE, the 487 VM management system should notify the MSN to send the IGMP/MLD 488 query messages to the relevant NVEs, so that the multicast 489 membership can be updated promptly. Otherwise, if there are changes 490 of VMs attachment to NVEs, then for the duration of the configured 491 default time interval that the TSs routers use for IGMP/MLD queries, 492 multicast data may not reach the VM(s) that moved. 494 6. Summary 496 This document has identified various mechanisms for supporting 497 application specific multicast in networks that use NVO3. It 498 highlights the basics of each mechanism and some of the issues with 499 them. As solutions are developed, the protocols would need to 500 consider the use of these mechanisms and co-existence may be a 501 consideration. It also highlights some of the requirements for 502 supporting multicast applications in an NVO3 network. 504 7. Security Considerations 506 This draft does not introduce any new security considerations beyond 507 what may be present in proposed solutions 509 8. IANA Considerations 511 This document requires no IANA actions. RFC Editor: Please remove 512 this section before publication. 514 9. References 516 9.1. Normative References 518 [RFC7365] Lasserre, M. et al., "Framework for data center (DC) 519 network virtualization", October 2014. 521 [RFC7364] Narten, T. et al., "Problem statement: Overlays for 522 network virtualization", October 2014. 524 [NVO3-ARCH] Narten, T. et al.," An Architecture for Overlay Networks 525 (NVO3)", work in progress, February 2014. 527 [RFC3376] Cain B. et al., "Internet Group Management Protocol, 528 Version 3", October 2002. 530 [RFC6513] Rosen, E. et al., "Multicast in MPLS/BGP IP VPNs", 531 February 2012. 533 9.2. Informative References 535 [RFC7348] Mahalingam, M. et al., " Virtual eXtensible Local Area 536 Network (VXLAN): A Framework for Overlaying Virtualized 537 Layer 2 Networks over Layer 3 Networks", August 2014. 539 [RFC7637] Garg P. and Wang, Y. (Eds.), "NVGRE: Network 540 Virtualization using Generic Routing Encapsulation", 541 September 2015. 543 [STT] Davie, B. and Gross, J., "A stateless transport tunneling 544 protocol for network virtualization," work in progress. 546 [DC-MC] McBride M., and Lui, H., "Multicast in the data center 547 overview," work in progress. 549 [ISIS-Multicast] 550 L. Yong, et al., "ISIS Protocol Extension for Building 551 Distribution Trees", work in progress. 553 [RFC4762] Lasserre, M., and Kompella, V. (Eds.), "Virtual Private 554 LAN Service (VPLS) using Label Distribution Protocol (LDP) 555 signaling," January 2007. 557 [RFC7117] Aggarwal, R. et al., "Multicast in VPLS," February 2014. 559 [LANE] "LAN emulation over ATM," The ATM Forum, af-lane-0021.000, 560 January 1995. 562 [EDGE-REP] 563 Marques P. et al., "Edge multicast replication for BGP IP 564 VPNs," work in progress.. 566 [RFC3569] S. Bhattacharyya, Ed., "An Overview of Source-Specific 567 Multicast (SSM)", July 2003. 569 [LISP-Signal-Free] 570 Moreno, V. and Farinacci, D., "Signal-Free LISP 571 Multicast", work in progress. 573 10. Acknowledgments 575 Thanks are due to Dino Farinacci, Erik Nordmark, Lucy Yong, and 576 Nicolas Bouliane, for their comments and suggestions. 578 This document was prepared using 2-Word-v2.0.template.dot. 580 Authors' Addresses 582 Anoop Ghanwani 583 Dell 584 Email: anoop@alumni.duke.edu 586 Linda Dunbar 587 Huawei Technologies 588 5340 Legacy Drive, Suite 1750 589 Plano, TX 75024, USA 590 Phone: (469) 277 5840 591 Email: ldunbar@huawei.com 593 Mike McBride 594 Huawei Technologies 595 mmcbride7@gmail.com 597 Vinay Bannai 598 Google 599 Email: vbannai@gmail.com 601 Ram Krishnan 602 Dell 603 Email: ramkri123@gmail.com