idnits 2.17.1 draft-ghanwani-nvo3-mcast-framework-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 9, 2015) is 3308 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'PS' -- Possible downref: Non-RFC (?) normative reference: ref. 'FW' -- Possible downref: Non-RFC (?) normative reference: ref. 'NVO3-ARCH' Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 working group A. Ghanwani 2 Internet Draft Dell 3 Intended status: Standards Track L. Dunbar 4 Expires: September 8, 2015 Huawei 5 M. McBride 6 Ericsson 7 V. Bannai 8 Paypal 9 R. Krishnan 10 Brocade 11 March 9, 2015 13 A Framework for Multicast in NVO3 14 draft-ghanwani-nvo3-mcast-framework-00 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. This document may not be modified, 23 and derivative works of it may not be created, except to publish it 24 as an RFC and to translate it into languages other than English. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other documents 33 at any time. It is inappropriate to use Internet-Drafts as 34 reference material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 This Internet-Draft will expire on July9,2015. 44 Copyright Notice 46 Copyright (c) 2015 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described in 56 Section 4.e of the Trust Legal Provisions and are provided without 57 warranty as described in the Simplified BSD License. 59 Abstract 61 This document discusses a framework of supporting multicast traffic, 62 , in a network that uses Network Virtualization using Overlays over 63 Layer 3 (NVO3). Both infrastructure multicast and application- 64 specific multicast are discussed. It describes the various 65 mechanisms and considerations that can be used for delivering such 66 traffic as well as the data plane and control plane considerations 67 for each of the mechanisms. 69 Table of Contents 71 1. Introduction...................................................3 72 1.1. Infrastructure multicast..................................3 73 1.2. Application-specific multicast............................3 74 2. Acronyms.......................................................4 75 3. Multicast mechanisms in networks that use NVO3.................4 76 3.1. No multicast support......................................5 77 3.2. Replication at the source NVE.............................6 78 3.3. Replication at a multicast service node...................8 79 3.4. IP multicast in the underlay..............................9 80 3.5. Other schemes............................................10 81 4. Simultaneous use of more than one mechanism...................11 82 5. Other issues..................................................11 83 5.1. Multicast-agnostic NVEs..................................11 84 5.2. Multicast membership management for DC with VMs..........12 85 6. Summary.......................................................12 86 7. Security Considerations.......................................12 87 8. IANA Considerations...........................................12 88 9. References....................................................13 89 9.1. Normative References.....................................13 90 9.2. Informative References...................................13 91 10. Acknowledgments..............................................14 93 1. Introduction 95 Network virtualization using Overlays over Layer 3 (NVO3) is a 96 technology that is used to address issues that arise in building 97 large, multitenant data centers that make extensive use of server 98 virtualization [PS]. 100 This document provides a framework for supporting multicast traffic, 101 in a network that uses Network Virtualization using Overlays over 102 Layer 3 (NVO3). Both infrastructure multicast (ARP/ND, DHCP, mDNS, 103 etc.) and application-specific multicast are considered. It 104 describes the various mechanisms and considerations that can be used 105 for delivering such traffic in networks that use NVO3. 107 The reader is assumed to be familiar with the terminology as defined 108 in the NVO3 Framework document [FW] and NVO3 Architecture document 109 [NVO3-ARCH]. 111 1.1. Infrastructure multicast 113 Infrastructure multicast includes protocols such as ARP/ND, DHCP, 114 and mDNS. It is possible to provide solutions for these that do not 115 involve multicast in the underlay network. In the case of ARP/ND, 116 an NVA can be used for distributing the mappings of IP address to 117 MAC address to all NVEs, and the NVEs can respond to ARP messages 118 from the TSs that are attached to it in a way that is similar to 119 proxy-ARP. In the case of DHCP, the NVE can be configured to 120 forward these messages using a helper function. 122 Of course it is possible to support all of these infrastructure 123 multicast protocols natively if the underlay provides multicast 124 transport. However, even in the presence of multicast transport, it 125 may be beneficial to use the optimizations mentioned above to reduce 126 the amount of such traffic in the network. 128 1.2. Application-specific multicast 130 Application-specific multicast traffic, which may be either Source- 131 Specific Multicast (SSM) or Any-Source Multicast (ASM)[RFC 3569], 132 has the following characteristics: 134 1. Receiver hosts are expected to subscribe to multicast content 135 using protocols such as IGMP [RFC3376] (IPv4) or MLD (IPv6). 136 Multicast sources and listeners participant in these protocols 137 using addresses that are in the Tenant System address domain. 139 2. The list of multicast listeners for each multicast group is not 140 known in advance. Therefore, it may not be possible for an NVA 141 to get the list of participants for each multicast group ahead 142 of time. 144 2. Acronyms 146 ASM: Any-Source Multicast 148 LISP: Locator/ID Separation Protocol 150 NVA: Network Virtualization Authority 152 NVE: Network Virtualization Edge 154 NVGRE: Network Virtualization using GRE 156 SSM: Source-Specific Multicast 158 STT: Stateless Tunnel Transport 160 VXLAN: Virtual eXtensible LAN 162 3. Multicast mechanisms in networks that use NVO3 164 In NVO3 environments, traffic between NVEs is transported using an 165 encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT [STT], etc. 167 Besides the need to support the Address Resolution Protocol (ARP) 168 and Neighbor Discovery (ND), there are several applications that 169 require the support of multicast and/or broadcast in data centers 170 [DC-MC]. With NVO3, there are many possible ways that multicast may 171 be handled in such networks. We discuss some of the attributes of 172 the following four methods: 1. No multicast support. 174 2. Replication at the source NVE. 176 3. Replication at a multicast service node. 178 4. IP multicast in the underlay. 180 These mechanisms are briefly mentioned in the NVO3 Framework [FW] 181 and NVO3 architecture [NVO3-ARCH] document. This document attempts 182 to provide more details about the basic mechanisms underlying each 183 of these mechanisms and discusses the issues and tradeoffs of each. 185 We note that other methods are also possible, such as [EDGE-REP], 186 but we focus on the above four because they are the most common. 188 3.1. No multicast support 190 In this scenario, there is no support whatsoever for multicast 191 traffic when using the overlay. This can only work if the following 192 conditions are met: 194 1. All of the traffic is unicast. traffic in the network and the 195 only multicast/broadcast traffic is from ARP/ND protocols and 196 flooding of frames with an unknown MAC destination address. 198 2. A network virtualization authority (NVA) is used by the NVEs to 199 determine the mapping of a given Tenant System's MAC/IP address 200 to its NVE. In other words, there is no data plane learning. 201 Address resolution requests via ARP/ND that are issued by the 202 Tenant Systems must be resolved by the NVE that they are 203 attached to. 205 With this approach, it is not possible to support application- 206 specific multicast. However, certain multicast/broadcast 207 applications such as DHCP can be supported by use of a helper 208 function in the NVE. 210 The main drawback of this approach, even for unicast traffic, is 211 that it is not possible to initiate communication with a Tenant 212 System for which a mapping to an NVE does not already exist with the 213 NVA. This is a problem in the case where the NVE is implemented in 214 a physical switch and the Tenant System is a physical end station 215 that has not registered with the NVA. 217 3.2. Replication at the source NVE 219 With this method, the overlay attempts to provide a multicast 220 service without requiring any specific support from the underlay, 221 other than that of a unicast service. A multicast or broadcast 222 transmission is achieved by replicating the packet at the source 223 NVE, and making copies, one for each destination NVE that the 224 multicast packet must be sent to. 226 For this mechanism to work, the source NVE must know, a priori, the 227 IP addresses of all destination NVEs that need to receive the 228 packet. For the purpose of ARP/ND, this would involve knowing the 229 IP addresses of all the NVEs that have Tenant Systems in the virtual 230 network instance (VNI) of the Tenant System that generated the 231 request. For the support of application-specific multicast traffic, 232 a method similar to that of receiver-sites registration for a 233 particular multicast group described in [LISP-Signal-Free] can be 234 used. The registrations from different receiver-sites can be merged 235 at the NVA, which can construct a multicast replication-list 236 inclusive of all NVEs to which receivers for a particular multicast 237 group are attached. The replication-list for each specific multicast 238 group is maintained either by the NVA. 240 The receiver-sites registration is achieved by egress NVEs 241 performing the IGMP/MLD snooping to maintain state for which 242 attached Tenant Systems have subscribed to a given IP multicast 243 group. When the members of a multicast group are outside the NVO3 244 domain, it is necessary for NVO3 gateways to keep track of the 245 remote members of each multicast group. The NVEs then communicate 246 these mappings to the NVA. Even if the membership is not 247 communicated to the NVA, if it is necessary to prevent hosts 248 attached to an NVE that have not subscribed to a multicast group 249 from receiving the multicast traffic, the NVE needs to maintain the 250 multicast group membership. 252 In the absence of IGMP/MLD snooping, the traffic would be delivered 253 to all hosts that are part of the VNI. 255 This method requires multiple copies of the same packet to all NVEs 256 that participate in the VN. If, for example, a tenant subnet is 257 spread across 50 NVEs, the packet would have to be replicated 50 258 times at the source NVE. This also creates an issue with the 259 forwarding performance of the NVE. 261 Note that this method is similar to what was used in VPLS [VPLS] 262 prior to support of MPLS multicast [MPLS-MC]. While there are some 263 similarities between MPLS VPN and the NVO3 overlay, there are some 264 key differences: 266 - The CE-to-PE attachment in VPNs is somewhat static, whereas in a 267 DC that allows VMs to migrate anywhere, the TS attachment to NVE 268 is much more dynamic. 270 - The number of PEs to which a single VPN customer is attached in 271 an MPLS VPN environment is normally far less than the number of 272 NVEs to which a VNI's VMs are attached in a DC. 274 When a VPN customer has multiple multicast groups, [RFC6513] 275 "Multicast VPN" combines all those multicast groups within each 276 VPN client to one single multicast group in the MPLS (or VPN) 277 core. The result is that messages from any of the multicast 278 groups belonging to one VPN customer will reach all the PE nodes 279 of the client. In other words, any messages belonging to any 280 multicast groups under customer X will reach all PEs of the 281 customer X. When the customer X is attached to only a handful of 282 PEs, the use of this approach does not result in excessive wastage 283 of bandwidth in the provider's network. 285 In a DC environment, a typical server/hypervisor based virtual 286 switch may only support 10's VMs (as of this writing). A subnet 287 with N VMs may be, in the worst case, spread across N vSwitches. 288 Using "MPLS VPN multicast" approach in a such a scenario would 289 require the creation of a Multicast group in the core for this VNI 290 to reach all N NVEs. If only small percentage of this client's VMs 291 participate in application specific multicast, a great number of 292 NVEs will receive multicast traffic that is not forwarded to any 293 of their attached VMs, resulting in considerable wastage of 294 bandwidth. 296 Therefore, the Multicast VPN solution may not scale in DC 297 environment with dynamic attachment of Virtual Networks to NVEs and 298 greater number of NVEs for each virtual network. 300 3.3. Replication at a multicast service node 302 With this method, all multicast packets would be sent using a 303 unicast tunnel encapsulation to a multicast service node (MSN). The 304 MSN, in turn, would create multiple copies of the packet and would 305 deliver a copy, using a unicast tunnel encapsulation, to each of the 306 NVEs that are part of the multicast group for which the packet is 307 intended. 309 This mechanism is similar to that used by the ATM Forum's LAN 310 Emulation [LANE] specification [LANE]. 312 The following are the possible ways for the MSN to get the 313 membership information for each multicast group: 315 - The MSN can obtain this information by snooping the IGMP/MLD 316 messages from the Tenant Systems and/or sending query messages to 317 the Tenant Systems. In order for MSN to snoop the IGMP/MLD 318 messages between TSs and their corresponding routers, the NVEs 319 that TSs are attached have to encapsulate a special outer header, 320 e.g. outer destination being the multicast server node. See 321 Section 3.3.2 for detail. 323 - The MSN can obtain the membership information from the NVEs that 324 snoop the IGMP/MLD messages. This can be done by having the MSN 325 communicate with the NVEs, or by having the NVA obtain the 326 information from the NVEs, and in turn have MSN communicate with 327 the NVA. 329 Unlike the method described in Section 3.2, there is no performance 330 impact at the ingress NVE, nor are there any issues with multiple 331 copies of the same packet from the source NVE to the multicast 332 service node. However there remain issues with multiple copies of 333 the same packet on links that are common to the paths from the MSN 334 to each of the egress NVEs. Additional issues that are introduced 335 with this method include the availability of the MSN, methods to 336 scale the services offered by the MSN, and the sub-optimality of the 337 delivery paths. 339 Finally, the IP address of the source NVE must be preserved in 340 packet copies created at the multicast service node if data plane 341 learning is in use. This could create problems if IP source address 342 reverse path forwarding (RPF) checks are in use. 344 3.4. IP multicast in the underlay 346 In this method, the underlay supports IP multicast and the ingress 347 NVE encapsulates the packet with the appropriate IP multicast 348 address in the tunnel encapsulation header for delivery to the 349 desired set of NVEs. The protocol in the underlay could be any 350 variant of Protocol Independent Multicast (PIM), or protocol 351 dependent multicast, such as [ISIS-Multicast]. 353 If an NVE connects to its attached TSs via Layer 2 network, there 354 are multiple ways for NVEs to support the application specific 355 multicast: 357 - The NVE only supports the basic IGMP/MLD snooping function, let 358 the TSs routers handling the application specific multicast. This 359 scheme doesn't utilize the underlay IP multicast protocols. 360 - 361 - The NVE can act as a pseudo multicast router for the directly 362 attached VMs and support proper mapping of IGMP/MLD's messages to 363 the messages needed by the underlay IP multicast protocols. 365 With this method, there are none of the issues with the methods 366 described in Sections 3.2. 368 With PIM Sparse Mode (PIM-SM), the number of flows required would be 369 (n*g), where n is the number of source NVEs that source packets for 370 the group, and g is the number of groups. Bidirectional PIM (BIDIR- 371 PIM) would offer better scalability with the number of flows 372 required being g. 374 In the absence of any additional mechanism, e.g. using an NVA for 375 address resolution, for optimal delivery, there would have to be a 376 separate group for each tenant, plus a separate group for each 377 multicast address (used for multicast applications) within a tenant. 379 Additional considerations are that only the lower 23 bits of the IP 380 address (regardless of whether IPv4 or IPv6 is in use) are mapped to 381 the outer MAC address, and if there is equipment that prunes 382 multicasts at Layer 2, there will be some aliasing. Finally, a 383 mechanism to efficiently provision such addresses for each group 384 would be required. 386 There are additional optimizations which are possible, but they come 387 with their own restrictions. For example, a set of tenants may be 388 restricted to some subset of NVEs and they could all share the same 389 outer IP multicast group address. This however introduces a problem 390 of sub-optimal delivery (even if a particular tenant within the 391 group of tenants doesn't have a presence on one of the NVEs which 392 another one does, the former's multicast packets would still be 393 delivered to that NVE). It also introduces an additional network 394 management burden to optimize which tenants should be part of the 395 same tenant group (based on the NVEs they share), which somewhat 396 dilutes the value proposition of NVO3 which is to completely 397 decouple the overlay and physical network design allowing complete 398 freedom of placement of VMs anywhere within the data center. 400 Multicast schemes such as BIER (Bit Index Explicit Replication) may 401 be able to provide optimizations by allowing the underlay network to 402 provide optimum multicast delivery without requiring routers in the 403 core of the network to main per-multicast group state. 405 3.5. Other schemes 407 There are still other mechanisms that may be used that attempt to 408 combine some of the advantages of the above methods by offering 409 multiple replication points, each with a limited degree of 410 replication [EDGE-REP]. Such schemes offer a trade-off between the 411 amount of replication at an intermediate node (router) versus 412 performing all of the replication at the source NVE or all of the 413 replication at a multicast service node. 415 4. Simultaneous use of more than one mechanism 417 While the mechanisms discussed in the previous section have been 418 discussed individually, it is possible for implementations to rely 419 on more than one of these. For example, the method of Section 3.1 420 could be used for minimizing ARP/ND, while at the same time, 421 multicast applications may be supported by one, or a combination of, 422 the other methods. For small multicast groups, the methods of 423 source NVE replication or the use of a multicast service node may be 424 attractive, while for larger multicast groups, the use of multicast 425 in the underlay may be preferable. 427 5. Other issues 429 5.1. Multicast-agnostic NVEs 431 Some hypervisor-based NVEs do not process or recognize IGMP/MLD 432 frames; i.e. those NVEs simply encapsulate the IGMP/MLD messages in 433 the same way as they do for regular data frames. 435 By default, TSs router periodically sends IGMP/MLD query messages to 436 all the hosts in the subnet to trigger the hosts that are interested 437 in the multicast stream to send back IGMP/MLD reports. In order for 438 MSN get the updated multicast group information, the MSN can also 439 send the IGMP/MLD query message comprising a client specific 440 multicast address, encapsulated in an overlay header to all the NVEs 441 to which the TSs in the VN are attached. 443 However, MSN may not always be aware of the client specific 444 multicast addresses. Then MSN has to snoop the IGMP/MLD messages 445 between TSs and their corresponding routers to maintain the 446 multicast membership. In order for MSN to snoop the IGMP/MLD 447 messages between TSs and their router, NVA needs to configure the 448 NVE to send copies of the IGMP/MLD messages to the MSN in addition 449 to the default behavior of sending them to the TSs' routers; e.g. 450 the NVA has to inform the NVEs to encapsulate data frames with DA 451 being 224.0.0.2 (destination address of IGMP report) to TSs' router 452 and MSN. 454 This process is similar to "Source Replication" described in Section 455 3.2, except the NVEs only replicate the message to TS's router and 456 MSN. 458 5.2. Multicast membership management for DC with VMs 460 For data centers with virtualized servers, VMs can be added, deleted 461 or moved very easily. When VMs are added, deleted or moved, the NVEs 462 to which the VMs are attached are changed. 464 When a VM is deleted from an NVE or a new VM is added to an NVE, the 465 VM management system should notify the MSN to send the IGMP/MLD 466 query messages to the relevant NVEs, so that the multicast 467 membership can be updated promptly. Otherwise, if there are changes 468 of VMs attachment to NVEs, then for the duration of the configured 469 default time interval that the TSs routers use for IGMP/MLD queries, 470 multicast data may not reach the VM(s) that moved. 472 6. Summary 474 This document has identified various mechanisms for supporting 475 application specific multicast in networks that use NVO3. It 476 highlights the basics of each mechanism and some of the issues with 477 them. As solutions are developed, the protocols would need to 478 consider the use of these mechanisms and co-existence may be a 479 consideration. It also highlights some of the requirements for 480 supporting multicast applications in an NVO3 network. 482 7. Security Considerations 484 This draft does not introduce any new security considerations beyond 485 what may be present in proposed solutions 487 8. IANA Considerations 489 This document requires no IANA actions. RFC Editor: Please remove 490 this section before publication. 492 9. References 494 9.1. Normative References 496 [PS] Lasserre, M. et al., "Framework for DC network 497 virtualization", work in progress, January 2014. 499 [FW] Narten, T. et al., "Problem statement: Overlays for 500 network virtualization", work in progress, July 2013. 502 [NVO3-ARCH] Narten, T. et al.," An Architecture for Overlay Networks 503 (NVO3)", work in progress, Feb 2014 505 [RFC3376] B. Cain, et al, "Internet Group Management Protocol, 506 Version 3", Oct 2002. 508 [RFC6513] Rosen, E. et al., "Multicast in MPLS/BGP IP VPNs". Feb 509 2012. 511 9.2. Informative References 513 [VXLAN] Mahalingam, M. et al., "VXLAN: A framework for overlaying 514 virtualized Layer 2 networks over Layer 3 networks," work 515 in progress. [AG: Replace with RFC.] 517 [NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization 518 using Generic Routing Encapsulation," work in progress. 520 [STT] Davie, B. and Gross J., "A stateless transport tunneling 521 protocol for network virtualization," work in progress. 523 [DC-MC] McBride M., and Lui, H., "Multicast in the data center 524 overview," work in progress. 526 [ISIS-Multicast] L. Yong, et al, "ISIS Protocol Extension For 527 Building Distribution Trees", work in progress. Oct 2013. 529 [VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private 530 LAN Service (VPLS) using Label Distribution Protocol (LDP) 531 signaling," RFC 4762, January 2007. 533 [MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS," work in 534 progress. 536 [LANE] "LAN emulation over ATM," The ATM Forum, af-lane- 537 0021.000, January 1995. 539 [EDGE-REP] Marques P. et al., "Edge multicast replication for BGP 540 IP VPNs," work in progress, June 2012. 542 [RFC 3569] S. Bhattacharyya, Ed., "An Overview of Source-Specific 543 Multicast (SSM)", July 2003. 545 [LISP-Signal-Free] V. Moreno & D. Farinacci, "Signal-Free LISP 546 Multicast", work in progress. Dec 2014. 548 10. Acknowledgments 550 We want to thank Dino Farinacci for comments and suggestions to this 551 draft. 553 This document was prepared using 2-Word-v2.0.template.dot. 555 Authors' Addresses 557 Anoop Ghanwani 558 Dell 559 Email: anoop@alumni.duke.edu 561 Linda Dunbar 562 Huawei Technologies 563 5340 Legacy Drive, Suite 1750 564 Plano, TX 75024, USA 565 Phone: (469) 277 5840 566 Email: ldunbar@huawei.com 568 Mike McBride 569 Ericsson 570 mike.mcbride@ericsson.com 572 Vinay Bannai 573 Paypal 574 Email: vbannai@paypal.com 576 Ramki Krishnan 577 Brocade 578 Email: ramk@brocade.com