idnits 2.17.1 draft-ghanwani-nvo3-app-mcast-framework-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 116 has weird spacing: '...t group is...' -- The document date (January 23, 2015) is 3352 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'PS' -- Possible downref: Non-RFC (?) normative reference: ref. 'FW' -- Possible downref: Non-RFC (?) normative reference: ref. 'NVO3-ARCH' Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NVO3 working group A. Ghanwani 2 Internet Draft Dell 3 Intended status: Standards Track L. Dunbar 4 Expires: July 23, 2015 Huawei 5 M. McBride 6 Ericsson 7 V. Bannai 8 Paypal 9 R. Krishnan 10 Brocade 11 January 23, 2015 13 Framework of Supporting Applications Specific Multicast in NVO3 14 draft-ghanwani-nvo3-app-mcast-framework-02 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. This document may not be modified, 23 and derivative works of it may not be created, except to publish it 24 as an RFC and to translate it into languages other than English. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as Internet- 29 Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other documents 33 at any time. It is inappropriate to use Internet-Drafts as 34 reference material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/ietf/1id-abstracts.txt 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 This Internet-Draft will expire on July23,2015. 44 Copyright Notice 46 Copyright (c) 2015 IETF Trust and the persons identified as the 47 document authors. All rights reserved. 49 This document is subject to BCP 78 and the IETF Trust's Legal 50 Provisions Relating to IETF Documents 51 (http://trustee.ietf.org/license-info) in effect on the date of 52 publication of this document. Please review these documents 53 carefully, as they describe your rights and restrictions with 54 respect to this document. Code Components extracted from this 55 document must include Simplified BSD License text as described in 56 Section 4.e of the Trust Legal Provisions and are provided without 57 warranty as described in the Simplified BSD License. 59 Abstract 61 This document discusses the framework of supporting application- 62 specific multicast traffic, i.e. multicast traffic other than 63 ARP/ND, in a network that uses Network Virtualization using Overlays 64 over Layer 3 (NVO3). It describes the various mechanisms and 65 considerations that can be used for delivering such traffic. 67 Table of Contents 69 1. Introduction...................................................3 70 2. Acronyms.......................................................3 71 3. Multicast mechanisms in networks that use NVO3.................4 72 3.1. No multicast support......................................5 73 3.2. Replication at the source NVE.............................5 74 3.3. Replication at a multicast service node...................7 75 3.4. IP multicast in the underlay..............................8 76 3.5. Other schemes............................................10 77 4. Simultaneous use of more than one mechanism...................10 78 5. Other issues..................................................10 79 5.1. Multicast-agnostic NVEs..................................11 80 5.2. Multicast membership management for DC with VMs..........11 81 6. Summary.......................................................12 82 7. Security Considerations.......................................12 83 8. IANA Considerations...........................................12 84 9. References....................................................12 85 9.1. Normative References.....................................12 86 9.2. Informative References...................................13 87 10. Acknowledgments..............................................13 89 1. Introduction 91 Network virtualization using Overlays over Layer 3 (NVO3) is a 92 technology that is used to address issues that arise in building 93 large, multitenant data centers that make extensive use of server 94 virtualization [PS]. 96 This document provides a framework for supporting application- 97 specific multicast traffic, i.e. multicast traffic other than 98 ARP/ND, in a network that uses Network Virtualization using Overlays 99 over Layer 3 (NVO3). It describes the various mechanisms and 100 considerations that can be used for delivering such traffic in 101 networks that use NVO3. 103 The reader is assumed to be familiar with the terminology as defined 104 in the NVO3 Framework document [FW] and NVO3 Architecture document 105 [NVO3-ARCH]. 107 Application-specific multicast traffic, which may be either Source- 108 Specific Multicast (SSM) or Any-Source Multicast (ASM)[RFC 3569], 109 has the following characteristics: 111 1. Receiver hosts are expected to subscribe to multicast content 112 using protocols such as IGMP [RFC3376] (IPv4) or MLD (IPv6). 113 Multicast sources and listeners participant in these protocols 114 using addresses that are in the Tenant System address domain. 116 2. The list of multicast listeners for each multicast group is 117 not known in advance. Therefore, it may not be possible for an 118 NVA to get the list of participants for each multicast group 119 ahead of time. 121 2. Acronyms 123 ASM: Any-Source Multicast 125 LISP: Locator/ID Separation Protocol 126 NVA: Network Virtualization Authority 128 NVE: Network Virtualization Edge 130 NVGRE: Network Virtualization using GRE 132 SSM: Source-Specific Multicast 134 STT: Stateless Tunnel Transport 136 VXLAN: Virtual eXtensible LAN 138 3. Multicast mechanisms in networks that use NVO3 140 In NVO3 environments, traffic between NVEs is transported using an 141 encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT [STT], etc. 143 Besides the need to support the Address Resolution Protocol (ARP) 144 and Neighbor Discovery (ND), there are several applications that 145 require the support of multicast and/or broadcast in data centers 146 [DC-MC]. With NVO3, there are many possible ways that multicast may 147 be handled in such networks. We discuss some of the attributes of 148 the following four methods: 1. No multicast support. 150 2. Replication at the source NVE. 152 3. Replication at a multicast service node. 154 4. IP multicast in the underlay. 156 These mechanisms are briefly mentioned in the NVO3 Framework [FW] 157 and NVO3 architecture [NVO3-ARCH] document. This document attempts 158 to provide more details about the basic mechanisms underlying each 159 of these mechanisms and discusses the issues and tradeoffs of each. 161 We note that other methods are also possible, such as [EDGE-REP], 162 but we focus on the above four because they are the most common. 164 3.1. No multicast support 166 In this scenario, there is no support whatsoever for multicast 167 traffic when using the overlay. This can only work if the following 168 conditions are met: 170 1. All of the traffic is unicast. traffic in the network and the 171 only multicast/broadcast traffic is from ARP/ND protocols and 172 flooding of frames with an unknown MAC destination address. 174 2. A network virtualization authority (NVA) is used by the NVEs to 175 determine the mapping of a given Tenant System's MAC/IP address 176 to its NVE. In other words, there is no data plane learning. 177 Address resolution requests via ARP/ND that are issued by the 178 Tenant Systems must be resolved by the NVE that they are 179 attached to. 181 With this approach, it is not possible to support application- 182 specific multicast. However, certain multicast/broadcast 183 applications such as DHCP can be supported by use of a helper 184 function in the NVE. 186 The main drawback of this approach, even for unicast traffic, is 187 that it is not possible to initiate communication with a Tenant 188 System for which a mapping to an NVE does not already exist with the 189 NVA. This is a problem in the case where the NVE is implemented in 190 a physical switch and the Tenant System is a physical end station 191 that has not registered with the NVA. 193 3.2. Replication at the source NVE 195 With this method, the overlay attempts to provide a multicast 196 service without requiring any specific support from the underlay, 197 other than that of a unicast service. A multicast or broadcast 198 transmission is achieved by replicating the packet at the source 199 NVE, and making copies, one for each destination NVE that the 200 multicast packet must be sent to. 202 For this mechanism to work, the source NVE must know, a priori, the 203 IP addresses of all destination NVEs that need to receive the 204 packet. For the purpose of ARP/ND, this would involve knowing the 205 IP addresses of all the NVEs that have Tenant Systems in the virtual 206 network instance (VNI) of the Tenant System that generated the 207 request. For the support of application-specific multicast traffic, 208 a method similar to that of receiver-sites registration for a 209 particular multicast group described in [LISP-Signal-Free] can be 210 used. The registrations from different receiver-sites can be merged 211 at the NVA, which can construct a multicast replication-list 212 inclusive of all NVEs to which receivers for a particular multicast 213 group are attached. The replication-list for each specific multicast 214 group is maintained either by the NVA. 216 The receiver-sites registration is achieved by egress NVEs 217 performing the IGMP/MLD snooping to maintain state for which 218 attached Tenant Systems have subscribed to a given IP multicast 219 group. When the members of a multicast group are outside the NVO3 220 domain, it is necessary for NVO3 gateways to keep track of the 221 remote members of each multicast group. The NVEs then communicate 222 these mappings to the NVA. Even if the membership is not 223 communicated to the NVA, if it is necessary to prevent hosts 224 attached to an NVE that have not subscribed to a multicast group 225 from receiving the multicast traffic, the NVE needs to maintain the 226 multicast group membership. 228 In the absence of IGMP/MLD snooping, the traffic would be delivered 229 to all hosts that are part of the VNI. 231 This method requires multiple copies of the same packet to all NVEs 232 that participate in the VN. If, for example, a tenant subnet is 233 spread across 50 NVEs, the packet would have to be replicated 50 234 times at the source NVE. This also creates an issue with the 235 forwarding performance of the NVE. 237 Note that this method is similar to what was used in VPLS [VPLS] 238 prior to support of MPLS multicast [MPLS-MC]. While there are some 239 similarities between MPLS VPN and the NVO3 overlay, there are some 240 key differences: 242 - The CE-to-PE attachment in VPNs is somewhat static, whereas in a 243 DC that allows VMs to migrate anywhere, the TS attachment to NVE 244 is much more dynamic. 246 - The number of PEs to which a single VPN customer is attached in 247 an MPLS VPN environment is normally far less than the number of 248 NVEs to which a VNI's VMs are attached in a DC. 250 When a VPN customer has multiple multicast groups, [RFC6513] 251 "Multicast VPN" combines all those multicast groups within each 252 VPN client to one single multicast group in the MPLS (or VPN) 253 core. The result is that messages from any of the multicast 254 groups belonging to one VPN customer will reach all the PE nodes 255 of the client. In other words, any messages belonging to any 256 multicast groups under customer X will reach all PEs of the 257 customer X. When the customer X is attached to only a handful of 258 PEs, the use of this approach does not result in excessive wastage 259 of bandwidth in the provider's network. 261 In a DC environment, a typical server/hypervisor based virtual 262 switch may only support 10's VMs (as of this writing). A subnet 263 with N VMs may be, in the worst case, spread across N vSwitches. 264 Using "MPLS VPN multicast" approach in a such a scenario would 265 require the creation of a Multicast group in the core for this VNI 266 to reach all N NVEs. If only small percentage of this client's VMs 267 participate in application specific multicast, a great number of 268 NVEs will receive multicast traffic that is not forwarded to any 269 of their attached VMs, resulting in considerable wastage of 270 bandwidth. 272 Therefore, the Multicast VPN solution may not scale in DC 273 environment with dynamic attachment of Virtual Networks to NVEs and 274 greater number of NVEs for each virtual network. 276 3.3. Replication at a multicast service node 278 With this method, all multicast packets would be sent using a 279 unicast tunnel encapsulation to a multicast service node (MSN). The 280 MSN, in turn, would create multiple copies of the packet and would 281 deliver a copy, using a unicast tunnel encapsulation, to each of the 282 NVEs that are part of the multicast group for which the packet is 283 intended. 285 This mechanism is similar to that used by the ATM Forum's LAN 286 Emulation [LANE] specification [LANE]. 288 The following are the possible ways for the MSN to get the 289 membership information for each multicast group: 291 - The MSN can obtain this information by snooping the IGMP/MLD 292 messages from the Tenant Systems and/or sending query messages to 293 the Tenant Systems. In order for MSN to snoop the IGMP/MLD 294 messages between TSs and their corresponding routers, the NVEs 295 that TSs are attached have to encapsulate a special outer header, 296 e.g. outer destination being the multicast server node. See 297 Section 3.3.2 for detail. 299 - The MSN can obtain the membership information from the NVEs that 300 snoop the IGMP/MLD messages. This can be done by having the MSN 301 communicate with the NVEs, or by having the NVA obtain the 302 information from the NVEs, and in turn have MSN communicate with 303 the NVA. 305 Unlike the method described in Section 3.2, there is no performance 306 impact at the ingress NVE, nor are there any issues with multiple 307 copies of the same packet from the source NVE to the multicast 308 service node. However there remain issues with multiple copies of 309 the same packet on links that are common to the paths from the MSN 310 to each of the egress NVEs. Additional issues that are introduced 311 with this method include the availability of the MSN, methods to 312 scale the services offered by the MSN, and the sub-optimality of the 313 delivery paths. 315 Finally, the IP address of the source NVE must be preserved in 316 packet copies created at the multicast service node if data plane 317 learning is in use. This could create problems if IP source address 318 reverse path forwarding (RPF) checks are in use. 320 3.4. IP multicast in the underlay 322 In this method, the underlay supports IP multicast and the ingress 323 NVE encapsulates the packet with the appropriate IP multicast 324 address in the tunnel encapsulation header for delivery to the 325 desired set of NVEs. The protocol in the underlay could be any 326 variant of Protocol Independent Multicast (PIM), or protocol 327 dependent multicast, such as [ISIS-Multicast]. 329 If an NVE connects to its attached TSs via Layer 2 network, there 330 are multiple ways for NVEs to support the application specific 331 multicast: 333 - The NVE only supports the basic IGMP/MLD snooping function, let 334 the TSs routers handling the application specific multicast. This 335 scheme doesn't utilize the underlay IP multicast protocols. 336 - 337 - The NVE can act as a pseudo multicast router for the directly 338 attached VMs and support proper mapping of IGMP/MLD's messages to 339 the messages needed by the underlay IP multicast protocols. 341 With this method, there are none of the issues with the methods 342 described in Sections 3.2. 344 With PIM Sparse Mode (PIM-SM), the number of flows required would be 345 (n*g), where n is the number of source NVEs that source packets for 346 the group, and g is the number of groups. Bidirectional PIM (BIDIR- 347 PIM) would offer better scalability with the number of flows 348 required being g. 350 In the absence of any additional mechanism, e.g. using an NVA for 351 address resolution, for optimal delivery, there would have to be a 352 separate group for each tenant, plus a separate group for each 353 multicast address (used for multicast applications) within a tenant. 355 Additional considerations are that only the lower 23 bits of the IP 356 address (regardless of whether IPv4 or IPv6 is in use) are mapped to 357 the outer MAC address, and if there is equipment that prunes 358 multicasts at Layer 2, there will be some aliasing. Finally, a 359 mechanism to efficiently provision such addresses for each group 360 would be required. 362 There are additional optimizations which are possible, but they come 363 with their own restrictions. For example, a set of tenants may be 364 restricted to some subset of NVEs and they could all share the same 365 outer IP multicast group address. This however introduces a problem 366 of sub-optimal delivery (even if a particular tenant within the 367 group of tenants doesn't have a presence on one of the NVEs which 368 another one does, the former's multicast packets would still be 369 delivered to that NVE). It also introduces an additional network 370 management burden to optimize which tenants should be part of the 371 same tenant group (based on the NVEs they share), which somewhat 372 dilutes the value proposition of NVO3 which is to completely 373 decouple the overlay and physical network design allowing complete 374 freedom of placement of VMs anywhere within the data center. 376 3.5. Other schemes 378 There are still other mechanisms that may be used that attempt to 379 combine some of the advantages of the above methods by offering 380 multiple replication points, each with a limited degree of 381 replication [EDGE-REP]. Such schemes offer a trade-off between the 382 amount of replication at an intermediate node (router) versus 383 performing all of the replication at the source NVE or all of the 384 replication at a multicast service node. 386 4. Simultaneous use of more than one mechanism 388 While the mechanisms discussed in the previous section have been 389 discussed individually, it is possible for implementations to rely 390 on more than one of these. For example, the method of Section 3.1 391 could be used for minimizing ARP/ND, while at the same time, 392 multicast applications may be supported by one, or a combination of, 393 the other methods. For small multicast groups, the methods of 394 source NVE replication or the use of a multicast service node may be 395 attractive, while for larger multicast groups, the use of multicast 396 in the underlay may be preferable. 398 5. Other issues 399 5.1. Multicast-agnostic NVEs 401 Some hypervisor-based NVEs do not process or recognize IGMP/MLD 402 frames; i.e. those NVEs simply encapsulate the IGMP/MLD messages in 403 the same way as they do for regular data frames. 405 By default, TSs router periodically sends IGMP/MLD query messages to 406 all the hosts in the subnet to trigger the hosts that are interested 407 in the multicast stream to send back IGMP/MLD reports. In order for 408 MSN get the updated multicast group information, the MSN can also 409 send the IGMP/MLD query message comprising a client specific 410 multicast address, encapsulated in an overlay header to all the NVEs 411 to which the TSs in the VN are attached. 413 However, MSN may not always be aware of the client specific 414 multicast addresses. Then MSN has to snoop the IGMP/MLD messages 415 between TSs and their corresponding routers to maintain the 416 multicast membership. In order for MSN to snoop the IGMP/MLD 417 messages between TSs and their router, NVA needs to configure the 418 NVE to send copies of the IGMP/MLD messages to the MSN in addition 419 to the default behavior of sending them to the TSs' routers; e.g. 420 the NVA has to inform the NVEs to encapsulate data frames with DA 421 being 224.0.0.2 (destination address of IGMP report) to TSs' router 422 and MSN. 424 This process is similar to "Source Replication" described in Section 425 3.2, except the NVEs only replicate the message to TS's router and 426 MSN. 428 5.2. Multicast membership management for DC with VMs 430 For data centers with virtualized servers, VMs can be added, deleted 431 or moved very easily. When VMs are added, deleted or moved, the NVEs 432 to which the VMs are attached are changed. 434 When a VM is deleted from an NVE or a new VM is added to an NVE, the 435 VM management system should notify the MSN to send the IGMP/MLD 436 query messages to the relevant NVEs, so that the multicast 437 membership can be updated promptly. Otherwise, if there are changes 438 of VMs attachment to NVEs, then for the duration of the configured 439 default time interval that the TSs routers use for IGMP/MLD queries, 440 multicast data may not reach the VM(s) that moved. 442 6. Summary 444 This document has identified various mechanisms for supporting 445 application specific multicast in networks that use NVO3. It 446 highlights the basics of each mechanism and some of the issues with 447 them. As solutions are developed, the protocols would need to 448 consider the use of these mechanisms and co-existence may be a 449 consideration. It also highlights some of the requirements for 450 supporting multicast applications in an NVO3 network. 452 7. Security Considerations 454 This draft does not introduce any new security considerations beyond 455 what may be present in proposed solutions 457 8. IANA Considerations 459 This document requires no IANA actions. RFC Editor: Please remove 460 this section before publication. 462 9. References 464 9.1. Normative References 466 [PS] Lasserre, M. et al., "Framework for DC network 467 virtualization", work in progress, January 2014. 469 [FW] Narten, T. et al., "Problem statement: Overlays for 470 network virtualization", work in progress, July 2013. 472 [NVO3-ARCH] Narten, T. et al.," An Architecture for Overlay Networks 473 (NVO3)", work in progress, Feb 2014 475 [RFC3376] B. Cain, et al, "Internet Group Management Protocol, 476 Version 3", Oct 2002. 478 [RFC6513] Rosen, E. et al., "Multicast in MPLS/BGP IP VPNs". Feb 479 2012. 481 9.2. Informative References 483 [VXLAN] Mahalingam, M. et al., "VXLAN: A framework for overlaying 484 virtualized Layer 2 networks over Layer 3 networks," work 485 in progress. [AG: Replace with RFC.] 487 [NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization 488 using Generic Routing Encapsulation," work in progress. 490 [STT] Davie, B. and Gross J., "A stateless transport tunneling 491 protocol for network virtualization," work in progress. 493 [DC-MC] McBride M., and Lui, H., "Multicast in the data center 494 overview," work in progress. 496 [ISIS-Multicast] L. Yong, et al, "ISIS Protocol Extension For 497 Building Distribution Trees", work in progress. Oct 2013. 499 [VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private 500 LAN Service (VPLS) using Label Distribution Protocol (LDP) 501 signaling," RFC 4762, January 2007. 503 [MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS," work in 504 progress. 506 [LANE] "LAN emulation over ATM," The ATM Forum, af-lane- 507 0021.000, January 1995. 509 [EDGE-REP] Marques P. et al., "Edge multicast replication for BGP 510 IP VPNs," work in progress, June 2012. 512 [RFC 3569] S. Bhattacharyya, Ed., "An Overview of Source-Specific 513 Multicast (SSM)", July 2003. 515 [LISP-Signal-Free] V. Moreno & D. Farinacci, "Signal-Free LISP 516 Multicast", work in progress. Dec 2014. 518 10. Acknowledgments 520 We want to thank Dino Farinacci for comments and suggestions to this 521 draft. 523 This document was prepared using 2-Word-v2.0.template.dot. 525 Authors' Addresses 527 Anoop Ghanwani 528 Dell 529 Email: anoop@alumni.duke.edu 531 Linda Dunbar 532 Huawei Technologies 533 5340 Legacy Drive, Suite 1750 534 Plano, TX 75024, USA 535 Phone: (469) 277 5840 536 Email: ldunbar@huawei.com 538 Mike McBride 539 Ericsson 540 mike.mcbride@ericsson.com 542 Vinay Bannai 543 Paypal 544 Email: vbannai@paypal.com 546 Ram Krishnan 547 Brocade 548 Email: ramk@brocade.com