idnits 2.17.1 draft-ghanwani-nvo3-mcast-issues-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 13, 2014) is 3696 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT A. Ghanwani 3 Intended Status: Informational Dell 4 Expires: August 12, 2014 L. Dunbar 5 Huawei 6 V. Bannai 7 Paypal 8 R. Krishnan 9 Brocade 10 February 13, 2014 12 Multicast Issues in Networks Using NVO3 13 draft-ghanwani-nvo3-mcast-issues-01 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with the 18 provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as 23 Internet-Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/1id-abstracts.html 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html 36 Copyright Notice 38 Copyright (c) 2012 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 Abstract 50 This memo discusses issues with supporting multicast traffic in a 51 network that uses Network Virtualization using Overlays over Layer 3 52 (NVO3). It describes the various mechanisms that may be used for 53 multicast and discusses some of the considerations with supporting 54 multicast applications in networks that use NVO3. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Multicast mechanisms in networks that use NVO3 . . . . . . . . 4 60 2.1 No multicast support . . . . . . . . . . . . . . . . . . . . 4 61 2.2 Replication at the source NVE . . . . . . . . . . . . . . . 5 62 2.3 Replication at a multicast service node . . . . . . . . . . 5 63 2.4 IP multicast in the underlay . . . . . . . . . . . . . . . . 6 64 2.5 Other schemes . . . . . . . . . . . . . . . . . . . . . . . 7 65 3. Simultaneous use of more than one mechanism . . . . . . . . . . 7 66 4. IP multicast applications in the overlay . . . . . . . . . . . 7 67 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 69 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 70 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 8.1 Normative References . . . . . . . . . . . . . . . . . . . 8 72 8.2 Informative References . . . . . . . . . . . . . . . . . . 9 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 75 1. Introduction 77 Network virtualization using Overlays over Layer 3 (NVO3) is a 78 technology that is used to address issues that arise in building 79 large, multitenant data centers that make extensive use of server 80 virtualization [PS]. 82 This document is focused specifically on the problem of supporting 83 multicast in networks that use NVO3. Because of the requirement of 84 multi-destination delivery, multicast traffic poses some unique 85 challenges. 87 The reader is assumed to be familiar with the terminology as defined 88 in the NVO3 Framework document [FW]. 90 2. Multicast mechanisms in networks that use NVO3 92 In NVO3 environments, traffic between NVEs is transported using a 93 tunnel encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT [STT], 94 etc. 96 Besides the need to support the Address Resolution Protocol (ARP) and 97 Neighbor Discovery (ND), there are several applications that require 98 the support of multicast and/or broadcast in data centers [DC-MC]. 99 With NVO3, there are many possible ways that multicast may be handled 100 in such networks. We discuss some of the attributes of the following 101 four methods, but other methods are also possible. 103 1. No multicast support. 104 2. Replication at the source NVE. 105 3. Replication at a multicast service node. 106 4. IP multicast in the underlay. 108 These mechanisms are briefly mentioned in the NVO3 Framework [FW] 109 document. This document attempts to fill in some more details about 110 the basic mechanisms underlying each of these mechanisms and 111 discusses the issues and tradeoffs of each. 113 2.1 No multicast support 115 In this scenario, there is no support whatsoever for multicast 116 traffic when using the overlay. This can only work if the following 117 conditions are met: 119 1. All of the traffic is unicast. In other words, there are no 120 multicast applications in the network and the only multicast 121 traffic is due to ARP/ND and due to flooding of frames with an 122 unknown MAC destination address. 124 2. A network virtualization authority (NVA) is used at the NVE 125 to determine the MAC address-to-NVE mapping and to determine 126 the MAC address-to-IP address bindings. In other words, 127 there is no data plane learning, and address resolution 128 requests via ARP/ND that are issued by the VMs must be 129 resolved by the NVE that they are attached to. 131 With this approach, certain multicast/broadcast applications such as 132 DHCP can be supported by use of a helper function in the NVE. 134 The main issues that need to be addressed with this mechanism are the 135 handling of hosts for which a mapping does not already exist in the 136 NVA. This issue can be particularly challenging if such end systems 137 are reachable through more than one NVE. 139 2.2 Replication at the source NVE 141 With this method, the overlay attempts to provide a multicast service 142 without requiring any specific support from the underlay, other than 143 that of a unicast service. A multicast or broadcast transmission is 144 achieved by replicating the packet at the source NVE, and making 145 copies, one for each destination NVE that the multicast packet must 146 be sent to. 148 For this mechanism to work, the source NVE must know, a priori, the 149 IP addresses of all destination NVEs that need to receive the packet. 150 For example, in the case of an ARP broadcast or an ND multicast, the 151 source NVE must know the IP addresses of all the remote NVEs where 152 there are members of the tenant subnet in question. 154 The obvious drawback with this method is that we have multiple copies 155 of the same packet that will traverse any common links that are along 156 the path to each of the destination NVEs. If, for example, a tenant 157 subnet is spread across 50 NVEs, the packet would have to be 158 replicated 50 times at the source NVE. This also creates an issue 159 with the forwarding performance of the NVE, especially if it is 160 implemented in software. 162 Note that this method is similar to what was used in VPLS [VPLS] 163 prior to extensive support of MPLS multicast [MPLS-MC]. 165 2.3 Replication at a multicast service node 167 With this method, all multicast packets would be sent using a unicast 168 tunnel encapsulation to a multicast service node. The multicast 169 service node, in turn, would create multiple copies of the packet and 170 would deliver a copy, using a unicast tunnel encapsulation, to each 171 of the NVEs that are part of the multicast group for which the packet 172 is intended. 174 This mechanism is similar to that used by the ATM Forum's LAN 175 Emulation [LANE] specification [LANE]. 177 Unlike the method described in Section 2.2, there is no performance 178 impact at the ingress NVE, nor are there any issues with multiple 179 copies of the same packet from the source NVE to the multicast 180 service node. However there remain issues with multiple copies of 181 the same packet on links that are common to the paths from the 182 multicast service node to each of the egress NVEs. Additional issues 183 that are introduced with this method include the availability of the 184 multicast service node, methods to scale the services offered by the 185 multicast service node, and the sub-optimality of the delivery paths. 187 Finally, the IP address of the source NVE must be preserved in packet 188 copies created at the multicast service node if data plane learning 189 is in use. This could create problems if IP source address reverse 190 path forwarding (RPF) checks are in use. 192 2.4 IP multicast in the underlay 194 In this method, the underlay supports IP multicast and the ingress 195 NVE encapsulates the packet with the appropriate IP multicast address 196 in the tunnel encapsulation header for delivery to the desired set of 197 NVEs. The protocol in the underlay could be any variant of Protocol 198 Independent Multicast (PIM). The NVE would be required to 199 participate in the underlay as a host using IGMP/MLD in order for the 200 underlay to learn about the groups that the NVE participates in. 202 With this method, there are none of the issues with the methods 203 described in Sections 2.2. 205 With PIM Sparse Mode (PIM-SM), the number of flows required would be 206 (n*g), where n is the number of source NVEs that source packets for 207 the group, and g is the number of groups. Bidirectional PIM (BIDIR- 208 PIM) would offer better scalability with the number of flows required 209 being g. 211 In the absence of any additional mechanism, e.g. using an NVA for 212 address resolution, for optimal delivery, there would have to be a 213 separate group for each tenant, plus a separate group for each 214 multicast address (used for multicast applications) within a tenant. 215 Additional considerations are that only the lower 23 bits of the IP 216 address (regardless of whether IPv4 or IPv6 is in use) are mapped to 217 the outer MAC address, and if there is equipment that prunes 218 multicasts at Layer 2, there will be some aliasing. Finally, a 219 mechanism to efficiently provision such addresses for each group 220 would be required. 222 There are additional optimizations which are possible, but they come 223 with their own restrictions. For example, a set of tenants may be 224 restricted to some subset of NVEs and they could all share the same 225 outer IP multicast group address. This however introduces a problem 226 of sub-optimal delivery (even if a particular tenant within the group 227 of tenants doesn't have a presence on one of the NVEs which another 228 one does, the former's multicast packets would still be delivered to 229 that NVE). It also introduces an additional network management 230 burden to optimize which tenants should be part of the same tenant 231 group (based on the NVEs they share), which somewhat dilutes the 232 value proposition of NVO3 which is to completely decouple the overlay 233 and physical network design allowing complete freedom of placement of 234 VMs anywhere within the data center. 236 2.5 Other schemes 238 There are still other mechanisms that may be used that attempt to 239 combine some of the advantages of the above methods by offering 240 multiple replication points, each with a limited degree of 241 replication [EDGE-REP]. Such schemes offer a trade-off between the 242 amount of replication at an intermediate node (router) versus 243 performing all of the replication at the source NVE or all of the 244 replication at a multicast service node. 246 3. Simultaneous use of more than one mechanism 248 While the mechanisms discussed in the previous section have been 249 discussed individually, it is possible for implementations to rely on 250 more than one of these. For example, the method of Section 2.1 could 251 be used for minimizing ARP/ND, while at the same time, multicast 252 applications may be supported by one, or a combination of, the other 253 methods. For small multicast groups, the methods of source NVE 254 replication or the use of a multicast service node may be attractive, 255 while for larger multicast groups, the use of multicast in the 256 underlay may be preferable. 258 4. IP multicast applications in the overlay 260 When IP multicast is implemented in the overlay (i.e. the tenant 261 traffic is IP multicast), there are a few issues that need to be 262 addressed. 264 First, in all cases where L2 virtual network interfaces (VNIs) are 265 present, the NVE would need to support IGMP/MLD snooping in order to 266 prevent delivery of packets to tenant systems that are not interested 267 in receiving them. 269 Second is the issue of how the groups are setup and mapped to tunnels 270 in the underlay. This can be accomplished entirely by an NVA if the 271 mechanisms described in Section 2.2 or Section 2.3 are used, with the 272 NVE just participating in snooping of IGMP messages from the tenant 273 systems. If the method of Section 2.4 is used, then a mechanism must 274 be provide for mapping the tenant IP multicast address to an IP 275 multicast address for use in the underlay, and the NVE would be 276 required to translate the information from the snooped IGMP/MLD 277 messages from the tenant systems into corresponding requests for the 278 underlay. 280 Third, when using the scheme described in Section 2.3, it may be 281 useful to have the multicast service node support the IGMP querier 282 function. 284 Fourth, if the IP multicast traffic is contained within a single 285 virtual network (VN), then the schemes described herein are 286 sufficient. If, on the other hand, the IP multicast traffic needs to 287 traverse VNs, then the routing mechanisms at the NVE need to offer IP 288 multicast forwarding. Once again, depending on how the groups are 289 setup -- whether by an NVA or some other entity -- the forwarding 290 tables at the NVE that has L3 virtual network interfaces (VNIs) would 291 need to be setup by that entity. 293 5. Summary 295 This document has identified various mechanisms for supporting 296 multicast in networks that use NVO3. It highlights the basics of 297 each mechanism and some of the issues with them. As solutions are 298 developed, the protocols would need to consider the use of these 299 mechanisms and co-existence may be a consideration. It also 300 highlights some of the requirements for supporting multicast 301 applications in an NVO3 network. 303 6. Security Considerations 305 This is an informational document, and as such, does not introduce 306 any new security considerations beyond what may be present in 307 proposed solutions. 309 7. IANA Considerations 311 This draft does not have any IANA considerations. 313 8. References 315 8.1 Normative References 317 [PS] Lasserre, M. et al., "Framework for DC network 318 virtualization", work in progress, January 2014. 320 [FW] Narten, T. et al., "Problem statement: Overlays 321 for network virtualization", work in progress, 322 July 2013. 324 8.2 Informative References 326 [VXLAN] Mahalingam, M. et al., "VXLAN: A framework for 327 overlaying virtualized Layer 2 networks over Layer 3 328 networks," work in progress. 330 [NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization 331 using Generic Routing Encapsulation," work in progress. 333 [STT] Davie, B. and Gross J., "A stateless transport 334 tunneling protocol for network virtualization," 335 work in progress. 337 [DC-MC] McBride M., and Lui, H., "Multicast in the data 338 center overview," work in progress. 340 [VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private 341 LAN Service (VPLS) using Label Distribution Protocol 342 (LDP) signaling," RFC 4762, January 2007. 344 [MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS," work in 345 progress. 347 [LANE] "LAN emulation over ATM," The ATM Forum, 348 af-lane-0021.000, January 1995. 350 [EDGE-REP] 351 Marques P. et al., "Edge multicast replication for 352 BGP IP VPNs," work in progress, June 2012. 354 Authors' Addresses 356 Anoop Ghanwani 357 Dell 358 Email: anoop@alumni.duke.edu 360 Linda Dunbar 361 Huawei 362 Email: ldunbar@huawei.com 363 Vinay Bannai 364 Paypal 365 Email: vbannai@paypal.com 367 Ram Krishnan 368 Brocade 369 Email: ramk@brocade.com