idnits 2.17.1 draft-ghanwani-nvo3-mcast-issues-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 30, 2012) is 4165 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT A. Ghanwani 3 Intended Status: Informational Dell 4 Expires: May 29, 2013 November 30, 2012 6 Multicast Issues in Networks Using NVO3 7 draft-ghanwani-nvo3-mcast-issues-00 9 Status of this Memo 11 This Internet-Draft is submitted to IETF in full conformance with the 12 provisions of BCP 78 and BCP 79. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/1id-abstracts.html 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Copyright Notice 32 Copyright (c) 2012 IETF Trust and the persons identified as the 33 document authors. All rights reserved. 35 This document is subject to BCP 78 and the IETF Trust's Legal 36 Provisions Relating to IETF Documents 37 (http://trustee.ietf.org/license-info) in effect on the date of 38 publication of this document. Please review these documents 39 carefully, as they describe your rights and restrictions with respect 40 to this document. 42 Abstract 44 This memo discusses issues with supporting multicast traffic in a 45 network that uses Network Virtualization using Overlays over Layer 3. 46 It lists the various mechanisms that may be used for multicast and 47 describes some of the considerations. 49 Table of Contents 51 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 2. Multicast mechanisms in networks that use NVO3 . . . . . . . . 3 53 2.1 No multicast support . . . . . . . . . . . . . . . . . . . . 3 54 2.2 Replication at the source NVE . . . . . . . . . . . . . . . 4 55 2.3 Replication at a multicast service node . . . . . . . . . . 4 56 2.4 IP multicast in the underlay . . . . . . . . . . . . . . . . 5 57 2.5 Simultaneous use of more than one mechanism . . . . . . . . 6 58 3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 4 Security Considerations . . . . . . . . . . . . . . . . . . . . 6 60 5 IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 61 6 References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 6.1 Normative References . . . . . . . . . . . . . . . . . . . 6 63 6.2 Informative References . . . . . . . . . . . . . . . . . . 7 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 7 66 1 Introduction 68 Network virtualization using Overlays over Layer 3 (NVO3) is a 69 technology that is used to address issues that arise in building 70 large, multitenant data centers that make extensive use of server 71 virtualization [PS]. 73 This document is focused specifically on the problem of supporting 74 multicast in networks that use NVO3. Because of the requirement of 75 multi-destination delivery, multicast traffic poses some unique 76 challenges. 78 The reader is assumed to be familiar with the terminology as defined 79 in the NVO3 Framework document [FW]. 81 2. Multicast mechanisms in networks that use NVO3 83 In NVO3 environments, traffic between NVEs is transported using a 84 tunnel encapsulation such as VXLAN [VXLAN], NVGRE [NVGRE], STT [STT], 85 etc. 87 Besides the need to support the Address Resolution Protocol (ARP) and 88 Neighbor Discovery (ND), there are several applications that require 89 the support of multicast and/or broadcast in data centers [DC-MC]. 90 With NVO3, there are four possible ways that multicast may be handled 91 in such networks. 93 1. No multicast support. 94 2. Replication at the source NVE. 95 3. Replication at a multicast service node. 96 4. IP Multicast in the underlay. 98 These mechanisms are briefly mentioned in the NVO3 Framework [FW] 99 document. This document attempts to fill in some more details about 100 the basic mechanisms underlying each of these mechanisms and 101 discusses the issues and tradeoffs of each. 103 2.1 No multicast support 105 In this scenario, there is no support whatsoever for multicast 106 traffic when using the overlay. This can only work if the following 107 conditions are met: 109 1. All of the traffic is unicast. 110 2. An oracle is used at the NVE to determine the MAC 111 address-to-NVE mapping and to determine the MAC 112 address-to-IP address bindings. In other words, 113 there is no data plane learning, and address resolution 114 requests via ARP/ND that are issued by the VMs must be 115 resolved by the NVE that they are attached to. 117 With this approach, certain multicast/broadcast applications such as 118 DHCP can be supported by use of a helper function in the NVE. 120 The main issues that need to be addressed with this mechanism are the 121 handling of hosts for which a mapping does not already exist in the 122 oracle. This issue can be particularly challenging if such end 123 systems are reachable through more than one NVE. 125 2.2 Replication at the source NVE 127 With this method, the overlay attempts to provide a multicast service 128 without requiring any specific support from the underlay, other than 129 that of a unicast service. A multicast or broadcast transmission is 130 achieved by replicating the packet at the source NVE, and making 131 copies, one for each destination NVE that the multicast packet must 132 be sent to. 134 For this mechanism to work, the source NVE must know, a priori, the 135 IP addresses of all destination NVEs that need to receive the packet. 136 For example, in the case of an ARP broadcast or an ND multicast, the 137 source NVE must know the IP addresses of all the remote NVEs where 138 there are members of the tenant subnet in question. 140 The obvious drawback with this method is that we have multiple copies 141 of the same packet that will traverse any common links that are along 142 the path to each of the destination NVEs. If, for example, a tenant 143 subnet is spread across 50 NVEs, the packet would have to be 144 replicated 50 times at the source NVE. This also creates an issue 145 with the forwarding performance of the NVE, especially if it is 146 implemented in software. 148 Note that this method is similar to what was used in VPLS [VPLS] 149 prior to extensive support of MPLS multicast [MPLS-MC]. 151 2.3 Replication at a multicast service node 153 With this method, all multicast packets would be sent using a unicast 154 tunnel encapsulation to a multicast service node. The multicast 155 service node, in turn, would create multiple copies of the packet and 156 would deliver a copy, using a unicast tunnel encapsulation, to each 157 of the NVEs that are part of the multicast group for which the packet 158 is intended. 160 This mechanism is similar to that used by the ATM Forum's LAN 161 Emulation [LANE] specification [LANE]. 163 Unlike the method described in Section 2.2, there is no performance 164 impact at the ingress NVE, nor are there any issues with multiple 165 copies of the same packet from the source NVE to the multicast 166 service node. However there remain issues with multiple copies of 167 the same packet on links that are common to the paths from the 168 multicast service node to each of the egress NVEs. Additional issues 169 that are introduced with this method include the availability of the 170 multicast service node, methods to scale the services offered by the 171 multicast service node, and the sub-optimality of the delivery paths. 173 Finally, the IP address of the source NVE must be preserved in packet 174 copies created at the multicast service node if data plane learning 175 is in use. This could create problems if IP source address reverse 176 path forwarding (RPF) checks are in use. 178 2.4 IP multicast in the underlay 180 In this method, the underlay supports IP multicast and the ingress 181 NVE encapsulates the packet with the appropriate IP multicast address 182 in the tunnel encapsulation header for delivery to the desired set of 183 NVEs. The protocol in the underlay could be any variant of Protocol 184 Independent Multicast (PIM). 186 With this method, there are none of the issues with the methods 187 described in Sections 2.2. 189 With PIM Sparse Mode (PIM-SM), the number of flows required would be 190 (n*g), where n is the number of source NVEs that source packets for 191 the group, and g is the number of groups. Bidirectional PIM (BIDIR- 192 PIM) would offer better scalability with the number of flows required 193 being g. 195 In the absence of any additional mechanism, e.g. using an oracle for 196 address resolution, for optimal delivery, there would have to be a 197 separate group for each tenant, plus a separate group for each 198 multicast address (used for multicast applications) within a tenant. 199 Additional considerations are that only the lower 23 bits of the IP 200 address (regardless of whether IPv4 or IPv6 is in use) are mapped to 201 the outer MAC address, and if there is equipment that prunes 202 multicasts at Layer 2, there will be some aliasing. Finally, a 203 mechanism to efficiently provision such addresses for each group 204 would be required. 206 There are additional optimizations which are possible, but they come 207 with their own restrictions. For example, a set of tenants may be 208 restricted to some subset of NVEs and they could all share the same 209 outer IP multicast group address. This however introduces a problem 210 of sub-optimal delivery (even if a particular tenant within the group 211 of tenants doesn't have a presence on one of the NVEs which another 212 one does, the former's multicast packets would still be delivered to 213 that NVE). It also introduces an additional network management 214 burden to optimize which tenants should be part of the same tenant 215 group (based on the NVEs they share), which somewhat dilutes the 216 value proposition of NVO3 which is to completely decouple the overlay 217 and physical network design allowing complete freedom of placement of 218 VMs anywhere within the data center. 220 2.5 Simultaneous use of more than one mechanism 222 While the mechanisms discussed in the previous section have been 223 discussed individually, it is possible for implementations to rely on 224 more than one of these. For example, the method of Section 2.1 could 225 be used for minimizing ARP/ND, while at the same time, multicast 226 applications may be supported by one, or a combination of, the other 227 methods. For small multicast groups, the methods of source NVE 228 replication or the use of a multicast service node may be attractive, 229 while for larger multicast groups, the use of multicast in the 230 underlay may be preferable. 232 3 Summary 234 This document has identified various mechanisms for supporting 235 multicast in networks that use NVO3. It highlights the basics of 236 each mechanism and some of the issues with them. As solutions are 237 developed, the protocols would need to consider the use of these 238 mechanisms and co-existence may be a consideration. 240 4 Security Considerations 242 This is an informational document, and as such, does not introduce 243 any new security considerations beyond what may be present in 244 proposed solutions. 246 5 IANA Considerations 248 This draft does not have any IANA considerations. 250 6 References 252 6.1 Normative References 254 [PS] Lasserre, M. et al., "Framework for DC network 255 virtualization", work in progress. 257 [FW] Narten, T. et al., "Problem statement: Overlays 258 for network virtualization", work in progress. 260 6.2 Informative References 262 [VXLAN] Mahalingam, M. et al., "VXLAN: A framework for 263 overlaying virtualized Layer 2 networks over Layer 3 264 networks", work in progress. 266 [NVGRE] Sridharan, M. et al., "NVGRE: Network virtualization 267 using Generic Routing Encapsulation", work in progress. 269 [STT] Davie, B. and Gross J., "A stateless transport 270 tunneling protocol for network virtualization", 271 work in progress. 273 [DC-MC] McBride M., and Lui, H., "Multicast in the data 274 center overview", work in progress. 276 [VPLS] Lasserre, M., and Kompella, V. (Eds), "Virtual Private 277 LAN Service (VPLS) using Label Distribution Protocol 278 (LDP) signaling", RFC 4762, January 2007. 280 [MPLS-MC] Aggarwal, R. et al., "Multicast in VPLS", work in 281 progress. 283 [LANE] "LAN emulation over ATM", The ATM Forum, 284 af-lane-0021.000, January 1995. 286 Authors' Addresses 288 Anoop Ghanwani 289 Dell 290 350 Holger Way 291 San Jose, CA 95134 293 Phone: +1-408-571-3228 294 Email: anoop@alumni.duke.edu