idnits 2.17.1 draft-vyncke-6man-mcast-not-efficient-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2014) is 3686 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4941 (Obsoleted by RFC 8981) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force E. Vyncke, Ed. 3 Internet-Draft P. Thubert 4 Intended status: Informational E. Levy-Abegnoli 5 Expires: August 18, 2014 A. Yourtchenko 6 Cisco 7 February 14, 2014 9 Why Network-Layer Multicast is Not Always Efficient At Datalink Layer 10 draft-vyncke-6man-mcast-not-efficient-01 12 Abstract 14 Several IETF protocols (IPv6 Neighbor Discovery for example) rely on 15 IP multicast in the hope to be efficient with respect to available 16 bandwidth and to avoid generating interrupts in the network nodes. 17 On some datalink-layer network, for example IEEE 802.11 WiFi, this is 18 not the case because of some limitations in the services offered by 19 the datalink-layer network. This document lists and explains all the 20 potential issues when using network-layer multicast over some 21 datalink-layer networks. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on August 18, 2014. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 58 2. Issue on Wired Ethernet Network . . . . . . . . . . . . . . . 3 59 3. Issues on IEEE 802.11 Wireless Network . . . . . . . . . . . 4 60 3.1. Multicast over Wireless . . . . . . . . . . . . . . . . . 4 61 3.2. Host Sleep Mode . . . . . . . . . . . . . . . . . . . . . 6 62 3.3. Low Power WiFi Clients . . . . . . . . . . . . . . . . . 7 63 3.4. Vendor and Configuration Optimizations . . . . . . . . . 8 64 3.5. Even Unicast NDP is not Optimum . . . . . . . . . . . . . 8 65 4. Measuring the Amount of IPv6 Multicast . . . . . . . . . . . 9 66 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 67 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 68 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 69 8. Informative References . . . . . . . . . . . . . . . . . . . 9 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 10 72 1. Introduction 74 Several IETF protocols rely on the use of link-local scoped IP 75 multicast in the hope of reducing traffic over the underlying 76 datalink network and generating less operating systems interrupts for 77 the receiving nodes. For example, IPv6 Neighbor Discovery [RFC4861] 78 uses link-local multicast to: 80 o advertise the presence of a router by sending router advertisement 81 to IPv6 address link-local multicast address (LLMA), ff02::1, 82 whose members are only the IPv6 nodes but per [RFC4291] section 3 83 those messages must be forwarded on all ports. This IPv6 LLMA is 84 mapped to the Ethernet Multicast Address (EMA) 33:33:00:00:00:01; 86 o solicit the data-link layer address of an adjacent on-link node by 87 sending a neighbor solicitation to the solicited-node multicast 88 address corresponding to the target address such as 89 ff02:0:0:0:0:1:ffXX:XXXX (where the last 24 bits are the last 24 90 bits of the target address) as described in [RFC4291]. This IPv6 91 LLMA is mapped to the EMA 33:33:ff:XX:XX:XX. 93 2. Issue on Wired Ethernet Network 95 Most switch vendors implement MLD snooping [RFC4541] in order to 96 forward multicast frames only to switch ports where there is a member 97 of the IPv6 multicast group. This optimization works by installing 98 hardware forwarding states in the switch. As there is a finite 99 amount of memory in the switches, especially when the memory is used 100 by the data plane forwarding, there is also a limit to the number of 101 MLD optimization states i.e. a limit to the number of IPv6 multicast 102 groups that can be optimized by the switch; frames destined to groups 103 without such a state are flooded on all ports in the same datalink 104 domain, and generally the use of MLD snooping is reserved to groups 105 with a scope wider than link local. 107 With IPv6, all nodes have usually at least two IPv6 addresses: a 108 link-local and a global address. If both addresses are based on 109 EUI-64, then they share the same 24 least-significant bits, hence 110 there is only one solicited-node multicast address per node. Else, 111 there is a high probability that the 24 least-significant bits are 112 different, hence requiring the membership to two solicited-node 113 multicast addresses. If a switch uses MLD snooping to install 114 hardware-optimized multicast forwarding states for LLMA, then the 115 switch installs two hardware-optimized states per node as EUI-64 116 addresses are no more commonly used. If privacy extension addresses 117 [RFC4941] are used, then every node can have multiple IPv6 global 118 addresses, most of which are not based on EUI-64, a large switch 119 fabric will have to support multiple times more states for multicast 120 EMA than it does for unicast addresses, resulting in an excessive 121 amount of resources in each individual switch to be built at an 122 affordable price. 124 Therefore, due to cost reason, the multicast optimization by MLD 125 snooping of solicited-node LLMA is disabled on most Ethernet 126 switches. This means wasting: 128 o the switch bandwidth as it works as a full-duplex hub; 130 o the nodes CPU as all nodes will have to receive the multicast 131 frame (if their network adapter is not optimized to support MAC 132 multicast) and quickly drop it. 134 A special mention must be paid when a layer-2 domain includes legacy 135 devices working on at 10 Mbps half-duplex; for example, in hospitals 136 having old equipments dated back of 1990. For this case, it takes 137 only 100 300-byte frames per second to already utilize the media to 138 2.4 % not to mention that the NIC and the processor have to process 139 those frames and that the processor is probably also dated from 140 1990... 142 It is unclear what the impact is on virtual machines with different 143 MAC addresses and different IPv6 address connected with a virtual 144 layer-2 switch hosted on a single physical server... The MLD snooping 145 done by the virtual switch will consume CPU by the hypervisor, hence, 146 also reducing the amount of CPU available for the virtual machines. 148 Leveraging MLD snooping to save layer-2 switches from flooding link- 149 local multicast messages carries additional challenges. Unsolicited 150 MLD reports are usually sent once (when link comes up) and not 151 acknowledged. There exist a retransmission mechanism, but it is not 152 generally deployed, and it does not guarantee that subsequent 153 retransmission won't also get lost. The switch could easily end up 154 with incomplete forwarding states for a given group, with some of the 155 listeners ports, but not all (much worse than no state at all). As 156 the switch does not know one of its forwarding entry is incomplete, 157 it can't fall back to broadcasting. As ordinary MLD routers, the 158 switch could query reports on a periodic basis. However, it is not 159 practical for layer-2 access switches to send periodic general MLD 160 queries to maintain forwarding states accuracy for at least 2 161 reasons: 163 o The queries must be sourced with a link-local IPv6 address, one 164 per link, and, for many practical reasons, layer-2 switches don't 165 have such address on each link (vlan) they operate on. 167 o Since address resolution uses a multicast group, and may happen 168 quite frequently on the link, in order to avoid black holing 169 resolution, the interval for a switch to issue MLD general query 170 would have to be very small (a few seconds). These MLD queries 171 are themselves sent to a multicast group that all nodes would need 172 to get. That would completely defeat the purpose of reducing 173 multicast traffic towards end nodes. 175 3. Issues on IEEE 802.11 Wireless Network 177 3.1. Multicast over Wireless 179 Wireless networks are a shared half-duplex media: when one station 180 transmits, then all others must be silent. A multicast or broadcast 181 transmission from an AP is physically transmitted to all WiFi cliens 182 (STAs) and no other node can use the wireless medium at that time. 183 This is the first issue with the use of wireless for multicast: the 184 medium access behaves as a Ethernet hub. 186 Depending on distance and radio propagation, different wireless 187 clients may use different transmission encodings and data rates. A 188 lower data rate effectively locks the medium for a longer time per 189 bit. In order to reach all nodes, and considering that multicast and 190 broadcast frames are not protected by ARQ (retries), the AP is 191 constrained to transmit all multicast or broadcast frames at the 192 lowest rate possible, which in practice is often translated to rates 193 as low as 1 Mbps or 6 Mbps, even when the unicast rate can reach a 194 hundred of Mbps and above. It results that sending a single 195 multicast frame can consume as much bandwidth as dozens of unicast 196 frames. Table Table 1 provides some example values of the bandwidth 197 used by multicast frames transmitted from the AP (i.e. not counting 198 the original multicast frame transmitted by the WiFi client to the AP 199 when he source is effectively wireless). 201 +--------------+---------------+---------------+--------------------+ 202 | Lowest WiFi | Highest WiFi | Mcast frame | WiFi Utilization | 203 | rate | rate | %-age | by Mcast | 204 +--------------+---------------+---------------+--------------------+ 205 | 1 Mbps | 11 Mbps | 1 % | 9 % | 206 | 6 Mbps | 54 Mbps | 1 % | 9 % | 207 | 6 Mbps | 54 Mbps | 5 % | 45 % | 208 | 6 Mbps | 54 Mbps | 10 % | 90 % | 209 +--------------+---------------+---------------+--------------------+ 211 Table 1: Multicast WiFi Usage 213 If multiple APs cover the same wireless LAN, then the multicast 214 frames must be transmitted by all APs to all their WiFi clients. 216 Communication of a multicast frame by a WiFi client requires three 217 steps: 219 1. The WiFi client sends a datalink unicast frame to the AP at its 220 maximum possible rate. 222 2. The WiFi AP forwards this frame on its wired interface and 223 broadcasts it (as explained above) to all its WiFi clients. If 224 there are multiple APs on the same datalink domain, then, all APs 225 also broadcast this multicast frame to their WiFi clients. 227 3. A WiFi NIC that implements the STA in the client filters the 228 frames that are effectively expected by this device based on 229 destination address. 231 Another side effect of multicast frames is that there cannot be an 232 acknowledgement mechanism (ARQ) similar to that used for unicast 233 frame, therefore frames can be missed and NDP does not take this non 234 negligible packet loss into account. This could have a negative 235 impact for Duplicate Address Detection (DAD) if the multicast NS or 236 the multicast NA with override are lost. Assuming a error rate of 8% 237 of corrupted frame, this means a 8% chance of loosing a complete 238 frame, this means a 16% chance of not detecting a duplicate address. 240 For a well-distributed multicast group where relatively few devices 241 actually participate to any given group, there should be no 242 transmission at all if none of the clients expects the multicast 243 destination address, and there should be very few unicast but fast 244 transmissions to the limited set of interest STAs when there is 245 effectively a match in the set of associated devices. But there is 246 no mechanism in place to ensure that functionality. 248 3.2. Host Sleep Mode 250 When a sleeping host wakes up by a user interaction, it cannot 251 determine whether it has moved to another network (SSID are not 252 unique), hence, it has to send a multicast Router Solicitation (which 253 triggers a Router Advertisement message from all adjacent routers) 254 and the mobile host has to do Duplicate Address Detection for its 255 link-local and global addresses, thus means transmitting at least two 256 multicast Neighbour Solicitation messages which will be repeated by 257 the AP to all other WiFi clients. 259 This process creates a lot of multicast packets: 261 o one multicast Router Solicitation from the WiFi client, which is 262 received by the AP and if the AP is not optimized, then the Router 263 Solitication is broadcasted again over the wireless link; 265 o one multicast Neighbor Solitication for the host LLA from the WiFi 266 client, which is received by the AP and if the AP is not 267 optimized, the message is transmitted back over the wireless link; 269 o per global address (usually 1 or 2 depending on whether privacy 270 extension is active), same behavior as above. 272 In conclusion and in the good case of not having privacy extension, 273 this means 6 WiFi broadcast packets plus the unicast replies on each 274 wake-up of the device. Assuming a packet size of 80 bytes, this 275 translates into about 120 bytes to take into account the WiFi frame 276 format which is larger than the usual Ethernet frame, the table 277 Table 2 gives some result of the WiFi utilization just for the 278 multicast part of the wake-up of sleeping devices... This does not 279 take into account the rest of the multicast utilization used by RS, 280 RA, NS, NA, MLD, ... and the associated unicast traffic. 282 +---------+---------+------------+----------+---------+-------------+ 283 | WiFi | Wake-up | Mcast | Mcast | Lowest | Mcast | 284 | Clients | Cycle | packet/sec | bit/sec | WiFi | Utilization | 285 | | | | | Rate | | 286 +---------+---------+------------+----------+---------+-------------+ 287 | 100 | 600 sec | 1 | 960 bps | 1 Mbps | 0.1 % | 288 | 1 000 | 600 sec | 1 | 9600 bps | 1 Mbps | 1.0 % | 289 | 5 000 | 600 sec | 50 | 48 kbps | 1 Mbps | 4.8 % | 290 | 5 000 | 300 sec | 100 | 96 kbps | 1 Mbps | 9.6 % | 291 +---------+---------+------------+----------+---------+-------------+ 293 Table 2: Multicast WiFi Usage by Sleeping Devices 295 3.3. Low Power WiFi Clients 297 In order to save their batteries, Low Power (LP) hosts go into radio 298 sleep mode until there is a local need to send a wireless frame. 299 Before going into radio sleep mode, the LP hosts signal to the AP 300 that they are going into sleep; this allows the AP to store unicast 301 and multicast frames destined for those sleeping LP clients. LP 302 clients wake up periodically to listen to the WiFi beacon frames 303 transmitted periodically (default every 100 ms) because this beacon 304 frame contains a bit mask (Traffic Indication Map - TIM) indicating 305 for which STA there is waiting unicast traffic and whether there is 306 multicast traffic waiting. If there is multicast traffic waiting, 307 that ALL LP hosts must stay awake to receive all multicast frames 308 sent immediately after by the AP and process them. If there is a bit 309 indicating that unicast traffic is waiting for a specific LP host, 310 then only this LP host will stay awake to poll the AP later to 311 collect its traffic. The TIM maximum length is 2008 bits and the 312 complete beacon frame is less than 300 bytes long. 314 The table Table 2 indicates the ration of active/sleeping time for LP 315 hosts when multicast is present. In the absence of multicast 316 traffic, the radio is active only 2.4 % of the time while if there 317 are 50 multicast frames of 300 bytes per second, the radio is active 318 14.4 % of the time, nearly 6 times more often... with a battery life 319 probably reduced by 6... 321 +-------------+------------+---------------+------------+-----------+ 322 | Beacon | Mcast | Mcast frame | Lowest | Awake | 323 | frames/sec | frames/sec | size (bytes) | WiFi Rate | time/sec | 324 +-------------+------------+---------------+------------+-----------+ 325 | 10 | 0 | 300 bytes | 1 Mbps | 2.4 % | 326 | 10 | 5 | 300 bytes | 1 Mbps | 3.6 % | 327 | 10 | 10 | 300 bytes | 1 Mbps | 4.8 % | 328 | 10 | 50 | 300 bytes | 1 Mbps | 14.4 % | 329 +-------------+------------+---------------+------------+-----------+ 331 Table 3: Multicast WiFi Impact on Low Power Hosts 333 3.4. Vendor and Configuration Optimizations 335 Vendors have noticed the problem and have come with several 336 optimizations such as 338 o LP hosts not waking up the main processor when they are not member 339 of the multicast group; 341 o APs no transmitting back over radio received Router Sollication 342 multicast messages; 344 o ... 346 AP can also work in 'AP isolation mode' where there is no direct 347 traffic between WiFi clients, this mode has a positive side-effect 348 when a WiFi client transmits a multicast frame as this frame is 349 transmitted at the highest possible rate over the WiFi medium and the 350 AP will not re-transmit if back to all other WiFi clients at the 351 lowest rate. 353 3.5. Even Unicast NDP is not Optimum 355 While this is not directly related to the subject of this document, 356 it is worth mentioning anyway as this is important for devices 357 running on battery. 359 NDP cache needs to be maintained by refreshing the neighbor cache for 360 entries which are in the STALE state. This requires yet another 361 Neighbor Solicitation / Neighbor Advertisement round. Even if the 362 destination IP and MAC addresses are unicast, this traffic is 363 generated and again wakes up mobile devices. 365 4. Measuring the Amount of IPv6 Multicast 367 There are basically three ways to measure the amount of IPv6 368 multicast traffic: 370 o sniffing the traffic and generating statistics, somehow an 371 overkill: 373 o exporting IPfix data and doing aggregation on the ff02::/16 link- 374 local multicast prefix 376 o using SNMP to query on the AP the IP-MIB [RFC4293] with commands 377 such as: 379 * snmpwalk -c private -v 1 udp6:[2001:db8::1] -Ci -m IP-MIB 380 ifDesc: to get the interface names and index; 382 * snmpwalk -c private -v 1 udp6:[2001:db8::1] -Ci -m IP-MIB 383 ipIfStatsOutTransmits.ipv6: to get the global transmit counters 384 (i.e. unicast and multicast as there is no broadcast in IPv6); 386 * snmpwalk -c private -v 1 udp6:[2001:db8::1] -Ci -m IP-MIB 387 ipIfStatsOutMcastPkts.ipv6: to get the multicast packet 388 counter. 390 5. Acknowledgements 392 The authors would like to thank Norman Finn, Michel Fontaine, Steve 393 Simlo, Ole Troan, and Stig Venaas for their suggestions and comments. 395 6. IANA Considerations 397 This memo includes no request to IANA. 399 7. Security Considerations 401 The only security considerations about this document is that by 402 forcing a lot of traffic to be multicast, then, a denial of service 403 (DoS) attack could be mounted on available bandwidth and battery of 404 some network nodes. 406 8. Informative References 408 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 409 Architecture", RFC 4291, February 2006. 411 [RFC4293] Routhier, S., "Management Information Base for the 412 Internet Protocol (IP)", RFC 4293, April 2006. 414 [RFC4541] Christensen, M., Kimball, K., and F. Solensky, 415 "Considerations for Internet Group Management Protocol 416 (IGMP) and Multicast Listener Discovery (MLD) Snooping 417 Switches", RFC 4541, May 2006. 419 [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, 420 "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, 421 September 2007. 423 [RFC4941] Narten, T., Draves, R., and S. Krishnan, "Privacy 424 Extensions for Stateless Address Autoconfiguration in 425 IPv6", RFC 4941, September 2007. 427 [packet_loss] 428 Department of Computer Sciences, University of Wisconsin 429 Madison, USA, "Diagnosing Wireless Packet Losses in 430 802.11: Separating Collision from Weak Signal", 431 . 433 Authors' Addresses 435 Eric Vyncke (editor) 436 Cisco 437 De Kleetlaan, 6A 438 Diegem 1831 439 BE 441 Phone: +32 2 778 4677 442 Email: evyncke@cisco.com 444 Pascal Thubert 445 Cisco 446 Batiment D, 45 Allee des Ormes 447 MOUGINS, PROVENCE-ALPES-COTE D'AZUR 06250 448 France 450 Email: pthubert@cisco.com 452 Eric Levy-Abegnoli 453 Cisco 454 Batiment D, 45 Allee des Ormes 455 MOUGINS, PROVENCE-ALPES-COTE D'AZUR 06250 456 France 458 Email: elevyabe@cisco.com 459 Andrew Yourtchenko 460 Cisco 461 De Kleetlaan, 6A 462 Diegem 1831 463 BE 465 Phone: +32 2 704 5494 466 Email: ayourtch@cisco.com