| < draft-mcbride-armd-mcast-overview-00.txt | draft-mcbride-armd-mcast-overview-01.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force M. McBride | Internet Engineering Task Force M. McBride | |||
| Internet-Draft H. Lui | Internet-Draft H. Lui | |||
| Intended status: Informational Huawei Technologies | Intended status: Informational Huawei Technologies | |||
| Expires: September 4, 2012 March 3, 2012 | Expires: September 11, 2012 March 10, 2012 | |||
| Multicast in the Data Center Overview | Multicast in the Data Center Overview | |||
| draft-mcbride-armd-mcast-overview-00 | draft-mcbride-armd-mcast-overview-01 | |||
| Abstract | Abstract | |||
| There has been much interest in issues surrounding massive amounts of | There has been much interest in issues surrounding massive amounts of | |||
| hosts in the data center. There was a discussion, in ARMD, involving | hosts in the data center. There was a discussion, in ARMD, involving | |||
| the issues with address resolution for non ARP/ND multicast traffic | the issues with address resolution for non ARP/ND multicast traffic | |||
| in data centers with massive number of hosts. This document provides | in data centers with massive number of hosts. This document provides | |||
| a quick survey of multicast in the data center and should serve as an | a quick survey of multicast in the data center and should serve as an | |||
| aid to further discussion of issues related to large amounts of | aid to further discussion of issues related to large amounts of | |||
| multicast in the data center. | multicast in the data center. | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on September 4, 2012. | This Internet-Draft will expire on September 11, 2012. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Multicast Applications in the Data Center . . . . . . . . . . . 3 | 2. Multicast Applications in the Data Center . . . . . . . . . . 3 | |||
| 2.1. L3 Multicast Applications . . . . . . . . . . . . . . . . . 3 | 2.1. Client-Server Applications . . . . . . . . . . . . . . . . 3 | |||
| 2.2. L2 Multicast Applications . . . . . . . . . . . . . . . . . 4 | 2.2. Non Client-Server Multicast Applications . . . . . . . . . 4 | |||
| 3. L2 Multicast Protocols in the Data Center . . . . . . . . . . . 5 | 3. L2 Multicast Protocols in the Data Center . . . . . . . . . . 5 | |||
| 4. L3 Multicast solutions in the Data Center . . . . . . . . . . . 6 | 4. L3 Multicast solutions in the Data Center . . . . . . . . . . 6 | |||
| 5. Challenges of using multicast in the Data Center . . . . . . . 7 | 5. Challenges of using multicast in the Data Center . . . . . . . 7 | |||
| 6. Layer 3 / Layer 2 Topological Variations . . . . . . . . . . . 8 | 6. Layer 3 / Layer 2 Topological Variations . . . . . . . . . . . 8 | |||
| 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 | 7. Address Resolution . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 9 | 7.1. Solicited-node Multicast Addresses for IPv6 address | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 9 | resolution . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 10. Informative References . . . . . . . . . . . . . . . . . . . . 9 | 7.2. Direct Mapping for Multicast address resolution . . . . . 9 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | ||||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | ||||
| 11. Informative References . . . . . . . . . . . . . . . . . . . . 10 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 | ||||
| 1. Introduction | 1. Introduction | |||
| Data center servers often use IP Multicast to send data to clients or | Data center servers often use IP Multicast to send data to clients or | |||
| other application servers. IP Multicast is expected to help conserve | other application servers. IP Multicast is expected to help conserve | |||
| bandwidth in the data center and reduce the load on servers. | bandwidth in the data center and reduce the load on servers. | |||
| Increased reliance on multicast, in next generation data centers, | Increased reliance on multicast, in next generation data centers, | |||
| requires higher performance and capacity especially from the | requires higher performance and capacity especially from the | |||
| switches. If multicast is to continue to be used in the data center, | switches. If multicast is to continue to be used in the data center, | |||
| it must scale well within and between datacenters. There has been | it must scale well within and between datacenters. There has been | |||
| much interest in issues surrounding massive amounts of hosts in the | much interest in issues surrounding massive amounts of hosts in the | |||
| data center. There was a discussion, in ARMD, involving the issues | data center. There was a discussion, in ARMD, involving the issues | |||
| with address resolution for non ARP/ND multicast traffic in data | with address resolution for non ARP/ND multicast traffic in data | |||
| centers. This document provides a quick survey of multicast in the | centers. This document provides a quick survey of multicast in the | |||
| data center and should serve as an aid to further discussion of | data center and should serve as an aid to further discussion of | |||
| issues related to multicast in the data center. | issues related to multicast in the data center. | |||
| ARP/ND issues are not addressed in this document. ARP/ND issues are | ARP/ND issues are not addressed in this document except to explain | |||
| how address resolution occurs with multicast. ARP/ND issues are | ||||
| addressed in [I-D.armd-problem-statement] | addressed in [I-D.armd-problem-statement] | |||
| 2. Multicast Applications in the Data Center | 2. Multicast Applications in the Data Center | |||
| There are many data center operators who do not deploy Multicast in | There are many data center operators who do not deploy Multicast in | |||
| their networks for scalability and stability reasons. There are also | their networks for scalability and stability reasons. There are also | |||
| many operators for whom multicast is critical and is enabled on their | many operators for whom multicast is critical and is enabled on their | |||
| data center switches and routers. For this latter group, there are | data center switches and routers. For this latter group, there are | |||
| several uses of multicast in their data centers. An understanding of | several uses of multicast in their data centers. An understanding of | |||
| the uses of that multicast is important in order to properly support | the uses of that multicast is important in order to properly support | |||
| these applications in the ever evolving data centers. If, for | these applications in the ever evolving data centers. If, for | |||
| instance, the majority of the applications are discovering/signaling | instance, the majority of the applications are discovering/signaling | |||
| each other using multicast there may be better ways to support them | each other using multicast there may be better ways to support them | |||
| then using multicast. If, however, the multicasting of data is | then using multicast. If, however, the multicasting of data is | |||
| occurring in large volumes, there is a need for very good data center | occurring in large volumes, there is a need for very good data center | |||
| under/overlay multicast support. The applications either fall into | under/overlay multicast support. The applications either fall into | |||
| the category of those that leverage L2 multicast for discovery or of | the category of those that leverage L2 multicast for discovery or of | |||
| those that require L3 support and likely span multiple subnets. | those that require L3 support and likely span multiple subnets. | |||
| 2.1. L3 Multicast Applications | 2.1. Client-Server Applications | |||
| IPTV servers use multicast to deliver content from the data center to | IPTV servers use multicast to deliver content from the data center to | |||
| end users. IPTV is typically a one to many application where the | end users. IPTV is typically a one to many application where the | |||
| hosts are configured for IGMPv3, the switches are configured with | hosts are configured for IGMPv3, the switches are configured with | |||
| IGMP snooping, and the routers are running PIM-SSM mode. Often | IGMP snooping, and the routers are running PIM-SSM mode. Often | |||
| redundant servers are sending multicast streams into the network and | redundant servers are sending multicast streams into the network and | |||
| the network is forwarding the data across diverse paths. | the network is forwarding the data across diverse paths. | |||
| Windows Media servers send multicast streaming to clients. Windows | Windows Media servers send multicast streaming to clients. Windows | |||
| Media Services streams to an IP multicast address and all clients | Media Services streams to an IP multicast address and all clients | |||
| skipping to change at page 4, line 30 ¶ | skipping to change at page 4, line 31 ¶ | |||
| subscriber of a publication. With multicast publish/subscribe, only | subscriber of a publication. With multicast publish/subscribe, only | |||
| one message is sent, regardless of the number of subscribers. In a | one message is sent, regardless of the number of subscribers. In a | |||
| publish/subscribe system, client applications, some of which are | publish/subscribe system, client applications, some of which are | |||
| publishers and some of which are subscribers, are connected to a | publishers and some of which are subscribers, are connected to a | |||
| network of message brokers that receive publications on a number of | network of message brokers that receive publications on a number of | |||
| topics, and send the publications on to the subscribers for those | topics, and send the publications on to the subscribers for those | |||
| topics. The more subscribers there are in the publish/subscribe | topics. The more subscribers there are in the publish/subscribe | |||
| system, the greater the improvement to network utilization there | system, the greater the improvement to network utilization there | |||
| might be with multicast. | might be with multicast. | |||
| 2.2. Non Client-Server Multicast Applications | ||||
| With load balancing protocols, such as VRRP, routers communicate | With load balancing protocols, such as VRRP, routers communicate | |||
| within themselves using a multicast address. | within themselves using a multicast address. | |||
| Overlays may use IP multicast to virtualize L2 multicasts. VXLAN, | Overlays may use IP multicast to virtualize L2 multicasts. VXLAN, | |||
| for instance, is an encapsulation scheme to carry L2 frames over L3 | for instance, is an encapsulation scheme to carry L2 frames over L3 | |||
| networks. The VXLAN Tunnel End Point (VTEP) encapsulates frames | networks. The VXLAN Tunnel End Point (VTEP) encapsulates frames | |||
| inside an L3 tunnel. VXLANs are identified by a 24 bit VXLAN Network | inside an L3 tunnel. VXLANs are identified by a 24 bit VXLAN Network | |||
| Identifier (VNI). The VTEP maintains a table of known destination | Identifier (VNI). The VTEP maintains a table of known destination | |||
| MAC addresses, and stores the IP address of the tunnel to the remote | MAC addresses, and stores the IP address of the tunnel to the remote | |||
| VTEP to use for each. Unicast frames, between VMs, are sent directly | VTEP to use for each. Unicast frames, between VMs, are sent directly | |||
| to the unicast L3 address of the remote VTEP. Multicast frames are | to the unicast L3 address of the remote VTEP. Multicast frames are | |||
| sent to a multicast IP group associated with the VNI. Underlying IP | sent to a multicast IP group associated with the VNI. Underlying IP | |||
| Multicast protocols (PIM-SM/SSM/BIDIR) are used to forward multicast | Multicast protocols (PIM-SM/SSM/BIDIR) are used to forward multicast | |||
| data across the overlay. | data across the overlay. | |||
| 2.2. L2 Multicast Applications | ||||
| Applications, such as Ganglia, uses multicast for distributed | Applications, such as Ganglia, uses multicast for distributed | |||
| monitoring of computing systems such as clusters and grids. | monitoring of computing systems such as clusters and grids. | |||
| Windows Server, cluster node exchange, relies upon the use of | Windows Server, cluster node exchange, relies upon the use of | |||
| multicast heartbeats between servers. Only the other interfaces in | multicast heartbeats between servers. Only the other interfaces in | |||
| the same multicast group use the data. Unlike broadcast, multicast | the same multicast group use the data. Unlike broadcast, multicast | |||
| traffic does not need to be flooded throughout the network, reducing | traffic does not need to be flooded throughout the network, reducing | |||
| the chance that unnecessary CPU cycles are expended filtering traffic | the chance that unnecessary CPU cycles are expended filtering traffic | |||
| on nodes outside the cluster. As the number of nodes increases, the | on nodes outside the cluster. As the number of nodes increases, the | |||
| ability to replace several unicast messages with a single multicast | ability to replace several unicast messages with a single multicast | |||
| skipping to change at page 9, line 5 ¶ | skipping to change at page 9, line 5 ¶ | |||
| to restrain the unnecessary multicast stream flooding. | to restrain the unnecessary multicast stream flooding. | |||
| 6. Layer 3 / Layer 2 Topological Variations | 6. Layer 3 / Layer 2 Topological Variations | |||
| As discussed in [I-D.armd-problem-statement], there are a variety of | As discussed in [I-D.armd-problem-statement], there are a variety of | |||
| topological data center variations including L3 to Access Switches, | topological data center variations including L3 to Access Switches, | |||
| L3 to Aggregation Switches, and L3 in the Core only. Further | L3 to Aggregation Switches, and L3 in the Core only. Further | |||
| analysis is needed in order to understand how these variations affect | analysis is needed in order to understand how these variations affect | |||
| IP Multicast scalability | IP Multicast scalability | |||
| 7. Acknowledgements | 7. Address Resolution | |||
| 7.1. Solicited-node Multicast Addresses for IPv6 address resolution | ||||
| Solicited-node Multicast Addresses are used with IPv6 Neighbor | ||||
| Discovery to provide the same function as the Address Resolution | ||||
| Protocol (ARP) in IPv4. ARP uses broadcasts, to send an ARP | ||||
| Requests, which are received by all end hosts on the local link. | ||||
| Only the host being queried responds. However, the other hosts still | ||||
| have to process and discard the request. With IPv6, a host is | ||||
| required to join a Solicited-Node multicast group for each of its | ||||
| configured unicast or anycast addresses. Because a Solicited-node | ||||
| Multicast Address is a function of the last 24-bits of an IPv6 | ||||
| unicast or anycast address, the number of hosts that are subscribed | ||||
| to each Solicited-node Multicast Address would typically be one | ||||
| (there could be more because the mapping function is not a 1:1 | ||||
| mapping). Compared to ARP in IPv4, a host should not need to be | ||||
| interrupted as often to service Neighbor Solicitation requests. | ||||
| 7.2. Direct Mapping for Multicast address resolution | ||||
| With IPv4 unicast address resolution, the translation of an IP | ||||
| address to a MAC address is done dynamically by ARP. With multicast | ||||
| address resolution, the mapping from a multicast IP address to a | ||||
| multicast MAC address is derived from direct mapping. In IPv4, the | ||||
| mapping is done by assigning the low-order 23 bits of the multicast | ||||
| IP address to fill the low-order 23 bits of the multicast MAC | ||||
| address. When a host joins an IP multicast group, it instructs the | ||||
| data link layer to receive frames that match the MAC address that | ||||
| corresponds to the IP address of the multicast group. The data link | ||||
| layer filters the frames and passes frames with matching destination | ||||
| addresses to the IP module. Since the mapping from multicast IP | ||||
| address to a MAC address ignores 5 bits of the IP address, groups of | ||||
| 32 multicast IP addresses are mapped to the same MAC address. As a | ||||
| result a multicast MAC address cannot be uniquely mapped to a | ||||
| multicast IPv4 address. Planning is required within an organization | ||||
| to select IPv4 groups that are far enough away from each other as to | ||||
| not end up with the same L2 address used. Any multicast address in | ||||
| the [224-239].0.0.x and [224-239].128.0.x ranges should not be | ||||
| considered. When sending IPv6 multicast packets on an Ethernet link, | ||||
| the corresponding destination MAC address is a direct mapping of the | ||||
| last 32 bits of the 128 bit IPv6 multicast address into the 48 bit | ||||
| MAC address. It is possible for more than one IPv6 Multicast address | ||||
| to map to the same 48 bit MAC address. | ||||
| 8. Acknowledgements | ||||
| The authors would like to thank the many individuals who contributed | The authors would like to thank the many individuals who contributed | |||
| opinions on the ARMD wg mailing list about this topic: Linda Dunbar, | opinions on the ARMD wg mailing list about this topic: Linda Dunbar, | |||
| Anoop Ghanwani, Peter Ashwoodsmith, David Allan, Aldrin Isaac, Igor | Anoop Ghanwani, Peter Ashwoodsmith, David Allan, Aldrin Isaac, Igor | |||
| Gashinsky, Michael Smith, Patrick Frejborg, Joel Jaeggli and Thomas | Gashinsky, Michael Smith, Patrick Frejborg, Joel Jaeggli and Thomas | |||
| Narten. | Narten. | |||
| 8. IANA Considerations | 9. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 9. Security Considerations | 10. Security Considerations | |||
| No security considerations at this time. | No security considerations at this time. | |||
| 10. Informative References | 11. Informative References | |||
| [I-D.armd-problem-statement] | [I-D.armd-problem-statement] | |||
| Narten, T., Karir, M., and I. Foo, | Narten, T., Karir, M., and I. Foo, | |||
| "draft-ietf-armd-problem-statement", February 2012. | "draft-ietf-armd-problem-statement", February 2012. | |||
| [I-D.pim-umf-problem-statement] | [I-D.pim-umf-problem-statement] | |||
| Zhou, D., Deng, H., Shi, Y., Liu, H., and I. Bhattacharya, | Zhou, D., Deng, H., Shi, Y., Liu, H., and I. Bhattacharya, | |||
| "draft-dizhou-pim-umf-problem-statement", October 2010. | "draft-dizhou-pim-umf-problem-statement", October 2010. | |||
| [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | |||
| End of changes. 12 change blocks. | ||||
| 24 lines changed or deleted | 74 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||