| < draft-ietf-mboned-dc-deploy-06.txt | draft-ietf-mboned-dc-deploy-07.txt > | |||
|---|---|---|---|---|
| MBONED M. McBride | MBONED M. McBride | |||
| Internet-Draft Futurewei | Internet-Draft Futurewei | |||
| Intended status: Informational O. Komolafe | Intended status: Informational O. Komolafe | |||
| Expires: December 8, 2019 Arista Networks | Expires: January 24, 2020 Arista Networks | |||
| June 6, 2019 | July 23, 2019 | |||
| Multicast in the Data Center Overview | Multicast in the Data Center Overview | |||
| draft-ietf-mboned-dc-deploy-06 | draft-ietf-mboned-dc-deploy-07 | |||
| Abstract | Abstract | |||
| The volume and importance of one-to-many traffic patterns in data | The volume and importance of one-to-many traffic patterns in data | |||
| centers is likely to increase significantly in the future. Reasons | centers is likely to increase significantly in the future. Reasons | |||
| for this increase are discussed and then attention is paid to the | for this increase are discussed and then attention is paid to the | |||
| manner in which this traffic pattern may be judiously handled in data | manner in which this traffic pattern may be judiously handled in data | |||
| centers. The intuitive solution of deploying conventional IP | centers. The intuitive solution of deploying conventional IP | |||
| multicast within data centers is explored and evaluated. Thereafter, | multicast within data centers is explored and evaluated. Thereafter, | |||
| a number of emerging innovative approaches are described before a | a number of emerging innovative approaches are described before a | |||
| skipping to change at page 1, line 38 ¶ | skipping to change at page 1, line 38 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on December 8, 2019. | This Internet-Draft will expire on January 24, 2020. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 16 ¶ | skipping to change at page 2, line 16 ¶ | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Reasons for increasing one-to-many traffic patterns . . . . . 3 | 2. Reasons for increasing one-to-many traffic patterns . . . . . 3 | |||
| 2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 | 2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3. Handling one-to-many traffic using conventional multicast . . 6 | 2.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 6 | 3. Handling one-to-many traffic using conventional multicast . . 7 | |||
| 3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 8 | 3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 9 | ||||
| 3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 | 3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 | |||
| 4. Alternative options for handling one-to-many traffic . . . . 9 | 4. Alternative options for handling one-to-many traffic . . . . 10 | |||
| 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 9 | 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 11 | |||
| 4.2. Head end replication . . . . . . . . . . . . . . . . . . 10 | 4.2. Head end replication . . . . . . . . . . . . . . . . . . 12 | |||
| 4.3. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 4.3. Programmable Forwarding Planes . . . . . . . . . . . . . 12 | |||
| 4.4. Segment Routing . . . . . . . . . . . . . . . . . . . . . 12 | 4.4. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 4.5. Segment Routing . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 13 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 13 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 | 9.2. Informative References . . . . . . . . . . . . . . . . . 16 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 | ||||
| 1. Introduction | 1. Introduction | |||
| The volume and importance of one-to-many traffic patterns in data | The volume and importance of one-to-many traffic patterns in data | |||
| centers is likely to increase significantly in the future. Reasons | centers is likely to increase significantly in the future. Reasons | |||
| for this increase include the nature of the traffic generated by | for this increase include the nature of the traffic generated by | |||
| applications hosted in the data center, the need to handle broadcast, | applications hosted in the data center, the need to handle broadcast, | |||
| unknown unicast and multicast (BUM) traffic within the overlay | unknown unicast and multicast (BUM) traffic within the overlay | |||
| technologies used to support multi-tenancy at scale, and the use of | technologies used to support multi-tenancy at scale, and the use of | |||
| certain protocols that traditionally require one-to-many control | certain protocols that traditionally require one-to-many control | |||
| message exchanges. These trends, allied with the expectation that | message exchanges. | |||
| future highly virtualized data centers must support communication | ||||
| These trends, allied with the expectation that future highly | ||||
| virtualized large-scale data centers must support communication | ||||
| between potentially thousands of participants, may lead to the | between potentially thousands of participants, may lead to the | |||
| natural assumption that IP multicast will be widely used in data | natural assumption that IP multicast will be widely used in data | |||
| centers, specifically given the bandwidth savings it potentially | centers, specifically given the bandwidth savings it potentially | |||
| offers. However, such an assumption would be wrong. In fact, there | offers. However, such an assumption would be wrong. In fact, there | |||
| is widespread reluctance to enable IP multicast in data centers for a | is widespread reluctance to enable conventional IP multicast in data | |||
| number of reasons, mostly pertaining to concerns about its | centers for a number of reasons, mostly pertaining to concerns about | |||
| scalability and reliability. | its scalability and reliability. | |||
| This draft discusses some of the main drivers for the increasing | This draft discusses some of the main drivers for the increasing | |||
| volume and importance of one-to-many traffic patterns in data | volume and importance of one-to-many traffic patterns in data | |||
| centers. Thereafter, the manner in which conventional IP multicast | centers. Thereafter, the manner in which conventional IP multicast | |||
| may be used to handle this traffic pattern is discussed and some of | may be used to handle this traffic pattern is discussed and some of | |||
| the associated challenges highlighted. Following this discussion, a | the associated challenges highlighted. Following this discussion, a | |||
| number of alternative emerging approaches are introduced, before | number of alternative emerging approaches are introduced, before | |||
| concluding by discussing key trends and making a number of | concluding by discussing key trends and making a number of | |||
| recommendations. | recommendations. | |||
| skipping to change at page 4, line 7 ¶ | skipping to change at page 4, line 11 ¶ | |||
| requirement for robustness, stability and predicability has meant the | requirement for robustness, stability and predicability has meant the | |||
| TV broadcast industry has traditionally used TV-specific protocols, | TV broadcast industry has traditionally used TV-specific protocols, | |||
| infrastructure and technologies for transmitting video signals | infrastructure and technologies for transmitting video signals | |||
| between end points such as cameras, monitors, mixers, graphics | between end points such as cameras, monitors, mixers, graphics | |||
| devices and video servers. However, the growing cost and complexity | devices and video servers. However, the growing cost and complexity | |||
| of supporting this approach, especially as the bit rates of the video | of supporting this approach, especially as the bit rates of the video | |||
| signals increase due to demand for formats such as 4K-UHD and 8K-UHD, | signals increase due to demand for formats such as 4K-UHD and 8K-UHD, | |||
| means there is a consensus that the TV broadcast industry will | means there is a consensus that the TV broadcast industry will | |||
| transition from industry-specific transmission formats (e.g. SDI, | transition from industry-specific transmission formats (e.g. SDI, | |||
| HD-SDI) over TV-specific infrastructure to using IP-based | HD-SDI) over TV-specific infrastructure to using IP-based | |||
| infrastructure. The development of pertinent standards by the SMPTE, | infrastructure. The development of pertinent standards by the | |||
| along with the increasing performance of IP routers, means this | Society of Motion Picture and Television Engineers (SMPTE) | |||
| transition is gathering pace. A possible outcome of this transition | [SMPTE2110], along with the increasing performance of IP routers, | |||
| will be the building of IP data centers in broadcast plants. Traffic | means this transition is gathering pace. A possible outcome of this | |||
| flows in the broadcast industry are frequently one-to-many and so if | transition will be the building of IP data centers in broadcast | |||
| IP data centers are deployed in broadcast plants, it is imperative | plants. Traffic flows in the broadcast industry are frequently one- | |||
| that this traffic pattern is supported efficiently in that | to-many and so if IP data centers are deployed in broadcast plants, | |||
| infrastructure. In fact, a pivotal consideration for broadcasters | it is imperative that this traffic pattern is supported efficiently | |||
| considering transitioning to IP is the manner in which these one-to- | in that infrastructure. In fact, a pivotal consideration for | |||
| many traffic flows will be managed and monitored in a data center | broadcasters considering transitioning to IP is the manner in which | |||
| with an IP fabric. | these one-to-many traffic flows will be managed and monitored in a | |||
| data center with an IP fabric. | ||||
| One of the few success stories in using conventional IP multicast has | One of the few success stories in using conventional IP multicast has | |||
| been for disseminating market trading data. For example, IP | been for disseminating market trading data. For example, IP | |||
| multicast is commonly used today to deliver stock quotes from the | multicast is commonly used today to deliver stock quotes from stock | |||
| stock exchange to financial services provider and then to the stock | exchanges to financial service providers and then to the stock | |||
| analysts or brokerages. The network must be designed with no single | analysts or brokerages. It is essential that the network | |||
| point of failure and in such a way that the network can respond in a | infrastructure delivers very low latency and high throughout, | |||
| deterministic manner to any failure. Typically, redundant servers | especially given the proliferation of automated and algorithmic | |||
| (in a primary/backup or live-live mode) send multicast streams into | trading which means stock analysts or brokerages may gain an edge on | |||
| the network, with diverse paths being used across the network. | competitors simply by receiving an update a few milliseconds earlier. | |||
| Another critical requirement is reliability and traceability; | As would be expected, in such deployments reliability is critical. | |||
| regulatory and legal requirements means that the producer of the | The network must be designed with no single point of failure and in | |||
| marketing data may need to know exactly where the flow was sent and | such a way that it can respond in a deterministic manner to failure. | |||
| be able to prove conclusively that the data was received within | Typically, redundant servers (in a primary/backup or live-live mode) | |||
| agreed SLAs. The stock exchange generating the one-to-many traffic | send multicast streams into the network, with diverse paths being | |||
| and stock analysts/brokerage that receive the traffic will typically | used across the network. The stock exchange generating the one-to- | |||
| have their own data centers. Therefore, the manner in which one-to- | many traffic and stock analysts/brokerage that receive the traffic | |||
| many traffic patterns are handled in these data centers are extremely | will typically have their own data centers. Therefore, the manner in | |||
| important, especially given the requirements and constraints | which one-to-many traffic patterns are handled in these data centers | |||
| mentioned. | are extremely important, especially given the requirements and | |||
| constraints mentioned. | ||||
| Many data center cloud providers provide publish and subscribe | Another reason for the growing volume of one-to-many traffic patterns | |||
| applications. There can be numerous publishers and subscribers and | in modern data centers is the increasing adoption of streaming | |||
| many message channels within a data center. With publish and | telemetry. This transition is motivated by the observation that | |||
| subscribe servers, a separate message is sent to each subscriber of a | traditional poll-based approaches for monitoring network devices are | |||
| publication. With multicast publish/subscribe, only one message is | usually inadequate in modern data centers. These approaches | |||
| sent, regardless of the number of subscribers. In a publish/ | typically suffer from poor scalability, extensibility and | |||
| subscribe system, client applications, some of which are publishers | responsiveness. In contrast, in streaming telemetry, network devices | |||
| and some of which are subscribers, are connected to a network of | in the data center stream highly-granular real-time updates to a | |||
| message brokers that receive publications on a number of topics, and | telemetry collector/database. This collector then collates, | |||
| send the publications on to the subscribers for those topics. The | normalizes and encodes this data for convenient consumption by | |||
| more subscribers there are in the publish/subscribe system, the | monitoring applications. The montoring applications can subscribe to | |||
| greater the improvement to network utilization there might be with | the notifications of interest, allowing them to gain insight into | |||
| multicast. | pertinent state and performance metrics. Thus, the traffic flows | |||
| associated with streaming telemetry are typically many-to-one between | ||||
| the network devices and the telemetry collector and then one-to-many | ||||
| from the collector to the monitoring applications. | ||||
| The use of publish and subscribe applications is growing within data | ||||
| centers, contributing to the rising volume of one-to-many traffic | ||||
| flows. Such applications are attractive as they provide a robust | ||||
| low-latency asynchronous messaging service, allowing senders to be | ||||
| decoupled from receivers. The usual approach is for a publisher to | ||||
| create and transmit a message to a specific topic. The publish and | ||||
| subscribe application will retain the message and ensure it is | ||||
| delivered to all subscribers to that topic. The flexibility in the | ||||
| number of publishers and subscribers to a specific topic means such | ||||
| applications cater for one-to-one, one-to-many and many-to-one | ||||
| traffic patterns. | ||||
| 2.2. Overlays | 2.2. Overlays | |||
| The proposed architecture for supporting large-scale multi-tenancy in | Another key contributor to the rise in one-to-many traffic patterns | |||
| highly virtualized data centers [RFC8014] consists of a tenant's VMs | is the proposed architecture for supporting large-scale multi-tenancy | |||
| distributed across the data center connected by a virtual network | in highly virtualized data centers [RFC8014]. In this architecture, | |||
| known as the overlay network. A number of different technologies | a tenant's VMs are distributed across the data center and are | |||
| have been proposed for realizing the overlay network, including VXLAN | connected by a virtual network known as the overlay network. A | |||
| [RFC7348], VXLAN-GPE [I-D.ietf-nvo3-vxlan-gpe], NVGRE [RFC7637] and | number of different technologies have been proposed for realizing the | |||
| GENEVE [I-D.ietf-nvo3-geneve]. The often fervent and arguably | overlay network, including VXLAN [RFC7348], VXLAN-GPE [I-D.ietf-nvo3- | |||
| partisan debate about the relative merits of these overlay | vxlan-gpe], NVGRE [RFC7637] and GENEVE [I-D.ietf-nvo3-geneve]. The | |||
| technologies belies the fact that, conceptually, it may be said that | often fervent and arguably partisan debate about the relative merits | |||
| these overlays typically simply provide a means to encapsulate and | of these overlay technologies belies the fact that, conceptually, it | |||
| tunnel Ethernet frames from the VMs over the data center IP fabric, | may be said that these overlays mainly simply provide a means to | |||
| thus emulating a layer 2 segment between the VMs. Consequently, the | encapsulate and tunnel Ethernet frames from the VMs over the data | |||
| VMs believe and behave as if they are connected to the tenant's other | center IP fabric, thus emulating a Layer 2 segment between the VMs. | |||
| VMs by a conventional layer 2 segment, regardless of their physical | Consequently, the VMs believe and behave as if they are connected to | |||
| location within the data center. Naturally, in a layer 2 segment, | the tenant's other VMs by a conventional Layer 2 segment, regardless | |||
| point to multi-point traffic can result from handling BUM (broadcast, | of their physical location within the data center. | |||
| unknown unicast and multicast) traffic. And, compounding this issue | ||||
| within data centers, since the tenant's VMs attached to the emulated | Naturally, in a Layer 2 segment, point to multi-point traffic can | |||
| segment may be dispersed throughout the data center, the BUM traffic | result from handling BUM (broadcast, unknown unicast and multicast) | |||
| may need to traverse the data center fabric. Hence, regardless of | traffic. And, compounding this issue within data centers, since the | |||
| the overlay technology used, due consideration must be given to | tenant's VMs attached to the emulated segment may be dispersed | |||
| handling BUM traffic, forcing the data center operator to consider | throughout the data center, the BUM traffic may need to traverse the | |||
| the manner in which one-to-many communication is handled within the | data center fabric. | |||
| IP fabric. | ||||
| Hence, regardless of the overlay technology used, due consideration | ||||
| must be given to handling BUM traffic, forcing the data center | ||||
| operator to pay attention to the manner in which one-to-many | ||||
| communication is handled within the data center. And this | ||||
| consideration is likely to become increasingly important with the | ||||
| anticipated rise in the number and importance of overlays. In fact, | ||||
| it may be asserted that the manner in which one-to-many | ||||
| communications arising from overlays is handled is pivotal to the | ||||
| performance and stability of the entire data center network. | ||||
| 2.3. Protocols | 2.3. Protocols | |||
| Conventionally, some key networking protocols used in data centers | Conventionally, some key networking protocols used in data centers | |||
| require one-to-many communication. For example, ARP and ND use | require one-to-many communications for control messages. Thus, the | |||
| broadcast and multicast messages within IPv4 and IPv6 networks | data center operator must pay due attention to how these control | |||
| respectively to discover MAC address to IP address mappings. | message exchanges are supported. | |||
| Furthermore, when these protocols are running within an overlay | ||||
| network, then it essential to ensure the messages are delivered to | ||||
| all the hosts on the emulated layer 2 segment, regardless of physical | ||||
| location within the data center. The challenges associated with | ||||
| optimally delivering ARP and ND messages in data centers has | ||||
| attracted lots of attention [RFC6820]. Popular approaches in use | ||||
| mostly seek to exploit characteristics of data center networks to | ||||
| avoid having to broadcast/multicast these messages, as discussed in | ||||
| Section 4.1. | ||||
| There are networking protocols that are being modified/developed to | For example, ARP [RFC0826] and ND [RFC4861] use broadcast and | |||
| specifically target working in a data center CLOS environment. BGP | multicast messages within IPv4 and IPv6 networks respectively to | |||
| has been extended to work in these type of DC environments and well | discover MAC address to IP address mappings. Furthermore, when these | |||
| supports multicast. RIFT (Routing in Fat Trees) is a new protocol | protocols are running within an overlay network, it essential to | |||
| being developed to work efficiently in DC CLOS environments and also | ensure the messages are delivered to all the hosts on the emulated | |||
| is being specified to support multicast addressing and forwarding. | Layer 2 segment, regardless of physical location within the data | |||
| center. The challenges associated with optimally delivering ARP and | ||||
| ND messages in data centers has attracted lots of attention | ||||
| [RFC6820]. | ||||
| Another example of a protocol that may neccessitate having one-to- | ||||
| many traffic flows in the data center is IGMP [RFC2236], [RFC3376]. | ||||
| If the VMs attached to the Layer 2 segment wish to join a multicast | ||||
| group they must send IGMP reports in response to queries from the | ||||
| querier. As these devices could be located at different locations | ||||
| within the data center, there is the somewhat ironic prospect of IGMP | ||||
| itself leading to an increase in the volume of one-to-many | ||||
| communications in the data center. | ||||
| 2.4. Summary | ||||
| Section 2.1, Section 2.2 and Section 2.3 have discussed how the | ||||
| trends in the types of applications, the overlay technologies used | ||||
| and some of the essential networking protocols results in an increase | ||||
| in the volume of one-to-many traffic patterns in modern highly- | ||||
| virtualized data centers. Section 3 explores how such traffic flows | ||||
| may be handled using conventional IP multicast. | ||||
| 3. Handling one-to-many traffic using conventional multicast | 3. Handling one-to-many traffic using conventional multicast | |||
| Faced with ever increasing volumes of one-to-many traffic flows for | ||||
| the reasons presented in Section 2, arguably the intuitive initial | ||||
| course of action for a data center operator is to explore if and how | ||||
| conventional IP multicast could be deployed within the data center. | ||||
| This section introduces the key protocols, discusses some example use | ||||
| cases where they are deployed in data centers and discusses some of | ||||
| the advantages and disadvantages of such deployments. | ||||
| 3.1. Layer 3 multicast | 3.1. Layer 3 multicast | |||
| PIM is the most widely deployed multicast routing protocol and so, | PIM is the most widely deployed multicast routing protocol and so, | |||
| unsurprisingly, is the primary multicast routing protocol considered | unsurprisingly, is the primary multicast routing protocol considered | |||
| for use in the data center. There are three potential popular modes | for use in the data center. There are three potential popular modes | |||
| of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] or PIM- | of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] or PIM- | |||
| BIDIR [RFC5015]. It may be said that these different modes of PIM | BIDIR [RFC5015]. It may be said that these different modes of PIM | |||
| tradeoff the optimality of the multicast forwarding tree for the | tradeoff the optimality of the multicast forwarding tree for the | |||
| amount of multicast forwarding state that must be maintained at | amount of multicast forwarding state that must be maintained at | |||
| routers. SSM provides the most efficient forwarding between sources | routers. SSM provides the most efficient forwarding between sources | |||
| skipping to change at page 6, line 48 ¶ | skipping to change at page 8, line 5 ¶ | |||
| With IPv4 unicast address resolution, the translation of an IP | With IPv4 unicast address resolution, the translation of an IP | |||
| address to a MAC address is done dynamically by ARP. With multicast | address to a MAC address is done dynamically by ARP. With multicast | |||
| address resolution, the mapping from a multicast IPv4 address to a | address resolution, the mapping from a multicast IPv4 address to a | |||
| multicast MAC address is done by assigning the low-order 23 bits of | multicast MAC address is done by assigning the low-order 23 bits of | |||
| the multicast IPv4 address to fill the low-order 23 bits of the | the multicast IPv4 address to fill the low-order 23 bits of the | |||
| multicast MAC address. Each IPv4 multicast address has 28 unique | multicast MAC address. Each IPv4 multicast address has 28 unique | |||
| bits (the multicast address range is 224.0.0.0/12) therefore mapping | bits (the multicast address range is 224.0.0.0/12) therefore mapping | |||
| a multicast IP address to a MAC address ignores 5 bits of the IP | a multicast IP address to a MAC address ignores 5 bits of the IP | |||
| address. Hence, groups of 32 multicast IP addresses are mapped to | address. Hence, groups of 32 multicast IP addresses are mapped to | |||
| the same MAC address. And so a a multicast MAC address cannot be | the same MAC address. And so a multicast MAC address cannot be | |||
| uniquely mapped to a multicast IPv4 address. Therefore, planning is | uniquely mapped to a multicast IPv4 address. Therefore, IPv4 | |||
| required within an organization to choose IPv4 multicast addresses | multicast addresses must be chosen judiciously in order to avoid | |||
| judiciously in order to avoid address aliasing. When sending IPv6 | unneccessary address aliasing. When sending IPv6 multicast packets | |||
| multicast packets on an Ethernet link, the corresponding destination | on an Ethernet link, the corresponding destination MAC address is a | |||
| MAC address is a direct mapping of the last 32 bits of the 128 bit | direct mapping of the last 32 bits of the 128 bit IPv6 multicast | |||
| IPv6 multicast address into the 48 bit MAC address. It is possible | address into the 48 bit MAC address. It is possible for more than | |||
| for more than one IPv6 multicast address to map to the same 48 bit | one IPv6 multicast address to map to the same 48 bit MAC address. | |||
| MAC address. | ||||
| The default behaviour of many hosts (and, in fact, routers) is to | The default behaviour of many hosts (and, in fact, routers) is to | |||
| block multicast traffic. Consequently, when a host wishes to join an | block multicast traffic. Consequently, when a host wishes to join an | |||
| IPv4 multicast group, it sends an IGMP [RFC2236], [RFC3376] report to | IPv4 multicast group, it sends an IGMP [RFC2236], [RFC3376] report to | |||
| the router attached to the layer 2 segment and also it instructs its | the router attached to the Layer 2 segment and also it instructs its | |||
| data link layer to receive Ethernet frames that match the | data link layer to receive Ethernet frames that match the | |||
| corresponding MAC address. The data link layer filters the frames, | corresponding MAC address. The data link layer filters the frames, | |||
| passing those with matching destination addresses to the IP module. | passing those with matching destination addresses to the IP module. | |||
| Similarly, hosts simply hand the multicast packet for transmission to | Similarly, hosts simply hand the multicast packet for transmission to | |||
| the data link layer which would add the layer 2 encapsulation, using | the data link layer which would add the Layer 2 encapsulation, using | |||
| the MAC address derived in the manner previously discussed. | the MAC address derived in the manner previously discussed. | |||
| When this Ethernet frame with a multicast MAC address is received by | When this Ethernet frame with a multicast MAC address is received by | |||
| a switch configured to forward multicast traffic, the default | a switch configured to forward multicast traffic, the default | |||
| behaviour is to flood it to all the ports in the layer 2 segment. | behaviour is to flood it to all the ports in the Layer 2 segment. | |||
| Clearly there may not be a receiver for this multicast group present | Clearly there may not be a receiver for this multicast group present | |||
| on each port and IGMP snooping is used to avoid sending the frame out | on each port and IGMP snooping is used to avoid sending the frame out | |||
| of ports without receivers. | of ports without receivers. | |||
| A switch running IGMP snooping listens to the IGMP messages exchanged | A switch running IGMP snooping listens to the IGMP messages exchanged | |||
| between hosts and the router in order to identify which ports have | between hosts and the router in order to identify which ports have | |||
| active receivers for a specific multicast group, allowing the | active receivers for a specific multicast group, allowing the | |||
| forwarding of multicast frames to be suitably constrained. Normally, | forwarding of multicast frames to be suitably constrained. Normally, | |||
| the multicast router will generate IGMP queries to which the hosts | the multicast router will generate IGMP queries to which the hosts | |||
| send IGMP reports in response. However, number of optimizations in | send IGMP reports in response. However, number of optimizations in | |||
| skipping to change at page 8, line 44 ¶ | skipping to change at page 9, line 46 ¶ | |||
| associated with the VXLAN interface. | associated with the VXLAN interface. | |||
| Another use case of PIM and IGMP in data centers is when IPTV servers | Another use case of PIM and IGMP in data centers is when IPTV servers | |||
| use multicast to deliver content from the data center to end users. | use multicast to deliver content from the data center to end users. | |||
| IPTV is typically a one to many application where the hosts are | IPTV is typically a one to many application where the hosts are | |||
| configured for IGMPv3, the switches are configured with IGMP | configured for IGMPv3, the switches are configured with IGMP | |||
| snooping, and the routers are running PIM-SSM mode. Often redundant | snooping, and the routers are running PIM-SSM mode. Often redundant | |||
| servers send multicast streams into the network and the network is | servers send multicast streams into the network and the network is | |||
| forwards the data across diverse paths. | forwards the data across diverse paths. | |||
| Windows Media servers send multicast streams to clients. Windows | ||||
| Media Services streams to an IP multicast address and all clients | ||||
| subscribe to the IP address to receive the same stream. This allows | ||||
| a single stream to be played simultaneously by multiple clients and | ||||
| thus reducing bandwidth utilization. | ||||
| 3.4. Advantages and disadvantages | 3.4. Advantages and disadvantages | |||
| Arguably the biggest advantage of using PIM and IGMP to support one- | Arguably the biggest advantage of using PIM and IGMP to support one- | |||
| to-many communication in data centers is that these protocols are | to-many communication in data centers is that these protocols are | |||
| relatively mature. Consequently, PIM is available in most routers | relatively mature. Consequently, PIM is available in most routers | |||
| and IGMP is supported by most hosts and routers. As such, no | and IGMP is supported by most hosts and routers. As such, no | |||
| specialized hardware or relatively immature software is involved in | specialized hardware or relatively immature software is involved in | |||
| using them in data centers. Furthermore, the maturity of these | using these protocols in data centers. Furthermore, the maturity of | |||
| protocols means their behaviour and performance in operational | these protocols means their behaviour and performance in operational | |||
| networks is well-understood, with widely available best-practices and | networks is well-understood, with widely available best-practices and | |||
| deployment guides for optimizing their performance. | deployment guides for optimizing their performance. For these | |||
| reasons, PIM and IGMP have been used successfully for supporting one- | ||||
| to-many traffic flows within modern data centers, as discussed | ||||
| earlier. | ||||
| However, somewhat ironically, the relative disadvantages of PIM and | However, somewhat ironically, the relative disadvantages of PIM and | |||
| IGMP usage in data centers also stem mostly from their maturity. | IGMP usage in data centers also stem mostly from their maturity. | |||
| Specifically, these protocols were standardized and implemented long | Specifically, these protocols were standardized and implemented long | |||
| before the highly-virtualized multi-tenant data centers of today | before the highly-virtualized multi-tenant data centers of today | |||
| existed. Consequently, PIM and IGMP are neither optimally placed to | existed. Consequently, PIM and IGMP are neither optimally placed to | |||
| deal with the requirements of one-to-many communication in modern | deal with the requirements of one-to-many communication in modern | |||
| data centers nor to exploit characteristics and idiosyncrasies of | data centers nor to exploit idiosyncrasies of data centers. For | |||
| data centers. For example, there may be thousands of VMs | example, there may be thousands of VMs participating in a multicast | |||
| participating in a multicast session, with some of these VMs | session, with some of these VMs migrating to servers within the data | |||
| migrating to servers within the data center, new VMs being | center, new VMs being continually spun up and wishing to join the | |||
| continually spun up and wishing to join the sessions while all the | sessions while all the time other VMs are leaving. In such a | |||
| time other VMs are leaving. In such a scenario, the churn in the PIM | scenario, the churn in the PIM and IGMP state machines, the volume of | |||
| and IGMP state machines, the volume of control messages they would | control messages they would generate and the amount of state they | |||
| generate and the amount of state they would necessitate within | would necessitate within routers, especially if they were deployed | |||
| routers, especially if they were deployed naively, would be | naively, would be untenable. Furthermore, PIM is a relatively | |||
| untenable. | complex protocol. As such, PIM can be challenging to debug even in | |||
| significantly more benign deployments than those envisaged for future | ||||
| data centers, a fact that has evidently had a dissuasive effect on | ||||
| data center operators considering enabling it within the IP fabric. | ||||
| 4. Alternative options for handling one-to-many traffic | 4. Alternative options for handling one-to-many traffic | |||
| Section 2 has shown that there is likely to be an increasing amount | Section 2 has shown that there is likely to be an increasing amount | |||
| one-to-many communications in data centers. And Section 3 has | one-to-many communications in data centers for multiple reasons. And | |||
| discussed how conventional multicast may be used to handle this | Section 3 has discussed how conventional multicast may be used to | |||
| traffic. Having said that, there are a number of alternative options | handle this traffic, presenting some of the associated advantages and | |||
| of handling this traffic pattern in data centers, as discussed in the | disadvantages. Unsurprisingly, as discussed in the remainder of | |||
| subsequent section. It should be noted that many of these techniques | Section 4, there are a number of alternative options of handling this | |||
| are not mutually-exclusive; in fact many deployments involve a | traffic pattern in data centers. Critically, it should be noted that | |||
| combination of more than one of these techniques. Furthermore, as | many of these techniques are not mutually-exclusive; in fact many | |||
| will be shown, introducing a centralized controller or a distributed | deployments involve a combination of more than one of these | |||
| control plane, makes these techniques more potent. | techniques. Furthermore, as will be shown, introducing a centralized | |||
| controller or a distributed control plane, typically makes these | ||||
| techniques more potent. | ||||
| 4.1. Minimizing traffic volumes | 4.1. Minimizing traffic volumes | |||
| If handling one-to-many traffic in data centers can be challenging | If handling one-to-many traffic flows in data centers is considered | |||
| then arguably the most intuitive solution is to aim to minimize the | onerous, then arguably the most intuitive solution is to aim to | |||
| volume of such traffic. | minimize the volume of said traffic. | |||
| It was previously mentioned in Section 2 that the three main causes | It was previously mentioned in Section 2 that the three main | |||
| of one-to-many traffic in data centers are applications, overlays and | contributors to one-to-many traffic in data centers are applications, | |||
| protocols. While, relatively speaking, little can be done about the | overlays and protocols. Typically the applications running on VMs | |||
| volume of one-to-many traffic generated by applications, there is | are outside the control of the data center operator and thus, | |||
| more scope for attempting to reduce the volume of such traffic | relatively speaking, little can be done about the volume of one-to- | |||
| generated by overlays and protocols. (And often by protocols within | many traffic generated by applications. Luckily, there is more scope | |||
| overlays.) This reduction is possible by exploiting certain | for attempting to reduce the volume of such traffic generated by | |||
| characteristics of data center networks: fixed and regular topology, | overlays and protocols. (And often by protocols within overlays.) | |||
| single administrative control, consistent hardware and software, | This reduction is possible by exploiting certain characteristics of | |||
| well-known overlay encapsulation endpoints and so on. | data center networks such as a fixed and regular topology, single | |||
| administrative control, consistent hardware and software, well-known | ||||
| overlay encapsulation endpoints and systematic IP address allocation. | ||||
| A way of minimizing the amount of one-to-many traffic that traverses | A way of minimizing the amount of one-to-many traffic that traverses | |||
| the data center fabric is to use a centralized controller. For | the data center fabric is to use a centralized controller. For | |||
| example, whenever a new VM is instantiated, the hypervisor or | example, whenever a new VM is instantiated, the hypervisor or | |||
| encapsulation endpoint can notify a centralized controller of this | encapsulation endpoint can notify a centralized controller of this | |||
| new MAC address, the associated virtual network, IP address etc. The | new MAC address, the associated virtual network, IP address etc. The | |||
| controller could subsequently distribute this information to every | controller could subsequently distribute this information to every | |||
| encapsulation endpoint. Consequently, when any endpoint receives an | encapsulation endpoint. Consequently, when any endpoint receives an | |||
| ARP request from a locally attached VM, it could simply consult its | ARP request from a locally attached VM, it could simply consult its | |||
| local copy of the information distributed by the controller and | local copy of the information distributed by the controller and | |||
| reply. Thus, the ARP request is suppressed and does not result in | reply. Thus, the ARP request is suppressed and does not result in | |||
| one-to-many traffic traversing the data center IP fabric. | one-to-many traffic traversing the data center IP fabric. | |||
| Alternatively, the functionality supported by the controller can | Alternatively, the functionality supported by the controller can | |||
| realized by a distributed control plane. BGP-EVPN [RFC7432, RFC8365] | realized by a distributed control plane. BGP-EVPN [RFC7432, RFC8365] | |||
| is the most popular control plane used in data centers. Typically, | is the most popular control plane used in data centers. Typically, | |||
| the encapsulation endpoints will exchange pertinent information with | the encapsulation endpoints will exchange pertinent information with | |||
| each other by all peering with a BGP route reflector (RR). Thus, | each other by all peering with a BGP route reflector (RR). Thus, | |||
| information about local MAC addresses, MAC to IP address mapping, | information such as local MAC addresses, MAC to IP address mapping, | |||
| virtual networks identifiers etc can be disseminated. Consequently, | virtual networks identifiers, IP prefixes, and local IGMP group | |||
| ARP requests from local VMs can be suppressed by the encapsulation | membership can be disseminated. Consequently, for example, ARP | |||
| endpoint. | requests from local VMs can be suppressed by the encapsulation | |||
| endpoint using the information learnt from the control plane about | ||||
| the MAC to IP mappings at remote peers. In a similar fashion, | ||||
| encapsulation endpoints can use information gleaned from the BGP-EVPN | ||||
| messages to proxy for both IGMP reports and queries for the attached | ||||
| VMs, thus obviating the need to transmit IGMP messages across the | ||||
| data center fabric. | ||||
| 4.2. Head end replication | 4.2. Head end replication | |||
| A popular option for handling one-to-many traffic patterns in data | A popular option for handling one-to-many traffic patterns in data | |||
| centers is head end replication (HER). HER means the traffic is | centers is head end replication (HER). HER means the traffic is | |||
| duplicated and sent to each end point individually using conventional | duplicated and sent to each end point individually using conventional | |||
| IP unicast. Obvious disadvantages of HER include traffic duplication | IP unicast. Obvious disadvantages of HER include traffic duplication | |||
| and the additional processing burden on the head end. Nevertheless, | and the additional processing burden on the head end. Nevertheless, | |||
| HER is especially attractive when overlays are in use as the | HER is especially attractive when overlays are in use as the | |||
| replication can be carried out by the hypervisor or encapsulation end | replication can be carried out by the hypervisor or encapsulation end | |||
| point. Consequently, the VMs and IP fabric are unmodified and | point. Consequently, the VMs and IP fabric are unmodified and | |||
| unaware of how the traffic is delivered to the multiple end points. | unaware of how the traffic is delivered to the multiple end points. | |||
| Additionally, it is possible to use a number of approaches for | Additionally, it is possible to use a number of approaches for | |||
| constructing and disseminating the list of which endpoints should | constructing and disseminating the list of which endpoints should | |||
| receive what traffic and so on. | receive what traffic and so on. | |||
| For example, the reluctance of data center operators to enable PIM | For example, the reluctance of data center operators to enable PIM | |||
| and IGMP within the data center fabric means VXLAN is often used with | within the data center fabric means VXLAN is often used with HER. | |||
| HER. Thus, BUM traffic from each VNI is replicated and sent using | Thus, BUM traffic from each VNI is replicated and sent using unicast | |||
| unicast to remote VTEPs with VMs in that VNI. The list of remote | to remote VTEPs with VMs in that VNI. The list of remote VTEPs to | |||
| VTEPs to which the traffic should be sent may be configured manually | which the traffic should be sent may be configured manually on the | |||
| on the VTEP. Alternatively, the VTEPs may transmit appropriate state | VTEP. Alternatively, the VTEPs may transmit pertinent local state to | |||
| to a centralized controller which in turn sends each VTEP the list of | a centralized controller which in turn sends each VTEP the list of | |||
| remote VTEPs for each VNI. Lastly, HER also works well when a | remote VTEPs for each VNI. Lastly, HER also works well when a | |||
| distributed control plane is used instead of the centralized | distributed control plane is used instead of the centralized | |||
| controller. Again, BGP-EVPN may be used to distribute the | controller. Again, BGP-EVPN may be used to distribute the | |||
| information needed to faciliate HER to the VTEPs. | information needed to faciliate HER to the VTEPs. | |||
| 4.3. BIER | 4.3. Programmable Forwarding Planes | |||
| As discussed in Section 2, one of the main functions of PIM is to | ||||
| build and maintain multicast distribution trees. Such a tree | ||||
| indicates the path a specific flow will take through the network. | ||||
| Thus, in routers traversed by the flow, the information from PIM is | ||||
| ultimately used to create a multicast forwarding entry for the | ||||
| specific flow and insert it into the multicast forwarding table. The | ||||
| multicast forwarding table will have entries for each multicast flow | ||||
| traversing the router, with the lookup key usually being a | ||||
| concantenation of the source and group addresses. Critically, each | ||||
| entry will contain information such as the legal input interface for | ||||
| the flow and a list of output interfaces to which matching packets | ||||
| should be replicated. | ||||
| Viewed in this way, there is nothing remarkable about the multicast | ||||
| forwarding state constructed in routers based on the information | ||||
| gleaned from PIM. And, in fact, it is perfectly feasible to build | ||||
| such state in the absence of PIM. Such prospects have been | ||||
| significantly enhanced with the increasing popularity and performance | ||||
| of network devices with programmable forwarding planes. These | ||||
| devices are attractive for use in data centers since they are | ||||
| amenable to being programmed by a centralized controller. If such a | ||||
| controller has a global view of the sources and receivers for each | ||||
| multicast flow (which can be provided by the devices attached to the | ||||
| end hosts in the data center communicating with the controller), an | ||||
| accurate representation of data center topology (which is usually | ||||
| well-known), then it can readily compute the multicast forwarding | ||||
| state that must be installed at each router to ensure the one-to-many | ||||
| traffic flow is delivered properly to the correct receivers. All | ||||
| that is needed is an API to program the forwarding planes of all the | ||||
| network devices that need to handle the flow appropriately. Such | ||||
| APIs do in fact exist and so, unsurprisingly, handling one-to-many | ||||
| traffic flows using such an approach is attractive for data centers. | ||||
| Being able to program the forwarding plane in this manner offers the | ||||
| enticing possibility of introducing novel algorithms and concepts for | ||||
| forwarding multicast traffic in data centers. These schemes | ||||
| typically aim to exploit the idiosyncracies of the data center | ||||
| network architecture to create ingenious, pithy and elegant encodings | ||||
| of the information needed to facilitate multicast forwarding. | ||||
| Depending on the scheme, this information may be carried in packet | ||||
| headers, stored in the multicast forwarding table in routers or a | ||||
| combination of both. The key characterstic is that the terseness of | ||||
| the forwarding information means the volume of forwarding state is | ||||
| significantly reduced. Additionally, the overhead associated with | ||||
| building and maintaining a multicast forwarding tree has been | ||||
| eliminated. The result of these reductions in the overhead | ||||
| associated with multicast forwarding is a significant and impressive | ||||
| increase in the effective number of multicast flows that can be | ||||
| supported within the data center. | ||||
| [Shabaz19] is a good example of such an approach and also presents | ||||
| comprehensive discussion of other schemes in the discussion on | ||||
| releated work. Although a number of promising schemes have been | ||||
| proposed, no consensus has yet emerged as to which approach is best, | ||||
| and in fact what "best" means. Even if a clear winner were to | ||||
| emerge, it faces significant challenges to gain the vendor and | ||||
| operator buy-in to ensure it is widely deployed in data centers. | ||||
| 4.4. BIER | ||||
| As discussed in Section 3.4, PIM and IGMP face potential scalability | As discussed in Section 3.4, PIM and IGMP face potential scalability | |||
| challenges when deployed in data centers. These challenges are | challenges when deployed in data centers. These challenges are | |||
| typically due to the requirement to build and maintain a distribution | typically due to the requirement to build and maintain a distribution | |||
| tree and the requirement to hold per-flow state in routers. Bit | tree and the requirement to hold per-flow state in routers. Bit | |||
| Index Explicit Replication (BIER) [RFC 8279] is a new multicast | Index Explicit Replication (BIER) [RFC 8279] is a new multicast | |||
| forwarding paradigm that avoids these two requirements. | forwarding paradigm that avoids these two requirements. | |||
| When a multicast packet enters a BIER domain, the ingress router, | When a multicast packet enters a BIER domain, the ingress router, | |||
| known as the Bit-Forwarding Ingress Router (BFIR), adds a BIER header | known as the Bit-Forwarding Ingress Router (BFIR), adds a BIER header | |||
| to the packet. This header contains a bit string in which each bit | to the packet. This header contains a bit string in which each bit | |||
| maps to an egress router, known as Bit-Forwarding Egress Router | maps to an egress router, known as Bit-Forwarding Egress Router | |||
| (BFER). If a bit is set, then the packet should be forwarded to the | (BFER). If a bit is set, then the packet should be forwarded to the | |||
| associated BFER. The routers within the BIER domain, Bit-Forwarding | associated BFER. The routers within the BIER domain, Bit-Forwarding | |||
| Routers (BFRs), use the BIER header in the packet and information in | Routers (BFRs), use the BIER header in the packet and information in | |||
| the Bit Index Forwarding Table (BIFT) to carry out simple bit- wise | the Bit Index Forwarding Table (BIFT) to carry out simple bit- wise | |||
| operations to determine how the packet should be replicated optimally | operations to determine how the packet should be replicated optimally | |||
| so it reaches all the appropriate BFERs. | so it reaches all the appropriate BFERs. | |||
| BIER is deemed to be attractive for facilitating one-to-many | BIER is deemed to be attractive for facilitating one-to-many | |||
| communications in data ceneters [I-D.ietf-bier-use-cases]. The | communications in data centers [I-D.ietf-bier-use-cases]. The | |||
| deployment envisioned with overlay networks is that the the | deployment envisioned with overlay networks is that the the | |||
| encapsulation endpoints would be the BFIR. So knowledge about the | encapsulation endpoints would be the BFIR. So knowledge about the | |||
| actual multicast groups does not reside in the data center fabric, | actual multicast groups does not reside in the data center fabric, | |||
| improving the scalability compared to conventional IP multicast. | improving the scalability compared to conventional IP multicast. | |||
| Additionally, a centralized controller or a BGP-EVPN control plane | Additionally, a centralized controller or a BGP-EVPN control plane | |||
| may be used with BIER to ensure the BFIR have the required | may be used with BIER to ensure the BFIR have the required | |||
| information. A challenge associated with using BIER is that, unlike | information. A challenge associated with using BIER is that it | |||
| most of the other approaches discussed in this draft, it requires | requires changes to the forwarding behaviour of the routers used in | |||
| changes to the forwarding behaviour of the routers used in the data | the data center IP fabric. | |||
| center IP fabric. | ||||
| 4.4. Segment Routing | 4.5. Segment Routing | |||
| Segment Routing (SR) [I-D.ietf-spring-segment-routing] adopts the the | Segment Routing (SR) [RFC8402] is a manifestation of the source | |||
| source routing paradigm in which the manner in which a packet | routing paradigm, so called as the path a packet takes through a | |||
| traverses a network is determined by an ordered list of instructions. | network is determined at the source. The source encodes this | |||
| These instructions are known as segments may have a local semantic to | information in the packet header as a sequence of instructions. | |||
| an SR node or global within an SR domain. SR allows enforcing a flow | These instructions are followed by intermediate routers, ultimately | |||
| through any topological path while maintaining per-flow state only at | resulting in the delivery of the packet to the desired destination. | |||
| the ingress node to the SR domain. Segment Routing can be applied to | In SR, the instructions are known as segments and a number of | |||
| the MPLS and IPv6 data-planes. In the former, the list of segments | different kinds of segments have been defined. Each segment has an | |||
| is represented by the label stack and in the latter it is represented | identifier (SID) which is distributed throughout the network by newly | |||
| as a routing extension header. Use-cases are described in [I-D.ietf- | defined extensions to standard routing protocols. Thus, using this | |||
| spring-segment-routing] and are being considered in the context of | information, sources are able to determine the exact sequence of | |||
| BGP-based large-scale data-center (DC) design [RFC7938]. | segments to encode into the packet. The manner in which these | |||
| instructions are encoded depends on the underlying data plane. | ||||
| Segment Routing can be applied to the MPLS and IPv6 data planes. In | ||||
| the former, the list of segments is represented by the label stack | ||||
| and in the latter it is represented as an IPv6 routing extension | ||||
| header. Advantages of segment routing include the reduction in the | ||||
| amount of forwarding state routers need to hold and the removal of | ||||
| the need to run a signaling protocol, thus improving the network | ||||
| scalability while reducing the operational complexity. | ||||
| Multicast in SR continues to be discussed in a variety of drafts and | The advantages of segment routing and the ability to run it over an | |||
| working groups. The SPRING WG has not yet been chartered to work on | unmodified MPLS data plane means that one of its anticipated use | |||
| Multicast in SR. Multicast can include locally allocating a Segment | cases is in BGP-based large-scale data centers [RFC7938]. The exact | |||
| Identifier (SID) to existing replication solutions, such as PIM, | manner in which multicast traffic will be handled in SR has not yet | |||
| mLDP, P2MP RSVP-TE and BIER. It may also be that a new way to signal | been standardized, with a number of different options being | |||
| and install trees in SR is developed without creating state in the | considered. For example, since with the MPLS data plane, segments | |||
| network. | are simply encoded as a label stack, then the protocols traditionally | |||
| used to create point-to-multipoint LSPs could be reused to allow SR | ||||
| to support one-to-many traffic flows. Alternatively, a special SID | ||||
| may be defined for a multicast distribution tree, with a centralized | ||||
| controller being used to program routers appropriately to ensure the | ||||
| traffic is delivered to the desired destinations, while avoiding the | ||||
| costly process of building and maintaining a multicast distribution | ||||
| tree. | ||||
| 5. Conclusions | 5. Conclusions | |||
| As the volume and importance of one-to-many traffic in data centers | As the volume and importance of one-to-many traffic in data centers | |||
| increases, conventional IP multicast is likely to become increasingly | increases, conventional IP multicast is likely to become increasingly | |||
| unattractive for deployment in data centers for a number of reasons, | unattractive for deployment in data centers for a number of reasons, | |||
| mostly pertaining its inherent relatively poor scalability and | mostly pertaining its relatively poor scalability and inability to | |||
| inability to exploit characteristics of data center network | exploit characteristics of data center network architectures. Hence, | |||
| architectures. Hence, even though IGMP/MLD is likely to remain the | even though IGMP/MLD is likely to remain the most popular manner in | |||
| most popular manner in which end hosts signal interest in joining a | which end hosts signal interest in joining a multicast group, it is | |||
| multicast group, it is unlikely that this multicast traffic will be | unlikely that this multicast traffic will be transported over the | |||
| transported over the data center IP fabric using a multicast | data center IP fabric using a multicast distribution tree built and | |||
| distribution tree built by PIM. Rather, approaches which exploit | maintained by PIM in the future. Rather, approaches which exploit | |||
| characteristics of data center network architectures (e.g. fixed and | idiosyncracies of data center network architectures are better placed | |||
| regular topology, single administrative control, consistent hardware | to deliver one-to-many traffic in data centers, especially when | |||
| and software, well-known overlay encapsulation endpoints etc.) are | judiciously combined with a centralized controller and/or a | |||
| better placed to deliver one-to-many traffic in data centers, | distributed control plane, particularly one based on BGP-EVPN. | |||
| especially when judiciously combined with a centralized controller | ||||
| and/or a distributed control plane (particularly one based on BGP- | ||||
| EVPN). | ||||
| 6. IANA Considerations | 6. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| No new security considerations result from this document | No new security considerations result from this document | |||
| 8. Acknowledgements | 8. Acknowledgements | |||
| skipping to change at page 13, line 25 ¶ | skipping to change at page 16, line 10 ¶ | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [I-D.ietf-bier-use-cases] | [I-D.ietf-bier-use-cases] | |||
| Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., | Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., | |||
| Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. | Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. | |||
| Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-06 | Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-09 | |||
| (work in progress), January 2018. | (work in progress), January 2019. | |||
| [I-D.ietf-nvo3-geneve] | [I-D.ietf-nvo3-geneve] | |||
| Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic | Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic | |||
| Network Virtualization Encapsulation", draft-ietf- | Network Virtualization Encapsulation", draft-ietf- | |||
| nvo3-geneve-11 (work in progress), March 2019. | nvo3-geneve-13 (work in progress), March 2019. | |||
| [I-D.ietf-nvo3-vxlan-gpe] | [I-D.ietf-nvo3-vxlan-gpe] | |||
| Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | |||
| Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-06 (work | Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-07 (work | |||
| in progress), April 2018. | in progress), April 2019. | |||
| [I-D.ietf-spring-segment-routing] | [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | |||
| Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., | Converting Network Protocol Addresses to 48.bit Ethernet | |||
| Litkowski, S., and R. Shakir, "Segment Routing | Address for Transmission on Ethernet Hardware", STD 37, | |||
| Architecture", draft-ietf-spring-segment-routing-15 (work | RFC 826, DOI 10.17487/RFC0826, November 1982, | |||
| in progress), January 2018. | <https://www.rfc-editor.org/info/rfc826>. | |||
| [RFC2236] Fenner, W., "Internet Group Management Protocol, Version | [RFC2236] Fenner, W., "Internet Group Management Protocol, Version | |||
| 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, | 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, | |||
| <https://www.rfc-editor.org/info/rfc2236>. | <https://www.rfc-editor.org/info/rfc2236>. | |||
| [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast | [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast | |||
| Listener Discovery (MLD) for IPv6", RFC 2710, | Listener Discovery (MLD) for IPv6", RFC 2710, | |||
| DOI 10.17487/RFC2710, October 1999, | DOI 10.17487/RFC2710, October 1999, | |||
| <https://www.rfc-editor.org/info/rfc2710>. | <https://www.rfc-editor.org/info/rfc2710>. | |||
| skipping to change at page 14, line 20 ¶ | skipping to change at page 17, line 5 ¶ | |||
| [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | |||
| "Protocol Independent Multicast - Sparse Mode (PIM-SM): | "Protocol Independent Multicast - Sparse Mode (PIM-SM): | |||
| Protocol Specification (Revised)", RFC 4601, | Protocol Specification (Revised)", RFC 4601, | |||
| DOI 10.17487/RFC4601, August 2006, | DOI 10.17487/RFC4601, August 2006, | |||
| <https://www.rfc-editor.org/info/rfc4601>. | <https://www.rfc-editor.org/info/rfc4601>. | |||
| [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for | [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for | |||
| IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, | IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, | |||
| <https://www.rfc-editor.org/info/rfc4607>. | <https://www.rfc-editor.org/info/rfc4607>. | |||
| [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | ||||
| "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | ||||
| DOI 10.17487/RFC4861, September 2007, | ||||
| <https://www.rfc-editor.org/info/rfc4861>. | ||||
| [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, | [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, | |||
| "Bidirectional Protocol Independent Multicast (BIDIR- | "Bidirectional Protocol Independent Multicast (BIDIR- | |||
| PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, | PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, | |||
| <https://www.rfc-editor.org/info/rfc5015>. | <https://www.rfc-editor.org/info/rfc5015>. | |||
| [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | |||
| Problems in Large Data Center Networks", RFC 6820, | Problems in Large Data Center Networks", RFC 6820, | |||
| DOI 10.17487/RFC6820, January 2013, | DOI 10.17487/RFC6820, January 2013, | |||
| <https://www.rfc-editor.org/info/rfc6820>. | <https://www.rfc-editor.org/info/rfc6820>. | |||
| skipping to change at page 15, line 23 ¶ | skipping to change at page 18, line 11 ¶ | |||
| Explicit Replication (BIER)", RFC 8279, | Explicit Replication (BIER)", RFC 8279, | |||
| DOI 10.17487/RFC8279, November 2017, | DOI 10.17487/RFC8279, November 2017, | |||
| <https://www.rfc-editor.org/info/rfc8279>. | <https://www.rfc-editor.org/info/rfc8279>. | |||
| [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | |||
| Uttaro, J., and W. Henderickx, "A Network Virtualization | Uttaro, J., and W. Henderickx, "A Network Virtualization | |||
| Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | |||
| DOI 10.17487/RFC8365, March 2018, | DOI 10.17487/RFC8365, March 2018, | |||
| <https://www.rfc-editor.org/info/rfc8365>. | <https://www.rfc-editor.org/info/rfc8365>. | |||
| [RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., | ||||
| Decraene, B., Litkowski, S., and R. Shakir, "Segment | ||||
| Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, | ||||
| July 2018, <https://www.rfc-editor.org/info/rfc8402>. | ||||
| [Shabaz19] | ||||
| Shabaz, M., Suresh, L., Rexford, J., Feamster, N., | ||||
| Rottenstreich, O., and M. Hira, "Elmo: Source Routed | ||||
| Multicast for Public Clouds", ACM SIGCOMM 2019 Conference | ||||
| (SIGCOMM '19) ACM, DOI 10.1145/3341302.3342066, August | ||||
| 2019. | ||||
| [SMPTE2110] | ||||
| SMTPE, Society of Motion Picture and Television Engineers, | ||||
| "SMPTE2110 Standards Suite", | ||||
| <http://www.smpte.org/st-2110>. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Mike McBride | Mike McBride | |||
| Futurewei | Futurewei | |||
| Email: michael.mcbride@futurewei.com | Email: michael.mcbride@futurewei.com | |||
| Olufemi Komolafe | Olufemi Komolafe | |||
| Arista Networks | Arista Networks | |||
| End of changes. 40 change blocks. | ||||
| 223 lines changed or deleted | 375 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||