- 1 - Network Working Group S. Brim Request for Comments: DRAFT Cornell University Y. Rekhter T.J. Watson Research Center, IBM Corp. October 1992 IP Multicast Communications Using BGP Status of this Memo This document reflects the current status of recommendations for supporting inter-domain multicast packet forwarding using BGP. This RFC specifies an IAB standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "IAB Official Protocol Standards" for the standardization state and status of this protocol. Distribution of this document is unlimited. This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress". Abstract This document, a major revision of the previous version, reflects the current status of recommendations for supporting inter-domain multicast packet forwarding using BGP. Research is underway on other methods for inter-domain multicasting, but only what can be done today is considered here. Expiration Date March 1993 [Page 1] - 2 - 1 Introduction Most communication in the Internet today is unicasting, where there is a single specific destination for every packet. On local area networks broadcasting is common, in which the the destination of a packet is every node on the network. Multicasting is like broadcasting in that it supports multiple recipients for a single packet, but packets are intended for a specific group. As examples, in a local area network environment multicasting is currently often used for communication between processors of loosely coupled systems, or for communication between routers or bridges. In these cases the sender wants to reach the members of a special group, but not every node on the network. Broadcasting is in fact a special case of multicasting in which the special group is all nodes. Multicasting over wide areas, as opposed to just on local area networks, is an important capability the development of which has lagged far behind its need. Until recently there has only been one IP multicasting implementation which could be used on more than just a local area network [1], [9], but multicast packets had to be encapsulated in order to send them across autonomous system boundaries. Work is now in progress to develop support for wide-area multicasting using standard protocols and to make that support an integrated part of the Internet protocol suite. Part of that work is being done under the auspices of the IETF BGP Working Group, including this document which explores how multicasting can be supported between autonomous systems using the Border Gateway Protocol (BGP) [5], [7], [8]. The best introductory reference on multicast forwarding is by Deering [2]. It is highly recommended that this paper, plus some of its references, as well as the BGP RFCs be read before this one, since this one assumes a high degree of understanding of BGP and only summarizes some parts of Deering's presentation. We speak in terms of multicast groups. In IP multicasting, multicast groups are known by their addresses, which are in the range from 224.0.0.0 to 239.255.255.255. Each group has a unique address. Each member of a group is known by one or more multicast addresses in addition to one or more unicast addresses. RFC 1112 [1] defines methods for mapping between IP multicast group addresses and the level 2 address spaces of ethernet, 802.3, all point-to-point protocols, and protocols with broadcast but no multicast capability (e.g. LocalTalk). Mappings are also defined for FDDI [4] and SMDS Expiration Date March 1993 [Page 2] - 3 - [6]. The general issues in wide-area multicasting are: - Discovering where packets should be sent. The destination address for a multicast packet refers to a multicast group, whose members and locations change over time. On a local area network all destinations hear the same packet, and there is no need for forwarding. When a multicast packet must be forwarded to the members of a group not on the same local area network, we need mechanisms by which the members can make themselves known to the routers and the routers can ensure that every member of a group receives at least one and preferably only one copy of a packet addressed to that group. - Establishing efficient routing paths for multicast packets. A packet with multiple recipients must be replicated and sent on multiple links, but copies of the packet should travel over as few links as possible. We need routing protocols defined for use both within and between autonomous systems. A major issue which has received increasing attention recently is how well inter-domain multicast routing can scale, given that we are now thinking in terms of an Internet which should be able to support a billion domains. This document assumes that hosts will inform routers of their membership in multicast groups, probably via the Internet Group Management Protocol [1]. It attempts to explore three remaining problems -- communication between routers of where multicast packets should be sent, efficient propagation of packets between autonomous systems, and interactions between intra-autonomous system and inter- autonomous system routing protocols in the support of multicasting. 2 Reverse Path Forwarding The only approach to wide-area routing of multicast packets that has been implemented so far uses "reverse path forwarding" and is described in RFC 1075 and in [2]. This approach would fit well in the BGP environment, offering low overhead and excellent interaction with IGPs. Also the implemented method is directly applicable to BGP already. However, it may not allow the level of administrative control of routing paths to which some network administrators have Expiration Date March 1993 [Page 3] - 4 - become accustomed (see Section 4.1). In every approach to forwarding multicast packets the problem faced by a particular router is to determine its position in the paths by which multicast packets from a particular source should be forwarded. A router needs to (1) determine whether to accept a particular multicast packet or to discard it, based on its originating source and immediate previous hop, and (2) once a packet has been accepted, decide which of its peers to forward it to, if any. If a "source tree" defines the paths by which an end system sends unicast packets to all other end systems, then a "sink tree" defines how an end system is reached by unicast packets from all others. The goal in unicast routing is to make a destination reachable by packets from all sources; the goal in multicast routing is to ensure that a packet from a single source reaches multiple destinations. Obviously a set of paths that solves the first problem can be used to solve the second if we use it in the reverse direction. The basic reverse path forwarding approach uses the fact that the propagation of unicast IP routing information already causes the formation of a sink tree -- the graph of how unicast packets should flow to that IP entity from all others. Thus when multicast packets need to be routed from that network to multiple destinations, a broadcast tree with that source as the root has already been formed, and this approach simply arranges for the multicast packets to flow along certain branches of that tree, but in the opposite direction of the unicast packets. This procedure establishes paths for efficient broadcast, but network bandwidth is still wasted by sending multicast packets along all branches of the sink tree even when there are no nodes on those branches interested in receiving them. Further mechanisms can be defined to dynamically ensure that multicast packets are only sent to those peers which are on paths leading to members of the destination multicast group, for example via the "prune" and "graft" messages described in RFC 1075. A prune message is sent to tell a BGP peer not to send it packets addressed from a particular source to a particular multicast group. A graft message is sent to cancel that directive. Prune messages can be cached and timed out by the receiver, and repeated as necessary by the sender. A border router can maintain a table of which interfaces packets from a particular source to a particular target multicast group should and should not be forwarded on, depending on memory constraints and multicast activity in the Internet. Expiration Date March 1993 [Page 4] - 5 - 3 BGP and Reverse Path Forwarding The BGP protocol itself does not have to be changed to support inter-domain multicasting, but implementation of IGMP "prune" and "graft" messages by the BGP speaker is required. Functionally, in reverse path forwarding, if a border router which receives a multicast packet receives it on the link by which it would send a unicast packet to the originator of that multicast packet, then it will propagate that multicast packet to the other BGP peers which are using it to reach the originator and which have not sent a "prune" message for that {originator, group} combination. In all current implementations of the BGP protocol, a border router has an implicit confirmation of whether its external peers are using routes that it has offered to them through the "echo" inherent in the BGP update messages (as strongly encouraged in the BGP4 Internet Draft). This combined with prune messages can efficiently limit propagation of multicast packets to only those branches that want them. However, without some extra features it is impossible for border routers to exchange prune and graft information across an autonomous system. A border router can use information obtained through examining LOC_PREF attributes and/or other means to detect if it is its own AS's exit point for sending unicast packets to a particular multicast source. If it is not, then the border router would never propagate multicast packets from that source into its AS or across its AS to others. However, if it is the AS's unicast exit point for a particular source, then without any way to gather further information it will have to forward multicast packets across its AS to all other AS border gateways, since (in reverse path forwarding) it has no way of knowing if there is a group member beyond one of the other border gateways or not. There are two solutions to this problem which are reasonable. The first, which is recommended here, is to define new IGMP prune and graft messages. Prunes and grafts were originally meant to be messages from a router to one of its immediate neighbors, telling the neighbor whether it has recipients downstream from it with respect to a particular multicast {source, group} combination. To solve the above problem we can create new IGMP prune and graft messages which would be advisory -- these messages would be sent, for example, from one AS border router to another, telling it that the originator has no recipients downstream from it with respect to a particular multicast source that the AS is reaching through the recipient border Expiration Date March 1993 [Page 5] - 6 - router. Another possibility would be to require the establishment of multicast "tunnels", as used by mrouted [9], between multicast- capable border routers. The tunnels would be used for sending encapsulated IGMP prunes and grafts between the border routers, bypassing the AS's internal routing. If tunnels are used, it would be best to have the multicast data packets carried in the tunnels as well -- one copy of a packet would be multicast into the AS if there were group members in the AS, and other copies would be encapsulated and sent directly to the other border routers that had not sent a prune for that {source, group} combination. The tunnels could be automatically created when the BGP connection is created. Neither of these solutions would require changes to BGP, but both would couple multicast routing to knowledge of BGP routing information in the border routers. While the proposed solutions are similar to each other, the first one have an advantage of not requiring the establishment of multicast "tunnels", thus simplifying the operation of the protocol. 4 Potential Problems with Reverse Path Forwarding 4.1 Asymmetric routes As long as the path by which one node reaches another is the exact reverse of how the other node reaches the first (symmetric routes), unicast and RPF-based multicast packets will flow along the same paths. However, the Internet supports, and frequently has, asymmetric routes between ASs. Network administrators currently set policies for how they want their networks to reach others, but, since in reverse path forwarding multicast packets flow according to how a node is reached, not according to how it reaches others, if routes are not symmetrical the behavior of the multicast packets will be controlled in the opposite way of what the network managers intended when they set up the controls for unicast traffic. Discussions in IETF meetings suggest that while most network managers would not mind if multicast packets flowed from their ASs along the paths which others use to send unicast packets to them, there are some who would like to retain more control of how multicast packets flow through the Internet. There are ways to add source-based control, but they all add significant overhead either to protocol Expiration Date March 1993 [Page 6] - 7 - traffic or to network administration. The cost to everyone seems to outweigh the benefit gained by a few. We can probably set up a mechanism similar to that in the "unified" routing scheme [3], where the majority of traffic is taken care of by simple, low-overhead routing, and for the small number of cases where it is necessary more complex routing can be used. 4.2 Incremental Implementation One consideration is how easy it will be to get from the current Internet to one that mostly supports multicasting (getting to an Internet which fully supports multicasting is not a reasonable goal). Since in reverse path forwarding multicast routing depends directly on unicast routing, incremental implementation in the Internet might be awkward. There is no way to detect which routers support multicast routing, and thus no way to know if multicast packets can get between any two points, directly from the network itself. Tunnels may easily be set up, as described in RFC 1075, to reach between islands of multicast-supporting routers, but again with RPF there is no way of knowing when these (relatively inefficient) tunnels should be in place and when they are no longer necessary without frequent dialog between network operators. Once again this is not an extreme difficulty, and network administrators are careful enough that they will probably be aware of their tunnel topology and their neighbors' activities, and able to control them effectively. Another proposal, which has been called "multicast fireworks" because of the way multicast packets would "explode", essentially says that one should not require multicast forwarding to ever be completely deployed Internet-wide, that the Internet will be in a hybrid state, with some tunnels connecting multicast-capable ASs, for a very long time, perhaps forever. 4.3 Scaling Many people have valid concerns about the capability of any multicast routing algorithm to scale to support 10^9 autonomous systems. In the case of reverse path forwarding, some people wonder about the involved in not propagating multicast group member locations in the first place, and essentially discovering them by sending data packets everywhere and using prune responses to clean up the forwarding tree Expiration Date March 1993 [Page 7] - 8 - after the fact. Under traditional RPF, every multicast group with global scope periodically sends at least one packet to every part of the world, regardless of whether there are group members there or not. Since it is not necessary for a node to be a member of a group in order to send messages to that group, the alternative of propagating membership information (instead of the using the "probe" data packets) would require propagating membership information for each group everywhere, to any node that might want to send to that group. Propagating knowledge of group membership would require at least one packet for each member-containing network to be sent to every leaf of the Internet, each time that member-containing network transitioned between having zero and at least one member. On the other hand using data packets and prune messages would require one packet to be sent to every constituent of the Internet for the entire group, as opposed to sending one for each member-containing network. More data packets would be sent periodically, but the frequency would depend on the times specified in the prune messages. It is expected that these times will be long and that routers will use graft messages as necessary. Since a graft message will only be sent if data packets for a particular group is desired, graft messages are only incrementally more traffic than the data itself will be and are not significant as overhead. Thus, independent of the topology of the Internet, it is always cheaper to use the prune/graft approach than it is to propagate membership information. Finally, prune messages need not apply to just the particular group and source for the packet that triggers them. It can be shown that if the source and group fields in the prune message are prefix-based, and prunes are sent which cover all unwanted groups and sources, essentially in anticipation of future data "probe" packets, that very few of these packets will ever be sent. Since reverse path forwarding works with whatever address prefixes are in the route information base at any BGP node, and keeps only cached information about active multicast sources and groups, the amount of stored information required will continue to scale well as the Internet grows. There is a draft document based on Tony Ballardie's ongoing thesis work, in which he and others propose "core-based trees", basically that members of a group form a tree based on a well-known set of "core" nodes, and that senders of packets to that group need know nothing about the membership; they should simply send their packets toward the core. When the packets hit the tree formed by the members they will begin following all branches of the tree from that point. This scheme seems to have great potential, in that it doesn't flood Expiration Date March 1993 [Page 8] - 9 - the Internet with either membership notifications or "probe" data packets, and thus it should scale well. Policies can be applied to some degree and traffic will flow from a source toward the tree basically according to the source's preferences. However, there is a chance that it might have significant overhead in maintaining trees, since participants must be sure that a particular "core" node is functioning, and adapt rapidly if it is not. The detailed mechanisms of actually making the scheme work robustly are still being explored and a subject of future research. 5 Acknowledgments The development of some of the ideas presented in this document was supported by the Defense Advanced Research Project Agency through grant NAG 2-593 from the NASA Ames Research Center. This work would not have been possible without the help of the IETF BGP Working Group, John Moy, and Steve Deering. References [1] S.Deering, "Host extensions for IP multicasting", RFC 1112, Network Information Center, Aug. 1989. [2] S. Deering, "Multicast Routing in a Datagram Internetwork", PhD thesis, Electrical Engineering Dept., Stanford University, Dec. 1991. [3] D.Estrin, Y.Rekhter, and S.Hotz, "A Unified Approach to Inter- Domain Routing", RFC 1322, Network Information Center, May 1991. [4] D.Katz, "A Proposed Standard for the Transmission of IP Datagrams over FDDI Networks", RFC 1188, Network Information Center, Oct. 1990. [5] K.Lougheed and Y.Rekhter, "A Border Gateway Protocol 3 (BGP-3)", RFC 1267, Network Information Center, Oct. 1991. [6] D.Piscitello and J.Lawrence, "A Specification of the Transmission of IP Datagrams Over SMDS", RFC 1209, Network Information Center, Mar. 1991. [7] Y.Rekhter and P.Gross, "Applications of the Border Gateway Protocol in the Internet", RFC 1268, Network Information Center, Oct. 1991. Expiration Date March 1993 [Page 9] - 10 - [8] Y.Rekhter and T.Li, "A Border Gateway Protocol 4 (BGP-4)", Internet Draft, Network Information Center, June 1992. [9] D.Waitzman, C.Partridge, and S.Deering, "Distance vector multicast routing protocol", RFC 1075, Network Information Center, Nov. 1988. Security Considerations Security issues are not discussed in this memo Authors' Addresses Scott W. Brim Cornell Information Technologies 143 Caldwell Hall Cornell University Ithaca, NY 14853 USA Phone: +1-607-255-5510 EMail: Scott_Brim@cornell.edu Yakov Rekhter T.J. Watson Research Center IBM Corporation P.O. Box 218 Yorktown Heights, NY 10598 Phone: +1-914-945-3896 EMail: yakov@watson.ibm.com Expiration Date March 1993 [Page 10]