IDMR Working Group Steve Deering INTERNET-DRAFT Xerox PARC Expires April 1993 Deborah Estrin USC/ISI Dino Farinacci cisco Systems Van Jacobsen LBL October 18, 1993 IGMP Router Extensions for Routing to Sparse Multicast-Groups Status of this Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts). Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please check the I-D abstract listing contained in each Internet Draft directory to learn the current status of this or any other Internet Draft. ************************************************************************** *********************P*L*E*A*S*E*****R*E*A*D****************************** ***October 18th version solves the problem of deadtimes between time when receiver DR sets up S,G and when the associated join is processed all the way upstream (the problem was that as soon as S,G is set up then packets arriving on *,G would get dropped because the incoming interface does not match). The fix is that you associate a bit with the S,G entry and when it is cleared, packets that do not match the incoming interface are checked agains *,G before being dropped. If they match *,G, they are forwarded accordingly. Once a router sees a packet that matches S,G (both longest match for source and incoming interface) it sets the bit associated with S,G and from then on data packets must match the incoming interface for S,G or be dropped. A DR waits to send a prune up the RP,tree until the SPT bit for the S,G entry is set. ***DISCLAIMER: In my usual rush to get a revised version of this document to the IDMR list, Deering, Farinacci, and Jacobson did not get the chance to review the recent detailed changes to the document. I will be out of town and largely off-the-net from October 18 until IETF. please send comments to the list anyway and hopefully someone will be able to respond in a timely manner. At the very least, we will collect them at IETF and respond to them in real time. ***At the Amsterdam IETF we thought that we could eliminate S,G state on the shared tree. After chasing our tails for a while we realized that in order to maintain our ability to do an incoming interface check on data packets, to detect multicast loops, we needed S,G specific state "offtree" of the shared tree (to use CBT terminology). However, to deal with sparse traffic and with data packets sent before the S,G state is established we support encapsulation of data packets in registers so that it is possible that the S,G state might never be set up if it is not deemed necessary (data packets are sporadic and few) and that at least data packets do not need to be dropped (or buffered) until the S,G state is setup. ***This brings up an important point, that we believe it critical to do incoming interface checks on all multicast data packets because of the potentially severe consequences of looping multicast packets. Any multicast protocol that we design should have this capability. ***The biggest open issue with respect to this scheme is the need to introduce an aggregation mechanism for S,G state and messages. Van has proposed something called proxy. It looks doubtful that something will get written up by IETF, but it will be the primary agenda item immediately after. ***The major changes to this document since the last release is that we 1. added back the S,G state upstream of the RP on the shared tree. 2. added a mechanism to disambiguate which RP is being used in the multiple RP case (see additional check added to ESL-Join processing and additional check added to RP-reachability message processing). 3. Data packets travel encapsulated in register packets until the state is established for S,G (e.g. the RPs' join propagates upstream to the first hop router) 4. Register packets are sent unicast to the RP who sets up S,G and sends joins upstream towards S to set up S,G state between the source and RP. 5. All *,G entries have as their incoming interface the interface used to reach the RP. 6. Added option for Register and RP-reachability messages to carry source, mask information downstream so that S,G entries can be set up with appropriate (subnet) mask information. ***Note: The name ESL is not perfect; at least I, DE do not think so. This is particularly true in light of the dense mode scheme which will really be part of the same protocol. But to avoid changing the name repeatedly we are sticking with ESL and down the rode when the protocols themselves stabilize, we will reopen the question of names... ************************************************************************** 1. Introduction and Motivation This document describes a mechanism for efficiently routing to sparse multicast groups that span wide-area (and inter-domain) internets. The implementation of our approach is based on extensions to IGMP[RFC1112] and makes use of explicit source lists. We refer to the scheme as ESL. The mechanism proposed here complements existing multicast routing mechanisms such as those implemented in MOSPF and DVMRP. These traditional multicast schemes are well suited for use within local or wide area regions where a group is widely represented. However, when group members, and senders to those group members, are distributed SPARSELY across a wide area, these schemes are not efficient; data packets (in the case of DVMRP) or membership report information (in the case of MOSPF) are sent over many links that do NOT lead to receivers or senders, respectively. The Explicit Source List (ESL) extensions to IGMP, proposed here, efficiently establish multicast distribution trees to reach members of a multicast group in regions where members are NOT densely represented. Stated simply, when group members densely populate the internet, it is efficient to assume that most networks or subnetworks contain members, and to prune off the exception networks explicitly. In contrast, when group members are sparse, it is efficient to assume that most networks or subnetworks do NOT contain members, and to join on the exception networks explicitly. Thus the basis of the scheme described in this document is an explicit joining mechanism. In another document, in preparation, we will describe a companion multicast routing protocol for dense groups that is based on implicit joining and explicit pruning. The Core Based Tree (CBT) protocol supports sparse multicast groups with a shared (center-based) tree.[CBT] In contrast, the ESL protocol proposed here supports both center and source-specific distribution trees in order to provide higher quality data distribution, when needed. In the remainder of this section we enumerate our design goals and then present necessary background on existing Reverse Path Multicasting (RPM) mechanisms. Section 2 summarizes basic protocol operation and limitations. Section 3 describes the protocol in detail (including packet formats). Section 4 addresses robustness and Section 5 and 6 addresses interoperation with networks that do not implement ESL-multicast. Section 7 briefly compares our approach to CBT[cbt93] and addresses several open design issues. 1.1 Design Goals We had several design objectives in mind when designing this protocol: 1.1.1 Efficient Sparse Group Support Our primary design goal is efficient support for sparsely distributed wide-area multicast groups ("skinny trees"). We define a sparse group as one in which a) the number of networks/domains with members is significantly smaller than number of networks/domains in the Internet, so that traditional RPM or Link-State style multicast (e.g., MOSPF) is inefficient; b) group members span an area that is too large/wide to rely on scope control; and c) the internetwork spanned by the group is not sufficiently resource rich to ignore the overhead of current schemes. Sparse groups are not necessarily "small"; therefore we must support dynamic groups with large numbers of receivers. 1.1.2 High Quality Data Distribution We wish to support low-delay data distribution when needed by the application. In particular, we avoid IMPOSING a single shared tree in which data packets are forwarded to receivers along a common tree, independent of their source. Source-specific trees are superior when (a) multiple sources send data simultaneously and would experience poor service when the traffic is all concentrated on a single shared tree, or (b) the path lengths between sources and destinations in the shortest path tree (SPTs) are significantly shorter than in the shared tree. In particular, to support low-delay distribution for "continuous" media such as voice and video, it is desirable (and perhaps essential) to avoid concentrating the traffic of all sources onto a common distribution tree; it is preferable to spread the traffic load and have the data travel over source-specific trees. Moreover, for all types of traffic, if one wants to minimize path length, data packets should travel along a shortest path distribution tree rooted in the source; i.e., data packets are forwarded based on the {Multicast-Address, Source} tuple (as in MOSPF). For some applications or contexts group trees are appropriate; for example, resource-discovery applications where large numbers of sources transmit packets intermittently. However, shared trees should not be imposed as the only delivery option. In the scheme presented here a shared tree is maintained as a rendezvous mechanism for new receivers and sources; however, in steady state, data can be delivered over a shortest path from receiver to sender, or over the shared tree. We use the term rendezvous points (RPs) in this document because although similar, the role and function of an RP is different from a Core router in the CBT scheme. The protocol described here is based on Reverse Path Forwarding. All RPF schemes construct "reverse shortest path" trees. When unicast routes are symmetric (i.e., the shortest path from a source to receiver is the same as the shortest path from the receiver to the source), reverse shortest path trees will provide equivalent paths to forward shortest path trees. However, when unicast routes are not symmetric, the reverse shortest path may be longer than the forward shortest path. Nevertheless, the reverse shortest path tree delays will still be superior, in general, to the use of a shared tree. 1.1.3 Routing Protocol Independent The protocol should rely on existing unicast routing functionality to adapt to topology changes, but at the same time be independent of the particular protocol employed. This is achievable if the multicast protocol makes use of the unicast routing tables, independent of how those tables are computed. 1.1.4 Interoperability We need interoperability with traditional RPM and link-state multicast routing, both intra- and inter-domain. For example, a single conversation should allow intra-domain distribution to be established by IP-style multicast (e.g. MOSPF) and inter-domain distribution established by ESL; and to allow some inter-domain distribution to be established by ESL, and some by another inter-domain multicast approach designed for more-densely distributed groups. (This sidesteps the question of when to use which type of multicast routing approach.). However, to interoperate with some existing IGPs it will be necessary to impose some additional protocol or configuration overhead. In support of interoperation with IP multicast, AND in support of groups with very large numbers of receivers, we also wish to maintain the logical separation of roles between receivers and senders. We may introduce some optimization to support senders that are also receivers (this is often appropriate to the application), but we do not want to impose the dual role/symmetry. 1.2. Reverse Path Forwarding Before proceeding with a description of ESL we briefly describe existing RPM mechanisms. This provides more detailed motivation for the extensions proposed here, and provides background as to the existing mechanisms with which ESL must interoperate. 1.2.1 RPM mechanisms Reverse Path Forwarding (RPF) is a technique used to forward multicast datagrams. A router will forward a multicast datagram, from a given source, if it was received on the same interface that it uses to forward unicast datagrams to that source. Hence, the reverse path is interrogated before forwarding is performed. RPF forwards the data packet out all interfaces except the incoming. Reverse Path Broadcasting (RPB) eliminates some of the duplicate packets generated by RPF by only sending packets out on a subset of the outgoing interfaces; this subset is based on a local database of parent-child information obtained from the unicast routing protocol. Truncated RPF or RPB detects leaf networks without members and stops forwarding multicast packets for that group. Reverse Path Multicasting (RPM) has mechanisms for detecting and distributing prune information upstream of the truncated leaf networks to stop distribution of multicast packets to parts of the network that do not have members. RPM schemes periodically flush this prune information and revert back to RPF or RPB behavior. In addition, explicit graft messages may be used to undo pruning and thereby join new members into the distribution tree more quickly. 1.2.2 Router to Host interaction mechanisms Hosts inform routers which multicast groups they are members of. Routers use this information to tell other routers that group members are (MOSPF), or are not (DVMRP), present. IGMP is the protocol mechanism for Router to Host interaction for IP multicasting. In particular, Routers send IGMP Host Query packets periodically to ask hosts for group membership information. Hosts reply with IGMP Host Report packets for each multicast group they are a member of; however when a host hears a membership join report from another host on the same LAN, it will supress its join to avoid duplicates [IGMP]. Hosts do not have to inform routers explicitly when they want to send a multicast datagram. ESL uses this same mechanism to identify the location of group members. ESL will also use a router-to-router IGMP Query packet so adjacent multicast routers can be identified. See section 5 for details. 1.2.3 Limitations of Existing Multicast Routing Mechanisms Existing multicast routing mechanisms work efficiently when most networks have local members and therefore the default RPM treatment of data packets (as in DVMRP), or flooding of link state advertisements with local membership information (as in MOSPF) is appropriate. In this context the lack of local members is best treated as an exception by issuing a "Prune" message. However when most networks do NOT have local members there is significant overhead associated with these schemes. One of the major incentives to use multicast is efficient bandwidth usage (otherwise multicast routing support would not be needed to begin with). This is most critical in wide area and multi-domain internetworks where resources are not as uniformly available as in the local and campus network context. 1.2.4 Scope Control One way of limiting the overhead of multicast is to define a maximum number of hops that all messages will traverse. In this way the multicast group information will only be distributed within a limited region. This is a perfect mechanism for local groups. However, for groups that span wider areas, the scope would have to be set so high that the reverse path multicasting of data packets, or the flooding of membership information, would once again consume excessive resources. 1.3 Directly connected ESL routers A directly connected router is one that either a) shares a physical network with a receiver host (i.e., are both physically connected to a common multi-access network), or b) is in the same domain as a receiver and some other multicast routing protocol within the domain is used to distribute multicast packets and to signal membership. A Directly connected ESL router is the last hop ESL router, from the perspective of a receiver; it is the first hop ESL router from the perspective of a source. 2. Overview of the Explicit Source List (ESL) protocol design In the remainder of this document we describe a multicast routing mechanism for sparse groups which can be realized with relatively simple extensions to IGMP[RFC112]. 2.1 ESL We introduce two new IGMP message types to establish distribution trees between sources and receivers (group members); routers send ESL-Join messages upstream towards sources and routers send ESL-Register messages downstream from sources to the RPs. An ESL-Join message contains both a join and prune list; the former enumerates the sources from which the downstream receivers wish to receive packets (via this router, i.e., this upstream router) and the latter indicates the sources from which the downstream receivers do not expect to receive packets (via this route). An ESL-Register message identifies the group and is sent directly to each of the RPs associated with the group. The ESL messages are sent to unicast addresses using raw IP. We can summarize the operation of the ESL scheme as follows. One or more Rendezvous Points (RPs) are used INITIALLY to propagate data packets from sources to receivers. An RP may be an ESL-speaking router that is close to one of the members of the group, or it may be some other host or router in the network. A sparse mode group, i.e., one that the receiver's directly connected ESL router will join using ESL. is identified by the presence of RP address(es) associated with the group in question. The mapping information may be configured or may be learned through another protocol mechanism. When sources start sending to a multicast group, the first hop ESL-router sends an ESL-Register message to the RP(s) for that group. When a receiver joins an ESL multicast group, its first hop ESL router sends an ESL-Join message towards one of the RPs. If source-specific distribution trees are desired, the first hop ESL router for each member (receiver) eventually joins the source-rooted distribution tree for each source by sending an ESL-Join message towards the source and after data packets are received on the new path, it sends an IGMP prune message toward the RP (assuming these represent different uplinks/branches). The state maintained in routers is the same as the forwarding information that is currently maintained by routers running existing IP multicast protocols such as MOSPF, i.e., source (S), multicast address (G), outgoing interface (oif), incoming interface (iif). We refer to this forwarding information as the multicast forwarding entry for (S,G). The ESL messages sent upstream by receivers include an explicit list of the sources known to the downstream receivers (thus the name). 2.2 Design Tradeoffs Referring back to our design objectives, we selected the ESL approach over an alternative, source-initiated approach, where each receiver contacts each source and each source sends out a message to the routers to install a distribution tree from that source to all the listed receivers[ST-II]. The source-initiated approach is less desirable because it a) requires each source to deal with join requests from each receiver (or each receiver aggregate such as a domain), and b) requires that member-domains be listed explicitly. The last concern is the most significant because it means that the source-initiated scheme's overhead increases with the number of receivers (receiver domains) in the group. One limitation of the ESL approach, mentioned earlier, occurs when there are asymmetric paths. This occurs when the unicast path from a given receiver is different than the multicast path from the source to that receiver. Since RPM is used, the path chosen is the one from receiver to source. It is our opinion that this route asymmetry problem is NOT critical. In the future, if routing protocols become more load-sensitive, and as a result more routes are asymmetric due to asymmetric traffic loading, we may need to rely on other aspects of the adaptive routing service to address this problem.\Footnote{For example, if unicast routing could provide a special QoS route whose characteristic was that it represented the preferred path FROM the indicated destination, instead of TO the destination, then the ESL messages used in our protocol could be sent using that QoS, and the deficiency described here would be avoided. } ESL avoids explicit enumeration of receivers, but does require enumeration of sources. If there are very large numbers of sources sending to a group but the sources' average data rates are low, then the group can be supported with a shared tree instead which has less per-source overhead. If shortest path trees are used then when the number of sources grows very large, some form of aggregation or proxy mechanism will be needed; see section 6. We selected this tradeoff because in many existing and anticipated applications, the number of receivers is much larger than the number of sources. And when the number of sources is very large, the average data rate tends to be very low (e.g. resource discovery). 3. Protocol Description Below is a description of the protocol steps and messages. 3.1 Overview ESL-Join messages traveling up from receivers to the RP create a RP-rooted distribution tree that is used to distribute data packets from new sources to all receivers and from all sources to new receivers. ESL-Register messages traveling from sources to the RP causes the RP to send join messages upstream to the sources and thereby create distribution paths from the sources to the RP-rooted distribution tree. In this way, a shared tree is formed between the sources and the RP and the RP and receivers. If shortest path, source-specific, trees are to be used, then data packets >from new sources will trigger ESL-Join messages to travel up from receivers via their shortest paths to sources. | | |** MR-1 ************ MR-2 ******** MR-3 --| | . . *@ |-- Receiver-1-Ga Source-1 -| . . *@ | | . . *@ | | . . *@ | RP ................. MR-8 . . *@ . . *@ . . *@ | . . *@ | |@@ MR-4 @@@@@@@@@@@@ MR-5 @@@@@@@@ MR-6 --| | \ / |-- Receiver-2-Ga Source-2 -| \ / | | \ / | | --------- MR-7 -------- ... RP-rooted distribution tree *** Source-1 based distribution tree @@@ Source-2 based distribution tree MR Multicast Router 3.2 Receiver/Upstream messages This section describes the sequence of messages sent as receivers join a group, as well as the actions taken to establish distribution paths to the receivers. 1) Host sends IGMP-Report message identifying a particular group, G, in response to a directly-connected Router's IGMP-Query message. From this point on we refer to such a host as a receiver, R, (or member) of the group G. 2) When a designated router (DR) receives a report for a new group G it checks to see if it has RP address(es) associated with G. The mechanism for learning this mapping of G to RP(s) is somewhat orthogonal to the specification of this protocol; however, we require some mechanism in order for the protocol to work. At the very least this information must be manually configurable. In addition, as discussed in Section 7, we propose the use of a new IGMP-RP-report message that would allow hosts to inform their directly-connected ESL routers of G,RP(s) mappings. This is important for dynamic groups where hosts participate in special applications to advertise and learn of multicast addresses and their associated RP(s) A DR will identify a new group (i.e., one for which it has no existing multicast entries) as needing ESL support by checking if there exists an RP mapping. If there is no RP mapping provided in IGMP report messages, and there is no mapping provided in the appropriate configuration file, then the router will assume that the group is NOT to be supported with ESL. Even when a group has an associated RP, it may be that some outgoing and incoming interfaces do not require ESL, but are handled using a dense mode scheme such as MOSPF, DVMRP, or Dense mode ESL. In this case the router will flag individual interfaces as dense or sparse mode, to allow differential treatment of different interfaces. For the sake of clarity, we will ignore these added complexities throughout most of the protocol description. For the remainder of this description we will also assume a single RP just for the sake of clarity. We describe the direct extensibility to operation with multiple RPs later in the document. 3) The DR creates a multicast forwarding cache for (*,G) . The RP address is included in a special record in the forwarding entry, so that it will be included in upstream join messages. The outgoing interface is set to that over which the IGMP report was received from the new member. The incoming interface is set to the interface used to send unicast packets to the RP. A wildcard (WC) bit is associated with this entry. The DR sets an RP-timer for this entry. The timer is reset each time an RP-reachable message is received for *,G (see 3.3). 4) The router creates an ESL-Join message with the RP address in its join list with the WC bit set; nothing is listed in its prune list. The WC bit indicates that the receiver expects to receive packets from new sources via this path and therefore upstream routers should create or add to *,G forwarding entries. The WC bit also indicates that the particular IP address is being used as an RP and that the router with that address should send an RP-reachability message downstream; these messages are effectively sent periodically in response to the receipt of periodic join messages. The message is sent as an IP packet addressed to the next hop router upstream towards the RP; the payload contains the IGMP information Multicast-Address=G, ESL-join={WCbit}, ESL-prune=NULL. 5) Each upstream router creates or updates its multicast forwarding entry for (*,G) when it receives an ESL-Join with the WC bit set. The interface on which the ESL-Join message arrived is added to the list of outgoing interfaces for (*,G). As a result each upstream router between the receiver and the RP sends an ESL-Join message in which the join list includes the RP and the WC bit. The messages are sent using IP addressed to the next hop router used to reach the RP. The payload IGMP packet contains Multicast-Address=G, ESL-join={WCbit}, ESL-prune=NULL. The RP recognizes its own address and does not attempt to send join messages for this entry upstream. Because the RP recognizes itself as the RP it knows to send RP-reachability messages in response to the periodic join messages received from downstream. In addition, the incoming interface in the RP's *,G entry is set to null. 6) When an ESL-router has directly-connected members that want to join the group with shortest paths, the router notices data packets for G that are NOT sourced by an address for which it has a multicast forwarding entry. The router initiates a new multicast forwarding entry for (Sn,G), clears the "SPT-bit" for that entry, and sets a timer for the S,G entry. The router also triggers the generation of IGMP messages upstream. For example, an ESL-Join message will be sent upstream to the best next hop towards the new source, Sn, with Sn in the join list: Multicast-Address=G, ESL-join={Sn}, ESL-prune=NULL. The ESL-Join message that gets sent upstream toward the RP will have Sn in the prune list (at the point where the two upstream paths diverge) when the SPT bit on the DR's S,G entry is set: Multicast-Address=G, ESL-join={RP,*}, ESL-prune={Sn}. In order to avoid missing data packets the DR should send the ESL message toward the new Sn before sending the prune message toward the RP. The DR knows it is time to send the prune when it starts receiving new packets from Sn on the interface used to reach Sn. Therefore Sn is not included in the prune list sent toward the RP until the SPT bit is set for the S,G entry. When the Sn,G entry is created, the outgoing interface list is copied >from *,G. In this way when a data packet from Sn arrives and matches on this entry, all receivers will continue to receive sources packets along this path unless and until the receivers choose to prune themselves. Note that a DR may adopt a policy of not setting up a S,G entry (and therefore not sending an ESL-Join message toward the source) until it has received m data packets from the source within some interval of n seconds. This would eliminate the overhead of S,G state upstream when small numbers of packets are sent sporadically. However, data packets distributed in this manner may be delivered over the suboptimal paths of the shared RP tree. The DR may also choose to remain on the RP-distribution tree indefinitely instead of moving to the shortest path tree. 7) In the steady state each router sends periodic refreshes of ESL messages upstream to each of the next hop routers that is en route to each source, S, for which it has a multicast forwarding entry (S,G); as well as for the RP listed in the (*,G) entry. These messages are sent periodically to capture state, topology, and membership changes. An ESL message is also sent on an event-triggered basis each time a new forwarding entry is established for some new (Sn,G) (note that some damping function may be applied, e.g., a merge time). Optionally the ESL message could contain only the incremental information about the new source and only be sent to the next hop toward that source. ESL messages are not sent reliably; lost packets will be recovered from at the next periodic refresh time. The join list in an ESL-Join message sent to a neighboring router, X, includes an address for each source, S, for which: 1) there is a multicast forwarding entry (S,G), or S is listed as the RP-entry for (*,G); AND, 2) X is the next-hop router used to send unicast packets to S (or if S is a directly connected host, then include S if X is the DR for S's LAN), AND, 3) the outgoing interface list in the forwarding entry is NOT null. The prune list in an ESL-Join message sent to a neighboring router Y, includes an address for each source, S, for which: 1) there is a S,G multicast forwarding entry with, the SPT bit set, a null outgoing interface list and Y is the next hop to reach S (or, if S is a directly connected host, then include S if Y is the DR for S's LAN), and In addition, if Y is the next hop used to reach the RP, the prune list also includes an address for each source S for which: 1) there is a S,G multicast forwarding entry, the SPT bit is set, and Y is not the next hop used to reach S. ESL-Join messages are sent periodically and the join and prune lists are populated as specified above. In addition four events will trigger ESL-Join messages: 1) receipt of an IGMP report message for a new group (i.e., one for which the receiving router does not have any S,G or *,G entries) will trigger an ESL-Join message toward the RP with the RP address and WC bit set in the join list, and 2) receipt of an ESL-Join message for an S,G pair (including *,G) for which there is no current forwarding entries, will trigger an ESL-Join message toward S (or RP) with S (or RP with WC bit set) in the join list. 3) receipt of packet on the NEW S,G entry over the appropriate incoming interface triggers a) setting of the SPT bit, and b) sending a prune message up the RP tree. 4) when the outgoing interface list becomes null, indicating no more downstream receivers, a prune is sent upstream. We do not trigger prunes based on data packets. Data packets that arrive on the wrong incoming interface are silently dropped. Note that each source address listed in an ESL may be a specific IP address, or may indicate a subnet or a general aggregate. To support this generality in the future each ESL entry is represented by a {mask length, Address} pair. The distribution of mask information is described in Section 3.3 where reachability messages are described. The potential for using proxy or aggregate information is described briefly in Section 7. 8) Each router that receives an ESL message processes it as follows: a) notes the interface on which the ESL-Join message arrived, call it I. b) if one of the Si has has the WC bit set, and a *,G forwarding entry already exists, add I to the *,G forwarding entry and set the timer. If I is a new interface in the *,G forwarding entry add I to all other existing Si,G forwarding entries also, with the exception of those Si listed in the prune list. If the value of Si with the WC bit set is different from the RP-entry listed in the existing *,G forwarding entry then: i. if Si is greater than the listed RP-entry value, set RP-entry to Si, ii. if Si is less than the listed RP-entry value, leave the RP-entry as is. Do not reset the RP-entry timer. (These steps are taken so that in the case of multiple RPs, loops can be avoided in the RP-based shared tree. This is achieved by making sure that within any branch of the shared tree, routers will converge on using a single RP until it fails.) The incoming interface is set to the RPF interface to the RP in the *,G forwarding entries. c) for any Si without the WC bit set that is included in the ESL-join list, for which there is NO existing (Si,G) forwarding entry, the router initiates one. The outgoing interface is set to I, and the incoming interface is set to the interface used to send unicast packets to Si. IF the interface used to reach Si is the same as the outgoing interface being built (i.e., the interface on which the ESL-Join message arrived) this represents an error and the join should not be processed. d) for any Si, included in the ESL-join list, for which there IS an existing (Si,G) forwarding entry, the router adds I to the list of outgoing interfaces, IF I is not the same as the existing incoming interface; If I is the same as the existing incoming interface, the existing incoming interface takes precedence and the join is dropped. e) for each Si, included in the ESL-prune list, for which there is an existing (Si,G) forwarding entry, the router deletes I from the list of outgoing interfaces. If the router has a current *,G forwarding entry, and if an Si,G entry also exists then the forwarding entry is maintained for (Si,G) even if its outgoing interface list is NULL. If there is no (Si,G) entry, then one is created with the outgoing interface list copied from *,G, and the interface on which the prune was received is deleted. This acts as a negative cache so that packets from Si are not forwarded to the pruning receiver. 9) A timer is maintained for each outgoing interface listed in each S,G or *,G entry. The timer is set when the interface is added. The timer is reset each time an ESL-join message is received on that interface for that forwarding entry (i.e., S,G or *,G). When a timer expires, the corresponding outgoing interface is deleted >from the outgoing interface list. When the outgoing interface list is null a prune message is sent upstream and the entry is deleted after 3 times the refresh period (i.e., 180 seconds). 3.3 Source/Downstream messages Two types of messages are sent downstream: Registers and RP-reachabilty messages. 3.3.1 Register messages 1) When a source, S, wishes to send to a multicast group, G, for the first time, S simply sends a data packet addressed to the group. 2) When a data packet from S addressed to G arrives at the first hop ESL designated router (DR), and the DR has no current forwarding entry for (S,G), the router looks up the RP(s) address(es) associated with G. The RP information may be configured or may be provided by a new IGMP-RP-report message. If no RP information exists, then the router assumes the group is handled as a dense group and simply sends the data packets out all non-incoming interfaces. The RP mapping function is only performed by the the first ESL router to see the source's packets before the *,G entry is established; i.e, the mapping is not performed by each ESL router on a distribution tree. The RP information should be cached for future use. 3) The router sends an ESL-Register message to the RP. The message indicates the group for which the source is registering, and has the WC bit set. Mask information for the source may be included. The original data packet is encapsulated inside the Register packet. The message is sent as a unicast packet to the RP; it is not processed by the intermediate routers. If there are multiple RPs associated with the multicast group, then the source sends a Register message to each of them. Subsequent data packets sent to the same group will trigger the same action until an S,G entry is set up in the first hop router in response to a join message received from downstream. The RP information should be cached so that multiple lookups can be avoided for subsequent data packets sent to the same group. 4) When a router (i.e., the RP) receives a Register message, the router a) decapsulates the data packet, and forwards it according its local *,G forwarding entry, and b) sets up an S,G forwarding entry with the outgoing interface list copied from the *,G outgoing interface list. The S,G entry is set up using the mask information, if provided, in the Register message. A timer is set for the S,G entry. The S,G entry causes the RP to send an ESL-Join message for the indicated group toward the source of the Register message. The ESL-Join message includes the source's address and mask information; note the source here is the source of the Register message, i.e., the source-host's directly-connected ESL router, NOT the source host itself. This message is triggered and processed like any other ESL-Join message by the intermediate routers, which either create or augment the S,G forwarding state in exactly the same way as was described in 3.2: the ESL-Join message's incoming interface is added to the outgoing interface list, and the incoming interface for the entry is set to the interface used to reach the source. Note that an RP may adopt a policy of not setting up a S,G entry (and therefore not sending an ESL-Join message toward the source) until it has received m Register messages (with encapsulated data packets) from the source within som interval of n seconds. This would eliminate the overhead of S,G state upstream of the RP when small numbers of packets are sent sporadically. However, data packets distributed in this manner may be delivered on very suboptimal paths because they travel all the way to the RP before being multicasted. 5) Once the ESL-Join messages have propagated upstream from the RP, data packets from the source will follow the S,G distribution path state established. The packets will travel to the receivers via the distribution paths established by the ESL-Join messages sent upstream >from receivers toward the RP. Multicast packets will arrive at some receivers before reaching the RP if the receivers and the source are both "upstream" of the RP. When the receivers initiate shortest-path distribution, additional outgoing interfaces will be added to the S,G entry and the data packets will be delivered via the shortest paths to receivers. 6) Data packets will continue to travel from the source to the RP(s) in order to reach new receivers. Similarly, receivers continue to receive some data packets via the RP tree in order to pick up new senders. However, when source-specific tree distribution is used, most data packets will arrive at receivers over a shortest path distribution tree. 7) Data packets travel from the source via the reverse shortest path tree rooted in the source because routers between the source and the receiver have a multicast forwarding entry for (Sn,G) whose outgoing interface list includes all the interfaces on which the routers received ESL-Join messages from downstream receivers. 3.3.2 RP-Reachability Messages 1) A router starts sending periodic "RP-reachability" messages downstream when: (a) it receives an ESL-Join message with its own address AND WC-bit set in the join list, and (b) the incoming interface on its (*,G) entry is Null. The first condition is to make sure that it is an RP. The second condition is to make sure that only the "dominant" RP will send RP-reachability messages, so the traffic can be minimized. This obviates the need to do any kind of special configuration of RPs; any router can be an RP since RP behavior is triggered by the protocol itself. A router is responsible for initiating RP-reachability messages to downstream nodes if it has a *,G entry with a NULL incoming interface. 2) The router sends the periodic RP-reachability messages out all the outgoing interfaces in the *,G entry. The period for this message is 90 seconds. The messages are addressed to the 224.0.0.1 class D address and the message contents includes the RP and G and an optional list of source, mask information. 2) When a router receives an RP-reachability message for a group G it must compare the RP address listed in the message to the RP address listed in the current *,G RP-entry. If the RP listed in the message is greater than the RP listed in the *,G RP-entry, and if the next hop used to reach the listed RP is the same as the next hop used to reach the RP-entry, then the router replaces its current RP-entry with the RP address from the RP-reachability message This is necessary to eliminate routing loops that can occur in some instances when downstream receivers select different upstream RPs and the RP-centerod distribution trees overlap. If the RP listed in the message is less than the RP listed in the *,G RP-entry, OR if the RP-reachability message did not come in on the RPF interface to the RP listed in the message, then the message is not forwarded. In more detail, when a router receives an RP-reachability message it does the following; assume that router X receives an RP-reachability message of RP1 from incoming interface I. 1. Perform RPF check. If I is not the best next hop to RP1, drop this RP-reachability message. 2. Else, If the incoming interface of (*,G) is not NULL and not I, drop the RP-reachability message. 3. If the incoming interface of (*,G) is I, compare RP1 with the address in RP-entry, say RP2. If RP1 is larger than RP2, set RP-entry to RP1 and propagate the RP-reachability message downstream. Otherwise, drop the RP-reachability message. 4. If the incoming interface of (*,G) is NULL and WC-bit is set then this router is currently acting as an RP for G. In this case, compare RP1 with X. If RP1 is larger than X, set RP-entry to RP1, set the incoming interface to the RPF interfac used to reach RP1, and clear the WC-bit for that router. Also, propagate RP-reachability message downstream. Otherwise, if RP1 is less than X, drop the RP-reachability message. 4) If there are any *,G entries the message is forwarded with the same class D address out the outgoing interfaces from the G entries. If a downstream router does not have any *,G entries then the packet is dropped. When DRs with directly connected group members receive this message they reset their RP-timers on the RP-entry in *,G. This allows group-members' directly-connected ESL routers to detect when an RP becomes unreachable and trigger a join toward an alternate RP, if one exists. 5) The RP-reachability message may optionally contain Source/Mask information for (S,G) entries maintained by the RP. This mask information is optionally obtained via Register messages sent to the RP by sources' first hop routers. The masking information can be used by last-hop ESL routers to consolidate S,G entries, and consequently ESL-Join lists sent upstream. 3.4 Multicast Data Packet Processing. Data packets are processed in a similar manner to existing multicast schemes. An incoming interface check is performed and if it fails the packet is dropped, otherwise the packet is forwarded to all the interfaces listed in the outgoing interface list (whose timers have not expired). There are two exception actions that are introduced if packets are to be delivered continuously, even during the transition from a shared to shortest path tree. First, when a data packet matches on an S,G entry with a cleared SPT bit, if the packet does not match the incoming interface for that entry, then the packet is forwarded according to the *,G entry; i.e., it is sent to the outgoing interfaces listed in *,G IF the incoming interface matches that of the *,G. In addition, when a data packet matches on an S,G entry with a cleared SPT bit, AND the incoming interface of the packet matches that of the S,G entry, then the packet is forwarded and the SPT bit is set for that entry. Data packets never trigger prunes . Data packets may trigger actions which in turn trigger prunes. In particular data packets from a new source can trigger creation of a new S,G forwarding entry. This causes S to be included in the prune list in a triggered ESL messages toward the RP; just as it causes S to be included in the join list in a triggered ESL message toward the source. 3.5 Packet Types RFC 1112 specifies two types of IGMP packets for hosts and routers to convey multicast group membership and reachability information. An IGMP Query packet is transmitted periodically by routers to ask hosts to report which multicast groups they are members of. An IGMP Report packet is transmitted by hosts in response to received Queries advertising group membership. This document introduces new types of IGMP packets that are used by ESL routers. The following packet format is used: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ %%DE Dino, please add Version/Type/TOS/Code that you proposed in comments Version This memo specifies version 1 of IGMP. Version 0 is specified in RFC-988 and is now obsolete. Type There are five types of IGMP messages: 1 = Host Membership Query 2 = Host Membership Report 3 = Router DVMRP Messages 4 = Router ESL Messages Code Codes for specific message types. Used only by DVMRP and ESL. ESL codes are: 0 = Query 1 = Register 2 = Join/Prune 3 = RP-Reachable 4 = Assert dense-mode ESL only 5 = Mode dual-mode ESL only 6 = Mode-Ack dual-mode ESL only Checksum The checksum is the 16-bit one's complement of the one's complement sum of the entire IGMP message. For computing the checksum, the checksum field is zeroed. Group Address In a Host Membership Query message, the group address field is zeroed when sent, ignored when received. In a Host Membership Report message, the group address field holds the IP host group address of the group being reported. In a Register, Join/Prune, Query, and RP-Reachable message, the group address field is zeroed when sent, ignored when received. 3.5.1 ESL-Register, ESL-Join, and Assert messages. The Register, Join/Prune and Assert messages have additional information appended to the fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | Maddr Length | Addr Length | Num groups | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast Group Address-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Join Sources | Number of Prune Sources | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Join Source Address-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Join Source Address-n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prune Source Address-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prune Source Address-n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast Group Address-n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Join Sources | Number of Prune Sources | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Join Source Address-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Join Source Address-n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prune Source Address-1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | . | | . | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Prune Source Address-n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved Unused field, zeroed when sent, ignored when received. Addr Length The length in bytes of the encoded source addresses in the Join and Prune lists. Maddr Length The length in bytes of the encoded multicast addresses. Num Groups The number of multicast group sets contained in the message. Multicast group address For IP, it is a 4-byte Class D address. Number of Join Sources Number of join source addresses listed for a given group. Join Source Address-1 - n This list contains the sources that the sending router will forward multicast datagrams for if received on the interface this message is sent on. The address 0.0.0.0 indicates a join for all sources. For a Register message, the source address specifies the address(es) of the Rendezvous Point(s). Number of Prune Sources Number of prune source addresses listed for a given group. Prune Source Address-1 - n This list contains the sources that the sending router does not want to forward multicast datagrams for when received on the interface this message is sent on. The address 0.0.0.0 indicates a prune for all sources. In Router messages, all source addresses will have the following format:
is a 1 bit value. If 1, packets should propagate to this address and a wildcard multicast entry should be built with interface information based on the receiving and sending interface for Router messages. If 0, the
is a source address. The Address should be added to the address list associated with the wildcard multicast entry for the group. is 7 bits. The value is the number of contiguous bits left justified used as a mask which describes the
.
is the length indicated from the "Addr Length" field at the beginning of the header. The must be less than or equal to "Addr Length" * 8. A source address could be a host IP address: <0><32><192.1.1.17> A source address could be the RP's IP address: <1><32><131.108.13.111> A source address could be a subnet address: <0><28><192.1.1.16> A source address could be a general aggregate: <0><16><192.1.0.0> ESL messages are always sent as unicast IP addressed packets. These messages are sent towards the direction of the Join and Prune source addresses. This is achieved by doing a route lookup for each source address and IP addressing it to the next-hop router along the path to the source. Each router along the way does this until the destination is reached. %df - is this still true? Router messages may be data linked multicast when transmitted on subnetworks that support multicast. 3.5.2 RP-reachability message The RP-reachable packet format is as follows: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version| Type | Code | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Group Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number of Entries | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Mask Length | Address ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ . . . Group Address Group address associated with RP. RP Address The Rendezvous Point IP address of the sender. Number of Entries The number of {Mask Length, Address} entries in the message. This value is 0 if no addresses are included. Mask Length Number of bits in the network mask for the corresponding address. Address 4-byte IP address. The corresponding zero bits in the mask should be set to zero in the address. Each RP will send RP-Reachable messages to all routers on its distribution tree for a particular group. These messages are sent so routers can detect that an RP is unreachable. Routers that have attached host members for a group will process the message. On a multi-home stub subnetwork, the DR is responsible for processing the message. A router that processes the message is one that updates the RP address timer to indicate that the RP is still alive. Other routers that are on the RP-distribution tree propagate the message. The RPs will address the RP-Reachable messages to 224.0.0.1. Routers that have state for the group with respect to the RP distribution tree will propagate the message. Otherwise, the message is discarded. If an RP address timer expires, the DR should attempt to send an ESL join message toward an alternate RP provided for that group if one is available. 3.5.3 Other message types In a future version of this document we will specify a new IGMP message type that will allows hosts to advertise a list of 1 to n RP addresses associated with a particular group address. 3.5.4 Examples Following are examples of source address encodings in Register, Join/Prune, and RP-reachability messages: Register scenario: A host (131.108.1.1) sends a multicast datagram to 224.1.1.1. The first hop designated router (131.108.1.2) for the attached LAN, sends an ESL-Register message to the RP (131.108.10.2). The multicast datagram is encapsulated in the Register message. IP Header: Source IP address: 131.108.1.2 Destination IP address: 131.108.10.2 IGMP Header: IGMP Type: ESL IGMP Code: Register Group Address: 0.0.0.0 ESL Header: Maddr Length: 4 Addr Length: 4 Num Groups: 1 Multicast Group: 224.1.1.1 # Join/Prune Sources: 1/0 WC-bit: 0 Mask Length: 24 Address: 131.108.1.0 Encapsulated datagram: Source IP address: 131.108.1.1 Destination IP address: 224.1.1.1 Join scenario: The RP (131.108.10.2) sends an ESL-Join upstream towards the source (131.108.1.0) in response to the above Register message recieived. The RP's next-hop to reach 131.08.1.0 is 131.08.20.2. IP Header: Source IP address: 131.108.10.2 Destination IP address: 131.108.20.2 IGMP Header: IGMP Type: ESL IGMP Code: Join/Prune Group Address: 0.0.0.0 ESL Header: Maddr Length: 4 Addr Length: 4 Num Groups: 1 Multicast Group: 224.1.1.1 # Join/Prune Sources: 1/0 WC-bit: 0 Mask Length: 24 Address: 131.108.1.0 RP-reachability scenario: Now that 131.108.10.2 knows it is an RP (because it received a Register message), it must send ESL RP-Reachability messages downstream on the RP-distribution tree for 224.1.1.1. It will send the following packet on all outgoing interfaces of the (*, 224.1.1.1) entry. Each router on the path will build a new IP header with its own IP address as the source IP address. IP Header: Source IP address: 131.108.10.2 Destination IP address: 224.0.0.1 IGMP Header: IGMP Type: ESL IGMP Code: RP-Reachability Group Address: 224.1.1.1 ESL Header: RP Address: 131.108.10.2 Number of Entries: 1 Mask Length: 24 Address: 131.108.1.0 4. Robustness Features 4.1 Lost ESL messages The protocol is fairly robust to lost control messages. If an ESL-Register message gets lost then data packets will continue to be encapsulated in subsequent ESL-Register messages until the RP initializes an S,G entry and the associated ESL-Join messages propagate up to the Source. If an ESL-Join message is lost then for the remainder of the refresh period, packets will not be forwarded on the new path, or will continue to be forwarded until the refresh is sent. %%Editorial note: we are not at all fixed on these timer values... It is recommended that ESL messages be transmitted at a rate of 60 seconds. Information that is cached should be timed out after 3 times the transmission period if no ESL message for the entries have been received. When a forwarding entry has no more outgoing interfaces it is deleted and a prune can be sent upstream (or the router can wait until the next period when the ESL list will no longer include the Source for the deleted entry and the state will eventually be timed out upstream). 4.1 Multiple Rendezvous Points and RP failure scenarios If there is one RP then there is no concern about sources and receivers actually being able to rendezvous, but there is a reliability issue. If there are more than one RPs then each receiver still joins to a single RP, but each source must register to EACH and EVERY RP. In other words there are multiple RP distribution trees, and so long as each source sends its packets to all of them, receivers need only join to one. When the RP fails or becomes unreachable by receivers, members who have already joined will continue to receive packets from sources that had previously sent to the group and for which the receivers had already switched to the SPT (assuming the SPT is not affected by the same failure as makes the RP unreachable). However, new members will send toward the unreachable RP and will NOT be successfully joined to the group unless their join packets reach existing SPTs of the sources before they reach the RP. New sources will attempt to register and send to the RP. Their packets will either not arrive at the RP in which case they will only be forwarded to receivers who are upstream of the RP with respect to the source, or their packets will get to the RP but will not reach downstream receivers. In the latter case, the SPT from the source to receivers will never be set up even if the paths that make up the SPT are available. This leads to the motivation for employing multiple RPs. Unreachable RPs are detected using the RP reachability message. When a *,G entry is established by a router with local members, a timer is set. The timer is reset each time an RP reachability message is received. If this timer expires, the router looks up an alternate RP for the group, sends a join toward the new RP. A new *,G entry is established with the incoming interface set to the interface used to reach the new RP. The outgoing interface list includes only those interfaces on which IGMP Reports for the group were received. (Other outgoing interfaces may no longer be valid since the router in question may not be on the shortest path between the downstream branch and the new RP. If the router is on this shortest path as well, it will eventually receive an explicit join from that downstream branch as the last hop routers take the same action). When multiple RPs are used, each source registers and sends data packets towards each of the RPs, but Receivers only join toward a single RP. If one of the RPs fails, receivers that joined to that RP will stop receiving RP-reachability messages and will start sending joins to one of the alternative RPs. Sources do not need to take special action. When an RP is unreachable it will not receive the source's Register messages and therefore will not respond with joins and so the outgoing interfaces in *,G pointing toward the unreachable RP will time out; without any explicit action on the part of the source. Because each receiver's directly connected router selects an RP independently, it is possible for routers on the same part of the distribution tree to specify different RPs while both are still available. This can lead to looping in some topologies. To avoid looping, RP address information carried in ESL-Join and RP-reachability messages is examined to converge to a common RP (the larger numbered RP dominates). 4.2 Unicast routing changes When unicast routing changes an RPF check is done and all affected expected incoming interfaces are updated. If the new incoming interface appears in the outgoing interface list, it is deleted from the outgoing list. The previous incoming interface may be added to the outgoing interface list by a subsequent join from downstream. Joins received on the current incoming interface are ignored. Joins received on new interfaces or existing outgoing interfaces are not ignored. Other outgoing interfaces are left as is until they are explicitly pruned by downstream routers or are timed out due to lack of appropriate join messages. The ESL-router must send an ESL-Join message out its new interface to inform upstream routers that it expects multicast datagrams over the interface. It must send an ESL-Prune message out the old interface, if the link is operational, to inform upstream routers that this part of the distribution tree is going away. If the unicast route goes unreachable, all multicast entries for (S,Gi) should be modified to have null outgoing interface lists, however the entries should not be deleted immediately; this causes periodic prunes to be sent and multicast packets to be discarded. The entry should be kept alive for the remainder of the timeout lifetime. This helps to eliminate transient multicast routing forwarding loops. If the unicast route has a new next-hop interface, the (S,Gi) entries must be updated. The following diagram shows how a multicast forwarding loop can be avoided. Assume all LANs have members for a given group. The arrows represent the expected interface each router will receive multicast datagrams on, as well as the outgoing interfaces with respect to Source. ---------- ---------- | | ^ ^ +----+ +----+ | R1 | >-----> | R2 | +----+ +----+ ^ v | | | | | | | | | | +----+ | | +--------> | R3 | >-| | +----+ | | | | | | Source --------+ If the path to Source changes, R1 and R3 may converge on the new path before R2, e.g., R1 uses its link to R3 as its expected incoming interface and R3 uses its new shortest path link to Source. In this state, ---------- ---------- | | ^ ^ +----+ +----+ | R1 | >-----> | R2 | +----+ +----+ | ^ | | | | | | | | | | +----+ | | +--------- | R3 | >-| | +----+ | | ^ | | | Source --------+ R1 and R3 were informed about a topology change for Source and changed their incoming interfaces. Both R1 and R3 send joins up their new incoming interfaces. R1 also deleted its outgoing interface to R3 because this interface is used as an incoming interface. R2, however has not been informed about the topology change. If R1 received a multicast datagram on its old expected interface, it would silently drop it. This would happen if upstream routers from R1 to the Source had old routing information. If upstream routers have converged on a new path all datagrams will enter this part of the network through R3, and it would forward appropriately to its LAN and to R1 which expects it. R1 would forward to its LAN as well as R2. R2, using out of date expected incoming interface, would also forward the packet. Once R2 is informed of the topology change, it will change its expected incoming interface to R3 and will send a prune to R1 and a join to R3. The final state would look like: ---------- ---------- | | ^ ^ +----+ +----+ | R1 | ------- | R2 | +----+ +----+ | ^ ^ | | | | | ^ | | | +----+ | | +--------< | R3 | >-| | +----+ | | ^ | | | Source --------+ More generally, if unicast routing changes and the router in question has not converged then one of two situations exists. In the first, one or more of the existing outgoing interfaces may no longer reach any receivers. In this case data packets are forwarded until they reach a router that has converged and finds that the incoming interface for the packet is not right; in which case the packet will be dropped. The cost of this transient condition is the continued sending of data packets down links that do not lead to receivers; this can occur for the duration of a refresh period. The second situation occurs when data packets begin to arrive over an incoming interface other than the one listed in the corresponding S,G entry. This occurs when upstream has converged and the router in question has not. In this case, data packets will be dropped instead of delivered for a time less than or equal to the convergence time + refresh period. 5. ESL Routers on multi-access subnetworks There are several multiaccess subnetwork configurations that require special consideration. 5.1 Designated Routers +----+ +----+ | R1 | | R2 | +----+ +----+ | | ------------------------- | | | H1a H2a H3b (Hxy - Host x is a member to group y) When there are multiple ESL routers on a multi-access network, only a single router must be responsible for the following actions: o Soliciting group membership from hosts and sending ESL-Join messages. o Sending Register messages on behalf of a connected source host when it sends a multicast packet. o Forwarding multicast packets onto the multi-access network. This is done with a simple Designated Router (DR) election. Neighboring routers send ESL Query packets to each other. The packets are sent to 224.0.0.2. The largest IP addressed system will assume role as DR. The default transmission interval is 30 seconds. A router should detect the DR as unreachable when it does not receive a Query in 3 times the transmission interval. There will be one DR that supports all groups per multi-access network. The DR sends periodic IGMP Host Query packets to 224.0.0.1 soliciting hosts to respond. The DR sends multicast packets using the data link address that is mapped from the IP Class D multicast address. 5.2 Multiaccess subnetwork as a transit network The following diagram shows the case where a multi-access network is used as a transit network. | | | | +----+ +----+ | | R1 | | R2 | | +----+ +----+ | | | | Downstream to group members | v | ------------------------- | | | | v v | +----+ +----+ V | R3 | | R4 | +----+ +----+ | | v v | | When a LAN is used as a transit network among routers, it is required that a single router forward multicast packets to downstream routers. This router is known as the Router-DR. A single Router-DR is chosen by the receipt of ESL messages unicasted by each of the downstream routers. All routers that use the LAN as their incoming interface for multicast packets from a particular source, will expect it from a single Router-DR. This router is the one they use for sending unicast packets to the source. All routers will select the same router. In the case of equal-cost unicast paths, the largest IP addressed next-hop is used. The Router-DR forwards multicast packets using data link address that is mapped from the IP Class D multicast address. The multicasted data packets will be seen by all other routers connected to the LAN. For each router, if it has an entry for S,G or *,G and the LAN is the indicated incoming interface then the router will forward the packet. If there is no such entry or if the incoming interface is not the LAN then the packet will be silently dropped. In the above diagram, both R3 and R4 have downstream group members. They will send their ESL-Join messages towards the RP, and therefore select either R1 or R2 to send the message. Assume R2 is the shorter path to the RP. Later, when multicast datagrams travel from the RP, they will come through R2 only, avoiding duplicates on the LAN. The same procedure takes place when the source-based distribution tree is built. Whatever router on the LAN is chosen is also responsible for delivering multicast packets to host members, if they were present. 5.3 Parallel routers The following diagram illustrates the behavior of routers that are in parallel and the interaction of DR and RP routers. S | ------------------------------------- LAN1 | | | +----+ +----+ +----+ DR | R1 | RP | R2 | | R3 | +----+ +----+ DR +----+ | | | ------------------------------------- LAN2 | | H1a H2a Assume R1 is the RP for Ga, R3 is DR for LAN1, R2 the DR for LAN2, and that LAN1 is the preferred path among routers to reach the RP. If the receivers of Ga join first, R2 will send an ESL-join to R1, the RP, out LAN1. R2 builds a multicast entry for (*,Ga) with incoming interface LAN1 and outgoing interface LAN2. R1 receives the ESL-Join message and builds a multicast entry for (*,Ga) with incoming interface set to {} (since it is the RP) and outgoing interface set to LAN1. When S sends a multicast datagram, the DR for LAN1, R3, will encapsulate the data packet in a Register message and send it to R1, the RP, out LAN1. The RP, R1, decapsulates the data packet and forwards it onto LAN1, as indicated in the outgoing interface list of R1's *,Ga entry. The RP, R1, then processes the Register part of the message and sets up an S,Ga entry with LAN1 as the incoming interface and a null outgoing interface list; the outgoing interface list is copied from *,Ga but LAN1 is NOT included in S,Ga outgoing interface list because LAN1 is the incoming interface. R1 triggers a join toward S. When R1 does a lookup on S it finds that S is on a directly connected LAN and sends the join to the DR for that LAN, i.e., R3. When the DR, R3, receives the join it builds an S,Ga entry with LAN1 as the incoming interface. Since there is no *,Ga entry, the outgoing interface list is set to null. Subsequent data packets from S for Ga that arrive from LAN1 will be silently dropped by R1 and R3 since the S,Ga entries have null outgoing interface lists. For this example we assume that shortest path trees are desired. In this case, when R2 receives the multicast datagram from S and finds that the longest match is on *,Ga (i.e., there is no S,Ga entry in R2), R2 creates an S,Ga entry, sets the incoming interface to LAN1 (since that is the interface used to send packets to S), and copies the outgoing interface list from the existing *,Ga entry (in this case LAN2). R2 forwards this and subsequent multicast datagrams for S,G onto LAN2. When R2 creates S,Ga, it also triggers an ESL-join message to the next hop to S; in this case S is directly connected so the ESL-join is sent to the DR for that LAN, R3. R3 receives the join but does not add LAN1 to its outgoing interface list for S,Ga because LAN1 is the incoming interface for S,Ga. 5.4 Leaf-Router Prunes LAN connected routers must also detect when there are no more downstream routers. The following protocol is used: when a router whose incoming interface is the LAN has all of its outgoing interfaces go to null, the router multicasts a prune message for S,G onto the LAN. All other routers hear this prune and if there is any router that has the LAN as its incoming interface for the same S,G and has non-null outgoing interface list, then the router sends a join message onto the LAN to override the prune. The join should go to single upstream router that is the right previous hop to the source or RP; however, at the same time we want others to hear the join so that they supress their own joins. For this reason the join is data link multicasted, with the IP address set to the upstream router. 6. Interoperation with non-ESL networks/regions A network or collection of networks should be able to choose whether to use ESL or traditional multicast to join a distribution tree, depending on the density of the membership in that region. If the density is high then there is no need to carry ESL messages and state overhead within the region; it is more efficient to use RPM or flood membership reports since in general most links will be on a path from some source to some destination and the overhead for these traditional IP multicast mechanisms is not a function of the number of sources. In addition, we wish to interoperate with networks that do not have hosts and routers modified to generate and interpret ESL-Join messages. The basic problem of splicing these "IP clouds" onto ESL trees is identifying which border router for the IP cloud should be the entry point for data packets from a particular source, and therefore which sources individual border routers should put in their join and prune lists. This is analogous to the LAN case when there is more than one router serving it. The designated router is the one that takes responsibility for serving the members on the LAN. If the Border routers are running IBGP then they have the information necessary to determine which BR should include a particular host in its join list. Similarly if the BRs are running OSPF then the information can be computed. However, if the domain is running DVMRP or some other scheme, there may need to be some additional mechanism employed in the BRs. This is an open issue still to be resolved in order to achieve maximum interoperation with existing networks. An additional problem arises when interoperating with a non-ESL cloud. Namely when a receiver decides to join a group inside of a cloud in which there are no other members then the BRs of that cloud must be notified in order to trigger sending of an ESL-Join join message. In the case of MOSPF new group membership is advertised to backbone routers but not necessarily to all BRs. In the case of DVMRP and most other distance vector IGPs membership is not advertised at all. Therefore in both cases, some additional mechanism is needed. We can solve this problem in a manner similar to the multi-access LAN case. 1. Two internal (to the cloud) multicast groups are created Multicast-Reporters (MR) and All-ESL-BRs. 2a. If the cloud runs MOSPF then one (or a small number for reliability) of the backbone routers joins the MR group. 2b. If the cloud runs DVMRP, then ALL internal routers that have the potential of being DRs for a network must join the MR group (i.e., any router that will process an IGMP report). 3. All BRs that speak ESL join the All-ESL-BRs group AND the MR group. 4. Members of All-ESL-BRs do a Designated BR election among themselves 5. The resulting DBR sends an IGMP-query to the MR group. 6. Members of MR respond by sending IGMP-report messages to the MR group. Members of MR listen to these reports and supress sending reports for groups that have been reported by other routers. As a result, all ESL-BRs hear of all groups for which internal members exist. Based on this information, and information obtained from IBGP or OSPF, the BRs can determine which of them should send an ESL-Join message to the RP for each group for which there is a local member. Note that DBRs are source, group specific. We will describe two scenarios to illustrate the interoperability issue: one case where the source of a multicast datagram is in the non-ESL cloud and receivers in a group are outside of the cloud, and one case where the source is outside of the cloud and receivers are inside the cloud. --------------- / \ / BR1 \ -----...-------- RP | | / | S | / | | / | | . \ BR2 / -------------. \ BR3 / . --------------- / | / +---------...------------ S sends a multicast packet that gets to all border routers. Protocols such as MOSPF and DVMRP will cause the multicast packet to hit all border routers. If all border routers know of each other the one with the shortest path to S is elected the Border DR. In case of tie, the largest IP addressed router becomes Border DR. The Border DR sends the Register message to the RP. All others discard the multicast packet. The Border DR sends the multicast packets along the path to the RP after join messages for S,G propagate back to the DBR to establish S,G state. In the receiver in the cloud case: --------------- / \ / BR1 \ -----...-------- RP | H1a | / | | / | H2a | / | | . \ BR2 / -------------. \ BR3 / . --------------- / | / +---------...------------ Border Routers need to know which one sends ESL-Join messages to RP for which groups. If MOSPF is running all borders know of each other, they can determine which one is closer to the RP. The one closer, sends the ESL-Join message. Similarly, iBGP provides the BRs with the information to determine whether each particular BR has the preferred route to the RP. Similarly, once a data packet from a new source arrives at the BR, it must determine which border router is closest to that source. If the BR itself is the closest, it forwards the packet internally to the multicast group, sets up the source-specific forwarding entry, and sends an ESL-join message toward the source. Otherwise, it encapsulates the packet and unicasts it to the correct border, and that border router takes the same action. In summary, borders need to learn about each other and their respective routes to RPs or sources using one of the following: o OSPF or IS-IS o IBGP or IIDRP o Configured mesh of tunnels and unicast routing is running over tunnels. 7. Design Issues 7.1 Comparison with Core Based Tree CBT was proposed to address similar scaling problems, however it has several differences; some represent functional differences and some engineering tradeoffs. 7.1.1 Tree Types The first major issue is that CBT imposes a single shared tree for each multicast group. We justified our desire to avoid this scenario earlier. CBT must rely on more "cores" in order to obtain efficient distribution paths. This means that the core(s) must be selected carefully to avoid excessively high delay distribution paths. Even if the core is placed optimally, there is still the significant issue for continuous media types of concentrating all traffic onto a common data distribution tree. In ESL If some application does not want shortest path tree distribution then a host does not have to add all new sources to its ESL. This fact must also be signaled to the routers so that they also operate in shared-tree mode. This will cause the RP-based tree to continue to be used as the distribution tree. In that way an application can choose a group tree instead of a shortest path tree. Actually first hop routers can make this decision independently, and a host could even choose differently for different sources. However, if RP-based distribution is maintained in any cases then the choice of RPs is more critical than when RPs are used only as a transition path to shortest path trees. 7.1.2 Group Specific State There are also protocol engineering differences between the two. One of these issues is a tradeoff between requiring group specific state on the routers in between sources and the RP, vs. carrying an option in all DATA packets sent to the group. In CBT, data packets travel from the source to the CBT with an option attached. THis allows the packets to be sent initially towards the core, by non-CBT routers, and then to be routed along the CBT once they hit a router that is on the core tree. In ESL we have chosen to use a Register packet and establish explicit S,G forwarding entries so that data packets need not require as much processing. 7.1.3 Soft state vs. explicit reliability mechanism CBT uses explicit hop by hop mechanisms to achieve reliable delivery of control messages. ESL uses periodic refreshes as its primary means of reliability. This approach reduces the complexity of the protocol and covers a wide range of protocol and network failures in a single simple mechanism. On the other hand, it can introduce additional message protocol overhead. 7.1.4 Effect on Host Service Model CBT requires that hosts be modified to participate in the CBT protocol. ESL proposes to make use of optional new IGMP Report messages that include a list of zero to n RPs; however hosts do not otherwise have to participate directly in the ESL protocol. 7.1.5 Incoming interface check on all multicast data packets If multicast data packets loop the result can be severe; unlike unicast packets, multicast packets fan out each time they loop. Therefore we assert that all multicast data packets should be subject to an incoming interface check comparable to the one performed by DVMRP and MOSPF. In order to do this check *,G state can only be used downstream of the RP. As a consequence, in any particular router on the shared tree, a specific S,G entry must be maintained for sources that are upstream of the RP relative to that router. 7.2 Selecting and Identifying RPs An RP for a particular multicast group can be any IP-addressable entity in the internet. However, it is most efficient and convenient for the RP to be the directly-connected ESL router of the members of the group. If an RP has local members of the group then there is no wasted overhead associated with sources continually sending their data packets to the RP since it needed to be delivered there anyway for delivery to those members. Nevertheless, we need not be overly concerned with placement of the RPs when shortest path trees are used because the RP will not remain on the distribution path for most receivers, unless it happens to be centrally located. Obviously, pathological cases should be avoided, such as putting the RP on the other end of a very narrow link that is exceeded by the datarate of sources. The RP address can be configured or can be dynamically discovered by mapping from the multicast address, query of a directory service, or from information obtained via new ESL-RP-Report messages. The mapping of G to RP addresses should be cached. While the mapping of multicast addresses to RP addresses is an open issue in the long term, in the short term we will implement two mechanism. The first approach is to simply manually configure the mapping. The second approach is to allow hosts (both sources and receivers) to inform routers of the mapping using a new ESL-RP-Report message. The latter approach is needed to support dynamic groups that hosts advertise and discover by participating in a special application, e.g., the session directory (sd) tool developed by V. Jacobson. Advertising hosts will advertise RP addresses along with the multicast address and other hosts that wish to send to or join the group will send an ESL-RP-Report message with the RP address(es) in response to IGMP Queries. The DNS is not a general solution because it is not appropriate for advertising dynamic information quickly as is needed for dynamic multicast groups. In the future if the DNS is used for multicast address advertisement, RP addresses can be advertised along with them. 7.4 Separating receiver and sender roles We chose to continue with the design philosophy of IP multicast for two reasons. The first is that in order to interoperate seamlessly with IP multicast we needed to maintain the separation between receivers and senders. The second is that the separation allows us to build a protocol that has less overhead per receiver by introducing more overhead per source. While some applications might like to have explicit information about all receivers in a group, the aggregation mechanisms proposed for very large groups would interfere with the utility of this information anyway; i.e., explicit receiver information would only tell which domains were receiving the packets, not which hosts within those domains. It seems that some other mechanism is needed if an application really wants to enforce access control on the multicast group. This is a subject for further study. In many applications the source should be a receiver as well in order to obtain feedback and facilitate debugging\cite{Van}. For this reason we might add an optimization whereby an IGMP Register message that is appropriately flagged, would be interpreted and processed as both an IGMP Register and an IGMP join message. 7.5 State overhead State overhead is of considerable concern given the large number of multicast groups that will exist and the large number of potential sources that do exist. The ESL protocol described here entails the following state. 1. On the RP downstream tree (RP and routers downstream of RP) there is: a *,G state for each Group, and negative or positive cache information for each Si,G when SPT's used.. 2. On the SPT's Si,G for subset of Si's whose SPTs pass through that particular router 3. On the upstream RP Tree (between source and RP, what CBT calls offtree), there is also Si,G state for each source whose shortest path to the RP passes through the particular router. In the periphery the number of sources with SPTs through a router is not so large. The number of groups may still be large but is still not as large as in center of the network. However, a very large number of sources' SPTs pass through "central routers" and a very large number of groups have distribution trees that pass through the central routers as well. Source specific state is unavoidable if you want SPTs. If you do not need SPTs or do not need all of the tree to be SPT, use *,G instead. We should investigate a situation in which periphery routers switch to their SPT interfaces but central routers stick with *,G RP tree entry. We need to answer two questions: What kind/quality of trees do we end up with? and what is the implication for traffic Concentration in center of network, even given the greater aggregate BW found there. There remains the open issue of aggregation across groups as well. Scott Brim has proposed some mechanisms for dense mode operation. CBT does avoid group specific state on the routers that lie between sources and the shared tree for that group by employing and processing an option in all data packets sent to sparse multicast groups. For now, we wish to avoid interfering with data packet processing and pay with state. But we must due further studies to determine how many groups can we supported before the shared-tree mechanism or cross-group aggregation is mandated? 7.6 Aggregation of information in ESL There are several motivations for aggregating source information beyond the subnet level supported in the current specification; the most important are ESL message size and the amount of memory used for routing forwarding entries. One possibility is to use the highest level aggregate available for an address when setting up the multicast forwarding entry. This is optimal with respect to forwarding entry space. It is also optimal with respect to ESL message size. However, ESL messages will carry very coarse information and when the messages arrive at routers closer to the source(s) where more specific routes exist there will be a large fanout and ESL messages will travel toward all members of the aggregate which would be inefficient in most/many cases. If ESL is being used for inter-domain routing, and routers are able to map from IP address to domain identifier, then one possibility is to use the domain level aggregate for a source in ESL messages (AS numbers or RDI's). Then the ESL message will travel to the BR(s) of the domain and the BRs can use the internal multicast protocol's mechanism for propagating the join within the domain (e.g. send appropriate LSA in MOSPF or register a "local member" and do not prune in the case of RPF). However this approach requires that it is both possible and efficient to map from IP to domain address when processing data packets, as well as control packets. %%Editorial Note: the following is a very gross high level description %%of Vans scheme. It is just a placeholder for the difinitive paragraph %%that I will eventually extract from him. Another possibility is to use proxies as suggested by V. Jacobson. In this case within ESL clouds, ESL messages need only refer to proxies for sources outside the cloud. In this scheme BRs would join an ESL tree externally and inject themselves as sources internally. When data packets arrived, the data packet would be forwarded into the cloud and routers would see a new source. They would then need to determine which is the entry BR for the particular source and forward the packet on the multicast tree associated with that BR. The router could cache a forwarding entry for the new source in order to avoid repeating this step on each data packet. To create efficient multicast distribution trees that do not generate duplicate packets this scheme requires that internal routers be able to map from an IP address to the entry BR used by that IP address as source. If such a mechanism is not available, possible approximations may be employed that map packets based on the previous hop router. This technique is currently being developed and would be deployable as an addition to the current protocol without affecting the protocol specification per se. In the absence of aggregation or proxy techniques, when the number of sources get to some threshold value (to be determined), receivers could compromise the quality of the distribution tree in exchange for accommodating large numbers of unaggregatable sources. In particular receivers could continue to receive packets over the group tree instead of moving them off to a shortest path tree. For example, Receivers could send a wildcard IGMP to an RP to maintain distribution of all sources packets to that multicast address via the RP. While this would result in a suboptimal distribution tree, it would avoid explicit enumeration of sources. Alternatively, the receiver could send a wildcard with explicit sources listed in the prune portion of the list. This would allow the receiver to get shortest path delivery from a subset of the sources. %%DE added One problem with leaving selection of shared vs. shortest path trees to the receivers is that the burden of excessive S,G entries will most likely be in the center of the network far away from receivers. In this case routers should be able to act unilaterally to decline requested establishment of new S,G entries. If a router does not process a join then the downstream receivers will not receive packets over the shortest path. Assuming a strategy is used whereby receivers do not prune the shared tree until packets arrive on the shortest path tree, then receivers will simply remain on the shared tree until more state becomes available on the shortest path tree. 7.7 Interaction with policy based routing ESL messages and data packets will travel over paths that include policy so long as the policy does not preclude them, to the same extent that unicast routing does. In addition, in the future we will construct a special ESL message type that embeds a Source Demand Route (SDRP route) and thereby causes the ESL message and the multicast forwarding state to be on an alternative distribution tree branch. To obtain policy sensitive distribution of multicast packets we need to consider the paths chosen for forwarding ESL-Join and Register messages. If the path to reach the RP or some source is indicated as being the appropriate QOS and indicated as being symmetric then ESL routers can determine that if they forward joins upstream that the data packets will allowed to travel downstream. This implies that BGP/IDRP should carry two QOS flags: symmetry flag and multicast willing flag. The former if set indicates that that each AD hop has local route selection policies that allow data to flow in either direction. THe latter flag indicates that each AD hop on the path has a local transit policy that indicates that multicast packets are allowed. NOTE: there are two types of symmetry. One indicates that it is not in violation of transit policies to allow data to flow in both directions so even if route selection is not symmetric, if mcast forwarding entries point along the reverse route it does not violate policy. THe second type of symmetry indicates that packets are in fact routed symmetrically--i.e., if R1 forwards Packets from S destined for D out over an interface to R2, then the route is truely symmetric if R2 forwards packets from D to S over its interface to R1. For ESL we only need the former information. If the generic route computed by hop-by-hop routing does not have the symmetry and mcast bits set, but there is an SDRP route that does, then the ESL message should be sent with an embedded SDRP route. This option needs to be added to ESL join messages. Its absence will indicate forwarding according to the router's unicast routing tables. Its presence will indicate forwarding according to the SDRP route. This implies that SDRP should also carry symmetry and mcast QOS bits AND that ESL should carry an optional SDRP route inside of it. 7.8 Interaction with Receiver Initiated reservation setup such as RSVP Once the SP distribution tree has been established RSVP reservation messages follow the reverse of senders path messages and the senders path messages will travel according to the state that ESL installs. However, one wants to avoid switching reservation-oriented routes so the receiver could initially receive all packets via the RP distribution tree and after some delay it could send ESL messages to establish the SP tree and then establish reservations over that tree. The source's path message would travel first via the RP path, then to avoid setting up a reservation on the RP path, the receiver would send its IGMP message BEFORE it sends out its reservation message and wait for another path message to travel over the new SP. In summary we expect that this receiver initiated routing is well suited to receiver initiated reservations since if a reservation is blocked the previous router or the receiver can select an alternative reverse path to the particular source(s). This is also a subject for future work that will affect the use of the protocol, and not the protocol itself. 7.9 Dense Mode We can use similar IGMP extensions to support a mode of multicasting that is good (more efficient than ESL-sparse) for forwarding to receivers that densely populate a region. Clouds might run this form of ESL internally as their internal multicast mechanism; or it could support dense-inter-domain groups. In this model routers run RPF (forward out all interfaces, except the incoming, if a packet arrived on the outgoing interface used to get to the source). Directly connected routers run IGMP query and report and when they have no members for a group and receive packets for it, they send IGMP prune messages that consist of ESL messages with Prune lists only. Similarly, when a router gets a packet on a source that is NOT its outgoing interface to the source, that router sends an ESL message with prune information only. ESL records prune information and propagates it upwards if entire downstream branches prune themselves. Periodically the prune information is timed out and the packets are sent again and downstream routers must resend the prune messages. A draft specification of dense mode ESL is available from Dino Farinacci. Running dense mode internally with ESL sparse mode outside has all the same problems of DVMRP internally--they need to run BGP or tunnels to identify appropriate BRs, and the need to add a mechanism for alerting BRs to new group members. 7.10 Open Issues The open issues associated with ESL are: 1. Aggregation of source lists via use of proxies. 2. Discovering RP addresses: new IGMP-Report-RP message. 3. Aggregating group specific state along the shared tree. 4. Dense to Sparse to Dense transition issues; see dense mode document. 5. Deciding when to switch from shared to shortest path trees. Acknowledgments Tony Ballardie, Scott Brim, Jon Crowcroft, Paul Francis, Ching-Gung (Charley) Liu, Liming Wei and Lixia Zhang provided detailed comments on previous drafts. The authors of CBT and membership of the IDMR WG provided many of the motivating ideas for this work and useful feedback on design details.