RMT Working Group Miriam Kadansky/Sun Microsystems Internet Engineering Task Force Brian Neil Levine/UMass INTERNET-DRAFT Dah Ming Chiu/Sun Microsystems draft-ietf-rmt-bb-tree-config-01.txt Brian Whetten/Talarian 22 November, 2000 Gursel Taskale/Talarian Expires 22 May 2001 Brad Cain/Mirror Image Dave Thaler/Microsoft Seok Joo Koh/ETRI/Korea Reliable Multicast Transport Building Block: Tree Auto-Configuration Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as a "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Reliable Multicast Transport Working Group has been chartered to standardize multicast transport services. This working group expects to initially standardize three protocol instantiations. This draft is concerned with the requirements of the tree-based ACK protocol. In particular, it is concerned with defining the building block for auto-configuration of the logical ACK-tree. According to the charter, a building block is "a coarse-grained modular component that is common to multiple protocols along with abstract APIs that define a building block's access methods and their arguments." For more information, see the Reliable Multicast Transport Building Blocks and Reliable Multicast Design Space documents [WLKHFL00][HWKFV00]. draft-ietf-rmt-bb-tree-config-01 [Page 1] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 Table of Contents 1. Introduction 1.1 Terminology 1.2 Assumptions 1.3 Requirements 1.4 Applicability Statement 2. Overview 3. Distance Metrics 3.1 Multicast Beacon 3.2 Mesh 3.3 Address 3.4 Static 3.5 GRA 4. Session Announcement 5. Service Node Discovery 5.1 Service Node Discovery Algorithms 5.1.1 Static 5.1.2 ERS 5.1.3 POC 5.1.4 Mesh 5.2 Service Node Selection 6. Tree Formation 7. Tree Maintenance 7.1 Monitoring Parents and Children 7.2 Optimizing the Tree 8. Messages 9. External Interfaces 9.1 Interfaces to the BB 9.1.1 findSNs 9.1.2 messageFromSN 9.1.3 lostSN 9.1.4 activateSN 9.1.5 setOptimization 9.1.6 setSN 9.1.7 acceptMember 9.2 Interfaces from the BB 9.2.1 SNlist 9.2.2 SNactive 10. References Authors' Addresses draft-ietf-rmt-bb-tree-config-01 [Page 2] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 1. Introduction The Reliable Multicast Transport (RMT) working group has been chartered to standardize IP multicast transport services. This draft is concerned with the requirements of the tree-based ACK protocol[WCP00]. In particular, this draft defines a building block for auto-configuration of a logical ACK-tree comprised of a single Sender, Service Nodes, and Receivers into a tree (called a Session Tree in this document). The design goals of this draft are motivated by the needs of TRACK-based protocols, however the trees it constructs are useful for other services. The process of session tree construction is difficult for IP multicast. The best session trees match the underlying (i.e. forwarding tree) multicast routing tree topology[LPG98], however the multicast service model[DEE89] does not provide explicit support for discovering routing tree topology. Furthermore, deployed multicast architectures can vary; for example, hosts may be restricted to multicast reception and not transmission with Source-Specific routing[HC00]; and routers may provide special extended routing services with Generic Router Assist [CST00]. The RMT charter does not restrict the use of any particular network service in constructing the tree, it only suggests preferred scenarios. Accordingly, there are several viable solutions for constructing a tree. The optimality of a tree may also depend on other factors, such as the need for load balancing, and the need to minimize the depth when used for collecting feedback information. The goal of this building block is to specify a distributed procedure for automatically constructing a tree that is loop-free, but only as good as possible depending on the information available. This draft describes a unified solution for tree construction in the presence of different multicast service models and routing protocols. In particular, it specifies a single procedure which may be used with various techniques for service node discovery and distance measurements, several of which are specified within this document. The difference in these techniques primarily affects the optimality of the tree. The unified algorithm ensures that different implementations can interoperate and construct a loop-free tree. In order to accommodate various multicast deployments, this document divides the tree building process into the following major components: 1. Several techniques for discovering neighboring service nodes. In particular: Static, Expanding Ring Search, Point-of-Contact, and Mesh. Discovering neighboring service nodes is a necessary condition for getting connected, so each node in the session must implement at least one of the above techniques. 2. Several algorithms for selecting neighboring service nodes. In particular, the measurement and use of distance-to-neighbor and distance-to-sender are described. These service node selection algorithms help produce a good tree. 3. A single distributed procedure for construction and maintenance of loop-free Session Trees. This draft is required to conform to the guidelines specified in draft-ietf-rmt-bb-tree-config-01 [Page 3] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 draft-ietf-rmt-author-guidelines-00, Author Guidelines for RMT Building Blocks and Protocol Instantiation Documents [KV00]. A future version of this draft will more explicitly describe this conformance. 1.1 Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Session A session is used to distribute data over a multicast address. A tree is used to provide reliability service for a session. Sender The single sender of data on a multicast session. The Sender is the root of the Session Tree. Receiver A receiver receives data from the sender via the multicast session. Session Identifier A fixed-size number, chosen either by the application that creates the session or by the transport. Senders and Receivers use the Session Identifier to distinguish sessions. The size of this number is specified by the PI. Service Node (SN) A node within the tree which receives and retransmits data, and aggregates and forwards control information toward the Sender. The Sender operates as the root Service Node in any session tree. Service Nodes MAY be dedicated servers within a network designed to participate in multiple Sessions and support multiple trees, or they MAY be Receivers participating in an individual session. SNs MAY limit the number of Children they choose to service, and MAY also make other restrictions on the characteristics of each Child (distance, location, etc.). An SN that has accepted Children for a session is called a Parent. Session Tree (ST) The Session Tree is a spanning tree per Session, rooted at the Sender and consisting of a number of Service Nodes. An ST is constructed for the forwarding of control information back to the Sender as well as for the resending of missed data to the Receivers. The ST for a particular session may change over the course of the session. Parent A Parent is an SN or Receiver's predecessor in the ST on the path toward the Sender. Every SN or Receiver on the tree except the Sender itself has a parent. Each Parent communicates with its children using either an assigned multicast address or through unicast. If a multicast address is used, this may be the same draft-ietf-rmt-bb-tree-config-01 [Page 4] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 address used by the session, or one specifically assigned to the Parent. Children The set of Receivers and SNs for which an SN is providing repair and feedback services. Tree Level A number indicating the number of "generations" away from the root The sender is at TL=0. Those that use the sender as service node are at TL=1 and so on. When a receiver is not connected to the tree yet, it has a tree level value greater or equal to 128. The reason for reserving part of the space (of tree levels) for indicating "off- tree" is so that special measures can be used to prevent forming loops. The largest value is 256, so the range of off-tree levels are in the range 128 - 256. Initially, all receivers have a TL value of 128. Once a Node joins the tree, its Tree Level is updated to be one more than its Parent's level. Distance Metric There are several techniques to quantify distances between nodes (Receivers, SNs, and the Sender) in a session. Each type of quantification is called a distance metric. Several Distance Metrics are described in this draft. Sender Distance The distance from a node to the Sender. Neighbor Distance The distance from a node to a Neighbor. Neighbor A node's (Receiver or SN) Neighbors are SNs that are close to the node, according to the Distance Metric(s) used by the node. 1.2 Assumptions This document describes how to build trees under the following conditions: a. The multicast group has only a single sender. b. A single SN can serve multiple sessions. c. Sessions can take advantage of a pre-deployed infrastructure of SNs (ones that are not necessarily aware of a session before the receivers), or recruit Receivers to be SNs. d. Generic Router Assist[CST00] and Expanding Ring Search[YGS95] are not required of the network infrastructure, but if available they should be able to be utilized. 1.3 Requirements The following are specifically required: draft-ietf-rmt-bb-tree-config-01 [Page 5] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 a. While tree-building may take advantage of information from the routing layer, the mechanisms described are independent of the routing protocol(s) used by the underlying multicast tree. b. All trees constructed must be loop-free c. These mechanisms must support late joiners and tree optimization 1.4 Applicability Statement The authors recognize that automatic tree construction is a very difficult problem. Nonetheless, complete reliance on manual configuration is very user unfriendly and error prone as well. This building block describes a procedure for constructing loop-free trees when there is minimal manual configured information available. This is analogous to providing a system with default configurations that allow the system to work correctly, but not necessarily optimally. There are many possible criteria for tree optimality. This BB does not attempt to define a single optimality criterion, nor does it tries to produce an optimal tree. It is, however, the goal of the BB to construct better trees as more configuration and measurement data are introduced to the procedure. This BB describes only a subset of the relevant parameters for constructing a better tree, in particular sender distance and neighbor distance. There are many techniques for measuring these distances. Some of the techniques may not be applicable globally. Expanding ring search (ERS) is an effective technique in a local subnet or intranet (specially when the IP multicast routing protocol is dense-mode based). On the other hand, it is not practical in a multi-domain network; it is not effective when the routing protocol is sparse-mode based; and it can add a significant control traffic overhead. Generic Router Assist (GRA) can provide measurement hooks to determine SNs that are located along the path for multicast data distribution. But such facilities many not be available in all networks. The tree construction procedure does allow manual configuration and various distance measurement techniques to be selectively and independently applied for different subgroup of receivers and SNs, to achieve incremental improvement to the quality of the tree. 2. Overview The tree building process described within this document builds logical trees which consist of: 1. A root node which is the Sender 2. Intermediate nodes (Service Nodes or SNs) which may be either Receivers or nodes specifically allocated to the task of repair and aggregation 3. Leaf nodes which are Receivers only ACK trees are spanning trees rooted at the Sender. ACK trees can be used for the forwarding of control information (i.e. ACKs) towards draft-ietf-rmt-bb-tree-config-01 [Page 6] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 the root, or for forwarding of data (i.e. repairs) towards the leafs. ACK trees are constructed per Sender; each node wishing to join the tree discovers its neighboring SNs and then selects its best parent based on locally available information, such as the relative sender distances and neighbor distances. This document specifies several techniques for measuring distances. It is important to note that SNs may be actual Receivers (e.g. Receivers willing and able to also function as SNs) or pre-deployed "specialized" servers that are prodded into service by Receivers. We use the term Service Node to refer to either a Receiver or "server" which is participating as part of the logical tree formation. Tree construction, regardless of SN discovery and selection algorithm, proceeds generically as follows. 1. Session Announcement Receivers of a session use standard out-of-band mechanisms for discovery of a session's existence (e.g. Session Advertisement [HPW00], URL, etc). In this way, a Receiver discovers the multicast group address, the Sender's address, and other information necessary for logical tree construction. Sessions may be announced in two parts, the first part containing generic information about the session, such as the multicast address, and the second part, announced on the multicast address, containing additional information. 2. Measurements (optional) All SNs and Receivers that know about the session optionally determine their distance to the Sender. 3. Neighbor Discovery Meanwhile, each Receiver discovers nearby SNs for the Session using the neighbor discovery algorithm(s). 4. Service Node Selection Once a Receiver (or SN needing to join the tree) discovers a nearby SN, it obtains the SN's distance to the Sender as well as its distance to the Receiver, tree level, and other suitability values, if available. After discovery is complete, the best SN is selected. (Note, in the case when the POC algorithm is used for neighbor discovery, neighbor discovery and selection may become a combined step, performed at the POC). 5. Binding to Service Node The Receiver or SN then binds to the chosen SN. If a bind is unsuccessful, the Receiver or SN retries with another nearby SN, or starts the discovery process all over again. Once an SN receives a bind from a child, that SN must then also join the tree if it has not already, discovering an SN of its own, possibly using a different method than leaf Receivers. 6. Optimization and Fault Recovery (optional) During a session, a Receiver or SN may change to a different SN for a number of reasons described below, including fault tolerance. The Session Tree is maintained and optimized over time. draft-ietf-rmt-bb-tree-config-01 [Page 7] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 The building block provides mechanisms for maintaining and optimizing the tree, as well as tearing it down when the Sender is done with it. In the rest of this document, the term 'Node' denotes a Receiver or SN. +--------------------+ | | | 1. Session | | Advertisement | | | +--------------------+ | | Node receives tree-building parameters | V +--------------------+ | | | 2. Measurements |<-------------------------| | | | | (optional) | | +--------------------+ | | | | | | | V | +--------------------+ | | | | | 3. Neighbor | | | Discovery | | | | | +--------------------+ | | | | | | | V | +--------------------+ | | | | | 4. Service Node | | | Selection | | | | | +--------------------+ | | | | | | Node picks best Neighbor | | | V | +--------------------+ | | | | | 5. Binding to |--------------------------- | Service Node | 6. Optimization and | | Fault Tolerence +--------------------+ 3. Distance Measurement Different techniques can be used to determine distances between nodes in a session. These distances can be used in selecting a Service Node if several are discovered. draft-ietf-rmt-bb-tree-config-01 [Page 8] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 These techniques quantify distance differently. Each specific way of quantifying distance is called a metric. Different Metrics are not necessarily comparable. For example, if distance between A and B is X using metric m1, and distance between A and C is Y using metric m2, then X > Y does not necessarily imply B is farther from A than C. Only distances of the same metric should be compared and ranked. Metrics quantify how close two nodes are from each other. Though not all of them measure specific distances, we will refer to these measures as distances in this document. It is not necessary for all nodes to use the same metric to select their neighboring SN to connect to. 3.1 Multicast Beacon Periodically, a node sends a Beacon message on the Session's multicast address. The Beacon message includes the original time- to-live value set by the node. The distance to the node is then calculated as Beacon's original TTL - Beacon's current TTL One node is closer than another if its distance is a lower number. 3.2 Mesh This metric is just like the Multicast Beacon Metric, except that nodes query their local router to determine their distance to the sender. 3.3 Address For IPV6 addresses[HOD98], distance can be approximately determined by the number of aggregation levels one address has in common with another. For this metric, one node is closer than another if its address has more aggregation levels in common with the querying node's address. 3.4 Static The node's distance to other nodes is made available in some well- known location. One node's is closer than another if its distance is a lower number. 3.5 GRA The node's distance to the sender is determined by a GRA message that lists the set of GRA routers on the path from the source. Nodes (and POCs) can use these path-strings to determine relative locations of SNs. 4. Session Announcement The first step in the tree-building process is for a node to be informed of the session's existence. This can be done using some out-of-band method, such as Session Advertisement [HPW00], URL, e- mail, etc. draft-ietf-rmt-bb-tree-config-01 [Page 9] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 SNs do not necessarily receive these advertisements. If an SN is not a Receiver, it obtains the advertisement information once it is contacted by a Receiver. The advertisement includes the multicast address being used, the Sender's address, the Session Identifier, any specific port numbers to be used, and any global information useful for tree construction (such as a sender POC), to be described later. 5. Service Node Discovery Discovery is the process by which a node determines a suitable Service Node. During the discovery process, SNs are found, Sender distances are exchanged, and the best SN is selected. 5.1 Service Node Discovery Algorithms This draft describes four algorithms for discovering SNs: Static, Expanding Ring Search (ERS), Point-of-Contact (POC), and Mesh. Multiple algorithms may be used within a single session. In particular, one algorithm may be used by statically configured SNs in a session, while a different algorithm is used by Receivers of the session. For instance, SNs for a session may use the Mesh algorithm, while the receivers use static configuration to discover the SNs. The transport protocols request the following information from this BB by providing a unique identifier for the data session to join. This identifier, called a Session Identifier, is obtained from the Session Advertisement algorithm in the PI. Service Nodes: 1. ParentAddress: the address and port of the parent node to which the node should connect 2. UDPListenPort: the number of the port on which the node will listen for its children's control messages 3. RepairAddr: the multicast address, UDP port, and TTL on which this node sends control messages to its children. Receivers: 1. ParentAddress. Senders: 1. UDPListenPort 2. RepairAddr After the above information is obtained from auto-tree-config, the transport protocol may perform the necessary Bind operations for participating in the Session Tree. 5.1.1 Static A list of Neighbors is made available to a Node in some well-known location. The location of the Neighbor list, which could be a file name, a URL, or a DNS name, MAY be included in the session advertisement. A node MAY query these neighbors for their Sender distances and pick the best one, but it may simply pick one and try to bind to it. It is important to carefully configure static trees draft-ietf-rmt-bb-tree-config-01 [Page 10] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 without loops. If the tree-building process detects that binding two nodes would form a loop, the bind will not take place and the node will not be able to join the tree unless there are alternative static neighbors. 5.1.2 ERS Nodes look for Neighbors using a multicast QUERY message. The initial TTL value in the QUERY message is specified in the session announcement and may be as large as the session TTL. An SN that is able to handle additional Children SHOULD respond to a QUERY message by multicasting a ADVERTISE message. If the SN is not yet a Parent, the TTL used in this response is the same TTL used in the QUERY message. If the SN is a Parent, the TTL used is the greater of the QUERY TTL and the Parent's current ADVERTISE TTL. The Node listens for ADVERTISE messages after sending the QUERY message. If one or more ADVERTISE messages are received during a solicit period, the best SN among them is selected as described in section 5.2. If no ADVERTISE messages are received, the Node sends another multicast QUERY message with an incremented TTL. The process of sending the multicast QUERY message with an increasing TTL value continues until a response is received. The TTL value is limited by a value specified in the advertisement. If the TTL value required to reach the soliciting Node is greater than the TTL used to reach the SN, a ADVERTISE message may not reach the Node. However, if future QUERY messages have increased TTL values, the TTL may eventually be large enough for the ADVERTISE message to reach the Node. However, it is possible that the Node will not locate any SNs using Expanding Ring Search. It is advisable that a backup method, such as static, be available. SNs SHOULD suppress sending ADVERTISE messages in response to QUERY messages if one was sent with at least the QUERY's TTL within the last SolicitInterval (a multicast session parameter). Nodes MAY suppress sending QUERY messages for one interval when they first start in order to collect information from ADVERTISE messages already solicited by other nodes. 5.1.3 POC Nodes query using unicast a designated node, the POC, for Neighbors. The POC returns one or more choices of SN, from which the node chooses. The POC scheme is suited for scenarios where receivers lack the ability to use multicast for ERS, no mesh is deployed, and dynamic construction of the session tree is required. POCs may be well-known or discovered. POCs do not join the multicast tree of a session. Receivers and SNs inform the POC they wish to join the session tree and the POC recommends an SN to use. Local POCs contact the Sender's POC for interdomain connectivity. POCs are not necessarily specialized infrastructure. Any receiver, SN, or the Sender may be assigned the role at the start of the session, as long as all hosts have a unified view of who the local POC is. We expect an existing local SN may typically be appointed as the POC. draft-ietf-rmt-bb-tree-config-01 [Page 11] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 Alternatively, POCs could be specially deployed redirection servers similar to DNS redirect servers implementing the messages and interfaces described in this BB. The address of the Sender's POC MUST be included with the session advertisement. Local POCs SHOULD be used to configure trees within a single domain. Receivers and SNs may discover local POCs via a static configuration or other more dynamic mechanisms such as an anycast service [PTM93] or directory services. These local POCs MUST be told of the Sender's POC in order to connect all trees across domains. POCs also serve as an interface between interdomain and intradomain tree construction. SN discovery with POCs proceeds as follows: 1. The node joins the multicast session and learns of the Sender's POC or a local POC from the session announcement or directory services. 2. The node optionally discovers its distance from the Sender. Any metric described in Section 3 may be used. 3. The node sends a QUERY message to the POC for a parent. The request includes (optionally) the node's distance to the sender. 4. The POC chooses a parent for the node and sends the IP address (and port) of the parent in an ADVERTISE packet. 5. A SN wishing to accept children sends (unicasts) an ADVERTISE message to a POC. The POC used these ADVERTISE messages as replies to QUERY messages. 5.1.4 Mesh The job of an IP Multicast routing protocol is very similar to that of the auto tree configuration building block-to dynamically construct and maintain a set of distribution trees, where each router is a node in the tree. To do so, the multicast routing protocol makes use of a unicast routing protocol, which is responsible for discovering and maintaining a set of unicast routes between neighbors, such that any node in the interconnected mesh can be reached by any other. The candidates for those neighbors are statically configured as part of the router configuration, while host entities (senders and receivers) dynamically find their nearest SN using neighbor discovery algorithms described in this BB. The mesh approach to neighbor discovery and distance metrics is directly analogous to this scenario, providing functionality similar to what OSPF offers. The mesh approach requires that all of the SN's that will participate in any given tree construction process exist as persistent infrastructure entities, with statically configured routes among their neighbors. Each of these neighbor routes among the SNs in a mesh exchange unicast routing updates, using a standard unicast routing protocol such as OSPF or IS-IS, or even a very simple link- state routing protocol. The link for each route SHOULD be implemented as a TCP connection. The result of this unicast routing protocol is a state table at each SN, with a next-hop neighbor SN forwarding entry for each SN that is presently connected into the mesh. This next-hop neighbor is used as the selection for the best neighbor service node to bind to, for SN draft-ietf-rmt-bb-tree-config-01 [Page 12] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 to SN bindings. Receivers and senders locate a neighboring SN through any of the other mechanisms described in section 5. The following algorithm is used. 1. At the time they start running, each SN contacts its statically configured neighbors via TCP, and joins in the unicast routing protocol being run over all of the SNs in this mesh. 2. At the time a sender wishes to join a tree, it finds a SN that is part of the mesh, using any of the other neighbor discovery algorithms in section 5. The sender binds to this SN, and then advertises this SN's address in the session advertisement it sends to all receivers. 3. When a receiver wishes to join the tree, it finds a neighboring SN using any of the other neighbor discovery methods in section 5. 4. For neighbor selection, each SN looks up the sender's SN in its own forwarding table, and uses the primary next-hop neighbor in this table. 5. If that neighbor SN does not wish to join the tree, for example due to the load at that SN, in the Reject message of the bind it can pass back its next-hop neighbor to the sender's SN, which can be used to attempt to bind to instead. 6. Loop freedom is guaranteed by the algorithms in section 6. 5.2 Service Node Selection Once Neighbors have been discovered, a node selects the best one using whatever distance information is available. If there is no sender distance information to compare, the best SN is simply the one that is closest to the node, without a loop being formed by binding the node to the SN. If sender distances are available, there are two cases: Leaf nodes: For leaf nodes, the goal is to use the closest SN possible. SNs: For SNs joining the tree, it is important to pick an SN that is closer to the sender; neighbor distance is a secondary factor. Once an SN has been selected, the node tries to bind to it as described in Section 6. Loop prevention is done during the bind process using only Tree Level information. This algorithm is recommended because it assigns each node the closest SN, and does not require all nodes to measure their sender distance at the start of the session. Depending on the selected metric, multiple nodes measuring sender distance could cause message implosion, and delay tree construction. On the other hand, the SNs selected may actually be further from the Sender than their children are. However, it may be necessary to assign nodes to non-optimal SNs in order to get them on the tree, since it is possible that no SN closer to the Sender can accept any more children. More optimal trees can be constructed by using stricter rules for SN selection. For instance, a session MAY ensure loop-freedom by building the tree down from the Sender. In this case, nodes only draft-ietf-rmt-bb-tree-config-01 [Page 13] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 need to select SNs whose level is less than 128 to ensure the tree is loop-free. However, this makes tree-building take more time. Alternatively, nodes may be required to measure sender distance before selecting an SN in order to ensure that each parent is closer to the Sender than its children. Presumably, this results in a tree in which parents detect message loss before their children, minimizing repair requests. There are other criteria for SN selection based on load balancing and minimizing the number of levels. 6. Tree Formation The following is a detailed description of the tree formation process. All tree construction follows this pattern. The ONLY differences between instantiations of this building block lie in how nodes discover neighbors. Once an SN has been selected, the Node sends a BIND message to the SN. If the SN has an outstanding request to bind to another SN, it must refuse the incoming bind request in case it forms a loop. Otherwise, it MAY accept the Node as a child as long as selecting it would not cause a loop in the tree. Loop freedom is guaranteed by these two rules: 1. If the requesting node does not have children, the SN can accept it as a child as long as the SN has no outstanding bind requests. If it does have an outstanding bind request, the SN can accept the node as a child if it's address is less than the child's address. 2. If the requesting node has children, the SN can accept it as a child if a. the SN's level is 128, i.e., it is the top of a sub-tree not yet connected to the Sender, or b. the SN's level is less than 128, i.e., it is connected to the Sender. The second rule prevents a node from selecting one of its own children as its parent. Two nodes at level 128 are prevented from selecting each other using tie-breaking criteria described step 1 above. If the SN accepts the Node as a Child, it returns an ACCEPT message. If it does not accept the Node, it sends a REJECT message. If the Node does not receive a response after a reasonable number of tries and timeout period, it MAY select the next best Neighbor from its cached list, or else run the Service Node Discovery process again to determine an alternate SN to try. The REJECT message contains a reason code. If the code indicates that the node was rejected because the SN was not yet on the tree, the node MAY choose to retry that SN after a suitable timeout. The REJECT message may also include a list of alternative SNs for the node to try. The ACCEPT message MUST include the Parent's current Tree Level. The Node MUST set its Tree Level to one more than the Parent's level. If draft-ietf-rmt-bb-tree-config-01 [Page 14] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 the Node itself has Children, it MUST send a HEARTBEAT message containing the new tree level. The ACCEPT message MAY also indicate the starting sequence number of the message from which data reliability is assured. Service Nodes MAY limit the number of children they support depending on their capacity. Once an SN has accepted its maximum number of children, it stops accepting new children until a change in membership causes its count of children to go below this limit. If an SN limits the number of children it supports, it SHOULD reserve at least one child slot for other SNs. This guarantees the growth of the repair tree. A note about late joiners: a late joiner is a node seeking membership in the tree after the Sender has started to send data. In this case, it is possible that some of the data is no longer available within the tree. Nodes may need to have specific information about repair availability before selecting a Parent. In order to accommodate this requirement, the QUERY, BIND, ACCEPT, and HEARTBEAT messages may contain information about both a Node's requirements for repair data, and an SN's ability to provide that data. 7. Tree Maintenance Tree maintenance is an ongoing process active in every Node. Because the tree is based on the operation of SNs, as well as the various underlying metrics that may change over time, it is important that these dependencies be monitored for changes. Nodes MUST monitor Parents for liveliness and changes in tree level, and SHOULD continue to run the Neighbor Discovery and Selection process in order to optimize their choice of SN. Parents must also monitor Children for liveliness. 7.1 Monitoring Parents and Children Parents periodically send out HEARTBEAT messages unicast or on their assigned multicast addresses as a keep-alive to their Children. The HEARTBEAT message MUST contain the Parent's current Tree Level. Children MUST set their Tree Level to one more that their Parent's level. Children respond to their Parent's HEARTBEAT message with a HEARTBEATCONFIRM message. Both messages MAY be suppressed if other messages in the protocol instantiation have recently provided a keep-alive function. If a Child does not receive a HEARTBEAT message from its Parent for a period of time, it MUST attempt to bind with an alternate SN. A Child that is leaving a session MUST send a unicast LEAVE message to its Parent. The Parent SHOULD respond with a LEAVECONFIRM message. A Parent that is leaving the session MUST send a multicast LEAVE message to its Children indicating that they need to bind with an alternate SN. If a Parent does not hear a HEARTBEATCONFIRM message from a Child for a period of time, or it receives a LEAVE message from a Child, it draft-ietf-rmt-bb-tree-config-01 [Page 15] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 removes that Child from its list of Children. 7.2 Optimizing the Tree Implementations of this building block SHOULD continue to run the Neighbor Discovery and Selection process in order to optimize the choice of SN. This continuous process also keeps the distance information for the current Parent up- to-date. Whenever the process returns a better SN than the current one, the Node MAY bind to the new SN. Once the new SN is bound to, the Node MUST send a LEAVE message to the original Parent. A Parent with no Children MAY leave the session. draft-ietf-rmt-bb-tree-config-01 [Page 16] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 8. Messages These messages are required for implementations of this building block. The list below indicates which message contents are required by implementations. Implementations may also include other protocol-specific information in these messages. +------------------+---------------------------------+--------------------------+ | Message Name | Description | | | m-cast or u-cast | | Contents | |------------------+---------------------------------+--------------------------+ | QUERY | A message used to discover | Sender distance, TTL | | both | neighbors. The unicast version | (optional) | | | is sent to POCs; the multicast | | | | version is used in ERS. | | |------------------+---------------------------------+--------------------------+ | ADVERTISE | A message used to advertise a | IP address, port | | both | SN. The unicast version is sent | | | | from the POC; the multicast | | | | version is sent by SN themselves| | |------------------+---------------------------------+--------------------------+ | BIND | request to SN to join tree | | | unicast | | current Tree Level | |------------------+---------------------------------+--------------------------+ | ACCEPT | acceptance of BIND from SN | current Tree Level | | unicast | | | |------------------+---------------------------------+--------------------------+ | REJECT | rejection of BIND from SN | reject reason | | unicast | | alternate SN list | |------------------+---------------------------------+--------------------------+ | LEAVE | Child leaving Parent (u-cast), | | | both | or Parent leaving tree (m-cast) | | |------------------+---------------------------------+--------------------------+ | LEAVECONFIRM | Acknowledgement of LEAVE | | | unicast | message | | |------------------+---------------------------------+--------------------------+ | HEARTBEAT | Periodic keep-alive from Parent | current Tree Level | | both | to Children | Sender distance | | | | Advertisement (optional) | |------------------+---------------------------------+--------------------------+ | HEARTBEATCONFIRM | Acknowledgement of HEARTBEAT | | | unicast | from Child | | |------------------+---------------------------------+--------------------------+ draft-ietf-rmt-bb-tree-config-01 [Page 17] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 9. External Interfaces This section describes external interfaces for the building block. 9.1 Interfaces to this BB. These may be used by a PI, or by a higher- level BB. 9.1.1 findSNs findSNs instructs the BB to start the process of finding SN candidates for this node. findSNs may return immediately with a list of candidate SN, or may use the SNlist interface (see section 9.2.1) to return the list at a later time. 9.1.2 messageFromSN(message) messageFromSN passes along a message received from the SN. The PI or BB should pass messages down to this BB in case they contain information necessary for the BB. 9.1.3 lostSN lostSN notifies the BB that the connection to the SN was lost. 9.1.4 activateSN activateSN notifies the BB to begin SN operation if necessary. 9.1.5 setOptimization(boolean) setOptimization tells the BB to start or stop the tree optimization process. The upper BB or PI may want to control when tree optimization takes place. 9.1.6 setSN(SN) setSN informs the BB that the PI or upper BB has successfully bound to an SN. 9.1.7 acceptMember(node) acceptMember asks the BB to accept or reject the node as a member. The BB returns a boolean value in response. 9.2 Interfaces from this BB to the PI or a higher-level BB. 9.2.1 SNlist(list) SNlist returns to the upper PI or BB a list of SN candidates. 9.2.2 SNactive(boolean) SNactive informs the upper PI or BB that this node has become an SN or has ceased SN operation. draft-ietf-rmt-bb-tree-config-01 [Page 18] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 10. References [CST00] B. Cain, T. Speakman, D. Towsley, "Generic Router Assist (GRA) Building Block - Motivation and Architecture", Internet Draft, Internet Engineering Task Force, March 2000. [DEE89] Deering, S., "Host Extensions for IP Multicasting", RFC 1112. [ECTP00] ITU-T SG7/Q.13 and ISO/IEC JTC1/SC6/WG7, "Enhanced Communication Transport Protocol," ITU-T Draft Recommendation X.ectp and JTC1/SC6 Committee Draft 14476, June, 2000. [HC00] H. Holbrook, B. Cain, "Source-Specific Multicast for IP," Internet Draft, Internet Engineering Task Force, March 2000. [HOD98} R. Hinden, M. O'Dell, S. Deering, "An IPv6 Aggregatable Global Unicast Address Format," Request For Comments 2374, July 1998. [HPW00] M. Handley, C. Perkins, E. Whelan, "Session Announcement Protocol", Internet Draft, Internet Engineering Task Force, March 2000. [HWKFV00] M. Handley, B. Whetten, R. Kermode, S. Floyd, L. Vicisano, and M. Luby, "The Reliable Multicast Design Space for Bulk Data Transfer," Internet Draft, Internet Engineering Task Force, March 2000. [KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building Blocks and Protocol Instantiation Documents," Internet Draft, Internet Engineering Task Force, July, 2000. [LPG98] B.N. Levine, S. Paul, and J.J. Garcia-Luna-Aceves, "Organizing Multicast Receivers Deterministically According to Packet-Loss Correlation," Proc. Sixth ACM International Multimedia Conference (ACM Multimedia 98), Bristol, UK, September 1998. [PTM93] C. Partridge, T. Mendez, and W. Milliken. "Host anycasting service." Request For Comments 1546, November 1993. [WCP00] B. Whetten, D. Chiu, S. Paul, "TRACK Architecture, A Scalable Real-Time Reliable Multicast Protocol," Internet Draft, Internet Engineering Task Force, July 2000. [WLKHFL00] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S. Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks for One-to-Many Bulk-Data Transfer," Internet Draft, Internet Engineering Task Force, March 2000. [YGS95] R. Yavatkar, J. Griffioen and M. Sudan, "A Reliable Dissemination Protocol for Interactive Collaborative Applications", University of Kentucky, 1995. draft-ietf-rmt-bb-tree-config-01 [Page 19] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 Authors' Addresses Brad Cain brad.cain@mirror-image.com Mirror Image 49 Dragon Court Woburn, MA 01801 Dah Ming Chiu dahming.chiu@sun.com Miriam Kadansky miriam.kadansky@sun.com Sun Microsystems Laboratories 1 Network Drive Burlington, MA 01803 Seok Joo Koh sjkoh@pec.etri.re.kr ETRI/Korea 161 Kajong-dong Yusong-Gu, TAEJON, 305-350, KOREA Brian Neil Levine brian@cs.umass.edu Computer Science Department University of Massachusetts Amherst, MA 01003 Dave Thaler dthaler@microsoft.com Microsoft Corporation One Microsoft Way Redmond, WA 98052 Gursel Taskale gursel@talarian.com Brian Whetten whetten@talarian.com Talarian 333 Distel Circle Los Altos, CA 94022-1404 Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet languages other than English. The limited permissions granted above are perpetual and will not be draft-ietf-rmt-bb-tree-config-01 [Page 20] INTERNET DRAFT draft-ietf-rmt-bb-tree-config-01.txt 20 November 2000 revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." draft-ietf-rmt-bb-tree-config-01 [Page 21]