INTERNET DRAFT Tie Liao Expires: 13 April 1999 INRIA 13 October 1998 Light-weight Reliable Multicast Protocol Specification draft-liao-lrmp-00.txt Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes LRMP, the Light-weight Reliable Multicast Protocol. LRMP provides a minimum set of functions for end-to-end reliable network transport suitable for bulk data transfer to multiple receivers. LRMP is designed to work in heterogeneous network environments and support multiple data senders. A totally distributed control scheme is used for error recovery so that no prior configuration and no router support are required. LRMP also includes a selective feedback mechanism allowing to monitor the quality of service at receivers. In LRMP, flow and congestion control is performed based on NACK packets and congestion indication from receivers. Forward error correction is supported as an independent optional module. Liao [Page 1] INTERNET-DRAFT LRMP 13 October 1998 Table of Contents 1. Introduction ............................................ 3 1.1 Application Scope ....................................... 3 1.2 Overview ................................................ 4 1.3 Implementation .......................................... 6 2. Definitions ............................................. 7 3. Packet Types and Common Header Fields ................... 8 4. Data Transmission ....................................... 11 4.1 Reliable Data Transmission .............................. 11 4.2 Unreliable Data Transmission ............................ 12 5. Packet Loss Recovery .................................... 12 5.1 Organization of Subgroups ............................... 13 5.2 Enabling/disabling a Subgroup ........................... 14 5.3 Error Report ............................................ 15 5.4 Duplicate NACK Detection ................................ 17 5.5 Retransmission .......................................... 18 5.6 Reception Failure ....................................... 20 5.7 Mean Round Trip Time Estimation ......................... 21 6. Sender and Receiver Reports ............................. 23 6.1 Sender Report ........................................... 23 6.2 Selective Receiver Report ............................... 23 7. Flow and Congestion Control ............................. 25 7.1 Transmission Rate ....................................... 26 7.2 Congestion Control ...................................... 26 7.2.1 Adaptation to the Loss ................................. 26 7.2.2 Adaptation to the Congestion Indication from Receivers .. 27 7.2.3 Getting to the Equilibrium: Rate Increase ............... 28 7.3 The Case of Heterogeneous Receivers ..................... 28 8. Packet Formats .......................................... 29 8.1 Unreliable Data ......................................... 29 8.2 Reliable Data ........................................... 29 8.3 Retransmission Data ..................................... 30 8.4 FEC Redundant Data ...................................... 31 8.5 Error Report ............................................ 31 8.6 NACK Reply .............................................. 32 8.7 Sender Report ........................................... 33 8.8 Receiver Report Selection ............................... 34 8.9 Receiver Report ......................................... 35 9. Options for Forward Error Correction .................... 36 9.1 Parameters .............................................. 37 9.2 Packet Format ........................................... 37 10. Security Considerations ................................. 38 A. Requirements of RFC 2357 ................................ 40 B. References .............................................. 43 C. Acknowledgments ......................................... 45 D. Author's Address ........................................ 45 Liao [Page 2] INTERNET-DRAFT LRMP 13 October 1998 1. Introduction This document describes the Light-weight Reliable Multicast Protocol (LRMP) version 1 which is intended to provide reliable multicast transport service to applications. The reliable service means source ordered and reliable data delivery on an end-to-end basis. 1.1 Application Scope Applications that require group communication are continuously emerging. In group communication, data sent by one host should arrive at multiple receivers. These applications can benefit from IP Multicast [1] that allows one-to-many data delivery at network layer, making optimization possible by replication of data at routers. LRMP (version 1) is intended to be used for a subset of these applications that have the following characteristics: o large-scale communication group. The users participating in group communication are generally dispersed in a wide area network environment, thus their network links are heterogeneous, i.e., the end-to-end transmission delay and bandwidth differ a lot. o large-size communication group. The number of users participating in group communication may be potentially very large. o loose time constraint. This means that the transmission delay is not critical for applications. In another extreme, real time audio and video conference applications need low transmission delay in order for that audio and video data could be played in a synchronized fashion. o reasonable performance. These applications do not need to reach very high data throughput, instead they need to keep data transmission at a reasonable rate. o loosely coupled group. Applications may join or leave the group at any time and at any location. o requiring that data is reliably delivered to receivers. Typical applications include bulk data transfer (e.g., file transfer), multicast chat and some kinds of collaborative work. These characteristics allow to adopt conservative approaches when doing error recovery and flow control in order to keep the total traffic of a multicast session at a reasonable level with regard to data traffic. Liao [Page 3] INTERNET-DRAFT LRMP 13 October 1998 1.2 Overview IP Multicast [1] provides point-to-multipoint datagram delivery service to communicating hosts, i.e., transmission of datagrams from a sender to a group of simultaneous receivers. IP Multicast itself is not reliable, datagrams are not guaranteed to arrive at receivers and in the order they are sent. In unicast case, reliable data delivery is generally achieved by TCP [2] at the transport layer in the Internet environment. In multicast case, reliable multicast transport protocols are needed for applications requiring reliable data transfer. The Light-weight Reliable Multicast Protocol (LRMP) is intended to offer end-to-end reliable network transport services suitable for bulk data transfer to multiple receivers. While some TCP design principles can be applied to reliable multicast, the design philosophy for reliable multicast protocols is rather different. Reliable multicast protocols have to offer good scalability, otherwise they may be potentially dangerous to the Internet if used in a wide scale [5]. A common approach for scalability is to replace positive acknowledgements (ACK) by negative acknowledgements (NACK) [6]. This can reduce considerably the volume of control traffic under normal network conditions. However for large groups, this is insufficient for several reasons. First, the number of NACK packets could still be very high when packet loss occurs simultaneously at many receivers. Second, the resulted control traffic and retransmitted packets still go to every receiver in the group. In particular, heterogeneous environments make these problems much worse. Other techniques are necessary to further improve the scalability, such as duplicate NACK and repair suppression [6], and local error recovery. LRMP is intended to offer good scalability on the current infrastructure of Internet, that is, no routers involved in error recovery [10]. The generated control traffic and retransmissions should be kept at a low level and in a limited scope. In case of reception errors, error recovery is first restricted to its local area, i.e., local group. If hopefully it can be done within the local area, this will be transparent to the sender and receivers in other areas. Local error recovery technique allows to isolate reception errors in a limited region when possible. In this way, it avoids that NACK and retransmitted packets travel across the whole group. In LRMP, there are no central control points, namely representatives, every receiver plays a strictly equal role. Using centralized control places constraints on flexibility [9] [10] [11]. Unless the router Liao [Page 4] INTERNET-DRAFT LRMP 13 October 1998 assisted control scheme can be applied, decentralized control is more appropriate for a general use on the Internet. LRMP does not try to rebuild a tree structure from the multicast tree, instead a totally distributed control scheme is used. Local groups are formed implicitly for local error recovery. Therefore all packets to form local group and assign local group representatives are not necessary. LRMP is a receiver-reliable transport protocol. Each receiver is responsible for ensuring its correct reception of data while it may be cooperative by allowing to send repair packets to its neighbors. The failure at one receiver has generally not much to do with the quality of service (QoS) at other receivers. LRMP attempts to optimize the message exchange for error recovery. For duplicate NACK and repair suppression, some basic mechanisms of SRM [6] are adopted, including NACK based loss report, exponential back-off timers for sending NACK and missing (repair) packets. While there are no representatives in local recovery, a receiver can dynamically play this role for a repair process in a local group. This can further reduce the probability of duplicate NACK and repair packets. It is very difficult, if not impossible, to ensure both high scalability and full reliability in a large scale. Note that at present hosts on the Internet generally have very heterogeneous network links. While one encounters failure in reception, others may have perfectly received all packets. If the missing packets are repeatedly transmitted for a few receivers by the transport protocol while the session is going on, infinite buffer size is theoretically required. LRMP is not intended to be fully reliable for an arbitrary set of receivers, whereas applications could still employ some specific mechanisms to cope with this problem, for example, using a separate session for some particularly slow receivers. In LRMP, receivers either successfully receive all packets or generate reception error events in case of failure. Besides LRMP does not make effort to recover missing packets that are transmitted before a receiver joins a session. In order to get feedback information from receivers, a sender selective receiver report mechanism is included. That allows to trigger the transmission of receiver reports while the required bandwidth is totally under control. This technique is also called sampling. Receiver reports can be considered in some sense as positive ACKs and complement the NACK mechanism to make this protocol fully reliable when the number of receivers is low. LRMP uses a rate based flow and congestion control mechanism to control the traffic sent to the network. Congestion control is necessary to ensure that the network bandwidth is fairly shared among Liao [Page 5] INTERNET-DRAFT LRMP 13 October 1998 a number of competing data flows when network congestion occurs. The flow and congestion control mechanism included in LRMP is intended to be friendly and competitive with other data flows under flow and congestion control such as TCP. By the name, LRMP is intended to be light-weight. The protocol overhead and control traffic is relatively low to increase the efficiency in data transfer. LRMP is intended to be embeddable into an application. LRMP is designed, at least initially, not for use with real time continuous media applications. The performance provided by LRMP, e.g., the transmission delay and throughput, may not satisfy the requirements of these applications. While LRMP is not aimed at providing high performance, the data throughput and transmission delay should be reasonable compared with the quality of service provided by the underlying network. This document does not provide any mechanisms either to do membership control in group communications or to collect information about each session member. Applications are free to join and leave the session at any time and at any location. In addition this document does not include any mechanism for authentication and protection of information conveyed by the specified protocol. But it does provide a description on security constraints that should be taken into account by applications, see section 10 for more details. 1.3 Implementation This specification is based on the lower layer network services that provide identification of a transport end-point by a transport address, transfer of data from or to the given transport address with indication of received datagram length. A transport address is composed of a network address and a port number. Typical example of the target underlying network service is UDP/IP multicast. All traffic including data and control is carried on the group address and port unless otherwise noted. LRMP operates on a single group address/port pair. LRMP could be implemented as an independent module following the layered design principles or as an integrated part of an application following the principles of application level framing [7]. All integer fields are carried in network byte order, that is, most significant byte (octet) first. This byte order is commonly known as Liao [Page 6] INTERNET-DRAFT LRMP 13 October 1998 big-endian. The transmission order is described in detail in [3]. Unless otherwise noted, numeric constants are in decimal (base 10). All header data is aligned to its natural length, i.e., 16-bit fields are aligned on even offsets, 32-bit fields are aligned at offsets divisible by four, etc. Octets designated as padding have the value zero. Wallclock time (absolute time) is represented using the timestamp format of the Network Time Protocol (NTP), which is in seconds relative to 0h UTC on 1 January 1900 [4]. The full resolution NTP timestamp is a 64-bit unsigned fixed-point number with the integer part in the first 32 bits and the fractional part in the last 32 bits. In some fields where a more compact representation is appropriate, only the middle 32 bits are used; that is, the low 16 bits of the integer part and the high 16 bits of the fractional part. The high 16 bits of the integer part must be determined independently. 2. Definitions Application: either a higher layer protocol entity or end user program. It is the source and/or consumer of the data transported by LRMP. FEC: forward error correction, a mechanism that adds redundant packets in the transmitted data stream so that partial missing packets could be recovered without retransmission. Subgroup: the set of protocol entities in a local region which is reachable using a scope (TTL) value from a particular protocol entity. Local recovery: a scheme for requesting and sending missing packets in a subgroup, i.e., whose scope value is less than the session scope value. LRMP entity: or protocol entity, an implementation of LRMP that is able to send, receive and reply to LRMP data and control packets. LRMP receiver: or simply receiver, an LRMP entity that makes efforts to receive data packets sent by other LRMP entities. LRMP sender: or simply sender, an LRMP entity that sends both data and control packets. LRMP session: or simply session, the association among a set of LRMP entities. For each LRMP entity, the session is defined by a Liao [Page 7] INTERNET-DRAFT LRMP 13 October 1998 particular pair of destination transport addresses (one network address plus a port). The destination transport address pair may be common for all LRMP entities, as in the case of IP multicast, or may be different for each, as in the case of individual unicast network addresses plus a common port. NACK: negative acknowledgement used to report missing packets. Entity Identifier: a 32 bit integer that identifies the source of LRMP packets. It is carried in LRMP packet header so as not to be dependent upon the network address. All packets from a source form part of the same sequence number space, so a receiver groups packets by the entity identifier. An LRMP entity should use the same identifier during the lifetime of an LRMP session. TTL: time-to-live, a parameter used in sending IP multicast packets to limit the scope of propagation. 3. Packet Types and Common Header Fields This section defines a minimal set of packet types for data transport and control. Packets other than reliable data are sent in a best effort way, that is, there is no mechanism to recover them. The sender should retransmit these packets in need. All packets from a single protocol entity should use the same entity identifier. Otherwise they are considered coming from a different entity. 3.1 Common Fixed Header All LRMP packets must contain the following fixed header, followed by packet specific options: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=1|P| PT | Scope | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Entity ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: Version (V): 2 bits This field identifies the version of LRMP defined by the current specification which is one (1). Liao [Page 8] INTERNET-DRAFT LRMP 13 October 1998 Padding (P): 1 bit If this bit is set, the packet contains one or more additional padding octets at the end which are not part of the application data. The last octet of the packet contains a count of how many octets should be ignored. Packet Type (PT): 5 bits This field identifies the type of the packet. Scope: 8 bits This field indicates the time-to-live (TTL) value used to transmit the packet. Length: 16 bits This field indicates the total length of the packet in number of octets, including the header and data. Entity ID: 32 bits This field contains the entity identifier that uniquely identifies the sender of the packet. Additional packet specific options, if present, should immediately follow this fixed header and should be of an integer number of 32 bit words. 3.2 Packet Types The following packet type (PT) values are defined in this document (for backward compatibility, the type values are not continuous): Value Type Abbreviation 0 Reliable Data DATA 4 Retransmission Data R_DATA 8 Unreliable Data U_DATA 12 FEC Redundant Data F_DATA 17 NACK NACK 18 NACK Reply R_NACK 19 Sender Report SR 20 Receiver Report Selection RS 21 Receiver Report RR The values not included in the above list are reserved for future use. 3.3 Entity Identifier Every protocol entity is assigned with an entity identifier which is a randomly chosen 32 bit integer. The entity identifier must be globally unique within a particular session. Though the probability Liao [Page 9] INTERNET-DRAFT LRMP 13 October 1998 of identifier collision is rather low in theory, to further minimize collision errors, a protocol entity must detect whether or not there is a collision between the received identifiers and its own each time it receives a packet. In the case of collision, it's the receiver which must change its identifier if another entity is a sender. If the concerned protocol entities are receivers, they must all change their identifiers to another randomly chosen identifier. It is expected that the collision is resolved at the initialization phase for senders, i.e., a sender sends several control packets before data transmission to detect possible collisions. At receivers, if a protocol entity sees that two senders use the same identifier from different network locations, it should reject the one which was heard later. A protocol entity should use only one entity identifier during the lifetime of a session. If it changes the identifier during a session, it is considered as a completely new entity. In data transmission, the entity identifier is used to identify the data stream from a sender, i.e., the packets with the same entity identifier form a data stream ordered by continuous sequence numbers. So as to retransmissions by a third party, the original entity identifier is included. 3.4 Scope The scope field of an LRMP packet should be set to the same TTL value with that the packet is sent to the underlying network layer. In LRMP, packets may be transmitted using different TTL values due to local error recovery. The scope field is used by a receiver to approximatively estimate the distance to the packet sender. 3.5 Maximum Packet Length Inadequate data fragmentation may introduce inefficiency in the network and should be generally avoided [12]. Optimal solution requires the cooperation of applications. Therefore proper data fragmentation/reassembly are left to higher layer protocols to perform. LRMP places a upper bound on the maximum length of packets that can be transmitted, in other words, the maximum transmission unit (MTU). The MTU should be set to the minimum MTU of the network path. Modern internet routers support larger and larger MTU than a decade before. Consequently the maximum transmission unit in LRMP is set to 1400 octets, including packet header. Any packets larger than this length are considered not valid and should be dropped by receivers. This Liao [Page 10] INTERNET-DRAFT LRMP 13 October 1998 gives also the advantage to optimize the buffer management at receivers. 3.6 Multiplexing The length field is included in the packet header for multiplexing several data or control packets into one outgoing packet sent to the underlying network. It is not allowed to multiplex packets with different scope value in one compound packet for obvious reasons. LRMP entities need the indication of received data length from the underlying network protocol to correctly parse multiplexed packets, which is supported by most of UDP/IP implementations. 4. Data Transmission This section describes the data transmission mechanism that must be implemented by a protocol entity. LRMP provides a per packet selective reliability, i.e., application data can be sent either reliably or in a best effort way. This allows an application to use the same transport protocol to send reliable and unreliable data. Application must indicate the reliability (a boolean value) when sending data to LRMP. 4.1 Reliable Data Transmission In a normal operation scenario, a data source multicasts a number of DATA packets to a session. Each DATA packet is ordered by a sequence number. A receiver receives these data packets and delivers them to the application if they are in order. If packet loss is detected, the receiver sends a NACK packet to request the retransmission. Any protocol entity that holds the requested lost packets will try to send repair packets (see the next section). In general, data packets, the first time transmitted, are sent to the whole group, that is, the scope field should be set to the time-to-live value of the session accordingly. This scope field informs receivers the maximum TTL value which should be used to report loss and do repair. Each reliable data packet contains a sequence number and the sender's entity identifier. The sequence number is increased by one for each reliable data packet sent, except retransmission. At receivers, a data packet is identified by the entity identifier and the sequence number. All packets from a single source constitute a data stream. The sequence numbers are rounded to 32-bit unsigned integer, thus when overflow occurs, they must take a modulo to 2^32. A sender Liao [Page 11] INTERNET-DRAFT LRMP 13 October 1998 should generally choose a non-zero random number as the initial sequence number. A receiver learns the initial reception sequence number at the time it joins the session. Therefore LRMP implementors should be aware of that the initial outgoing sequence number and the initial incoming sequence number may be different for a particular data sender and receiver. At receivers, the sequence numbers are used to correctly order packets that may be received out of order and to eliminate duplicates. A protocol entity should implement the reordering of received packets before delivering them to the application. An entity should also provide applications with capabilities to configure this option out, so that packets are delivered in reception order. Reliable data transmission is subject to flow and congestion control which is described in section 7. Data sender can send FEC encoded redundant data with the original data stream, see section 9 for more details. 4.2 Unreliable Data Transmission Unreliable packets are not subject to flow control at the sender side and not subject to sequence control at the receiver side. They can be considered as out-of-band packets which are delivered in priority but without guarantee of arrival at receivers. Due to this particularity, unreliable packets are expected to be sent occasionally by an application. When an unreliable packet is received by a receiver, it should be delivered immediately to the application. In addition a receiver does not need to cache such packets in the buffer. 5. Packet Loss Recovery To avoid to generate unnecessary traffic in the whole group, the repair of lost packets is first carried out in local regions. To do this, a session is divided by a particular receiver into a number of subgroups. For a protocol entity, a subgroup represents a subregion limited by a distance radius. Packets sent to a subgroup can not reach other entities outside the subgroup. By limiting error recovery in a subgroup, reception errors could be isolated. Local recovery can be performed by alternating the IP scope value in transmission of packets. Liao [Page 12] INTERNET-DRAFT LRMP 13 October 1998 5.1 Organization of Subgroups Yet another way to define the distance between two hosts is to use the number of hops, in other words, the number of intermediate routers that data packets should transit from the source to the destination. The propagation of an IP datagram is controlled by a scope value, also known as the time-to-live parameter. This scope value can be thus used to control the radius of distance that a multicast packet can reach. In practice, a multicast session is always created with a scope value to indicate at which scope the session is addressed. There is a trade-off between the number of subgroup hierarchies and the efficiency of error recovery. More the number of hierarchies, finer the local error recovery but larger the error recovery time. It could be possible to configure the subgroup hierarchies at runtime by an application, a protocol entity uses by default the following rule for organization of subgroups. Given the time-to-live value (the nominal TTL) for a session, the session must be organized into the following hierarchical subgroups: o the whole group at TTL o subgroup at 63 if TTL > 63 o subgroup at 47 if TTL > 47 o subgroup at 15 if TTL > 15 The top level, i.e., the whole group, must be present in every session. A descendant subgroup is created only if the nominal TTL is greater than the subgroup TTL value. Therefore there could be at maximum four levels of subgroups. For example, a session with TTL value 63 has three levels. A session with TTL value less than and equal to 15 has only one level subgroup. A subgroup contains another subgroup if its ttl value is greater than the latter. Every LRMP packet carries a scope field equal to the ttl value with that the packet is sent to the underlying network. So when a receiver receives a packet (data or control), it knows that the packet sender is surely within a distance less than the scope value carried in the packet. The scope field allows a protocol entity to determine whether or not the packet sender belongs to one of its subgroups. It remains constant during transmission, it should not be confused with the scope field in IP datagrams. It is possible that the multicast route between two hosts is asymmetric, i.e., the number of hops from host A to host B may be different from that from host B to host A. As the gap of the subgroup ttl value is enough large, in a given subgroup, if host A can reach host B, it is most probable that host B can also reach host A. Liao [Page 13] INTERNET-DRAFT LRMP 13 October 1998 Error recovery should be performed within subgroups in increasing distance (ttl) order, i.e., first in the lowest level subgroup, then in the second level subgroup and finally in the whole group. In each subgroup, every receiver plays a strictly equal role, there is no subgroup leader. So no join or leave messages are necessary to explicitly form subgroups while they may still be implemented by applications in case of strong membership control. Each protocol entity is responsible for recovery of its own lost packets. However it should keep track of the control messages received from other protocol entities to eventually cancel the duplicates of control and repair packets. 5.2 Enabling/disabling a Subgroup To avoid unnecessary error report to a subgroup, first of all, a protocol entity should use the heuristic algorithm described here to determine whether or not an error could be recovered in a given subgroup. Basically if there are no other protocol entities within a subgroup, certainly no need to do error recovery in that subgroup. If there are other protocol entities in a subgroup, errors may or may not be recoverable in that subgroup. A receiver enables a subgroup if it considers that there are some chances to recover errors in that subgroup, or disables otherwise. The enable/disable mechanism allows to improve the error recovery time for missing packets. Enable conditions Given a subgroup, enable it if one of the following conditions is true: o at start-up. o a repair packet with the subgroup ttl is heard. o the disabled time is greater than the maximum disabled time, e.g., 3 minutes. The last condition is to prevent the deadlock, that is, all subgroup members are in the disabled state. In this case, no receivers will send error report packets within that subgroup. So a timeout mechanism is necessary to retrigger the error recovery in that subgroup. Disable condition Given a subgroup, disable it if the number of successive subgroup Liao [Page 14] INTERNET-DRAFT LRMP 13 October 1998 scoped error report packets without subgroup scoped reply is greater than or equal to a prefixed number, 5. The whole group must never be disabled since the sender is surely there and will reply to error report packets. If a subgroup is disabled for any reason, a receiver should jump to the next higher level subgroup to do error recovery. 5.3 Error Report The protocol makes no effort to recover packets that were sent before a receiver joins the session. The sequence number of the first DATA packet or extracted from a SR (sender report) packet is considered as the starting sequence number for a particular sender. A receiver detects packet loss either by checking if there is a gap between the sequence numbers of successive received packets or the sequence number given by a SR (sender report) packet does not correspond to the expected sequence number for that sender. 5.3.1 Algorithm When packet loss is detected, error report is carried out from the lowest level subgroup to the whole group until the loss is recovered or the number of maximum tries is reached. An exponential back-off timer is used for duplicate NACK suppression. The subgroup to start error recovery is the lowest level subgroup which is in the enabled state. Given the starting subgroup Gi, a receiver must use the following algorithm to report packet loss by sending a NACK packet, 1. Set the number of tries (N) to 0. 2. Schedules a random timer whose value is uniformly distributed in the range [t1, 2*t1], where t1 = MRTT*2^N and MRTT is the estimated mean round trip time to other protocol entities reachable in the subgroup Gi. 3. During the timer period, if a NACK packet which contains the local NACK is received (see section 5.3.3), goto step 5. 4. When the timer expires, if all lost packets have been received, the repair process terminates, otherwise send a NACK packet to the subgroup Gi. 5. Then check o if the subgroup level is not the highest level, the subgroup level is increased by one, i <= i + 1, goto step 1; Liao [Page 15] INTERNET-DRAFT LRMP 13 October 1998 o if the subgroup level is the highest level, the number of tries (N) is increased by one, N <= N + 1. If the number of tries is greater than or equal to 8, the repair process is abandoned. Otherwise goto step 2. The loss report should be sequential, that is, the lost packets with lower sequence number should be reported first. Unless the current repair process is terminated normally or abnormally, next loss report should not be started. In the worst case, an error report could be carried out once in each subgroup, and eight times in the whole group. 5.3.2 Timer Adjustment As the protocol uses a rate based flow control mechanism (see section 7), the repair packet sender will send data generally in regular time interval. When a NACK packet is received, it is possible that the first repair packet will be sent only after a transmission interval. To avoid to send excessive NACK packets, a transmission interval should be added to the timer value if a NACK packet was already sent or heard from the network. For estimation of the transmission interval used by a particular sender, see section 6.1. 5.3.3 Duplicate NACK Suppression A NACK packet is said containing another NACK packet if they are for the same sender and the reported lost sequence numbers are a superset of the latter. The former is called a super NACK of the latter. For example, given the following two NACKs (see section 8.5) for the same sender, NACK A: lowest seqno = 34567 bitmask = 0x000000ff NACK B: lowest seqno = 34570 bitmask = 0x0000000f NACK A contains NACK B. If a super NACK packet is heard from the network, a receiver does not need to send its NACK since its lost packets are reported by that NACK. Therefore the receiver must not send its NACK packet when the current timer expires, but should schedule the next NACK timer to keep the repair process active. When the next timer expires, the receiver must still suppress its NACK in the following cases, since Liao [Page 16] INTERNET-DRAFT LRMP 13 October 1998 the NACK sender will probably send it again if the lost packets have not been repaired: o Another super NACK was heard in this timer period. o The receiver did not send a NACK at previous timeouts in the current repair process. o Its entity identifier is higher than that of the super NACK packet heard in the last timer period. This procedure is intended to further reduce the probability of duplicate NACK packets at the following timeouts. Therefore when multiple receivers encounter the same lost packets, the receiver with the shortest timer value and with the lowest entity identifier will act as a representative in that loss repair process. To prevent the NACK sender leaving the group or behaving abnormally, a receiver should not stop the repair process when a super NACK packet is heard and should apply the above delay procedure for at most next two timer periods. 5.3.4 Other Delay Conditions When the NACK timer expires, a receiver should delay the transmission of NACK packet and schedules another timer in the following cases, o The first lost packet has been received, it is possible that succeeding repair packets are queued for transmission at the responder side and will arrive soon, except that a R_NACK was received that excludes this possibility. o A R_NACK was received during the timer period but the repair packets are not yet received. 5.4 Duplicate NACK Detection A newly received NACK packet is considered as duplicate if a super NACK packet (containing this one) was received just before. Duplicates may occur under one of the following conditions: o two (or more) protocol entities have a timeout time in an interval which is less than the packet propagation time from one to another. o variable packet propagation time. Liao [Page 17] INTERNET-DRAFT LRMP 13 October 1998 o due to packet loss, a receiver did not see the NACK packet sent by another entity. To distinguish duplicate NACK packets and not to deteriorate the error recovery time, a straightforward way is the following: given the time t1 when the last NACK packet was received and the time t2 when the same reporter will issue the next NACK if the loss is not repaired, all NACK packets received between t1 and t2 could be treated as duplicates. This way, at least a successive NACK sent by the same reporter will not be confused with duplicates. The time interval (t2 - t1) between sending two NACK packets by a same reporter is greater than twice the mean round trip time plus a transmission interval. Therefore a protocol entity should treat a newly received NACK packet as duplicate if there is a super NACK packet which was received within an interval equal to twice the mean round trip time plus a transmission interval. 5.5 Retransmission LRMP entities keep recently sent and received data packets in their buffer space. The total size of the buffer, depending on the used transmission rate, should be enough large to allow loss repair. It is recommended that the buffer size should correspond to 10 seconds - 1 minutes of data packets at the full specified transmission rate (e.g., between 80 KB and 480 KB at 64 kb/s). A NACK packet allows to report a number of adjacent lost packets and contains a scope field that should be used by the NACK responder to restrict the retransmissions to the same scope. Upon receipt of a NACK packet, a protocol entity is said eligible for sending repairs if it holds a part of the requested lost packets. An eligible entity must check first if the NACK is a duplicate for the purpose to eliminate possible duplicate retransmissions. If a NACK is duplicate, it should be ignored. 5.5.1 Algorithm If a NACK is not duplicate, an eligible protocol entity schedules a random timer whose value is uniformly distributed in the range [MRTT, 2*MRTT], where MRTT is the estimated mean round trip time to other protocol entities reachable in the corresponding subgroup. When the timer expires, if the lost packet(s) or a R_NACK (NACK reply) packet have not been heard from other protocol entities, the protocol entity sends first a R_NACK and then the lost packet(s). To further improve the performance and reduce the number of duplicate repair packets, the original sender applies a different algorithm. Liao [Page 18] INTERNET-DRAFT LRMP 13 October 1998 When the sender receives a NACK packet, it checks if this is a duplicate. If it is not, the sender should send immediately the repair packets since one and only one protocol entity will respond immediately. In addition, if the scope field of a NACK packet is the session TTL, all receivers must not reply to the NACK packet and expect that the original sender will send the repair(s) with no delay. This will eliminate duplicate repair packets at the session scope level. This scheme will get still better performance if there is one fixed protocol entity responsible for error control in each subgroup. This means if a unique entity can apply the retransmission algorithm of the original sender in a subgroup, for example, the router or a designated receiver, the error recovery time and duplicate repairs will be get even better. 5.5.2 Sending NACK reply Before sending the repair data (R_DATA) packets, the protocol entity should first send a NACK reply packet. This packet is used to identify the repair packet sender and also used to measure the round trip time between the NACK sender and the repair packet sender. NACK reply packets allow in addition fast partial repair. It is possible that a receiver holds only a part of the reported lost packets. In this case, it is desirable that this receiver will send this part while the remaining lost packets can be repaired by other entities or in a higher level subgroup. In the NACK reply packet, the LRSN field should be set to the sequence number of the first repair packet and the BFRP field to the bitmask of the following repair packets, see section 8.6. Upon reception of a NACK reply packet, the error reporter can determine which packets can not be repaired in the corresponding subgroup and can report them to a higher level subgroup. For example, suppose that the first lost sequence number and the first repair sequence number is the same, the bitmask of the remaining lost packets can be determined by, loss_bitmask &= ~repair_bitmask; 5.5.3 Sending Repairs If a receiver is able to send repair packets to its neighbors, it must keep the original data packets intact. Otherwise it should not respond to NACK packets. LRMP implementors should be aware that accepting packets retransmitted by a third-party represents an Liao [Page 19] INTERNET-DRAFT LRMP 13 October 1998 additional vulnerability to malicious attacks. Repair packets should be scoped within the same subgroup as the received NACK packet. They should be transmitted with higher priority than the transmission of new data. In addition, retransmissions are subject to the same flow and congestion control as in data transmission, see section 7. 5.5.4 Duplicate Repair Suppression In order to avoid duplicate repair packets, all receivers eligible to respond to the NACK packet should use NACK reply packets to detect possible multiple responses. Since an entity should send first a NACK reply before sending the repair packets, upon receipt of such a packet, other entities can determine whether there will be duplicate repairs if they participate in sending repairs. Thus when a NACK reply is received, if a receiver has not yet send a NACK reply, it should cancel its reply process. Otherwise if its entity identifier is higher than that of the NACK reply sender, it must cancel its response, even if they have started to send the repair packets. In absence of NACK reply packets, for example, due to loss, if repair packets are heard from the network, a receiver should still cancel its own retransmissions if its identifier is higher than that of the repair sender. In any case, if a NACK reply packet or repair packets are heard from the original sender, all other entities must cancel their reply and retransmissions. It is considered that the original sender has the highest priority in sending repair packets. 5.6 Reception Failure Reception failure is an event generated when a receiver can not synchronize with the data stream sent by a sender. Reception failures occur in case of network partition or congestion (a receiver behind a slow link), specifically under the following conditions: o when the number of timeouts for a loss event (NACK timer) reaches a threshold, repair packets have still not been received. This threshold depends on the number of error recovery subgroups and has a maximum value of 10. o there is a too large difference between the currently received sequence number and the expected sequence number so that the cache space is insufficient to hold all packets between this difference, which is required for error recovery. Liao [Page 20] INTERNET-DRAFT LRMP 13 October 1998 o there is a too large difference between the sequence number announced by a sender report packet and the expected sequence number. When a reception failure is encountered, a receiver should not try anymore to repair the current missing sequence number, instead it should try to synchronize with the current sequence number of the data stream. Otherwise the sender may not respond to its request for retransmission. In addition, a receiver should notify the application a reception failure event. Some applications may need this notification to drop incomplete data objects. It is desirable that higher layer protocols might include mechanism for retransmission of missing data objects according to the degree of importance. 5.7 Mean Round Trip Time Estimation The sender does not necessarily need to know the round trip time to the receivers for error recovery except for other purposes since it always immediately sends repair packets. Otherwise the round trip time plays an extremely important role in sending error reports and repairs. When sending error reports in a subgroup, a receiver generally does not know who will send repair packets. Thus it uses the mean round trip time to schedule error reports, so does the repair packet sender. All receivers should carefully estimate the mean round trip time values to make the error recovery algorithm work more efficiently. LRMP offers some mechanism for this purpose. At start-up, the initial mean round trip time for each subgroup should be set to a relatively conservative value using the following expression: MRTT = 200*(scope/63)^2 milliseconds where the upper bound is set to 800 milliseconds and the lower bound is set to 12 milliseconds. Consequently the groups at world, region and local site scope have approximatively the following initial mean round trip time values, o MRTT at scope <= 15: 12 milliseconds. o MRTT at scope 63: 200 milliseconds. o MRTT at scope >= 127: 800 milliseconds. The mean round trip time will get closer to its true value in the repair processes. NACK and NACK reply packets allow to measure the Liao [Page 21] INTERNET-DRAFT LRMP 13 October 1998 round trip time between the NACK sender and its responder. Each NACK packet carries a timestamp value (t1) which indicates the local time the NACK packet is sent. When a protocol entity sends the NACK reply packet, it puts this timestamp (t1) and the delay since the NACK packet was received into the reply packet. Upon reception of the NACK reply, the receiver can get a rough estimation of the round trip time as follows: NACK sender t1 (NACK) t2 (NACK_REPLY) ------|-----------------------|--------- \ / \ / ---------|-----------------|------------ NACK responder |<---- delay ---->| RTT = t2 - t1 - delay The mean round trip time in the corresponding subgroup is estimated using the following expression: MRTT = ALPHA*MRTT + (1 - ALPHA)*RTT where ALPHA is a smoothing factor set to 0.875. To avoid multiplication, the above formula can expressed as, MRTT = MRTT + ((RTT - MRTT) >> 3) There are two interesting side effects resulted from this scheme. When the network is not congested, the measured round trip time RTT is generally lower than the initial value that is relatively high. After a receiver sent a NACK packet and has received the repair packet, it will hold a mean round trip time lower than those never send a NACK packet. The next time packet loss occurs in that domain, it is most probably this receiver that will first timeout and send the NACK packet. This receiver will then get the mean round trip time even lower. In this way error report will tend to be done by a small number of receivers. This is a desirable behavior since there will be less concurrency in error report and the mean round trip time will get more realistic. When the network is congested, it is possible that the measured round trip time is larger than or around the initial value. In that case, error report will be carried out eventually by all receivers. It is more difficult to make the mean round trip time realistic. Liao [Page 22] INTERNET-DRAFT LRMP 13 October 1998 6. Sender and Receiver Reports 6.1 Sender Report Sender report packets are intended to provide receivers with the current state in data transmission such as the highest sequence number reached, the number of packets and that of octets sent since the beginning of the session. By analyzing a sender report, a receiver can figure out how many packets or octets it has missed and whether it is synchronized with the data transmission. A data sender should send some sender report packets in the following cases: o before the first time it will send data or it will begin to send data after a long silent time. This is to inform receivers the starting sequence number that will be used in the data transmission that follows. Thus a receiver can determine whether or not it has missed some starting packets. o periodically send a sender report during the data transmission to provide some statistics information to receivers. In this case, the sender sends either a report every two seconds or for every quarter of the send window data sent. o at the end of bulk data transfer. A sender report informs what is the end sequence number so that receivers can determine whether or not it has missed a portion of ending packets. When the data sender enters in idle state, i.e., it has no more data to send, the interval at which sender reports are sent should be doubled every time until the idle time reaches five minutes. The actual transmission rate at which the sender sends data can be computed from two successive sender reports. That is, the difference between the octet counts divided by the time interval. The later can be derived from the timestamp field. Similarly the transmission interval can also be calculated from two successive sender reports. That is, the difference between the packet counts divided by the time interval. 6.2 Selective Receiver Report Receiver reports are useful for both sender and receivers. They have multiple purposes. First they are used for estimation of the session population. It could be interesting to know how many participants are actually in a session. Based on this estimation, the sender could Liao [Page 23] INTERNET-DRAFT LRMP 13 October 1998 decide, for example, to go on or stop the session. Receiver reports are also used to get quality of service feedback from receivers, such as the loss rate, the number of packets and octets received. The feedback can be used to do flow and congestion control and adjust other transmission parameters, for example, the redundancy in the data stream. In addition they allow to know where there is a serious reception problem. By default receivers should not send receiver reports. A data sender sends a RS (receiver report selection) packet to tell receivers how to send receiver reports. The following ways are prescribed here for sending receiver reports: o a receiver sends once a report with a specified probability. This probability could be adapted by the sender to the number of receivers in a session. o a receiver sends a report periodically, but the total traffic (generated by receiver reports) is limited to a small percentage of the total bandwidth, i.e., 5%, like in RTP [23]. A receiver report selection packet contains a list of receivers which selects the receivers which should send receiver report(s). The list can be either a list of entity identifiers or a wild card address (0xffffffff). If it is a wild card address, that means that all receivers are selected. Otherwise only those whose identifier is among the list are selected. The selected receivers should use the probability and report interval parameters contained in the RS packet to control whether and how to send receiver report(s). If the probability field is set to zero, the selected receivers should send receiver reports periodically at the specified interval. However if the total traffic generated by control packets is above the bandwidth limit (5%), a receiver should calculate the report interval with respect to the bandwidth limit. If the probability field is greater than zero, the selected receivers should should use a random event generator (by means of a random value generator) to trigger whether or not they should send a report. If an event is generated with the given probability, the receiver should send only once a report. If both the probability and interval fields are set to zero, this packet must be interpreted as an unselect packet, that is, the selected receivers should stop sending receiver reports triggered by the previous receiver report selection packet. Liao [Page 24] INTERNET-DRAFT LRMP 13 October 1998 In all cases, the report interval should be randomized to avoid receiver report implosion. The randomized interval should be uniformly distributed in the range [I/4, I], where I is the interval given by the RS packet. The receiver report selection is sender specific, that is, a sender can select receivers to send reports just for itself, not for other senders. The settings carried in an RS packet remain valid until another RS packet modifies them. If it is absolutely necessary to modify current settings, the sender should send the new RS packet more than once to prevent possible loss. To have an approximative estimate of the session population, the probability field could be set to a non zero value and the list of receivers set to a wild card address. After the time corresponding to the interval field of the packet has elapsed, both the sender and receivers can estimate the number of receivers using the following expression, population = number of responders/probability. Note that due to possible packet loss, the real population is generally greater than this estimate. The round trip time RTT between a sender and a receiver can also be measured using a pair of sender report and receiver report, similar to the mechanism described in section 5.7. 7. Flow and Congestion Control Flow and congestion control constitutes a vital component of this protocol. First applications need flow control to adjust the transmission rate so that the data flow does not go above the processing capacity of the application. Otherwise packets may be lost at end hosts due to lack of processing power. Second the network can generally offer limited throughput. Even if there is a single data flow, it should still be adapted to the available network bandwidth, generally limited by the most bottleneck network path, to avoid congestion and achieve better performance. Third when there are several data flows in the network, it is extremely important that one flow does not shutdown the others. Instead the network bandwidth should be fairly shared among them. Besides a data flow should be competitive to ensure the responsiveness for applications. The rate-based control mechanism described here is intended to be friendly with other data flows under flow and congestion control such as TCP. Liao [Page 25] INTERNET-DRAFT LRMP 13 October 1998 7.1 Transmission Rate Data transmission and retransmission must be kept within a range specified by an application, in form of [Rmin, Rmax], where Rmin and Rmax are the lower and upper bounds of the data rate. If an application does not specify the rate range, the default range should be applied, for example, in [8 kbits/s, 64 kbits/s]. While the specification of rate range requires some experience in network communications, it is not a difficult task in most cases. At startup, the data rate R should be increased gradually up to the maximum rate, similar to the slow start in TCP [14]. The initial rate should be set to (Rmin + Rmax)/2. Then for every eighth of send window data sent, increase the rate by 0.125R if no contrary indications. Under normal network conditions, the data rate will reach the maximum rate at most after an amount of data corresponding to one and half send window have been sent. As a closed-loop approach is adopted, if the network is partitioned near the sender, no feedback will be returned from the network. In this case, it is desirable that the sender keeps the transmission at the minimum rate. 7.2 Congestion Control Flow and congestion control is performed at the sender side according to the feedback information gathered from the network. A network congestion problem can be reflected by the following parameters: o the information carried in NACK packets, such as the lowest lost sequence number and the bitmask. Congestion will cause more packets being lost, thus the number of retransmissions will increase. o indication from receivers. Due to local error recovery, additional traffic can be generated in local region that will not be seen by the sender. This may result in local congestion problem which needs to be reported to the sender. A sender generally maintains transmitted data in its buffer space, called send window, for possible retransmissions. Congestion control should function adaptively with regard to the above parameters and the send window size. To reduce unnecessary oscillations of the transmission rate, two adjustment operations should not be performed in too small time interval. It is recommended that adjustments are regularly carried out for every eighth of send window data sent. Liao [Page 26] INTERNET-DRAFT LRMP 13 October 1998 7.2.1 Adaptation to the Loss NACK packets can be used together with send window to limit the number of outstanding packets in the network, similar to ACK packets used in unicast. However the number of NACK packets heard from the network may not timely indicate a congestion problem due to NACK suppression and the fact that one NACK packet can report more than one lost packets. Congestion control that only makes use of this information will not respond quickly to congestion problem. But the lowest sequence number requested for retransmission by a NACK packet can be a good indication of congestion problem. If the lowest sequence number requested for retransmission is going out of the range maintained within the send window, that means that the sender is going ahead too fast and the error recovery will be soon impossible. If the sender does not slow down the data transmission, some receivers will not be able to synchronize with the data flow. To respond to this problem, a sender should calculate the difference between the current sequence number in transmission and the lowest sequence number requested for retransmission by a NACK packet. The rate should be adjusted as follows, given the current rate R: o If the difference is greater than half send window size, the transmission rate should be reduced to R/4. o If the difference is greater than one third of send window size, the transmission rate is reduced to R/2. o If the difference is greater than a quarter of send window size, the transmission rate is reduced to 3R/4. If the rate is not decreased according to the above criteria, it should not be increased at the next adjustment in presence of loss. 7.2.2 Adaptation to Congestion Indication from Receivers As described above, additional flow and congestion control is needed for the following reasons. First a receiver may not be able to synchronize with the data flow due to the fact that the application can not timely process received data. That may cause receive buffer overflow. Second the traffic introduced by local error recovery may cause local congestion problem. Congestion indication from receiver is intended to report possible local congestion problem that is Liao [Page 27] INTERNET-DRAFT LRMP 13 October 1998 useful for congestion avoidance. It was first proposed by [15]. The congestion flag in receiver report is used for this purpose. It should be set if the difference between the last sequence number delivered to the application and the highest sequence number heard from the sender is greater than half the receive window (buffer) size. This means that the number of outstanding packets is going to be too high so that the receiver will not be able to repair loss due to lack of buffer space. It is thus desirable for the sender to reduce the transmission rate in this case. Upon reception of a congestion indication, a sender should reduce the transmission rate to R/2 [22]. 7.2.3 Getting to the Equilibrium: Rate Increase If no decrease condition is true and no loss is reported, as described above, and one eighth of send window time has elapsed since the last decrease or increase, the rate should be increased by a factor of 0.125 (1/8) up to the maximum rate. With this control policy, one decrease to the half rate requires roughly six increases to reach the original rate. 7.3 The Case of Heterogeneous Receivers For large groups and in heterogeneous environments, it is often the case that a few receivers have very slow network connections than the others. In many cases, it is desirable that the slowest receivers do not block the data transmission for the majority of receivers. In particular, interactive applications require that new data is timely presented to the end users. Strictly applying the mechanism described above may lead the data rate always to the minimum in presence of very slow receivers. To avoid this to take place, several precautions should be taken: o the application should be advised to use an adequate minimum rate. But the minimum rate should be enough low to ensure the fairness of adaptive congestion control. o a sender can ignore the congestion signals from the slowest receivers. To do this, the sender needs to keep trace of the hosts that caused most often the rate decreases. In the case where the output queue is full most of time, the sender could make a heuristic decision to ignore further congestion signals from these hosts. Liao [Page 28] INTERNET-DRAFT LRMP 13 October 1998 o a receiver should leave the session if it encounters a high loss rate and the data rate is not reduced by the sender accordingly. For normal receivers, this is necessary to reject too aggressive data flows and possible traffic attacks. For very slow receivers, this is particularly necessary not to make their congested network links even worse and also not to block the session for the majority of receivers. The receiver may try to rejoin the session at later time. If the problem persists after several rejoin attempts, the receiver may definitely abandon this session. In particularly heterogeneous environments, layered multicast could be useful [24]. That is, an application can use multiple groups, each of them addresses a set of receivers with a particular level of QoS. 8. Packet Formats This section defines the packet formats that a protocol entity should implement to send data and control packets. A protocol entity should ignore those packets if their types are not recognized. 8.1 Unreliable Data Unreliable data packets have no additional options. They are transmitted in a best effort way and are not subject to sequence control. If they are lost, no efforts should be made for recovery. Therefore every unreliable data packet has a fixed header of 8 bytes. 8.2 Reliable Data Reliable data packets must contain the following options in addition to the common fixed header. In total, reliable data packets have a header of 16 bytes. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sequence number | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | application data | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: Liao [Page 29] INTERNET-DRAFT LRMP 13 October 1998 timestamp: 32 bits This field indicates the time at which this packet is sent. It is encoded as the middle 32 bits of NTP time, i.e., in units of fraction of 1/65536 second. sequence number: 32 bits This field indicates the sequence number of this packet. Application data with variable length should follow after the data packet header. The length of application data and packet header must fit into the maximum transmission unit. By doing a difference between the timestamp fields of two successive data packets, receivers can get a rough estimate of the transmission interval. This interval is only meaningful if the data flow is continuous. 8.3 Retransmission Data Retransmission data packets are sent in response to NACK packets. They must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | original entity ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sequence number | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | application data | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: original entity ID: 32 bits This field contains the entity identifier of the original sender of the data packet. sequence number: 32 bits This field gives the sequence number of the packet which should be the same as received from the original sender. It is prescribed that retransmission data packets have the same header length as reliable data packets for two reasons. First, to minimize necessary operations for retransmissions. The NACK responder just needs to modify the first few fields in the original Liao [Page 30] INTERNET-DRAFT LRMP 13 October 1998 data packet. Second, to avoid the total packet length exceeding the maximum transmission unit by appending additional fields in the header. 8.4 FEC Redundant Data A protocol entity may implement the forward error correction mechanism which is optional. FEC data packet format is specified in a way such that the data stream can still be correctly received even if FEC is not supported by a particular implementation. See section 9 for details. 8.5 Error report Error report (NACK) packets are used to report a set of adjacent lost packets and request the retransmission. NACK packets must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | srcID 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LLSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BFLP | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | (next srcID ...) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: timestamp: 32 bits This field indicates the time at which this packet is sent. It is encoded as the middle 32 bits of NTP time, i.e., in units of fraction of 1/65536 second. srcID 1: 32 bits This field indicates the entity identifier of the first data source to which packet loss is reported. LLSN (Lowest Lost Sequence Number): 32 bits This field indicates the sequence number of the first lost packet. Liao [Page 31] INTERNET-DRAFT LRMP 13 October 1998 BFLP (Bitmask of the Following Lost Packets): 32 bits This field indicates the bitmask of the following lost packets. In the bitmask, the bit k (1 <= k <= 32) is set to 1 if the corresponding sequence number (LLSN + k) is lost. The value 0x0101 means that packets with sequence number LLSN, LLSN + 1, LLSN + 9 are lost. next srcID: 32 bits This field indicates the entity identifier of the next data source to which packet loss is reported, followed by LLSN and BFLP fields. The timestamp is included for the purpose to measure the round trip time between the error reporter and the NACK packet responder. 8.6 NACK Reply NACK reply packets are sent in response to NACK packets and before sending repairs. They must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NACK Reporter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NACK timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since received NACK (DSRN) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NACK destinator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LRSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BFRP | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | (next NACK Reporter ...) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: NACK Reporter: 32 bits the entity identifier of the entity which reported the packet loss. NACK timestamp: 32 bits This field is a copy of the timestamp field carried in the NACK packet. Liao [Page 32] INTERNET-DRAFT LRMP 13 October 1998 delay since received NACK (DSRN): 32 bits This field indicates the time passed from the reception of the NACK packet to the reply. It is encoded in the fractions of 1/65536 seconds. The NACK reporter can use the NACK timestamp and the delay to estimate the round trip time to the responder. NACK destinator: 32 bits the entity identifier of the original data sender to which the packet loss is reported. LRSN (Lowest Repair Sequence Number): 32 bits This field indicates the lowest sequence number of the repair packets that will be sent by the responder. BFRP (Bitmask of the Following Repair Packets): 32 bits This field indicates the bitmask of the following repair packets. In the bitmask, the bit k (1 <= k <= 32) is set to 1 if the corresponding sequence number (LRSN + k) will be sent as repair. The value 0x0101 means that packets with sequence number LRSN, LRSN + 1, LRSN + 9 will be sent. 8.7 Sender Report Sender reports must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | next sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | packet count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | octet count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: timestamp: 32 bits This field indicates the time at which this packet is sent. It is encoded as the middle 32 bits of NTP time, i.e., in units of fraction of 1/65536 second. Liao [Page 33] INTERNET-DRAFT LRMP 13 October 1998 Next Sequence Number: 32 bits This field indicates the sequence number to be used in the next data packet. If the sender has not yet sent data packets (i.e., the packet count is 0), this field informs receivers the initial sequence number. Upon receipt of this packet, receivers could either set the initial sequence number for this sender or check that if they have received all packets till this number minus one. Packet Count: 32 bits This field indicates the total number of packets sent by the sender since the creation of the session. Octet Count: 32 bits This field indicates the total number of octets sent by the sender since the creation of the session. 8.8 Receiver Report Selection Receiver report selection packets must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | probability | interval | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | list of receivers | | (variable length) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: Timestamp: 32 bits This field indicates the time at which this packet is sent. It is encoded as the middle 32 bits of NTP time. Its purpose is to measure the round trip time between the sender and a receiver. Probability: 16 bit This field indicates the probability that a receiver should use to trigger the transmission of a receiver report. It is encoded in fractions of 1/65536. Interval: 16 bit This field indicates the time interval to use to randomize the report time. It is in number of seconds. Liao [Page 34] INTERNET-DRAFT LRMP 13 October 1998 List of receivers: variable length This field contains either a wild card address (0xffffffff) or a list of entity identifiers of the receivers that are selected to send receiver reports. 8.9 Receiver Report Receiver reports are sent only upon selection by the sender. While the report selection is sender specific, a receiver report packet may contain several receiver reports, each of them destinated to a specific sender. Receiver reports must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | srcID 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RS timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since RS (DSRS) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | expected sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C| loss rate | cumulative number of packets lost | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | (next srcID ...) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The fields have the following meaning: srcID 1: 32 bits the entity identifier of the data source to which the report is addressed. RS timestamp: 32 bits This field is a copy of the timestamp field carried in the receiver report selection packet. delay since RS (DSRS): 32 bits This field indicates the delay between the reception of the receiver report selection packet and the reply. It is encoded in the fractions of 1/65536 seconds. The sender can use the timestamp and the delay to estimate the round trip time to the reporter. Liao [Page 35] INTERNET-DRAFT LRMP 13 October 1998 expected sequence number: 32 bits This field indicates the expected sequence number from the sender. It implies that all packets less than this number have been correctly received. Thus it can be considered as a positive acknowledgement to all packets till this number minus one. C (Congestion): 8 bits This field is the congestion flag. If set, it indicates that the network path is congested between the sender and the receiver. loss rate: 8 bits This field indicates the fraction of packets lost since last report or since the receiver joined the session if no report was sent before. It is expressed in fractions of 1/128. cumulative number of packets lost: 24 bits This field indicates the total number of packets lost since the receiver joined the session. 9. Options for Forward Error Correction This section describes how to integrate forward error correction (FEC) into the protocol. Proper use of FEC technique can substantially improve the scalability due to the fact that it reduces the number of feedback packets from receivers [16] [17] [18] [19]. FEC is particularly useful in very large groups and over asymmetric network links where no or little bandwidth is available on feedback channels. Since the use of FEC is optional, the FEC integration scheme described here allows that implementations without FEC support can correctly receive data stream by ignoring FEC redundant packets. There are many ways to send FEC encoded packets. For example, FEC encoded packets can be sent either automatically or upon reception of NACK packets as repairs. This document is not intended to define what transmission schemes should be used by the sender, rather it only specifies the necessary elements that allows to implement a generic receiver. Thus no restrictions are put on how FEC packets should be transmitted as long as they allow correct recovery at receivers. Implementations are free to choose an adequate transmission scheme. Liao [Page 36] INTERNET-DRAFT LRMP 13 October 1998 9.1 Parameters FEC is based on the principle of adding some redundant packets to the transmitted data stream so that the original data packets can be recovered without retransmission up to certain loss. This can avoid generating NACK packets [18] [19]. FEC can be implemented using erasure codes [16]. More specifically, the data and redundant packets are grouped into blocks. A block contains k data packets and (n-k) redundant packets, where n is the size of a block. The (n-k) redundant packets are calculated from k data packets. At reception if k out of n packets are received, the original data packets can be reconstructed. To increase the efficiency of recovery in case of burst loss, a spacing parameter is introduced. That is, let the base sequence number be b and the spacing between sequence numbers be s. The sequence numbers of original data packets in a block are composed of: b, b + s, b + 2s, b + 3s, ..., b + (k-1)s Larger spacing parameter and large block size (n) requires more buffer space at receivers. In addition, for large block size, receivers that newly joined the session may need longer time to be able to start decoding FEC packets. FEC packets that belong to a same block are referenced by the same base sequence number, b. In addition, they should contain the block size (n), the number of redundant packets (n-k) and the spacing parameter. 9.2 FEC Packet Format FEC encoded redundant packets have the same header length as DATA packets and must contain the following options in addition to the common fixed header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BS | NR | spacing | RC | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | FEC data | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Liao [Page 37] INTERNET-DRAFT LRMP 13 October 1998 The fields have the following meaning: BSN (Base Sequence Number): 32 bits This field indicates the starting sequence number of the original data packets from which the redundant packets are calculated. BS (Block Size): 8 bits This field is the size of the current block minus one. NR (Number of Redundant Packets): 8 bits This field is the number of redundant packets in the current block minus one. spacing: 8 bits This field is the spacing between the sequence numbers of DATA packets minus one. RC (Redundant packet Count): 8 bits This field indicates the relative redundant packet number, starting from 0, in this block, i.e., 0 <= RC < NR. FEC data: variable length This field contains the redundant data calculated from the original data packets. 10. Security Considerations LRMP suffers from the same security liabilities as unicast protocols, for example, an imposter can fake source or destination network addresses, or change the header or data payload. This section describes the security constraints present in this protocol in addition to unicast protocols. Multicast is in general more vulnerable to malicious attacks. Implementations should take these constraints into account in order to ensure the normal working of the protocol. It is also desirable that an implementation offers possibilities to log detailed trace in the case where a possible attack is detected. 10.1 Traffic Attacks A malicious or misbehaving site may send as much data as possible to a session. This can result in that the session will be flooded by excessive data traffic so that the intended data traffic can not be transmitted as expected. At the worst, it can lead to a denial of service attack. If a protocol entity is detected doing so, the packets received from that entity should be dropped before further processing. The host doing this attack should also be indicated to Liao [Page 38] INTERNET-DRAFT LRMP 13 October 1998 the network administrator so that possible prevention could be made. 10.2 Error Recovery A malicious or misbehaving site may continuously request unnecessary retransmissions by sending excessive NACK packets. Since the retransmission will take some useful bandwidth, this can slow down the transmission of new data. To avoid this problem, a protocol entity can place an upper bound to the number of NACKs that can be generated by a receiver, for example, in form of ratio of NACKs/DATA packets, as well as an upper bound to the number of retransmissions of a same data packet. A protocol entity should reject NACK packets received from a receiver if the number of NACKs received from this receiver is abnormally much higher than any other receivers. It is possible that a protocol entity modifies the data packets received from the original sender and sends them as repairs. In absence of low level authentication, it is recommended that applications use some mechanism to authenticate the information carried by the protocol. 10.3 Entity Identifier A malicious or misbehaving site may use an entity identifier already used by another entity to send data or erroneous control packets. In this case, it is important that a protocol entity correctly implements the mechanism described in section 3.3 to deny packets from a forged entity. A malicious or misbehaving site may also use a low entity identifier in repair process, either for error report or sending repairs. This entity may send or respond to NACK packets, then stop the repair process. If a protocol entity implements incorrectly the repair procedure, loss repair may depend on the entity with low entity identifier and thus can be blocked. A malicious or misbehaving site may change very frequently its entity identifier. If an implementation keeps the list of all entities heard in the session, that will make the size of the list continuously growing and consequently will increase the memory usage. To avoid this type of problems, an implementation can take two precautions. First it can limit the size of the list. If the number of entities exceeds the maximum size, old entries are removed. Second it can reuse an old entry that is from the same network address as the new entity and has not been heard since a long time. Liao [Page 39] INTERNET-DRAFT LRMP 13 October 1998 10.4 Round Trip Time A malicious or misbehaving site may try to make the estimation of round trip time inadequate by putting bad delay either in NACK reply or receiver report. To prevent this type of attacks, a protocol entity should place a lower bound and a upper bound to measured round trip time and limit the portion contributed by a single entity to the mean round trip time. 10.5 Congestion Control A malicious or misbehaving site may try to make the transmission rate too low by sending false NACK packets or congestion indication in the receiver reports. It is desirable to advise the users to put a reasonable lower bound for the transmission rate. In addition, implementations can ignore those NACK packets or congestion indications if they come too often from the same site. 10.6 Confidentiality There is no data encryption prescribed by this protocol. Implementors should be aware that as the information is transmitted in clear, all receivers can receive and eventually interpret the information. If an application wants that only the intended receivers decode the information, it should use an encryption algorithm to protect the information. 10.7 Level of Redundancy Transmission of redundant packets adds additional traffic to the data transmission. When packet loss occurs, redundant packets could be useful for recovering loss without feedback from receivers. However transmission of large amounts of redundant data could increase network congestion, leading to a contradictory effect of the use of forward error correction. In the worst case, it can lead to a denial of service attack. A protocol entity implementing FEC in data transmission should use the feedback from receivers such as the number of NACKs, round trip time variation to adjust the level of redundancy in the data stream. A. Requirements of RFC 2357 This section describes how well this specification satisfies the criteria of RFC 2357 [5] and future work. LRMP has been implemented as a reusable library and a number of tests have been carried out [26]. More performance evaluation results will be presented later. Liao [Page 40] INTERNET-DRAFT LRMP 13 October 1998 A.1 Scalability The basic error recovery scheme of LRMP is derived from SRM and thus has similar behavior as SRM which is well analyzed in [6] for different network topologies. The parameters that influence the scalability are mainly the number of duplicate NACKs and that of duplicate repairs in case of loss. From [6], these numbers remain enough stable with the group size. From our experiments [26], these numbers increase slightly with the group size due to various practical reasons, e.g., variable round trip time and transmission rate. But they remain enough reasonable. LRMP adds some overhead by adopting local error recovery. If an error could be recovered in a local group, LRMP will behave better than the original scheme of SRM. Otherwise several messages could be generated in the local groups. Since NACK packets are of small size (24 bytes), the additional traffic is not significant. In general, data packets can be considered much costly than NACK packets in terms of network resource usage. Besides the NACKs sent to the local groups without local repair are not useless, they have also a positive effect, called election phase. That is, they can be thought to elect the protocol entity which will report the packet loss to a higher level group. This can reduce the probability of duplicate NACKs in subgroups with larger TTL values. The scalability of the error recovery scheme is improved in case of burst loss due to the fact that a NACK packet can report up to 33 lost packets. Compared to one NACK per lost packet, the probability of duplicate NACK packets is considerably reduced. When there are multiple senders in a session, the packet loss for one sender is handled independently of the others. It is prescribed that several control packets for different senders can be multiplexed into one outgoing packet if they are sent within the same scope. In addition of the network bandwidth, the cost of multiple senders is the cache space for out-of-order packets. More senders require more cache space. Therefore LRMP can generally support a limited number of senders which depends on available system resources. A sender does not need to maintain the state about every receiver. It generally needs only to maintain several recently heard receivers for the purpose to detect the slowest receivers or possible attacks. Similarly receivers only need to maintain the sender state. Thus LRMP should be enough scalable to large groups in terms of system resource usage. Liao [Page 41] INTERNET-DRAFT LRMP 13 October 1998 A.2 Congestion Control The congestion control scheme works in function of the send window size and loss rate (which is reflected by NACKs). For reasonable send window size, e.g., 32 or 64, the congestion control scheme reacts rather quickly in case of congestion problems. Overall data rate can very quickly drop to the minimum as data retransmission is also subject to the same congestion control as transmission of new data. Further simulation and test are needed to see what fairness can be achieved with competing TCP-like data flows [25]. It should be noted that it is especially hard to behave the same way as TCP while a different algorithm is used. However it could be easier make the congestion control less aggressive than TCP. For large groups and heterogeneous environments, congestion control may often lead the data rate to the minimum due to a few slow receivers. A number of heuristic methods are proposed in section 7.3 to tackle this problem. In particular, a receiver should leave the session if it deduces that there is serious mismatch between the session data rate and the available bandwidth. It is worth noting that though LRMP was designed for heterogeneous environments, a significant part of users use it in well controlled environments such as private networks where receivers are rather homogeneous. A.3 Congestion Avoidance It is hard to do congestion avoidance when packet loss is not present. As indicated in [27], in lack of any congestion indication from the network, the only available information about the possible congested network path is the packet loss and propagation delay. The possibility to do rate adaptation according to the round trip time variance was explored. To make the rate adaptation pertinent, a sender must know the round trip times to all receivers. Yet these RTTs need to be enough fresh to be usable for congestion control. The need to frequently measure the RTTs to all receivers adds a lot of complexity and makes the scalability worse. Using the mean RTT to all receivers for rate adaptation leads too often to unnecessary adjustments and is not suitable for heterogeneous environments due to the diversity of round trip times to different receivers. Thus LRMP uses only the packet loss information for congestion control. LRMP can respond to congestion problem at very early stage Liao [Page 42] INTERNET-DRAFT LRMP 13 October 1998 once packet loss is detected. Again we need to look further into the consequences of this scheme to see the behavior at both sides (LRMP and TCP) and what is the bandwidth share. A.4 Security Concerns A number of security constraints are presented in section 10. They should be carefully taken into account by any implementation. The current specification does not handle the issues related to the authentication and encryption of information. It is recommended that an application should make use of adequate techniques to ensure the authentication, integrity, and confidentiality of the information transported by LRMP. B. References [1] S. Deering, "Host extensions for IP multicasting", RFC 1112, August 1989. [2] J. Postel, "Transmission Control Protocol", STD 7, RFC 793, USC/Information Sciences Institute, Sept. 1981. [3] J. Postel, "Internet Protocol", STD 5, RFC 791, USC/Information Sciences Institute, Sept. 1981. [4] D. Mills, "Network Time Protocol Version 3", RFC 1305, UDEL, March 1992. [5] A. Mankin, A. Romanow, S. Bradner, V. Paxson., "IETF Criteria for Evaluating Reliable Multicast Transport and Application Protocols", RFC 2357, June 1998. [6] S. Floyd, V. Jacobson, S. McCanne, C. Liu, L. Zhang, "A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing", ACM Transactions on Networking, Nov. 1996. [7] D. D. Clark and D. L. Tennenhouse, "Architectural considerations for a new generation of protocols," in SIGCOMM Symposium on Communications Architectures and Protocols, (Philadelphia, Pennsylvania), pp. 200-208, IEEE, Sept. 1990. Computer Communications Review, Vol. 20(4), Sept. 1990. [8] J. C. Lin and S. Paul, "RMTP: A Reliable Multicast Transport Protocol", Proceedings of IEEE INFOCOM '96, March 1996, pp.1414 - 1424. Liao [Page 43] INTERNET-DRAFT LRMP 13 October 1998 [9] B. Whetten et al., "The RMTP-II Protocol", Internet Draft, draft-whtten-rmtp-ii-00.txt, April 1998. [10] T. Speakman, D. Farinacci, S. Lin, and A. Tweedly, "Pragmatic Group Multicast (PGM) Transport Protocol Specification", Internet Draft, draft-speakman-pgm-spec-01.txt, Jan. 1998. [11] M. Hofmann, "Enabling Group Communication in Global Networks", Proc. of Global Networking '97, Volume II, pp321-330, Calgary, Alberta, Canada, June 1997. [12] C. A. Kent and J. C. Mogul, "Fragmentation Considered Harmful", Proc. of SIGCOMM '87, August 1987. [13] P. Karn and C. Partridge, "Improving Round-Trip Time Estimates in Reliable Transport Protocol", ACM Transactions on Computer Systems (TOCS), Vol. 9, No. 4, pp. 364-373, November 1991. [14] V. Jacobson, "Congestion Avoidance and Control", Computer Communications Review, vol.18, pp314-329, August 1988. [15] K.K. Ramakrishnan and R. Jain, "A Binary Feedback Scheme for Congestion Avoidance in Computer Networks", ACM Transactions on Computer Systems (TOCS), Vol. 8, No. 2, pp 158-181, May, 1990. [16] R. E. Blahut, "Theory and Practice of Error Control Codes", Addison Wesley, 1984. [17] C. Perkins and O. Hodson, "Options for Repair of Streaming Media", RFC 2354, UCL, Jube 1998. [18] L. Rizzo, "Effective erasure codes for reliable computer communication protocols", ACM Computer Communication Review, Vol.27, no.2, April 1997. [19] J. Nonnenmacher, E. W. Biersack and D. Towsley, "Parity-Based Loss Recovery for Reliable Multicast Transmission", Proc. of SIGCOMM '97, Sept. 1997. [20] D. Meyer, "Administratively Scoped IP Multicast", RFC 2365, University of Oregon, July 1998. [21] S. Floyd and K. Fall, "Promoting the Use of End-to-End Congestion Control in the Internet", submitted to IEEE/ACM Transactions on Networking, Feb. 1998. Liao [Page 44] INTERNET-DRAFT LRMP 13 October 1998 [22] Dah Ming Chiu, "Flow and Congestion Control in Reliable Multicast", Presentation at IRTF RM group meeting, UCL, July 1998, available from http://www.east.isi.edu/RMRG/. [23] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", RFC 1889, January 1996. [24] L. Vicisano, L. Rizzo and J. Crowcroft, "TCP-like congestion control for layered multicast data transfer", IEEE Infocom '98, March 1998. [25] M. Handley, "Reference simulations for multicast congestion control", Presentation at IRTF RM group meeting, UCL, July 1998, available from http://www.east.isi.edu/RMRG/. [26] T. Liao, "Light-weight Reliable Multicast Protocol", Technical report, available at http://webcanal.inria.fr/lrmp/lrmp_paper.html, October 1998. [27] Jamal Golestani, "Fundamental Observations on Multicast Congestion Control in the Internet", Presentation at IRTF RM group meeting, UCL, July 1998, available at http://www.bell-labs.com/user/golestani/jamal1.ps. C. Acknowledgments Thanks to the IRTF RM group for useful ideas and specially to the following people for constructive suggestions and support: Jon Crowcroft (UCL), Jean-Francois Abramatic (INRIA/W3C), Yves Peynaud (INRIA). D. Author's Address Tie Liao INRIA Domaine Voluceau - BP 105 78153 Le Chesnay Cedex France Email: Tie.Liao@inria.fr Liao [Page 45]