idnits 2.17.1 draft-liao-lrmp-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 4 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1010 has weird spacing: '... sender t1 ...' == Line 1428 has weird spacing: '... modify the f...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (13 October 1998) is 9327 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 2014 looks like a reference -- Missing reference section? '2' on line 2017 looks like a reference -- Missing reference section? '5' on line 2026 looks like a reference -- Missing reference section? '6' on line 2030 looks like a reference -- Missing reference section? '10' on line 2048 looks like a reference -- Missing reference section? '9' on line 2045 looks like a reference -- Missing reference section? '11' on line 2052 looks like a reference -- Missing reference section? '7' on line 2035 looks like a reference -- Missing reference section? '3' on line 2020 looks like a reference -- Missing reference section? '4' on line 2023 looks like a reference -- Missing reference section? '12' on line 2056 looks like a reference -- Missing reference section? '23' on line 2095 looks like a reference -- Missing reference section? 'Rmin' on line 1193 looks like a reference -- Missing reference section? 'Rmax' on line 1193 looks like a reference -- Missing reference section? '14' on line 2063 looks like a reference -- Missing reference section? '15' on line 2066 looks like a reference -- Missing reference section? '22' on line 2091 looks like a reference -- Missing reference section? '24' on line 2099 looks like a reference -- Missing reference section? '16' on line 2070 looks like a reference -- Missing reference section? '17' on line 2073 looks like a reference -- Missing reference section? '18' on line 2076 looks like a reference -- Missing reference section? '19' on line 2080 looks like a reference -- Missing reference section? '26' on line 2107 looks like a reference -- Missing reference section? '25' on line 2103 looks like a reference -- Missing reference section? '27' on line 2111 looks like a reference -- Missing reference section? '8' on line 2041 looks like a reference -- Missing reference section? '13' on line 2059 looks like a reference -- Missing reference section? '20' on line 2084 looks like a reference -- Missing reference section? '21' on line 2087 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 31 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT Tie Liao 3 Expires: 13 April 1999 INRIA 4 13 October 1998 6 Light-weight Reliable Multicast Protocol Specification 7 draft-liao-lrmp-00.txt 9 Status of this memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, 13 and its working groups. Note that other groups may also distribute 14 working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is not appropriate to use Internet-Drafts as reference 19 material or to cite them other than as "work in progress". 21 To view the entire list of current Internet-Drafts, please check 22 the "1id-abstracts.txt" listing contained in the Internet-Drafts 23 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net 24 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au 25 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu 26 (US West Coast). 28 Abstract 29 This document describes LRMP, the Light-weight Reliable Multicast 30 Protocol. LRMP provides a minimum set of functions for end-to-end 31 reliable network transport suitable for bulk data transfer to 32 multiple receivers. LRMP is designed to work in heterogeneous 33 network environments and support multiple data senders. A totally 34 distributed control scheme is used for error recovery so that no 35 prior configuration and no router support are required. LRMP also 36 includes a selective feedback mechanism allowing to monitor the 37 quality of service at receivers. In LRMP, flow and congestion 38 control is performed based on NACK packets and congestion indication 39 from receivers. Forward error correction is supported as an 40 independent optional module. 42 Table of Contents 44 1. Introduction ............................................ 3 45 1.1 Application Scope ....................................... 3 46 1.2 Overview ................................................ 4 47 1.3 Implementation .......................................... 6 48 2. Definitions ............................................. 7 49 3. Packet Types and Common Header Fields ................... 8 50 4. Data Transmission ....................................... 11 51 4.1 Reliable Data Transmission .............................. 11 52 4.2 Unreliable Data Transmission ............................ 12 53 5. Packet Loss Recovery .................................... 12 54 5.1 Organization of Subgroups ............................... 13 55 5.2 Enabling/disabling a Subgroup ........................... 14 56 5.3 Error Report ............................................ 15 57 5.4 Duplicate NACK Detection ................................ 17 58 5.5 Retransmission .......................................... 18 59 5.6 Reception Failure ....................................... 20 60 5.7 Mean Round Trip Time Estimation ......................... 21 61 6. Sender and Receiver Reports ............................. 23 62 6.1 Sender Report ........................................... 23 63 6.2 Selective Receiver Report ............................... 23 64 7. Flow and Congestion Control ............................. 25 65 7.1 Transmission Rate ....................................... 26 66 7.2 Congestion Control ...................................... 26 67 7.2.1 Adaptation to the Loss ................................. 26 68 7.2.2 Adaptation to the Congestion Indication from Receivers .. 27 69 7.2.3 Getting to the Equilibrium: Rate Increase ............... 28 70 7.3 The Case of Heterogeneous Receivers ..................... 28 71 8. Packet Formats .......................................... 29 72 8.1 Unreliable Data ......................................... 29 73 8.2 Reliable Data ........................................... 29 74 8.3 Retransmission Data ..................................... 30 75 8.4 FEC Redundant Data ...................................... 31 76 8.5 Error Report ............................................ 31 77 8.6 NACK Reply .............................................. 32 78 8.7 Sender Report ........................................... 33 79 8.8 Receiver Report Selection ............................... 34 80 8.9 Receiver Report ......................................... 35 81 9. Options for Forward Error Correction .................... 36 82 9.1 Parameters .............................................. 37 83 9.2 Packet Format ........................................... 37 84 10. Security Considerations ................................. 38 85 A. Requirements of RFC 2357 ................................ 40 86 B. References .............................................. 43 87 C. Acknowledgments ......................................... 45 88 D. Author's Address ........................................ 45 90 1. Introduction 92 This document describes the Light-weight Reliable Multicast Protocol 93 (LRMP) version 1 which is intended to provide reliable multicast 94 transport service to applications. The reliable service means source 95 ordered and reliable data delivery on an end-to-end basis. 97 1.1 Application Scope 99 Applications that require group communication are continuously 100 emerging. In group communication, data sent by one host should arrive 101 at multiple receivers. These applications can benefit from IP 102 Multicast [1] that allows one-to-many data delivery at network layer, 103 making optimization possible by replication of data at routers. LRMP 104 (version 1) is intended to be used for a subset of these applications 105 that have the following characteristics: 107 o large-scale communication group. The users participating in 108 group communication are generally dispersed in a wide area 109 network environment, thus their network links are heterogeneous, 110 i.e., the end-to-end transmission delay and bandwidth differ a 111 lot. 113 o large-size communication group. The number of users participating 114 in group communication may be potentially very large. 116 o loose time constraint. This means that the transmission delay is 117 not critical for applications. In another extreme, real time 118 audio and video conference applications need low transmission 119 delay in order for that audio and video data could be played in a 120 synchronized fashion. 122 o reasonable performance. These applications do not need to reach 123 very high data throughput, instead they need to keep data 124 transmission at a reasonable rate. 126 o loosely coupled group. Applications may join or leave the group 127 at any time and at any location. 129 o requiring that data is reliably delivered to receivers. 131 Typical applications include bulk data transfer (e.g., file 132 transfer), multicast chat and some kinds of collaborative work. These 133 characteristics allow to adopt conservative approaches when doing 134 error recovery and flow control in order to keep the total traffic 135 of a multicast session at a reasonable level with regard to data 136 traffic. 138 1.2 Overview 140 IP Multicast [1] provides point-to-multipoint datagram delivery 141 service to communicating hosts, i.e., transmission of datagrams from 142 a sender to a group of simultaneous receivers. IP Multicast itself is 143 not reliable, datagrams are not guaranteed to arrive at receivers and 144 in the order they are sent. In unicast case, reliable data delivery 145 is generally achieved by TCP [2] at the transport layer in the 146 Internet environment. In multicast case, reliable multicast transport 147 protocols are needed for applications requiring reliable data 148 transfer. 150 The Light-weight Reliable Multicast Protocol (LRMP) is intended to 151 offer end-to-end reliable network transport services suitable for 152 bulk data transfer to multiple receivers. While some TCP design 153 principles can be applied to reliable multicast, the design 154 philosophy for reliable multicast protocols is rather different. 155 Reliable multicast protocols have to offer good scalability, 156 otherwise they may be potentially dangerous to the Internet if used 157 in a wide scale [5]. 159 A common approach for scalability is to replace positive 160 acknowledgements (ACK) by negative acknowledgements (NACK) [6]. This 161 can reduce considerably the volume of control traffic under normal 162 network conditions. However for large groups, this is insufficient 163 for several reasons. First, the number of NACK packets could still 164 be very high when packet loss occurs simultaneously at many 165 receivers. Second, the resulted control traffic and retransmitted 166 packets still go to every receiver in the group. In particular, 167 heterogeneous environments make these problems much worse. Other 168 techniques are necessary to further improve the scalability, such as 169 duplicate NACK and repair suppression [6], and local error recovery. 171 LRMP is intended to offer good scalability on the current 172 infrastructure of Internet, that is, no routers involved in error 173 recovery [10]. The generated control traffic and retransmissions 174 should be kept at a low level and in a limited scope. In case of 175 reception errors, error recovery is first restricted to its local 176 area, i.e., local group. If hopefully it can be done within the 177 local area, this will be transparent to the sender and receivers in 178 other areas. Local error recovery technique allows to isolate 179 reception errors in a limited region when possible. In this way, it 180 avoids that NACK and retransmitted packets travel across the whole 181 group. 183 In LRMP, there are no central control points, namely representatives, 184 every receiver plays a strictly equal role. Using centralized control 185 places constraints on flexibility [9] [10] [11]. Unless the router 186 assisted control scheme can be applied, decentralized control is 187 more appropriate for a general use on the Internet. LRMP does not try 188 to rebuild a tree structure from the multicast tree, instead a 189 totally distributed control scheme is used. Local groups are formed 190 implicitly for local error recovery. Therefore all packets to form 191 local group and assign local group representatives are not necessary. 192 LRMP is a receiver-reliable transport protocol. Each receiver is 193 responsible for ensuring its correct reception of data while it may 194 be cooperative by allowing to send repair packets to its neighbors. 195 The failure at one receiver has generally not much to do with the 196 quality of service (QoS) at other receivers. 198 LRMP attempts to optimize the message exchange for error recovery. 199 For duplicate NACK and repair suppression, some basic mechanisms of 200 SRM [6] are adopted, including NACK based loss report, exponential 201 back-off timers for sending NACK and missing (repair) packets. While 202 there are no representatives in local recovery, a receiver can 203 dynamically play this role for a repair process in a local group. 204 This can further reduce the probability of duplicate NACK and repair 205 packets. 207 It is very difficult, if not impossible, to ensure both high 208 scalability and full reliability in a large scale. Note that at 209 present hosts on the Internet generally have very heterogeneous 210 network links. While one encounters failure in reception, others may 211 have perfectly received all packets. If the missing packets are 212 repeatedly transmitted for a few receivers by the transport protocol 213 while the session is going on, infinite buffer size is theoretically 214 required. LRMP is not intended to be fully reliable for an arbitrary 215 set of receivers, whereas applications could still employ some 216 specific mechanisms to cope with this problem, for example, using a 217 separate session for some particularly slow receivers. In LRMP, 218 receivers either successfully receive all packets or generate 219 reception error events in case of failure. Besides LRMP does not make 220 effort to recover missing packets that are transmitted before a 221 receiver joins a session. 223 In order to get feedback information from receivers, a sender 224 selective receiver report mechanism is included. That allows to 225 trigger the transmission of receiver reports while the required 226 bandwidth is totally under control. This technique is also called 227 sampling. Receiver reports can be considered in some sense as 228 positive ACKs and complement the NACK mechanism to make this 229 protocol fully reliable when the number of receivers is low. 231 LRMP uses a rate based flow and congestion control mechanism to 232 control the traffic sent to the network. Congestion control is 233 necessary to ensure that the network bandwidth is fairly shared among 234 a number of competing data flows when network congestion occurs. The 235 flow and congestion control mechanism included in LRMP is intended to 236 be friendly and competitive with other data flows under flow and 237 congestion control such as TCP. 239 By the name, LRMP is intended to be light-weight. The protocol 240 overhead and control traffic is relatively low to increase the 241 efficiency in data transfer. LRMP is intended to be embeddable into 242 an application. 244 LRMP is designed, at least initially, not for use with real time 245 continuous media applications. The performance provided by LRMP, 246 e.g., the transmission delay and throughput, may not satisfy the 247 requirements of these applications. While LRMP is not aimed at 248 providing high performance, the data throughput and transmission 249 delay should be reasonable compared with the quality of service 250 provided by the underlying network. 252 This document does not provide any mechanisms either to do membership 253 control in group communications or to collect information about each 254 session member. Applications are free to join and leave the session 255 at any time and at any location. 257 In addition this document does not include any mechanism for 258 authentication and protection of information conveyed by the 259 specified protocol. But it does provide a description on security 260 constraints that should be taken into account by applications, see 261 section 10 for more details. 263 1.3 Implementation 265 This specification is based on the lower layer network services that 266 provide identification of a transport end-point by a transport 267 address, transfer of data from or to the given transport address with 268 indication of received datagram length. A transport address is 269 composed of a network address and a port number. Typical example of 270 the target underlying network service is UDP/IP multicast. 272 All traffic including data and control is carried on the group 273 address and port unless otherwise noted. LRMP operates on a single 274 group address/port pair. 276 LRMP could be implemented as an independent module following the 277 layered design principles or as an integrated part of an application 278 following the principles of application level framing [7]. 280 All integer fields are carried in network byte order, that is, most 281 significant byte (octet) first. This byte order is commonly known as 282 big-endian. The transmission order is described in detail in [3]. 284 Unless otherwise noted, numeric constants are in decimal (base 10). 285 All header data is aligned to its natural length, i.e., 16-bit fields 286 are aligned on even offsets, 32-bit fields are aligned at offsets 287 divisible by four, etc. Octets designated as padding have the value 288 zero. 290 Wallclock time (absolute time) is represented using the timestamp 291 format of the Network Time Protocol (NTP), which is in seconds 292 relative to 0h UTC on 1 January 1900 [4]. The full resolution NTP 293 timestamp is a 64-bit unsigned fixed-point number with the integer 294 part in the first 32 bits and the fractional part in the last 32 295 bits. In some fields where a more compact representation is 296 appropriate, only the middle 32 bits are used; that is, the low 16 297 bits of the integer part and the high 16 bits of the fractional part. 298 The high 16 bits of the integer part must be determined 299 independently. 301 2. Definitions 303 Application: either a higher layer protocol entity or end user 304 program. It is the source and/or consumer of the data 305 transported by LRMP. 307 FEC: forward error correction, a mechanism that adds redundant 308 packets in the transmitted data stream so that partial missing 309 packets could be recovered without retransmission. 311 Subgroup: the set of protocol entities in a local region which is 312 reachable using a scope (TTL) value from a particular protocol 313 entity. 315 Local recovery: a scheme for requesting and sending missing packets 316 in a subgroup, i.e., whose scope value is less than the session 317 scope value. 319 LRMP entity: or protocol entity, an implementation of LRMP that is 320 able to send, receive and reply to LRMP data and control packets. 322 LRMP receiver: or simply receiver, an LRMP entity that makes efforts 323 to receive data packets sent by other LRMP entities. 325 LRMP sender: or simply sender, an LRMP entity that sends both data 326 and control packets. 328 LRMP session: or simply session, the association among a set of LRMP 329 entities. For each LRMP entity, the session is defined by a 330 particular pair of destination transport addresses (one network 331 address plus a port). The destination transport address pair may 332 be common for all LRMP entities, as in the case of IP multicast, 333 or may be different for each, as in the case of individual 334 unicast network addresses plus a common port. 336 NACK: negative acknowledgement used to report missing packets. 338 Entity Identifier: a 32 bit integer that identifies the source of 339 LRMP packets. It is carried in LRMP packet header so as not to 340 be dependent upon the network address. All packets from a source 341 form part of the same sequence number space, so a receiver 342 groups packets by the entity identifier. An LRMP entity should 343 use the same identifier during the lifetime of an LRMP session. 345 TTL: time-to-live, a parameter used in sending IP multicast packets 346 to limit the scope of propagation. 348 3. Packet Types and Common Header Fields 350 This section defines a minimal set of packet types for data transport 351 and control. Packets other than reliable data are sent in a best 352 effort way, that is, there is no mechanism to recover them. The 353 sender should retransmit these packets in need. 355 All packets from a single protocol entity should use the same entity 356 identifier. Otherwise they are considered coming from a different 357 entity. 359 3.1 Common Fixed Header 361 All LRMP packets must contain the following fixed header, followed by 362 packet specific options: 364 0 1 2 3 365 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 |V=1|P| PT | Scope | Length | 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 369 | Entity ID | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 372 The fields have the following meaning: 374 Version (V): 2 bits 375 This field identifies the version of LRMP defined by the current 376 specification which is one (1). 378 Padding (P): 1 bit 379 If this bit is set, the packet contains one or more additional 380 padding octets at the end which are not part of the application 381 data. The last octet of the packet contains a count of how many 382 octets should be ignored. 384 Packet Type (PT): 5 bits 385 This field identifies the type of the packet. 387 Scope: 8 bits 388 This field indicates the time-to-live (TTL) value used to 389 transmit the packet. 391 Length: 16 bits 392 This field indicates the total length of the packet in number 393 of octets, including the header and data. 395 Entity ID: 32 bits 396 This field contains the entity identifier that uniquely 397 identifies the sender of the packet. 399 Additional packet specific options, if present, should immediately 400 follow this fixed header and should be of an integer number of 32 bit 401 words. 403 3.2 Packet Types 405 The following packet type (PT) values are defined in this document 406 (for backward compatibility, the type values are not continuous): 408 Value Type Abbreviation 409 0 Reliable Data DATA 410 4 Retransmission Data R_DATA 411 8 Unreliable Data U_DATA 412 12 FEC Redundant Data F_DATA 413 17 NACK NACK 414 18 NACK Reply R_NACK 415 19 Sender Report SR 416 20 Receiver Report Selection RS 417 21 Receiver Report RR 419 The values not included in the above list are reserved for future use. 421 3.3 Entity Identifier 423 Every protocol entity is assigned with an entity identifier which is 424 a randomly chosen 32 bit integer. The entity identifier must be 425 globally unique within a particular session. Though the probability 426 of identifier collision is rather low in theory, to further minimize 427 collision errors, a protocol entity must detect whether or not there 428 is a collision between the received identifiers and its own each time 429 it receives a packet. In the case of collision, it's the receiver 430 which must change its identifier if another entity is a sender. If 431 the concerned protocol entities are receivers, they must all change 432 their identifiers to another randomly chosen identifier. It is 433 expected that the collision is resolved at the initialization phase 434 for senders, i.e., a sender sends several control packets before data 435 transmission to detect possible collisions. 437 At receivers, if a protocol entity sees that two senders use the same 438 identifier from different network locations, it should reject the one 439 which was heard later. 441 A protocol entity should use only one entity identifier during the 442 lifetime of a session. If it changes the identifier during a session, 443 it is considered as a completely new entity. 445 In data transmission, the entity identifier is used to identify the 446 data stream from a sender, i.e., the packets with the same entity 447 identifier form a data stream ordered by continuous sequence numbers. 448 So as to retransmissions by a third party, the original entity 449 identifier is included. 451 3.4 Scope 453 The scope field of an LRMP packet should be set to the same TTL value 454 with that the packet is sent to the underlying network layer. In 455 LRMP, packets may be transmitted using different TTL values due to 456 local error recovery. The scope field is used by a receiver to 457 approximatively estimate the distance to the packet sender. 459 3.5 Maximum Packet Length 461 Inadequate data fragmentation may introduce inefficiency in the 462 network and should be generally avoided [12]. Optimal solution 463 requires the cooperation of applications. Therefore proper data 464 fragmentation/reassembly are left to higher layer protocols to 465 perform. 467 LRMP places a upper bound on the maximum length of packets that can 468 be transmitted, in other words, the maximum transmission unit (MTU). 469 The MTU should be set to the minimum MTU of the network path. Modern 470 internet routers support larger and larger MTU than a decade before. 471 Consequently the maximum transmission unit in LRMP is set to 1400 472 octets, including packet header. Any packets larger than this length 473 are considered not valid and should be dropped by receivers. This 474 gives also the advantage to optimize the buffer management at 475 receivers. 477 3.6 Multiplexing 479 The length field is included in the packet header for multiplexing 480 several data or control packets into one outgoing packet sent to 481 the underlying network. It is not allowed to multiplex packets 482 with different scope value in one compound packet for obvious 483 reasons. LRMP entities need the indication of received data length 484 from the underlying network protocol to correctly parse multiplexed 485 packets, which is supported by most of UDP/IP implementations. 487 4. Data Transmission 489 This section describes the data transmission mechanism that must be 490 implemented by a protocol entity. 492 LRMP provides a per packet selective reliability, i.e., application 493 data can be sent either reliably or in a best effort way. This allows 494 an application to use the same transport protocol to send reliable 495 and unreliable data. Application must indicate the reliability 496 (a boolean value) when sending data to LRMP. 498 4.1 Reliable Data Transmission 500 In a normal operation scenario, a data source multicasts a number of 501 DATA packets to a session. Each DATA packet is ordered by a sequence 502 number. A receiver receives these data packets and delivers them to 503 the application if they are in order. If packet loss is detected, 504 the receiver sends a NACK packet to request the retransmission. Any 505 protocol entity that holds the requested lost packets will try to 506 send repair packets (see the next section). 508 In general, data packets, the first time transmitted, are sent to 509 the whole group, that is, the scope field should be set to the 510 time-to-live value of the session accordingly. This scope field 511 informs receivers the maximum TTL value which should be used to 512 report loss and do repair. 514 Each reliable data packet contains a sequence number and the sender's 515 entity identifier. The sequence number is increased by one for each 516 reliable data packet sent, except retransmission. At receivers, 517 a data packet is identified by the entity identifier and the sequence 518 number. All packets from a single source constitute a data stream. 520 The sequence numbers are rounded to 32-bit unsigned integer, thus 521 when overflow occurs, they must take a modulo to 2^32. A sender 522 should generally choose a non-zero random number as the initial 523 sequence number. A receiver learns the initial reception sequence 524 number at the time it joins the session. Therefore LRMP implementors 525 should be aware of that the initial outgoing sequence number and the 526 initial incoming sequence number may be different for a particular 527 data sender and receiver. 529 At receivers, the sequence numbers are used to correctly order 530 packets that may be received out of order and to eliminate 531 duplicates. A protocol entity should implement the reordering of 532 received packets before delivering them to the application. An 533 entity should also provide applications with capabilities to 534 configure this option out, so that packets are delivered in 535 reception order. 537 Reliable data transmission is subject to flow and congestion control 538 which is described in section 7. Data sender can send FEC encoded 539 redundant data with the original data stream, see section 9 for 540 more details. 542 4.2 Unreliable Data Transmission 544 Unreliable packets are not subject to flow control at the sender side 545 and not subject to sequence control at the receiver side. They can be 546 considered as out-of-band packets which are delivered in priority but 547 without guarantee of arrival at receivers. Due to this particularity, 548 unreliable packets are expected to be sent occasionally by an 549 application. 551 When an unreliable packet is received by a receiver, it should be 552 delivered immediately to the application. In addition a receiver does 553 not need to cache such packets in the buffer. 555 5. Packet Loss Recovery 557 To avoid to generate unnecessary traffic in the whole group, the 558 repair of lost packets is first carried out in local regions. To do 559 this, a session is divided by a particular receiver into a number of 560 subgroups. For a protocol entity, a subgroup represents a subregion 561 limited by a distance radius. Packets sent to a subgroup can not 562 reach other entities outside the subgroup. By limiting error recovery 563 in a subgroup, reception errors could be isolated. Local recovery can 564 be performed by alternating the IP scope value in transmission of 565 packets. 567 5.1 Organization of Subgroups 569 Yet another way to define the distance between two hosts is to use 570 the number of hops, in other words, the number of intermediate 571 routers that data packets should transit from the source to the 572 destination. The propagation of an IP datagram is controlled by a 573 scope value, also known as the time-to-live parameter. This scope 574 value can be thus used to control the radius of distance that a 575 multicast packet can reach. In practice, a multicast session is 576 always created with a scope value to indicate at which scope the 577 session is addressed. 579 There is a trade-off between the number of subgroup hierarchies and 580 the efficiency of error recovery. More the number of hierarchies, 581 finer the local error recovery but larger the error recovery time. 582 It could be possible to configure the subgroup hierarchies at 583 runtime by an application, a protocol entity uses by default the 584 following rule for organization of subgroups. Given the time-to-live 585 value (the nominal TTL) for a session, the session must be organized 586 into the following hierarchical subgroups: 588 o the whole group at TTL 589 o subgroup at 63 if TTL > 63 590 o subgroup at 47 if TTL > 47 591 o subgroup at 15 if TTL > 15 593 The top level, i.e., the whole group, must be present in every 594 session. A descendant subgroup is created only if the nominal TTL is 595 greater than the subgroup TTL value. Therefore there could be at 596 maximum four levels of subgroups. For example, a session with TTL 597 value 63 has three levels. A session with TTL value less than and 598 equal to 15 has only one level subgroup. A subgroup contains another 599 subgroup if its ttl value is greater than the latter. 601 Every LRMP packet carries a scope field equal to the ttl value with 602 that the packet is sent to the underlying network. So when a receiver 603 receives a packet (data or control), it knows that the packet sender 604 is surely within a distance less than the scope value carried in the 605 packet. The scope field allows a protocol entity to determine whether 606 or not the packet sender belongs to one of its subgroups. It remains 607 constant during transmission, it should not be confused with the 608 scope field in IP datagrams. 610 It is possible that the multicast route between two hosts is 611 asymmetric, i.e., the number of hops from host A to host B may be 612 different from that from host B to host A. As the gap of the subgroup 613 ttl value is enough large, in a given subgroup, if host A can reach 614 host B, it is most probable that host B can also reach host A. 616 Error recovery should be performed within subgroups in increasing 617 distance (ttl) order, i.e., first in the lowest level subgroup, then in 618 the second level subgroup and finally in the whole group. In each 619 subgroup, every receiver plays a strictly equal role, there is no 620 subgroup leader. So no join or leave messages are necessary to 622 explicitly form subgroups while they may still be implemented by 623 applications in case of strong membership control. 625 Each protocol entity is responsible for recovery of its own lost 626 packets. However it should keep track of the control messages 627 received from other protocol entities to eventually cancel the 628 duplicates of control and repair packets. 630 5.2 Enabling/disabling a Subgroup 632 To avoid unnecessary error report to a subgroup, first of all, a 633 protocol entity should use the heuristic algorithm described here to 634 determine whether or not an error could be recovered in a given 635 subgroup. Basically if there are no other protocol entities within 636 a subgroup, certainly no need to do error recovery in that subgroup. 637 If there are other protocol entities in a subgroup, errors may or may 638 not be recoverable in that subgroup. A receiver enables a subgroup if 639 it considers that there are some chances to recover errors in that 640 subgroup, or disables otherwise. The enable/disable mechanism allows 641 to improve the error recovery time for missing packets. 643 Enable conditions 645 Given a subgroup, enable it if one of the following conditions is 646 true: 648 o at start-up. 650 o a repair packet with the subgroup ttl is heard. 652 o the disabled time is greater than the maximum disabled time, 653 e.g., 3 minutes. 655 The last condition is to prevent the deadlock, that is, all subgroup 656 members are in the disabled state. In this case, no receivers will 657 send error report packets within that subgroup. So a timeout 658 mechanism is necessary to retrigger the error recovery in that 659 subgroup. 661 Disable condition 663 Given a subgroup, disable it if the number of successive subgroup 664 scoped error report packets without subgroup scoped reply is greater 665 than or equal to a prefixed number, 5. 667 The whole group must never be disabled since the sender is surely 668 there and will reply to error report packets. If a subgroup is 669 disabled for any reason, a receiver should jump to the next higher 670 level subgroup to do error recovery. 672 5.3 Error Report 674 The protocol makes no effort to recover packets that were sent before 675 a receiver joins the session. The sequence number of the first DATA 676 packet or extracted from a SR (sender report) packet is considered 677 as the starting sequence number for a particular sender. 679 A receiver detects packet loss either by checking if there is a gap 680 between the sequence numbers of successive received packets or the 681 sequence number given by a SR (sender report) packet does not 682 correspond to the expected sequence number for that sender. 684 5.3.1 Algorithm 686 When packet loss is detected, error report is carried out from the 687 lowest level subgroup to the whole group until the loss is recovered 688 or the number of maximum tries is reached. An exponential back-off 689 timer is used for duplicate NACK suppression. The subgroup to start 690 error recovery is the lowest level subgroup which is in the enabled 691 state. Given the starting subgroup Gi, a receiver must use the 692 following algorithm to report packet loss by sending a NACK packet, 694 1. Set the number of tries (N) to 0. 696 2. Schedules a random timer whose value is uniformly distributed 697 in the range [t1, 2*t1], where t1 = MRTT*2^N and MRTT is the 698 estimated mean round trip time to other protocol entities 699 reachable in the subgroup Gi. 701 3. During the timer period, if a NACK packet which contains the 702 local NACK is received (see section 5.3.3), goto step 5. 704 4. When the timer expires, if all lost packets have been received, 705 the repair process terminates, otherwise send a NACK packet to 706 the subgroup Gi. 708 5. Then check 710 o if the subgroup level is not the highest level, the subgroup 711 level is increased by one, i <= i + 1, goto step 1; 713 o if the subgroup level is the highest level, the number of 714 tries (N) is increased by one, N <= N + 1. If the number of 715 tries is greater than or equal to 8, the repair process is 716 abandoned. Otherwise goto step 2. 718 The loss report should be sequential, that is, the lost packets with 720 lower sequence number should be reported first. Unless the current 721 repair process is terminated normally or abnormally, next loss report 722 should not be started. In the worst case, an error report could be 723 carried out once in each subgroup, and eight times in the whole 724 group. 726 5.3.2 Timer Adjustment 728 As the protocol uses a rate based flow control mechanism (see section 729 7), the repair packet sender will send data generally in regular time 730 interval. When a NACK packet is received, it is possible that the 731 first repair packet will be sent only after a transmission interval. 732 To avoid to send excessive NACK packets, a transmission interval 733 should be added to the timer value if a NACK packet was already sent 734 or heard from the network. 736 For estimation of the transmission interval used by a particular 737 sender, see section 6.1. 739 5.3.3 Duplicate NACK Suppression 741 A NACK packet is said containing another NACK packet if they are for 742 the same sender and the reported lost sequence numbers are a superset 743 of the latter. The former is called a super NACK of the latter. For 744 example, given the following two NACKs (see section 8.5) for the same 745 sender, 747 NACK A: lowest seqno = 34567 748 bitmask = 0x000000ff 750 NACK B: lowest seqno = 34570 751 bitmask = 0x0000000f 753 NACK A contains NACK B. 755 If a super NACK packet is heard from the network, a receiver does not 756 need to send its NACK since its lost packets are reported by that 757 NACK. Therefore the receiver must not send its NACK packet when the 758 current timer expires, but should schedule the next NACK timer to 759 keep the repair process active. When the next timer expires, the 760 receiver must still suppress its NACK in the following cases, since 761 the NACK sender will probably send it again if the lost packets have 762 not been repaired: 764 o Another super NACK was heard in this timer period. 766 o The receiver did not send a NACK at previous timeouts in the 767 current repair process. 769 o Its entity identifier is higher than that of the super NACK 770 packet heard in the last timer period. 772 This procedure is intended to further reduce the probability of 773 duplicate NACK packets at the following timeouts. Therefore when 774 multiple receivers encounter the same lost packets, the receiver with 775 the shortest timer value and with the lowest entity identifier will 776 act as a representative in that loss repair process. 778 To prevent the NACK sender leaving the group or behaving abnormally, 779 a receiver should not stop the repair process when a super NACK 780 packet is heard and should apply the above delay procedure for at 781 most next two timer periods. 783 5.3.4 Other Delay Conditions 785 When the NACK timer expires, a receiver should delay the transmission 786 of NACK packet and schedules another timer in the following cases, 788 o The first lost packet has been received, it is possible that 789 succeeding repair packets are queued for transmission at the 790 responder side and will arrive soon, except that a R_NACK was 791 received that excludes this possibility. 793 o A R_NACK was received during the timer period but the repair 794 packets are not yet received. 796 5.4 Duplicate NACK Detection 798 A newly received NACK packet is considered as duplicate if a super 799 NACK packet (containing this one) was received just before. 800 Duplicates may occur under one of the following conditions: 802 o two (or more) protocol entities have a timeout time in an 803 interval which is less than the packet propagation time from 804 one to another. 806 o variable packet propagation time. 808 o due to packet loss, a receiver did not see the NACK packet sent 809 by another entity. 811 To distinguish duplicate NACK packets and not to deteriorate the 812 error recovery time, a straightforward way is the following: given 813 the time t1 when the last NACK packet was received and the time t2 814 when the same reporter will issue the next NACK if the loss is not 815 repaired, all NACK packets received between t1 and t2 could be 816 treated as duplicates. This way, at least a successive NACK sent by 817 the same reporter will not be confused with duplicates. The time 818 interval (t2 - t1) between sending two NACK packets by a same 819 reporter is greater than twice the mean round trip time plus a 820 transmission interval. Therefore a protocol entity should treat a 821 newly received NACK packet as duplicate if there is a super NACK 822 packet which was received within an interval equal to twice the mean 823 round trip time plus a transmission interval. 825 5.5 Retransmission 827 LRMP entities keep recently sent and received data packets in their 828 buffer space. The total size of the buffer, depending on the used 829 transmission rate, should be enough large to allow loss repair. 830 It is recommended that the buffer size should correspond to 10 831 seconds - 1 minutes of data packets at the full specified 832 transmission rate (e.g., between 80 KB and 480 KB at 64 kb/s). 834 A NACK packet allows to report a number of adjacent lost packets and 835 contains a scope field that should be used by the NACK responder to 836 restrict the retransmissions to the same scope. 838 Upon receipt of a NACK packet, a protocol entity is said eligible for 839 sending repairs if it holds a part of the requested lost packets. An 840 eligible entity must check first if the NACK is a duplicate for the 841 purpose to eliminate possible duplicate retransmissions. If a NACK is 842 duplicate, it should be ignored. 844 5.5.1 Algorithm 846 If a NACK is not duplicate, an eligible protocol entity schedules a 847 random timer whose value is uniformly distributed in the range 848 [MRTT, 2*MRTT], where MRTT is the estimated mean round trip time to 849 other protocol entities reachable in the corresponding subgroup. When 850 the timer expires, if the lost packet(s) or a R_NACK (NACK reply) 851 packet have not been heard from other protocol entities, the protocol 852 entity sends first a R_NACK and then the lost packet(s). 854 To further improve the performance and reduce the number of duplicate 855 repair packets, the original sender applies a different algorithm. 857 When the sender receives a NACK packet, it checks if this is a 858 duplicate. If it is not, the sender should send immediately the 859 repair packets since one and only one protocol entity will respond 860 immediately. 862 In addition, if the scope field of a NACK packet is the session TTL, 863 all receivers must not reply to the NACK packet and expect that the 864 original sender will send the repair(s) with no delay. This will 865 eliminate duplicate repair packets at the session scope level. 867 This scheme will get still better performance if there is one fixed 868 protocol entity responsible for error control in each subgroup. This 869 means if a unique entity can apply the retransmission algorithm of 870 the original sender in a subgroup, for example, the router or a 871 designated receiver, the error recovery time and duplicate repairs 872 will be get even better. 874 5.5.2 Sending NACK reply 876 Before sending the repair data (R_DATA) packets, the protocol entity 877 should first send a NACK reply packet. This packet is used to 878 identify the repair packet sender and also used to measure the round 879 trip time between the NACK sender and the repair packet sender. 881 NACK reply packets allow in addition fast partial repair. It is 882 possible that a receiver holds only a part of the reported lost 883 packets. In this case, it is desirable that this receiver will send 884 this part while the remaining lost packets can be repaired by other 885 entities or in a higher level subgroup. In the NACK reply packet, the 886 LRSN field should be set to the sequence number of the first repair 887 packet and the BFRP field to the bitmask of the following repair 888 packets, see section 8.6. 890 Upon reception of a NACK reply packet, the error reporter can 891 determine which packets can not be repaired in the corresponding 892 subgroup and can report them to a higher level subgroup. For example, 893 suppose that the first lost sequence number and the first repair 894 sequence number is the same, the bitmask of the remaining lost 895 packets can be determined by, 897 loss_bitmask &= ~repair_bitmask; 899 5.5.3 Sending Repairs 901 If a receiver is able to send repair packets to its neighbors, it 902 must keep the original data packets intact. Otherwise it should 903 not respond to NACK packets. LRMP implementors should be aware that 904 accepting packets retransmitted by a third-party represents an 905 additional vulnerability to malicious attacks. 907 Repair packets should be scoped within the same subgroup as the 908 received NACK packet. They should be transmitted with higher priority 909 than the transmission of new data. In addition, retransmissions are 910 subject to the same flow and congestion control as in data 911 transmission, see section 7. 913 5.5.4 Duplicate Repair Suppression 915 In order to avoid duplicate repair packets, all receivers eligible to 916 respond to the NACK packet should use NACK reply packets to detect 917 possible multiple responses. Since an entity should send first a NACK 918 reply before sending the repair packets, upon receipt of such a 919 packet, other entities can determine whether there will be duplicate 920 repairs if they participate in sending repairs. Thus when a NACK 921 reply is received, if a receiver has not yet send a NACK reply, it 922 should cancel its reply process. Otherwise if its entity identifier 923 is higher than that of the NACK reply sender, it must cancel its 924 response, even if they have started to send the repair packets. 926 In absence of NACK reply packets, for example, due to loss, if 927 repair packets are heard from the network, a receiver should still 928 cancel its own retransmissions if its identifier is higher than that 929 of the repair sender. 931 In any case, if a NACK reply packet or repair packets are heard from 932 the original sender, all other entities must cancel their reply and 933 retransmissions. It is considered that the original sender has the 934 highest priority in sending repair packets. 936 5.6 Reception Failure 938 Reception failure is an event generated when a receiver can not 939 synchronize with the data stream sent by a sender. Reception 940 failures occur in case of network partition or congestion (a 941 receiver behind a slow link), specifically under the following 942 conditions: 944 o when the number of timeouts for a loss event (NACK timer) 945 reaches a threshold, repair packets have still not been received. 946 This threshold depends on the number of error recovery subgroups 947 and has a maximum value of 10. 949 o there is a too large difference between the currently received 950 sequence number and the expected sequence number so that the 951 cache space is insufficient to hold all packets between this 952 difference, which is required for error recovery. 954 o there is a too large difference between the sequence number 955 announced by a sender report packet and the expected sequence 956 number. 958 When a reception failure is encountered, a receiver should not try 959 anymore to repair the current missing sequence number, instead it 960 should try to synchronize with the current sequence number of the 961 data stream. Otherwise the sender may not respond to its request 962 for retransmission. 964 In addition, a receiver should notify the application a reception 965 failure event. Some applications may need this notification to drop 966 incomplete data objects. It is desirable that higher layer protocols 967 might include mechanism for retransmission of missing data objects 968 according to the degree of importance. 970 5.7 Mean Round Trip Time Estimation 972 The sender does not necessarily need to know the round trip time to 973 the receivers for error recovery except for other purposes since it 974 always immediately sends repair packets. Otherwise the round trip 975 time plays an extremely important role in sending error reports and 976 repairs. 978 When sending error reports in a subgroup, a receiver generally 979 does not know who will send repair packets. Thus it uses the mean 980 round trip time to schedule error reports, so does the repair packet 981 sender. All receivers should carefully estimate the mean round trip 982 time values to make the error recovery algorithm work more 983 efficiently. LRMP offers some mechanism for this purpose. 985 At start-up, the initial mean round trip time for each subgroup 986 should be set to a relatively conservative value using the following 987 expression: 989 MRTT = 200*(scope/63)^2 milliseconds 991 where the upper bound is set to 800 milliseconds and the lower bound 992 is set to 12 milliseconds. Consequently the groups at world, region 993 and local site scope have approximatively the following initial mean 994 round trip time values, 996 o MRTT at scope <= 15: 12 milliseconds. 997 o MRTT at scope 63: 200 milliseconds. 998 o MRTT at scope >= 127: 800 milliseconds. 1000 The mean round trip time will get closer to its true value in the 1001 repair processes. NACK and NACK reply packets allow to measure the 1002 round trip time between the NACK sender and its responder. Each NACK 1003 packet carries a timestamp value (t1) which indicates the local time 1004 the NACK packet is sent. When a protocol entity sends the NACK reply 1005 packet, it puts this timestamp (t1) and the delay since the NACK 1006 packet was received into the reply packet. Upon reception of the NACK 1007 reply, the receiver can get a rough estimation of the round trip time 1008 as follows: 1010 NACK sender t1 (NACK) t2 (NACK_REPLY) 1011 ------|-----------------------|--------- 1012 \ / 1013 \ / 1014 ---------|-----------------|------------ 1015 NACK responder |<---- delay ---->| 1017 RTT = t2 - t1 - delay 1019 The mean round trip time in the corresponding subgroup is estimated 1020 using the following expression: 1022 MRTT = ALPHA*MRTT + (1 - ALPHA)*RTT 1024 where ALPHA is a smoothing factor set to 0.875. To avoid 1025 multiplication, the above formula can expressed as, 1027 MRTT = MRTT + ((RTT - MRTT) >> 3) 1029 There are two interesting side effects resulted from this scheme. 1030 When the network is not congested, the measured round trip time RTT is 1031 generally lower than the initial value that is relatively high. After 1032 a receiver sent a NACK packet and has received the repair packet, it 1033 will hold a mean round trip time lower than those never send a NACK 1034 packet. The next time packet loss occurs in that domain, it is most 1035 probably this receiver that will first timeout and send the NACK 1036 packet. This receiver will then get the mean round trip time even 1037 lower. In this way error report will tend to be done by a small 1038 number of receivers. This is a desirable behavior since there will be 1039 less concurrency in error report and the mean round trip time will 1040 get more realistic. 1042 When the network is congested, it is possible that the measured round 1043 trip time is larger than or around the initial value. In that case, 1044 error report will be carried out eventually by all receivers. It is 1045 more difficult to make the mean round trip time realistic. 1047 6. Sender and Receiver Reports 1049 6.1 Sender Report 1051 Sender report packets are intended to provide receivers with the 1052 current state in data transmission such as the highest sequence 1053 number reached, the number of packets and that of octets sent since 1054 the beginning of the session. By analyzing a sender report, a 1055 receiver can figure out how many packets or octets it has missed and 1056 whether it is synchronized with the data transmission. 1058 A data sender should send some sender report packets in the following 1059 cases: 1061 o before the first time it will send data or it will begin to send 1062 data after a long silent time. This is to inform receivers the 1063 starting sequence number that will be used in the data 1064 transmission that follows. Thus a receiver can determine whether 1065 or not it has missed some starting packets. 1067 o periodically send a sender report during the data transmission 1068 to provide some statistics information to receivers. In this 1069 case, the sender sends either a report every two seconds or for 1070 every quarter of the send window data sent. 1072 o at the end of bulk data transfer. A sender report informs what 1073 is the end sequence number so that receivers can determine 1074 whether or not it has missed a portion of ending packets. 1076 When the data sender enters in idle state, i.e., it has no more data 1077 to send, the interval at which sender reports are sent should be 1078 doubled every time until the idle time reaches five minutes. 1080 The actual transmission rate at which the sender sends data can be 1081 computed from two successive sender reports. That is, the difference 1082 between the octet counts divided by the time interval. The later can 1083 be derived from the timestamp field. 1085 Similarly the transmission interval can also be calculated from two 1086 successive sender reports. That is, the difference between the packet 1087 counts divided by the time interval. 1089 6.2 Selective Receiver Report 1091 Receiver reports are useful for both sender and receivers. They have 1092 multiple purposes. First they are used for estimation of the session 1093 population. It could be interesting to know how many participants are 1094 actually in a session. Based on this estimation, the sender could 1095 decide, for example, to go on or stop the session. 1097 Receiver reports are also used to get quality of service feedback 1098 from receivers, such as the loss rate, the number of packets and 1099 octets received. The feedback can be used to do flow and congestion 1100 control and adjust other transmission parameters, for example, the 1101 redundancy in the data stream. In addition they allow to know where 1102 there is a serious reception problem. 1104 By default receivers should not send receiver reports. A data sender 1105 sends a RS (receiver report selection) packet to tell receivers how 1106 to send receiver reports. The following ways are prescribed here for 1107 sending receiver reports: 1109 o a receiver sends once a report with a specified probability. 1110 This probability could be adapted by the sender to the number of 1111 receivers in a session. 1113 o a receiver sends a report periodically, but the total traffic 1114 (generated by receiver reports) is limited to a small percentage 1115 of the total bandwidth, i.e., 5%, like in RTP [23]. 1117 A receiver report selection packet contains a list of receivers 1118 which selects the receivers which should send receiver report(s). 1119 The list can be either a list of entity identifiers or a wild card 1120 address (0xffffffff). If it is a wild card address, that means that 1121 all receivers are selected. Otherwise only those whose identifier 1122 is among the list are selected. The selected receivers should use 1123 the probability and report interval parameters contained in the RS 1124 packet to control whether and how to send receiver report(s). 1126 If the probability field is set to zero, the selected receivers 1127 should send receiver reports periodically at the specified interval. 1128 However if the total traffic generated by control packets is above 1129 the bandwidth limit (5%), a receiver should calculate the report 1130 interval with respect to the bandwidth limit. 1132 If the probability field is greater than zero, the selected receivers 1133 should should use a random event generator (by means of a random 1134 value generator) to trigger whether or not they should send a report. 1135 If an event is generated with the given probability, the receiver 1136 should send only once a report. 1138 If both the probability and interval fields are set to zero, this 1139 packet must be interpreted as an unselect packet, that is, the 1140 selected receivers should stop sending receiver reports triggered 1141 by the previous receiver report selection packet. 1143 In all cases, the report interval should be randomized to avoid 1144 receiver report implosion. The randomized interval should be 1145 uniformly distributed in the range [I/4, I], where I is the interval 1146 given by the RS packet. 1148 The receiver report selection is sender specific, that is, a sender 1149 can select receivers to send reports just for itself, not for other 1150 senders. The settings carried in an RS packet remain valid until 1151 another RS packet modifies them. If it is absolutely necessary to 1152 modify current settings, the sender should send the new RS packet 1153 more than once to prevent possible loss. 1155 To have an approximative estimate of the session population, the 1156 probability field could be set to a non zero value and the list of 1157 receivers set to a wild card address. After the time corresponding 1158 to the interval field of the packet has elapsed, both the sender 1159 and receivers can estimate the number of receivers using the 1160 following expression, 1162 population = number of responders/probability. 1164 Note that due to possible packet loss, the real population is 1165 generally greater than this estimate. 1167 The round trip time RTT between a sender and a receiver can also be 1168 measured using a pair of sender report and receiver report, similar 1169 to the mechanism described in section 5.7. 1171 7. Flow and Congestion Control 1173 Flow and congestion control constitutes a vital component of this 1174 protocol. First applications need flow control to adjust the 1175 transmission rate so that the data flow does not go above the 1176 processing capacity of the application. Otherwise packets may be lost 1177 at end hosts due to lack of processing power. Second the network can 1178 generally offer limited throughput. Even if there is a single data 1179 flow, it should still be adapted to the available network bandwidth, 1180 generally limited by the most bottleneck network path, to avoid 1181 congestion and achieve better performance. Third when there are 1182 several data flows in the network, it is extremely important that one 1183 flow does not shutdown the others. Instead the network bandwidth 1184 should be fairly shared among them. Besides a data flow should be 1185 competitive to ensure the responsiveness for applications. The 1186 rate-based control mechanism described here is intended to be 1187 friendly with other data flows under flow and congestion control 1188 such as TCP. 1190 7.1 Transmission Rate 1192 Data transmission and retransmission must be kept within a range 1193 specified by an application, in form of [Rmin, Rmax], where Rmin 1194 and Rmax are the lower and upper bounds of the data rate. If an 1195 application does not specify the rate range, the default range 1196 should be applied, for example, in [8 kbits/s, 64 kbits/s]. While the 1197 specification of rate range requires some experience in network 1198 communications, it is not a difficult task in most cases. 1200 At startup, the data rate R should be increased gradually up to the 1201 maximum rate, similar to the slow start in TCP [14]. The initial 1202 rate should be set to (Rmin + Rmax)/2. Then for every eighth of send 1203 window data sent, increase the rate by 0.125R if no contrary 1204 indications. Under normal network conditions, the data rate will 1205 reach the maximum rate at most after an amount of data corresponding 1206 to one and half send window have been sent. 1208 As a closed-loop approach is adopted, if the network is partitioned 1209 near the sender, no feedback will be returned from the network. In 1210 this case, it is desirable that the sender keeps the transmission 1211 at the minimum rate. 1213 7.2 Congestion Control 1215 Flow and congestion control is performed at the sender side according 1216 to the feedback information gathered from the network. A network 1217 congestion problem can be reflected by the following parameters: 1219 o the information carried in NACK packets, such as the lowest lost 1220 sequence number and the bitmask. Congestion will cause more 1221 packets being lost, thus the number of retransmissions will 1222 increase. 1224 o indication from receivers. Due to local error recovery, 1225 additional traffic can be generated in local region that will not 1226 be seen by the sender. This may result in local congestion 1227 problem which needs to be reported to the sender. 1229 A sender generally maintains transmitted data in its buffer space, 1230 called send window, for possible retransmissions. Congestion control 1231 should function adaptively with regard to the above parameters and 1232 the send window size. 1234 To reduce unnecessary oscillations of the transmission rate, two 1235 adjustment operations should not be performed in too small time 1236 interval. It is recommended that adjustments are regularly carried 1237 out for every eighth of send window data sent. 1239 7.2.1 Adaptation to the Loss 1241 NACK packets can be used together with send window to limit the 1242 number of outstanding packets in the network, similar to ACK packets 1243 used in unicast. However the number of NACK packets heard from the 1244 network may not timely indicate a congestion problem due to NACK 1245 suppression and the fact that one NACK packet can report more than 1246 one lost packets. Congestion control that only makes use of this 1247 information will not respond quickly to congestion problem. But the 1248 lowest sequence number requested for retransmission by a NACK packet 1249 can be a good indication of congestion problem. 1251 If the lowest sequence number requested for retransmission is going 1252 out of the range maintained within the send window, that means that 1253 the sender is going ahead too fast and the error recovery will be 1254 soon impossible. If the sender does not slow down the data 1255 transmission, some receivers will not be able to synchronize with the 1256 data flow. 1258 To respond to this problem, a sender should calculate the difference 1259 between the current sequence number in transmission and the lowest 1260 sequence number requested for retransmission by a NACK packet. The 1262 rate should be adjusted as follows, given the current rate R: 1264 o If the difference is greater than half send window size, the 1265 transmission rate should be reduced to R/4. 1267 o If the difference is greater than one third of send window 1268 size, the transmission rate is reduced to R/2. 1270 o If the difference is greater than a quarter of send window 1271 size, the transmission rate is reduced to 3R/4. 1273 If the rate is not decreased according to the above criteria, 1274 it should not be increased at the next adjustment in presence 1275 of loss. 1277 7.2.2 Adaptation to Congestion Indication from Receivers 1279 As described above, additional flow and congestion control is needed 1280 for the following reasons. First a receiver may not be able to 1281 synchronize with the data flow due to the fact that the application 1282 can not timely process received data. That may cause receive buffer 1283 overflow. Second the traffic introduced by local error recovery may 1284 cause local congestion problem. Congestion indication from receiver 1285 is intended to report possible local congestion problem that is 1286 useful for congestion avoidance. It was first proposed by [15]. 1288 The congestion flag in receiver report is used for this purpose. 1289 It should be set if the difference between the last sequence number 1290 delivered to the application and the highest sequence number heard 1291 from the sender is greater than half the receive window (buffer) 1292 size. This means that the number of outstanding packets is going to 1293 be too high so that the receiver will not be able to repair loss due 1294 to lack of buffer space. It is thus desirable for the sender to 1295 reduce the transmission rate in this case. 1297 Upon reception of a congestion indication, a sender should reduce 1298 the transmission rate to R/2 [22]. 1300 7.2.3 Getting to the Equilibrium: Rate Increase 1302 If no decrease condition is true and no loss is reported, as 1303 described above, and one eighth of send window time has elapsed since 1304 the last decrease or increase, the rate should be increased by a 1305 factor of 0.125 (1/8) up to the maximum rate. With this control 1306 policy, one decrease to the half rate requires roughly six increases 1307 to reach the original rate. 1309 7.3 The Case of Heterogeneous Receivers 1311 For large groups and in heterogeneous environments, it is often the 1312 case that a few receivers have very slow network connections than 1313 the others. In many cases, it is desirable that the slowest receivers 1314 do not block the data transmission for the majority of receivers. In 1315 particular, interactive applications require that new data is timely 1316 presented to the end users. Strictly applying the mechanism described 1317 above may lead the data rate always to the minimum in presence of 1318 very slow receivers. 1320 To avoid this to take place, several precautions should be taken: 1322 o the application should be advised to use an adequate minimum 1323 rate. But the minimum rate should be enough low to ensure the 1324 fairness of adaptive congestion control. 1326 o a sender can ignore the congestion signals from the slowest 1327 receivers. To do this, the sender needs to keep trace of the 1328 hosts that caused most often the rate decreases. In the case 1329 where the output queue is full most of time, the sender could 1330 make a heuristic decision to ignore further congestion signals 1331 from these hosts. 1333 o a receiver should leave the session if it encounters a high 1334 loss rate and the data rate is not reduced by the sender 1335 accordingly. For normal receivers, this is necessary to reject 1336 too aggressive data flows and possible traffic attacks. For very 1337 slow receivers, this is particularly necessary not to make their 1338 congested network links even worse and also not to block the 1339 session for the majority of receivers. The receiver may try to 1340 rejoin the session at later time. If the problem persists after 1341 several rejoin attempts, the receiver may definitely abandon this 1342 session. 1344 In particularly heterogeneous environments, layered multicast could 1345 be useful [24]. That is, an application can use multiple groups, 1346 each of them addresses a set of receivers with a particular level 1347 of QoS. 1349 8. Packet Formats 1351 This section defines the packet formats that a protocol entity should 1352 implement to send data and control packets. A protocol entity should 1353 ignore those packets if their types are not recognized. 1355 8.1 Unreliable Data 1357 Unreliable data packets have no additional options. They are 1358 transmitted in a best effort way and are not subject to sequence 1359 control. If they are lost, no efforts should be made for recovery. 1360 Therefore every unreliable data packet has a fixed header of 8 bytes. 1362 8.2 Reliable Data 1364 Reliable data packets must contain the following options in addition 1365 to the common fixed header. In total, reliable data packets have a 1366 header of 16 bytes. 1368 0 1 2 3 1369 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1371 | timestamp | 1372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1373 | sequence number | 1374 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1375 | application data | 1376 | .... | 1377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1379 The fields have the following meaning: 1381 timestamp: 32 bits 1382 This field indicates the time at which this packet is sent. 1383 It is encoded as the middle 32 bits of NTP time, i.e., in units 1384 of fraction of 1/65536 second. 1386 sequence number: 32 bits 1387 This field indicates the sequence number of this packet. 1389 Application data with variable length should follow after the data 1390 packet header. The length of application data and packet header 1391 must fit into the maximum transmission unit. 1393 By doing a difference between the timestamp fields of two successive 1394 data packets, receivers can get a rough estimate of the transmission 1395 interval. This interval is only meaningful if the data flow is 1396 continuous. 1398 8.3 Retransmission Data 1400 Retransmission data packets are sent in response to NACK packets. 1401 They must contain the following options in addition to the common 1402 fixed header: 1404 0 1 2 3 1405 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1407 | original entity ID | 1408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1409 | sequence number | 1410 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1411 | application data | 1412 | .... | 1413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1415 The fields have the following meaning: 1417 original entity ID: 32 bits 1418 This field contains the entity identifier of the original sender 1419 of the data packet. 1421 sequence number: 32 bits 1422 This field gives the sequence number of the packet which should 1423 be the same as received from the original sender. 1425 It is prescribed that retransmission data packets have the same 1426 header length as reliable data packets for two reasons. First, to 1427 minimize necessary operations for retransmissions. The NACK 1428 responder just needs to modify the first few fields in the original 1429 data packet. Second, to avoid the total packet length exceeding the 1430 maximum transmission unit by appending additional fields in the 1431 header. 1433 8.4 FEC Redundant Data 1435 A protocol entity may implement the forward error correction 1436 mechanism which is optional. FEC data packet format is specified in 1437 a way such that the data stream can still be correctly received even 1438 if FEC is not supported by a particular implementation. See section 1439 9 for details. 1441 8.5 Error report 1443 Error report (NACK) packets are used to report a set of adjacent lost 1444 packets and request the retransmission. NACK packets must contain the 1445 following options in addition to the common fixed header: 1447 0 1 2 3 1448 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1450 | timestamp | 1451 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1452 | srcID 1 | 1453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1454 | LLSN | 1455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1456 | BFLP | 1457 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1458 | (next srcID ...) | 1459 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1461 The fields have the following meaning: 1463 timestamp: 32 bits 1464 This field indicates the time at which this packet is sent. 1465 It is encoded as the middle 32 bits of NTP time, i.e., in units 1466 of fraction of 1/65536 second. 1468 srcID 1: 32 bits 1469 This field indicates the entity identifier of the first data 1470 source to which packet loss is reported. 1472 LLSN (Lowest Lost Sequence Number): 32 bits 1473 This field indicates the sequence number of the first lost 1474 packet. 1476 BFLP (Bitmask of the Following Lost Packets): 32 bits 1477 This field indicates the bitmask of the following lost packets. 1478 In the bitmask, the bit k (1 <= k <= 32) is set to 1 if the 1479 corresponding sequence number (LLSN + k) is lost. The value 1480 0x0101 means that packets with sequence number LLSN, LLSN + 1, 1481 LLSN + 9 are lost. 1483 next srcID: 32 bits 1484 This field indicates the entity identifier of the next data 1485 source to which packet loss is reported, followed by LLSN and 1486 BFLP fields. 1488 The timestamp is included for the purpose to measure the round trip 1489 time between the error reporter and the NACK packet responder. 1491 8.6 NACK Reply 1493 NACK reply packets are sent in response to NACK packets and before 1494 sending repairs. They must contain the following options in addition 1495 to the common fixed header: 1497 0 1 2 3 1498 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1500 | NACK Reporter | 1501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1502 | NACK timestamp | 1503 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1504 | delay since received NACK (DSRN) | 1505 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1506 | NACK destinator | 1507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1508 | LRSN | 1509 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1510 | BFRP | 1511 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1512 | (next NACK Reporter ...) | 1513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1515 The fields have the following meaning: 1517 NACK Reporter: 32 bits 1518 the entity identifier of the entity which reported the packet 1519 loss. 1521 NACK timestamp: 32 bits 1522 This field is a copy of the timestamp field carried in the NACK 1523 packet. 1525 delay since received NACK (DSRN): 32 bits 1526 This field indicates the time passed from the reception of the 1527 NACK packet to the reply. It is encoded in the fractions of 1528 1/65536 seconds. The NACK reporter can use the NACK timestamp 1529 and the delay to estimate the round trip time to the responder. 1531 NACK destinator: 32 bits 1532 the entity identifier of the original data sender to which the 1533 packet loss is reported. 1535 LRSN (Lowest Repair Sequence Number): 32 bits 1536 This field indicates the lowest sequence number of the repair 1537 packets that will be sent by the responder. 1539 BFRP (Bitmask of the Following Repair Packets): 32 bits 1540 This field indicates the bitmask of the following repair 1541 packets. In the bitmask, the bit k (1 <= k <= 32) is set to 1 if 1542 the corresponding sequence number (LRSN + k) will be sent as 1543 repair. The value 0x0101 means that packets with sequence 1544 number LRSN, LRSN + 1, LRSN + 9 will be sent. 1546 8.7 Sender Report 1548 Sender reports must contain the following options in addition to 1549 the common fixed header: 1551 0 1 2 3 1552 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1554 | timestamp | 1555 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1556 | next sequence number | 1557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1558 | packet count | 1559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1560 | octet count | 1561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1563 The fields have the following meaning: 1565 timestamp: 32 bits 1566 This field indicates the time at which this packet is sent. It 1567 is encoded as the middle 32 bits of NTP time, i.e., in units 1568 of fraction of 1/65536 second. 1570 Next Sequence Number: 32 bits 1571 This field indicates the sequence number to be used in the next 1572 data packet. If the sender has not yet sent data packets (i.e., 1573 the packet count is 0), this field informs receivers the initial 1574 sequence number. Upon receipt of this packet, receivers could 1575 either set the initial sequence number for this sender or check 1576 that if they have received all packets till this number minus 1577 one. 1579 Packet Count: 32 bits 1580 This field indicates the total number of packets sent by the 1581 sender since the creation of the session. 1583 Octet Count: 32 bits 1584 This field indicates the total number of octets sent by the 1585 sender since the creation of the session. 1587 8.8 Receiver Report Selection 1589 Receiver report selection packets must contain the following options 1590 in addition to the common fixed header: 1592 0 1 2 3 1593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1595 | timestamp | 1596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1597 | probability | interval | 1598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1599 | list of receivers | 1600 | (variable length) | 1601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1603 The fields have the following meaning: 1605 Timestamp: 32 bits 1606 This field indicates the time at which this packet is sent. It 1607 is encoded as the middle 32 bits of NTP time. Its purpose is to 1608 measure the round trip time between the sender and a receiver. 1610 Probability: 16 bit 1611 This field indicates the probability that a receiver should use 1612 to trigger the transmission of a receiver report. It is encoded 1613 in fractions of 1/65536. 1615 Interval: 16 bit 1616 This field indicates the time interval to use to randomize 1617 the report time. It is in number of seconds. 1619 List of receivers: variable length 1620 This field contains either a wild card address (0xffffffff) 1621 or a list of entity identifiers of the receivers that are 1622 selected to send receiver reports. 1624 8.9 Receiver Report 1626 Receiver reports are sent only upon selection by the sender. While 1627 the report selection is sender specific, a receiver report packet 1628 may contain several receiver reports, each of them destinated to a 1629 specific sender. 1631 Receiver reports must contain the following options in addition to 1632 the common fixed header: 1634 0 1 2 3 1635 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1637 | srcID 1 | 1638 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1639 | RS timestamp | 1640 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1641 | delay since RS (DSRS) | 1642 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1643 | expected sequence number | 1644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1645 |C| loss rate | cumulative number of packets lost | 1646 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1647 | (next srcID ...) | 1648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1650 The fields have the following meaning: 1652 srcID 1: 32 bits 1653 the entity identifier of the data source to which the report is 1654 addressed. 1656 RS timestamp: 32 bits 1657 This field is a copy of the timestamp field carried in the 1658 receiver report selection packet. 1660 delay since RS (DSRS): 32 bits 1661 This field indicates the delay between the reception of the 1662 receiver report selection packet and the reply. It is encoded 1663 in the fractions of 1/65536 seconds. The sender can use the 1664 timestamp and the delay to estimate the round trip time to 1665 the reporter. 1667 expected sequence number: 32 bits 1668 This field indicates the expected sequence number from the 1669 sender. It implies that all packets less than this number have 1670 been correctly received. Thus it can be considered as a 1671 positive acknowledgement to all packets till this number minus 1672 one. 1674 C (Congestion): 8 bits 1675 This field is the congestion flag. If set, it indicates that 1676 the network path is congested between the sender and the 1677 receiver. 1679 loss rate: 8 bits 1680 This field indicates the fraction of packets lost since last 1681 report or since the receiver joined the session if no report 1682 was sent before. It is expressed in fractions of 1/128. 1684 cumulative number of packets lost: 24 bits 1685 This field indicates the total number of packets lost since 1686 the receiver joined the session. 1688 9. Options for Forward Error Correction 1690 This section describes how to integrate forward error correction 1691 (FEC) into the protocol. Proper use of FEC technique can 1692 substantially improve the scalability due to the fact that it 1693 reduces the number of feedback packets from receivers [16] [17] 1694 [18] [19]. FEC is particularly useful in very large groups and over 1695 asymmetric network links where no or little bandwidth is available 1696 on feedback channels. 1698 Since the use of FEC is optional, the FEC integration scheme 1699 described here allows that implementations without FEC support can 1700 correctly receive data stream by ignoring FEC redundant packets. 1702 There are many ways to send FEC encoded packets. For example, FEC 1703 encoded packets can be sent either automatically or upon reception 1704 of NACK packets as repairs. This document is not intended to define 1705 what transmission schemes should be used by the sender, rather it 1706 only specifies the necessary elements that allows to implement a 1707 generic receiver. Thus no restrictions are put on how FEC packets 1708 should be transmitted as long as they allow correct recovery at 1709 receivers. Implementations are free to choose an adequate 1710 transmission scheme. 1712 9.1 Parameters 1714 FEC is based on the principle of adding some redundant packets to 1715 the transmitted data stream so that the original data packets can be 1716 recovered without retransmission up to certain loss. This can avoid 1717 generating NACK packets [18] [19]. 1719 FEC can be implemented using erasure codes [16]. More specifically, 1720 the data and redundant packets are grouped into blocks. A block 1721 contains k data packets and (n-k) redundant packets, where n is the 1722 size of a block. The (n-k) redundant packets are calculated from 1723 k data packets. At reception if k out of n packets are received, 1724 the original data packets can be reconstructed. 1726 To increase the efficiency of recovery in case of burst loss, a 1727 spacing parameter is introduced. That is, let the base sequence 1728 number be b and the spacing between sequence numbers be s. The 1729 sequence numbers of original data packets in a block are composed of: 1731 b, b + s, b + 2s, b + 3s, ..., b + (k-1)s 1733 Larger spacing parameter and large block size (n) requires more 1734 buffer space at receivers. In addition, for large block size, 1735 receivers that newly joined the session may need longer time to be 1736 able to start decoding FEC packets. 1738 FEC packets that belong to a same block are referenced by the same 1739 base sequence number, b. In addition, they should contain the block 1740 size (n), the number of redundant packets (n-k) and the spacing 1741 parameter. 1743 9.2 FEC Packet Format 1745 FEC encoded redundant packets have the same header length as DATA 1746 packets and must contain the following options in addition to the 1747 common fixed header: 1749 0 1 2 3 1750 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1752 | BSN | 1753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1754 | BS | NR | spacing | RC | 1755 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 1756 | FEC data | 1757 | .... | 1758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1760 The fields have the following meaning: 1762 BSN (Base Sequence Number): 32 bits 1763 This field indicates the starting sequence number of the 1764 original data packets from which the redundant packets are 1765 calculated. 1767 BS (Block Size): 8 bits 1768 This field is the size of the current block minus one. 1770 NR (Number of Redundant Packets): 8 bits 1771 This field is the number of redundant packets in the current 1772 block minus one. 1774 spacing: 8 bits 1775 This field is the spacing between the sequence numbers of 1776 DATA packets minus one. 1778 RC (Redundant packet Count): 8 bits 1779 This field indicates the relative redundant packet number, 1780 starting from 0, in this block, i.e., 0 <= RC < NR. 1782 FEC data: variable length 1783 This field contains the redundant data calculated from the 1784 original data packets. 1786 10. Security Considerations 1788 LRMP suffers from the same security liabilities as unicast protocols, 1789 for example, an imposter can fake source or destination network 1790 addresses, or change the header or data payload. This section 1791 describes the security constraints present in this protocol in 1792 addition to unicast protocols. Multicast is in general more 1793 vulnerable to malicious attacks. Implementations should take these 1794 constraints into account in order to ensure the normal working of 1795 the protocol. It is also desirable that an implementation offers 1796 possibilities to log detailed trace in the case where a possible 1797 attack is detected. 1799 10.1 Traffic Attacks 1801 A malicious or misbehaving site may send as much data as possible to 1802 a session. This can result in that the session will be flooded by 1803 excessive data traffic so that the intended data traffic can not be 1804 transmitted as expected. At the worst, it can lead to a denial of 1805 service attack. If a protocol entity is detected doing so, the 1806 packets received from that entity should be dropped before further 1807 processing. The host doing this attack should also be indicated to 1808 the network administrator so that possible prevention could be made. 1810 10.2 Error Recovery 1812 A malicious or misbehaving site may continuously request unnecessary 1813 retransmissions by sending excessive NACK packets. Since the 1814 retransmission will take some useful bandwidth, this can slow down 1815 the transmission of new data. To avoid this problem, a protocol 1816 entity can place an upper bound to the number of NACKs that can be 1817 generated by a receiver, for example, in form of ratio of NACKs/DATA 1818 packets, as well as an upper bound to the number of retransmissions 1819 of a same data packet. A protocol entity should reject NACK packets 1820 received from a receiver if the number of NACKs received from this 1821 receiver is abnormally much higher than any other receivers. 1823 It is possible that a protocol entity modifies the data packets 1824 received from the original sender and sends them as repairs. In 1825 absence of low level authentication, it is recommended that 1826 applications use some mechanism to authenticate the information 1827 carried by the protocol. 1829 10.3 Entity Identifier 1831 A malicious or misbehaving site may use an entity identifier already 1832 used by another entity to send data or erroneous control packets. In 1833 this case, it is important that a protocol entity correctly 1834 implements the mechanism described in section 3.3 to deny packets 1835 from a forged entity. 1837 A malicious or misbehaving site may also use a low entity identifier 1838 in repair process, either for error report or sending repairs. This 1839 entity may send or respond to NACK packets, then stop the repair 1840 process. If a protocol entity implements incorrectly the repair 1841 procedure, loss repair may depend on the entity with low entity 1842 identifier and thus can be blocked. 1844 A malicious or misbehaving site may change very frequently its entity 1845 identifier. If an implementation keeps the list of all entities heard 1846 in the session, that will make the size of the list continuously 1847 growing and consequently will increase the memory usage. To avoid 1848 this type of problems, an implementation can take two precautions. 1849 First it can limit the size of the list. If the number of entities 1850 exceeds the maximum size, old entries are removed. Second it can 1851 reuse an old entry that is from the same network address as the new 1852 entity and has not been heard since a long time. 1854 10.4 Round Trip Time 1856 A malicious or misbehaving site may try to make the estimation of 1857 round trip time inadequate by putting bad delay either in NACK reply 1858 or receiver report. To prevent this type of attacks, a protocol 1859 entity should place a lower bound and a upper bound to measured round 1860 trip time and limit the portion contributed by a single entity to the 1861 mean round trip time. 1863 10.5 Congestion Control 1865 A malicious or misbehaving site may try to make the transmission rate 1866 too low by sending false NACK packets or congestion indication in the 1867 receiver reports. It is desirable to advise the users to put a 1868 reasonable lower bound for the transmission rate. In addition, 1869 implementations can ignore those NACK packets or congestion 1870 indications if they come too often from the same site. 1872 10.6 Confidentiality 1874 There is no data encryption prescribed by this protocol. Implementors 1875 should be aware that as the information is transmitted in clear, all 1876 receivers can receive and eventually interpret the information. If an 1877 application wants that only the intended receivers decode the 1878 information, it should use an encryption algorithm to protect the 1879 information. 1881 10.7 Level of Redundancy 1883 Transmission of redundant packets adds additional traffic to the data 1884 transmission. When packet loss occurs, redundant packets could be 1885 useful for recovering loss without feedback from receivers. However 1886 transmission of large amounts of redundant data could increase 1887 network congestion, leading to a contradictory effect of the use of 1888 forward error correction. In the worst case, it can lead to a denial 1889 of service attack. A protocol entity implementing FEC in data 1890 transmission should use the feedback from receivers such as the 1891 number of NACKs, round trip time variation to adjust the level of 1892 redundancy in the data stream. 1894 A. Requirements of RFC 2357 1896 This section describes how well this specification satisfies the 1897 criteria of RFC 2357 [5] and future work. LRMP has been implemented 1898 as a reusable library and a number of tests have been carried out 1899 [26]. More performance evaluation results will be presented later. 1901 A.1 Scalability 1903 The basic error recovery scheme of LRMP is derived from SRM and 1904 thus has similar behavior as SRM which is well analyzed in [6] for 1905 different network topologies. The parameters that influence the 1906 scalability are mainly the number of duplicate NACKs and that of 1907 duplicate repairs in case of loss. From [6], these numbers remain 1908 enough stable with the group size. From our experiments [26], these 1909 numbers increase slightly with the group size due to various 1910 practical reasons, e.g., variable round trip time and transmission 1911 rate. But they remain enough reasonable. 1913 LRMP adds some overhead by adopting local error recovery. If 1914 an error could be recovered in a local group, LRMP will behave 1915 better than the original scheme of SRM. Otherwise several messages 1916 could be generated in the local groups. Since NACK packets are of 1917 small size (24 bytes), the additional traffic is not significant. 1918 In general, data packets can be considered much costly than NACK 1919 packets in terms of network resource usage. Besides the NACKs 1920 sent to the local groups without local repair are not useless, they 1921 have also a positive effect, called election phase. That is, they 1922 can be thought to elect the protocol entity which will report the 1923 packet loss to a higher level group. This can reduce the probability 1924 of duplicate NACKs in subgroups with larger TTL values. 1926 The scalability of the error recovery scheme is improved in case of 1927 burst loss due to the fact that a NACK packet can report up to 33 1928 lost packets. Compared to one NACK per lost packet, the probability 1929 of duplicate NACK packets is considerably reduced. 1931 When there are multiple senders in a session, the packet loss 1932 for one sender is handled independently of the others. It is 1933 prescribed that several control packets for different senders can 1934 be multiplexed into one outgoing packet if they are sent within the 1935 same scope. In addition of the network bandwidth, the cost of 1936 multiple senders is the cache space for out-of-order packets. More 1937 senders require more cache space. Therefore LRMP can generally 1938 support a limited number of senders which depends on available system 1939 resources. 1941 A sender does not need to maintain the state about every receiver. 1942 It generally needs only to maintain several recently heard receivers 1943 for the purpose to detect the slowest receivers or possible attacks. 1944 Similarly receivers only need to maintain the sender state. Thus 1945 LRMP should be enough scalable to large groups in terms of system 1946 resource usage. 1948 A.2 Congestion Control 1950 The congestion control scheme works in function of the send window 1951 size and loss rate (which is reflected by NACKs). For reasonable 1952 send window size, e.g., 32 or 64, the congestion control scheme 1953 reacts rather quickly in case of congestion problems. Overall data 1954 rate can very quickly drop to the minimum as data retransmission is 1955 also subject to the same congestion control as transmission of new 1956 data. 1958 Further simulation and test are needed to see what fairness can 1959 be achieved with competing TCP-like data flows [25]. It should be 1960 noted that it is especially hard to behave the same way as TCP while 1961 a different algorithm is used. However it could be easier make the 1962 congestion control less aggressive than TCP. 1964 For large groups and heterogeneous environments, congestion control 1965 may often lead the data rate to the minimum due to a few slow 1966 receivers. A number of heuristic methods are proposed in section 7.3 1967 to tackle this problem. In particular, a receiver should leave the 1968 session if it deduces that there is serious mismatch between the 1969 session data rate and the available bandwidth. 1971 It is worth noting that though LRMP was designed for heterogeneous 1972 environments, a significant part of users use it in well controlled 1973 environments such as private networks where receivers are rather 1974 homogeneous. 1976 A.3 Congestion Avoidance 1978 It is hard to do congestion avoidance when packet loss is not 1979 present. As indicated in [27], in lack of any congestion indication 1980 from the network, the only available information about the possible 1981 congested network path is the packet loss and propagation delay. 1982 The possibility to do rate adaptation according to the round trip 1983 time variance was explored. To make the rate adaptation pertinent, 1984 a sender must know the round trip times to all receivers. Yet these 1985 RTTs need to be enough fresh to be usable for congestion control. 1986 The need to frequently measure the RTTs to all receivers adds a 1987 lot of complexity and makes the scalability worse. 1989 Using the mean RTT to all receivers for rate adaptation leads too 1990 often to unnecessary adjustments and is not suitable for 1991 heterogeneous environments due to the diversity of round trip times 1992 to different receivers. 1994 Thus LRMP uses only the packet loss information for congestion 1995 control. LRMP can respond to congestion problem at very early stage 1996 once packet loss is detected. 1998 Again we need to look further into the consequences of this scheme 1999 to see the behavior at both sides (LRMP and TCP) and what is the 2000 bandwidth share. 2002 A.4 Security Concerns 2004 A number of security constraints are presented in section 10. They 2005 should be carefully taken into account by any implementation. 2006 The current specification does not handle the issues related to the 2007 authentication and encryption of information. It is recommended 2008 that an application should make use of adequate techniques to ensure 2009 the authentication, integrity, and confidentiality of the information 2010 transported by LRMP. 2012 B. References 2014 [1] S. Deering, "Host extensions for IP multicasting", RFC 1112, 2015 August 1989. 2017 [2] J. Postel, "Transmission Control Protocol", STD 7, RFC 793, 2018 USC/Information Sciences Institute, Sept. 1981. 2020 [3] J. Postel, "Internet Protocol", STD 5, RFC 791, 2021 USC/Information Sciences Institute, Sept. 1981. 2023 [4] D. Mills, "Network Time Protocol Version 3", RFC 1305, UDEL, 2024 March 1992. 2026 [5] A. Mankin, A. Romanow, S. Bradner, V. Paxson., "IETF Criteria 2027 for Evaluating Reliable Multicast Transport and Application 2028 Protocols", RFC 2357, June 1998. 2030 [6] S. Floyd, V. Jacobson, S. McCanne, C. Liu, L. Zhang, 2031 "A Reliable Multicast Framework for Light-weight Sessions and 2032 Application Level Framing", ACM Transactions on Networking, 2033 Nov. 1996. 2035 [7] D. D. Clark and D. L. Tennenhouse, "Architectural 2036 considerations for a new generation of protocols," in SIGCOMM 2037 Symposium on Communications Architectures and Protocols, 2038 (Philadelphia, Pennsylvania), pp. 200-208, IEEE, Sept. 1990. 2039 Computer Communications Review, Vol. 20(4), Sept. 1990. 2041 [8] J. C. Lin and S. Paul, "RMTP: A Reliable Multicast Transport 2042 Protocol", Proceedings of IEEE INFOCOM '96, March 1996, 2043 pp.1414 - 1424. 2045 [9] B. Whetten et al., "The RMTP-II Protocol", Internet Draft, 2046 draft-whtten-rmtp-ii-00.txt, April 1998. 2048 [10] T. Speakman, D. Farinacci, S. Lin, and A. Tweedly, "Pragmatic 2049 Group Multicast (PGM) Transport Protocol Specification", 2050 Internet Draft, draft-speakman-pgm-spec-01.txt, Jan. 1998. 2052 [11] M. Hofmann, "Enabling Group Communication in Global Networks", 2053 Proc. of Global Networking '97, Volume II, pp321-330, Calgary, 2054 Alberta, Canada, June 1997. 2056 [12] C. A. Kent and J. C. Mogul, "Fragmentation Considered Harmful", 2057 Proc. of SIGCOMM '87, August 1987. 2059 [13] P. Karn and C. Partridge, "Improving Round-Trip Time Estimates 2060 in Reliable Transport Protocol", ACM Transactions on Computer 2061 Systems (TOCS), Vol. 9, No. 4, pp. 364-373, November 1991. 2063 [14] V. Jacobson, "Congestion Avoidance and Control", Computer 2064 Communications Review, vol.18, pp314-329, August 1988. 2066 [15] K.K. Ramakrishnan and R. Jain, "A Binary Feedback Scheme for 2067 Congestion Avoidance in Computer Networks", ACM Transactions on 2068 Computer Systems (TOCS), Vol. 8, No. 2, pp 158-181, May, 1990. 2070 [16] R. E. Blahut, "Theory and Practice of Error Control Codes", 2071 Addison Wesley, 1984. 2073 [17] C. Perkins and O. Hodson, "Options for Repair of Streaming 2074 Media", RFC 2354, UCL, Jube 1998. 2076 [18] L. Rizzo, "Effective erasure codes for reliable computer 2077 communication protocols", ACM Computer Communication Review, 2078 Vol.27, no.2, April 1997. 2080 [19] J. Nonnenmacher, E. W. Biersack and D. Towsley, "Parity-Based 2081 Loss Recovery for Reliable Multicast Transmission", Proc. of 2082 SIGCOMM '97, Sept. 1997. 2084 [20] D. Meyer, "Administratively Scoped IP Multicast", RFC 2365, 2085 University of Oregon, July 1998. 2087 [21] S. Floyd and K. Fall, "Promoting the Use of End-to-End 2088 Congestion Control in the Internet", submitted to IEEE/ACM 2089 Transactions on Networking, Feb. 1998. 2091 [22] Dah Ming Chiu, "Flow and Congestion Control in Reliable 2092 Multicast", Presentation at IRTF RM group meeting, UCL, July 2093 1998, available from http://www.east.isi.edu/RMRG/. 2095 [23] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson, "RTP: 2096 A Transport Protocol for Real-Time Applications", RFC 1889, 2097 January 1996. 2099 [24] L. Vicisano, L. Rizzo and J. Crowcroft, "TCP-like congestion 2100 control for layered multicast data transfer", IEEE Infocom '98, 2101 March 1998. 2103 [25] M. Handley, "Reference simulations for multicast congestion 2104 control", Presentation at IRTF RM group meeting, UCL, July 2105 1998, available from http://www.east.isi.edu/RMRG/. 2107 [26] T. Liao, "Light-weight Reliable Multicast Protocol", Technical 2108 report, available at 2109 http://webcanal.inria.fr/lrmp/lrmp_paper.html, October 1998. 2111 [27] Jamal Golestani, "Fundamental Observations on Multicast 2112 Congestion Control in the Internet", Presentation at IRTF RM 2113 group meeting, UCL, July 1998, available at 2114 http://www.bell-labs.com/user/golestani/jamal1.ps. 2116 C. Acknowledgments 2118 Thanks to the IRTF RM group for useful ideas and specially to the 2119 following people for constructive suggestions and support: 2120 Jon Crowcroft (UCL), Jean-Francois Abramatic (INRIA/W3C), 2121 Yves Peynaud (INRIA). 2123 D. Author's Address 2125 Tie Liao 2126 INRIA 2127 Domaine Voluceau - BP 105 2128 78153 Le Chesnay Cedex 2129 France 2131 Email: Tie.Liao@inria.fr