idnits 2.17.1 draft-ietf-rmt-design-space-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 19 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 309 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 8 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 14 has weird spacing: '...This document...' == Line 22 has weird spacing: '...and may be ...' == Line 38 has weird spacing: '...lticast is r...' == Line 39 has weird spacing: '...lutions havin...' == Line 40 has weird spacing: '...atively small...' == (304 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 1999) is 9081 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'KCW98' is defined on line 763, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'BC94' -- Possible downref: Non-RFC (?) normative reference: ref. 'CDT99' -- Possible downref: Non-RFC (?) normative reference: ref. 'FLST98' -- Possible downref: Non-RFC (?) normative reference: ref. 'FJM95' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ha99' -- Possible downref: Non-RFC (?) normative reference: ref. 'HC97' -- Possible downref: Non-RFC (?) normative reference: ref. 'KCW98' -- Possible downref: Non-RFC (?) normative reference: ref. 'PPV98' -- Possible downref: Non-RFC (?) normative reference: ref. 'Ri97' -- Possible downref: Non-RFC (?) normative reference: ref. 'RV97' -- Possible downref: Non-RFC (?) normative reference: ref. 'RMWT98' -- Possible downref: Non-RFC (?) normative reference: ref. 'WHA98' -- Possible downref: Non-RFC (?) normative reference: ref. 'WKM94' -- Possible downref: Non-RFC (?) normative reference: ref. 'WGL97' Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force RMT WG 2 INTERNET-DRAFT M. Handley, ACIRI 3 draft-ietf-rmt-design-space-00.txt B. Whetten, Talarian 4 R. Kermode, Motorola 5 S. Floyd, ACIRI 6 L. Vicisano, Cisco 7 16th June 1999 8 Expires: Dec 1999 10 The Reliable Multicast Design Space for Bulk Data Transfer 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with all 15 provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering Task 18 Force (IETF), its areas, and its working groups. Note that other groups 19 may also distribute working documents as Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet- Drafts as reference material 24 or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Copyright Notice 34 Copyright (C) The Internet Society (1999). All Rights Reserved. 36 Abstract 38 The design space for reliable multicast is rich, with many possible 39 solutions having been devised. However, application requirements serve 40 to constrain this design space to a relatively small solution space. 41 This document provides an overview of the design space and the ways in 42 which application constraints affect possible solutions. 44 1. Introduction 46 The term ``general purpose reliable multicast protocol'' is something of 47 an oxymoron. Different applications have different requirements of a 48 reliable multicast protocol, and these requirements constrain the design 49 space in ways that two applications with differing requirements often 50 cannot share a single solution. There are however many successful reli- 51 able multicast protocol designs that serve more special purpose require- 52 ments well. 54 In this document we attempt to review the design space for reliable mul- 55 ticast protocols intended for bulk data transfer. The term bulk data 56 transfer should be taken as having broad meaning - the main limitations 57 are that the data stream is continuous and long lived - constraints nec- 58 essary for the forms of congestion control we currently understand. The 59 purpose of this review is to gather together an overview of the field 60 and to make explicit the constraints imposed by particular mechanisms. 61 The aim is to provide guidance to the standardization process for proto- 62 cols and protocol building blocks. In doing this, we cluster potential 63 solutions into a number of loose categories - real protocols may be com- 64 posed of mechanisms from more than one of these clusters. 66 The main constraint on solutions is imposed by the need to scale to 67 large receiver sets. For small receiver sets the design space is much 68 less restricted. 70 2. Application Constraints 72 Application requirements for reliable multicast (RM) are as broad and 73 varied as the applications themselves. However, there are a set of 74 requirements that significantly affect the design of an RM protocol. A 75 brief list includes: 77 o Does the application need to know that everyone received the data? 79 o Does the application need to constrain differences between 80 receivers? 82 o Does the application need to scale to large numbers of receivers? 84 o Does the application need to be totally reliable? 86 o Does the application need to provide low-delay delivery? 88 o Does the application need to provide time-bounded delivery? 89 o Does the application need many interacting senders? 91 o Is the application data flow intermittent? 93 o Does the application need to work in the public Internet? 95 o Does the application need to work without a return path (eg satel- 96 lite)? 98 o Does the application need to provide secure delivery? 100 In the context of standardizing bulk data transfer protocols, we can 101 rule out applications with multiple interacting senders and intermittent 102 data flows. It is not that these applications are unimportant, but that 103 we do not yet have effective congestion control for such applications. 105 2.1. Did everyone receive the data? 107 Some applications have a strong requirement for confirmation that all 108 the receivers got the data, or if not, to be informed of which specific 109 receivers failed to receive all the data. Examples include applications 110 where receivers pay for data, and reliable file-system replication. 112 Other applications do not have such a requirement. An example is the 113 distribution of free software. 115 If the application does need to know that everyone got the data, then a 116 positive acknowledgment must be received from every receiver, although 117 it may be possible to aggregate these acknowledgments. If the applica- 118 tion needs to know precisely which receivers failed to get the data, 119 additional constraints are placed on acknowledgment aggregation. 121 Note that this requirement is on application data unit (ADU) acknowledg- 122 ment and not necessarily on packet acknowledgment. 124 2.2. Constraining differences 126 Some applications need to constrain differences between receivers so 127 that the data reception characteristics for all receivers falls within 128 some range. An example is a stock price feed, where it is unacceptable 129 for a receiver to suffer delivery that is delayed significantly more 130 than any other receiver. 132 This requirement is difficult to satisfy without harming performance. 133 Typically solutions involve not sending more than a limited amount of 134 new data until positive acknowledgments have been received from all the 135 receivers. Such a solution does not cope with network and end-system 136 failures well. 138 2.3. Receiver Set Scaling 140 There are many applications for RM that do not need to scale to large 141 numbers of receivers. For such applications, a range of solutions may 142 be available that are not available for applications where scaling to 143 large receiver sets is a requirement. 145 Receiver set scaling is one of the most important constraining require- 146 ments, because it places strict limits on the mechanisms used to achieve 147 reliability. 149 In a very small system, it may be acceptable to have the receivers 150 acknowledge every packet. This approach provides the sender with the 151 maximum amount of information about reception conditions at all the 152 receivers. 154 For larger systems, such ``flat ACK'' schemes cause acknowledge implo- 155 sions at the sender. Attempts have been made to reduce this problem by 156 sending aggregate ACKs infrequently [RMWT98, BC94], but it is very dif- 157 ficult to incorporate effective congestion control into such protocols 158 because of the spareness of feedback. 160 Using negative acknowledgments (NACKs) instead of ACKs reduces this 161 problem to one of NACK implosion (only from the receivers missing the 162 packets), and because the sender really only needs to know that at least 163 one receiver is missing data, various NACK suppression mechanisms can be 164 applied. 166 An alternative to NACKs is ACK aggregation, which can be done by arrang- 167 ing the receivers into a logical tree, so that each leaf sends ACKs to 168 its parent which aggregates them, and passes them on up the tree. Tree- 169 based protocols scale well, but tree formation can be problematic. 171 Other ACK topologies such as rings are also possible, but are often more 172 difficult to form and maintain than trees are. An alternative strategy 173 is to add mechanisms to routers so that they can help out in achieving 174 reliability or in reducing the cost of achieving reliability. 176 All these solutions improve receiver set scaling, but they all have lim- 177 its of one form or another. One class of solutions scales to an infi- 178 nite number of receivers by having no feedback channel whatsoever. 179 These solutions take the initial data and encode it using an FEC-style 180 mechanism. This encoded data is transmitted in a continuous stream. 181 Receivers then join the session and receive packets until they have 182 sufficient packets to decode the original data, at which point they 183 leave the session. 185 Thus, it is clear that the intended scale of the session constrains the 186 possible solutions. All solutions will work for very small sessions, 187 but as the intended receive set increases, the range of possible solu- 188 tions that can be deployed safely decreases. 190 2.4. Total vs Semi-reliable 192 Many applications require delivery to be totally reliable; if any of the 193 data is missing, none of the data is useful. File transfer applications 194 are a good example of applications requiring total reliability. 196 However, some applications do not need total reliability. An example is 197 audio broadcasting, where missing packets reduce the quality of the 198 received audio but do not render it unusable. Such applications can 199 sometimes get by without any additional reliability over native IP reli- 200 ability, but often having a semi-reliable multicast protocol is desir- 201 able. 203 2.5. Time-bounded Delivery 205 Many applications just require data to be delivered to the receivers as 206 fast as possible. They have no absolute deadline for delivery. 208 However, some applications have hard delivery constraints - if the data 209 does not arrive at the receiver by a certain time, there is no point in 210 delivering it at all. Such time-boundedness may be as a result of real- 211 time constraints such as with audio or video streaming, or as the result 212 of new data superseding old data. In both cases, the requirement is for 213 the application to have a greater degree of control over precisely what 214 the application sends at which time than might be required with applica- 215 tions such as file transfer. 217 Time-bounded delivery usually also implies a semi-reliable protocol, but 218 the converse does not necessarily hold. 220 3. Network Constraints 222 The properties of the network in which the application is being deployed 223 may themselves constrain the reliable multicast design space. 225 3.1. Internet vs Intranet 227 In principle the Internet and intranets are the same. In practice how- 228 ever, the fact that an intranet is under one administration might allow 229 for solutions to be configured that can not easily be done in the public 230 Internet. Thus, if the data is of very high value, it might be appro- 231 priate to enhance the routers to provide assistance to a reliable multi- 232 cast transport protocol. In the public Internet, it is unlikely that 233 the additional expense required to support this state in the routers 234 would be acceptable. 236 3.2. Return Path 238 In principle, when feedback is required from receivers, this feedback 239 can be multicast or unicast. Multicast feedback has advantages, espe- 240 cially in NACK-based protocols where it is valuable for NACK suppres- 241 sion. However, it is not clear at this time whether all ISPs will allow 242 all members of a session to send to that session. If multicast feedback 243 is not allowed, then unicast feedback can almost always be substituted, 244 although often at the expense of additional messages and mechanisms. 246 Some networks may not allow any form of feedback however. The primary 247 example of this occurs with satellite broadcasts where the back channel 248 may be very narrow or even non-existent. For such networks the solution 249 space is very constrained - only FEC-based encodings have any real 250 chance of working. If the receivers are direct satellite receivers, 251 then no congestion control is needed, but it is dangerous to make such 252 assumptions because it is possible for a satellite hop to feed down- 253 stream networks. Thus, congestion control still needs to be considered 254 with solutions that do not have a return path. 256 3.3. Network Assistance 258 A reliable multicast protocol must involve mechanisms running in end 259 hosts, and must involve routers forwarding multicast packets. However 260 under some circumstances, it is possible to rely on some additional 261 degree of assistance from network elements. Broadly speaking we can 262 cluster RM protocols into four classes depending on the degree of sup- 263 port received from other network elements. 265 No Additional Support 266 The routers merely forward packets, and only the sender and 267 receivers have any reliable multicast protocol state. 269 Layered Approaches 270 Data is split across multiple multicast groups. Receivers join 271 appropriate groups to receive only the traffic they require. This 272 may in some cases require fast join or leave functionality from the 273 routers, and may require more forwarding state in the routers. 275 Server-based Approaches 276 Additional nodes are used to assist with data delivery or feedback 277 aggregation. These additional nodes might not be normal senders or 278 receivers, and may be present on the distribution or feedback tree 279 only to provide assistance to the reliable multicast protocol. They 280 would not otherwise receive the multicast traffic. 282 Router-based Approaches 283 With router-based approaches, routers on the normal data distribu- 284 tion tree from the sender to the receivers assist in the delivery of 285 data or feedback aggregation or suppression. As routers can 286 directly influence multicast routing, they have more control over 287 which traffic goes to which group members than server-based 288 approaches. However routers do not normally have a large amount of 289 spare memory or processing power, which restricts how much function- 290 ality can be placed in the routers. In addition, router code is 291 normally more difficult to upgrade than application code, so router- 292 based approaches need to be very general as they are more difficult 293 to deploy and to change. 295 4. The RM Solution Space 297 RM solutions can be roughly categorized as using one or more of the fol- 298 lowing techniques: 300 o Data packet acknowledgment. 302 o Negative acknowledgment of missing data packets. 304 o Redundancy allowing not all packets to be received. 306 These techniques themselves can be usefully subdivided, so that we can 307 examine the parts of the requirement space in which each mechanism can 308 be deployed. 310 4.1. ACK-based Mechanisms 312 The simplest ACK-based mechanism involves every receiver sending an ACK 313 packet for every data packet it receives. Such mechanisms are limited 314 to very small receiver groups by the implosion of ACKs received at the 315 sender, and for this reason they are impractical for most applications. 317 Putting multiple ACKs into a single data packet [RMWT98] reduces the 318 implosion problem by a constant amount, allowing slightly larger 319 receiver groups. However a limit is soon reached whereby feedback to 320 the sender is too infrequent for sender-based congestion control mecha- 321 nisms to work reliably. 323 Arranging the receivers into a ring [WKM94] whereby an ``ACK-token'' is 324 passed around the ring prevents the implosion problem for data. However 325 ring creation and maintenance may itself be problematic. Also if ring 326 creation does not take into account network topology (something which is 327 difficult to achieve in practice), then the number of ACK packets cross- 328 ing the network backbone for each data packet sent may increase O(n) 329 with the number of receivers. 331 4.1.1. Tree-based ACK Mechanisms 333 Arranging the receivers into a tree [MWB+98, KCW98] whereby receivers 334 generate ACKs to a parent node, which aggregates those ACKs to its par- 335 ent in turn, is both more robust and more easily configured than a ring. 336 The ACK-tree is typically only used for ACK-aggregation - data packets 337 are multicast from the sender to the receivers as normal. Trees are 338 easier to construct than rings because more local information can be 339 used in their construction. Also they can be more fault tolerant than 340 rings because node failures only affect a subset of receivers, each of 341 which can easily and locally decide to by-pass its parent and report 342 directly to the node one level higher in the tree. With good ACK-tree 343 formation, tree-based ACK mechanisms have the potential to be one of the 344 most scalable RM solutions. 346 To be simple to deploy, tree-based protocols must be self-organizing - 347 the receivers must form the tree themselves using local information in a 348 scalable manner. Such mechanisms are possible, but are not trivial. 349 The main scaling limitations of tree-based protocols therefore come from 350 the tree formation and maintenance mechanisms rather than from the use 351 of ACKs. Without such a scalable and automatic tree-formation mecha- 352 nism, tree-based protocols must rely on manual configuration, which sig- 353 nificantly limits their applicability (often to intranets) and (due to 354 the complexity of configuration) their scalability. 356 Orthogonal to the issue of tree formation is the issue of subtree 357 retransmission. With appropriate router mechanisms, or the use of mul- 358 tiple multicast groups, it is possible to allow the intermediate tree 359 nodes to retransmit missing data to the nodes below them in the tree 360 rather than relying on the original sender to retransmit the data. This 361 relies on there being a good correlation at the point of the intermedi- 362 ate node between the ACK tree and the actual data tree, as well as there 363 being a mechanism to constrain the retransmission to the subtree. A 364 good automatic tree formation mechanism combined with the use of admin- 365 istrative scoped multicast groups might provide such a solution. Without 366 such tree formation mechanisms, subtree retransmission is difficult to 367 deploy in large groups in the public internet. This could also be 368 solved by the use of transport-level router mechanisms to assist or per- 369 form retransmission, although existing router mechanisms [FLST98] sup- 370 port NACK-based rather than ACK-based protocols. 372 Another important issue is the nature of the aggregation performed at 373 interior nodes on the ACK-tree. Such nodes could: 375 1. aggregate ACKs by sending a single ACK when all their children have 376 ACKed, 378 2. aggregate ACKs by listing all the children that have ACKed, 380 3. send an aggregated ACK with a NACK-like exception list. 382 For data packets, 1. is clearly more scalable, and should be preferred. 383 However if the sender needs to know exactly which receivers received the 384 data, 2. and 3. provide this information. Fortunately, there is usually 385 no need to do this on a per-packet basis, but rather on a per-ADU basis. 386 Doing 1. on a per packet basis, and 3. on a per ADU basis is the most 387 scalable solution for applications that need this information, and suf- 388 fers virtually no disadvantage compared to the other solutions used on a 389 per-packet basis. 391 4.2. NACK-based mechanisms 393 Instead of sending an ACK for every data packet received, receivers can 394 send a negative acknowledgment (NACK) for every data packet they dis- 395 cover they did not receive. This has a number of advantages over ACK- 396 based mechanisms: 398 o The sender no longer needs to know exactly how many receivers there 399 are. This removes the topology-building phase needed for ring- or 400 tree-style ACK-based algorithms. 402 o Fault-tolerance is made somewhat simpler by making receivers respon- 403 sible for reliability. 405 o Sender state can be significantly reduced because the sender does 406 not need to keep track of the receivers state. 408 o Only a single NACK is needed from any receiver to indicate a packet 409 that is missing by any number of receivers. Thus NACK suppression 410 is possible. 412 The disadvantages are that it is more difficult for the sender to know 413 that it can free transmission buffers, and that additional session level 414 mechanisms are needed if the sender really needs to know if a particular 415 receiver actually received all the data. However for many applications, 416 neither of these is an issue. 418 4.2.1. NACK Suppression 420 The key differences between NACK-based protocols is in how NACK-suppres- 421 sion is performed. The goal is for only one NACK to reach the sender 422 (or a node that can resend the missing data) as soon as possible after 423 the loss is first noticed, and for only one copy of the missing data to 424 be received by those nodes needing retransmission. 426 Different mechanisms come close to satisfying these goals in different 427 ways. 429 o SRM [FJM95] uses random timers weighted by the round trip time 430 between the sender and each node missing the data. This is effec- 431 tive, but requires computing the RTT to each receiver before sup- 432 pression works properly. 434 o NTE [HC97] uses a sender-triggered mechanism based on random keys 435 and sliding masks. This does not require random timers, and works 436 for very large sessions, but makes it difficult to provide the con- 437 stant low-level stream of feedback needed to perform congestion con- 438 trol. 440 o AAP [Ha99] uses exponentially distributed random timers and is 441 effective for large sessions without needing to compute the RTT to 442 each receiver. 444 o PGM [FLST98] and LMS [PPV98] use additional mechanisms in routers to 445 suppress duplicate NACKs. 447 The most general of these mechanisms is probably exponentially weighted 448 random timers. Although SRM style timers can reduce feedback delay, 449 they are harder to use correctly in situations where all the RTTs are 450 not known, or where the number of respondees is unknown. In contrast, 451 exponentially weighted random timers work well across a large range of 452 session sizes with good worst case delay characteristics. 454 Either form of random timer based mechanism can be supplemented by 455 router-support where it is available. Sender triggered NACK mechanisms 456 are more difficult to integrate with router-based support mechanisms. 458 4.3. Redundancy 460 Some RM protocols can be designed so as to not need explicit reliability 461 mechanisms except in comparatively rare cases. An example is in a mul- 462 ticast game, where the position of a moving object is continuously mul- 463 ticast. This positional stream does not require additional reliability 464 because a new position superseding the old one will be sent before any 465 retransmission could take place. However, when the moving object inter- 466 acts with other objects or stops moving, then an explicit reliability 467 mechanism is required to reliably send the interaction information or 468 last position. 470 It is not just games that can be built in this manner - the NTE shared 471 text editor[HC97] uses just such a mechanism with changes to a line of 472 text. For every change the whole line is sent, and so long as the user 473 keeps typing no explicit reliability mechanism is needed. The major 474 advantage of redundancy is that it is not susceptible to spatially 475 uncorrelated packet loss. With a traditional ACK or NACK based proto- 476 col, the probability of any particular packet being received by all the 477 receivers in a large group can be very low. This leads to high retrans- 478 mission rates. In contrast, redundant streams do not suffer as the size 479 of the receiver group increases - different receivers lose different 480 packets, but this does not increase network traffic. 482 4.4. Packet-level Forward Error Correction 484 Forward Error Correction (FEC) is a well known technique for protecting 485 data against corruption. For reliable multicast it is most useful in 486 the form of erasure codes. 488 The simplest form of packet-level FEC is to take a group of packets that 489 is to be sent, and to XOR the packets together to form a new packet 490 which is also sent. If there were three original packets plus the XOR 491 packet sent, then if a receiver is missing any one of the original data 492 packets, but receives the XOR packet, then it can reproduce the missing 493 original packet. 495 More general erasure codes exist[Ri97] that allow the generation of n 496 packets from k original data packets. In such cases, so long as at 497 least k of the n packets are received, then the original data can be 498 reproduced. 500 Using erasure codes to repair packet loss is a significant improvement 501 over simple retransmission because the dependency on which packets have 502 been lost is removed. Thus, the amount of repair traffic required to 503 repair spatially uncorrelated packet loss is considerably lessened. 505 We can divide packet-level FEC schemes into two categories: pro-active 506 FEC and reactive FEC. The difference between the two is that the latter 507 relies on feedback from the receivers as to how many packets were lost 508 by the worst case receiver, and then only that number of FEC packets are 509 sent. These FEC packets will then also serve to repair loss at the 510 other receivers that are missing fewer packets. 512 To apply reactive FEC the sender must group data packets into rounds, 513 and the receivers report via ACKs or NACKs how many packets are missing 514 from each round. With NACKs, only the receiver missing the most packets 515 need send a NACK for this round, so this is used to weight the random 516 timers in the NACK calculation. 518 Reactive FEC is very effective at reducing the repair traffic for packet 519 loss. However, it requires that the data to be sent is split into 520 rounds, which can add to end-to-end latency. For bulk-data applications 521 this is typically not a problem, but this may be an issue for interac- 522 tive applications where redundancy may be a better solution. 524 4.5. Layered FEC 526 An alternative use of packet level FEC is possible when data is spread 527 across several multicast groups[RV97]. In such cases, the original k 528 data packets are used to generate n FEC packets, where n is much larger 529 than k. The n encoded packets are then striped across multiple multi- 530 cast groups, and repeated transmitted. When a receiver wishes to 531 receive the original data it joins one or more of the multicast groups, 532 and receives the FEC packets. Once it has received k different FEC 533 packets, the receiver can then leave all the multicast groups and recon- 534 struct the original data. 536 The primary importance of such a layering is that it allows different 537 receivers to be able to receive the traffic at different rates according 538 to the available capacity. Such schemes do not require any form of 539 feedback from the receivers to the sender, and therefore the need for 540 reliability does not constrain the size of the receiver set. However, 541 to perform adequate network congestion control using receiver joins and 542 leaves in this manner may require coordination between members that are 543 behind the same congested link from the sender. As a result, although 544 such schemes do not require feedback to the sender, they are not 545 entirely free of feedback. 547 5. Congestion Control Mechanisms 549 The basic delivery model of the Internet is best-effort service. No 550 guarantees are given as to throughput, delay or packet loss. End-sys- 551 tems are expected to be adaptive, and to reduce their transmission rate 552 to a level appropriate for the congestion state of the network. 553 Although increasingly the Internet will start to support reserved band- 554 width and differentiated service classes for specialist applications, 555 unless an end-system knows explicitly that it has reserved bandwidth, it 556 must still perform congestion control. 558 Broadly speaking, there are five classes of single-sender multicast con- 559 gestion control solution: 561 o Sender-controlled, one group. 563 A single multicast group is used for data distribution. Feedback 564 from the group members is used to control the rate of this group. 565 The goal is to transmit at a rate dictated by the slowest receiver. 567 o Sender-controlled, multiple groups. 569 One initial multicast group is adaptively subdivided into multiple 570 subgroups with subdivisions centered on congestion points in the 571 network. Application-level relays buffer data from a group nearer 572 the original sender, and retransmit it at a slower rate into a group 573 further from the original sender. In this way, different receivers 574 can receiver the data at different rates. Sender-based congestion 575 control takes place between the members of a subgroup and their 576 relay. 578 o Receiver-controlled, one group. 580 A single multicast group is used for data distribution. The 581 receivers determine if the sender is transmitting too rapidly for 582 the current congestion state of the network, and they leave the 583 group if this is the case. 585 o Receiver-controlled, layered organization. 587 The sender stripes data across multiple multicast groups simultane- 588 ously. Receivers join and leave these layered groups depending on 589 their measurements of the congestion state of the network, so that 590 the amount of data being received is always appropriate. 592 o Router-based congestion control. 594 It is possible to add additional mechanisms to multicast routers to 595 assist in multicast congestion control. Such mechanisms could 596 include: 598 o Conditional joins (a multicast join that specifies a loss rate 599 above which it is acceptable for the router to reject the join). 601 o Router filtering of traffic that exceeds a reasonable rate. 603 o Fair queuing schemes combined with end-to-end adaptation. 605 Router-based schemes generally require more state in network routers 606 than has traditionally been acceptable for backbone routers. Thus, 607 in the near-term, such schemes are only likely to be applicable for 608 intranet solutions. 610 For reliable multicast protocols, it is important to consider congestion 611 control at the same time as reliability is being considered. The same 612 feedback mechanisms that are required to provide reliability will nor- 613 mally be used to provide feedback for congestion control, although often 614 the timing constraints for congestion control will determine the timing 615 of reliability feedback, or will require additional feedback messages to 616 be sent. 618 In the case of receiver-based congestion control, FEC is the likely 619 choice for reliability for bulk-data transfer as receiver-based conges- 620 tion-control operates without feedback from the receiver to the sender. 622 6. Security Considerations 624 Generally speaking, security considerations have relatively little 625 effect on constraining the design space for reliable multicast proto- 626 cols. The primary issues constraining the design space are all related 627 to receiver-set scaling. For authentication of the source and of data 628 integrity, receiver-set scaling is not a significant issue. However, 629 for data encryption, key distribution and particularly re-keying may be 630 significantly affected by receiver-set scaling. Tree and graph based 631 re-keying solutions[WHA98,WGL97] would appear to be appropriate solu- 632 tions to these problems. It is not clear however that such re-keying 633 solutions need to directly affect the design of the data distribution 634 part of a reliable multicast protocol. 636 The primary question to consider for the security of reliable multicast 637 protocols is the role of third-parties. If nodes other than the origi- 638 nal source of the data are allowed to send or resend data packets, then 639 the security model for the protocol must take this into account. In 640 particular, it must be clear whether such third parties are trusted or 641 untrusted. A requirement for trusted third parties can make protocols 642 difficult to deploy on the Internet. 644 Untrusted third parties (such as receivers that retransmit the data) may 645 be used so long as the data authentication mechanisms take this into 646 account. Typically this means that the original sender digitally signs 647 and timestamps the data, and that the third parties resend this signed 648 timestamped payload unmodified. 650 Unlike unicast protocols, denial-of-service attacks on multicast trans- 651 port state are easy if the protocol design does not take such attacks 652 into account. This is because any receiver can join the session, and 653 can then produce feedback that influences the progress of a session 654 involving many other receivers. Hence protection against denial-of-ser- 655 vice attacks on reliable multicast protocols must be carefully consid- 656 ered. A receiver that requests retransmission of every packet, or that 657 refuses to acknowledge packets in an ACK-based protocol can potentially 658 bring a reliable multicast session to a standstill. Senders must have 659 appropriate policy to deal with such conditions, and if necessary, evict 660 the receiver from the group. A single receiver masquerading as a large 661 number of receivers may still be an issue under such circumstances with 662 protocols that support NACK-like functionality. Providing unique 663 ``keys'' to each NACKer when they first NACK using a unicast response 664 might potentially prevent such attacks. 666 Denial-of-service attacks caused by traffic flooding are however some- 667 what easier to protect against than with unicast. Unwanted senders can 668 simply be pruned from the distribution tree using the mechanisms imple- 669 mented in IGMP v3[CDT99]. 671 7. Conclusions 673 In this document we present an overview of the design space for reliable 674 multicast within the context of one-to-many bulk-data transfer. Other 675 flavors of multicast application are not considered in this document, 676 and hence the overview given should not be considered inclusive of the 677 design space for protocols that fall outside the context of one-to-many 678 bulk-data transfer. During the course of this overview, we have reaf- 679 firmed the notion that the process of reliable multicast protocol design 680 is affected by a number of factors that render the generation of a "one 681 size fits all solution" moot. These factors are then described to show 682 how an application's needs serve to constrain the set of available tech- 683 niques that may be used to create a reliable multicast protocol. We 684 examined a number of basic techniques and to show how well they can meet 685 the needs of certain types of applications. 687 This document is intended to provide guidance to the IETF community 688 regarding the standardization of reliable multicast protocols for bulk- 689 data transfer. Given the degree to which application requirements con- 690 strain reliable multicast solutions, and the diverse set of applications 691 that need to be supported, it should be clear that any standardization 692 work should take great pains to be future-proof. This would seem to 693 imply not standardizing complete reliable multicast transport protocols 694 in one pass, but rather examining the degree to which such protocols are 695 separable into functional building blocks, and standardizing these 696 blocks separately to the maxmimum degree that makes sense. Such an 697 approach allows for protocol evolution, and allows applications with new 698 constraints to be supported with maximal reuse of existing and tested 699 mechanisms. 701 8. Acknowledgments 703 This document represents an overview of the reliable multicast design 704 space. The ideas presented are not those of the authors, but are col- 705 lected from the varied presentations and discussions in the IRTF Reli- 706 able Multicast Research Group. Although they are too numerous to list 707 here, we thank everyone who has participated in these discussions for 708 their contributions. 710 9. Author's Addresses 712 Mark Handley, Sally Floyd 713 ATT Center for Internet Research at ICSI, 714 International Computer Science Institute, 715 1947 Center Street, Suite 600, 716 Berkeley, CA 94704, USA 717 mjh@aciri.org, floyd@aciri.org 719 Brian Whetten 720 Talarian Corporation, 721 333 Distel Circle, 722 Los Altos, CA 94022, USA 723 whetten@talarian.com 724 Roger Kermode 725 Motorola Australian Research Centre 726 Level 3, 12 Lord St, 727 Botany NSW 2019, 728 Australia. 729 ark008@email.mot.com 731 Lorenzo Vicisano 732 Cisco Systems, 733 170 West Tasman Dr. 734 San Jose, CA 95134, USA 735 lorenzo@cisco.com 737 10. References 739 [BC94] K. Birman, T. Clark. ``Performance of the Isis Distributed Com- 740 puting Toolkit.'' Technical Report TR-94-1432, Dept. of Computer 741 Science, Cornell University. 743 [CDT99] B. Cain, S. Deering, A. Thyagarajan, ``Internet Group Management 744 Protocol, Version 3'', Internet Draft, Work-in-progress, Feb 1999. 746 [FLST98] D. Farinacci, A. Lin, T. Speakman, and A. Tweedly, ``PGM reli- 747 able transport protocol specification,'' Internet Draft, Internet 748 Engineering Task Force, Aug. 1998. Work in progress. 750 [FJM95] S. Floyd, V. Jacobson, S. McCanne, ``A Reliable Multicast Frame- 751 work for Light-weight Sessions and Application Level Framing'', 752 Proc ACM SIGCOMM 95, Aug 1995 pp. 342-356. 754 [Ha99] M. Handley, ``Multicast address allocation protocol (AAP),'' 755 Internet Draft, Internet Engineering Task Force, Jun 1999. Work in 756 progress. 758 [HC97] M. Handley and J. Crowcroft, ``Network text editor (NTE) a scal- 759 able shared text editor for MBone,'' ACM Computer Communication 760 Review, vol. 27, pp. 197-208, Oct. 1997. ACM SIGCOMM'97, Sept. 761 1997. 763 [KCW98] M. Kadansky, D. Chiu, and J. Wesley, ``Tree-based reliable mul- 764 ticast (TRAM),'' Internet Draft, Internet Engineering Task Force, 765 Nov. 1998. Work in progress. 767 [MWB+98] T. Montgomery, B. Whetten, M. Basavaiah, S. Paul, N. Rastogi, 768 J. Conlan, and T. Yeh, ``THE RMTP-II PROTOCOL,'' Internet Draft, 769 Internet Engineering Task Force, Apr. 1998. Work in progress 771 [PPV98] C. Papadopoulos, G. Parulkar, and G. Varghese, ``An error con- 772 trol scheme for large-scale multicast applications,'' in Proceed- 773 ings of the Conference on Computer Communications (IEEE Infocom), 774 (San Francisco, California), p. 1188, March/April 1998. 776 [Ri97] L. Rizzo, ``Effective erasure codes for reliable computer commu- 777 nication protocols,'' ACM Computer Communication Review, vol. 27, 778 pp. 24-36, Apr. 1997. 780 [RV97] L. Rizzo, L. Vicisano, ``A Reliable Multicast data Distribution 781 Protocol based on software FEC techniques'', Proc. of The Fourth 782 IEEE Workshop on the Architecture and Implementation of High Per- 783 formance Communication Systems (HPCS'97), Sani Beach, Chalkidiki, 784 Greece June 23-25, 1997. 786 [RMWT98] K. Robertson, K. Miller, M. White, and A. Tweedly, ``StarBurst 787 multicast file transfer protocol (MFTP) specification,'' Internet 788 Draft, Internet Engineering Task Force, Apr. 1998. Work in 789 progress. 791 [WHA98] D. Wallner, E. Hardler, R. Agee, ``Key Management for Multi- 792 cast: Issues and Architectures'', Internet Draft, Work- 793 in-progress, Sept 1998. 795 [WKM94] Brian Whetten, Simon Kaplan, and Todd Montgomery, ``A high per- 796 formance totally ordered multicast protocol,'' research memorandum, 797 Aug. 1994. 799 [WGL97] C.K. Wong, M. Gouda, S. Lam, ``Secure Group Communications Using 800 Key Graphs,'' Technical Report TR 97-23, Department of Computer 801 Sciences, The University of Texas at Austin, July 1997. 803 11. Full Copyright Notice 805 Copyright (C) The Internet Society (1998). All Rights Reserved. 807 This document and translations of it may be copied and furnished to oth- 808 ers, and derivative works that comment on or otherwise explain it or 809 assist in its implementation may be prepared, copied, published and dis- 810 tributed, in whole or in part, without restriction of any kind, provided 811 that the above copyright notice and this paragraph are included on all 812 such copies and derivative works. However, this document itself may not 813 be modified in any way, such as by removing the copyright notice or ref- 814 erences to the Internet Society or other Internet organizations, except 815 as needed for the purpose of developing Internet standards in which case 816 the procedures for copyrights defined in the Internet languages other 817 than English. 819 The limited permissions granted above are perpetual and will not be 820 revoked by the Internet Society or its successors or assigns. 822 This document and the information contained herein is provided on an "AS 823 IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK 824 FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT 825 LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT 826 INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FIT- 827 NESS FOR A PARTICULAR PURPOSE."