idnits 2.17.1 draft-ietf-rmt-bb-norm-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1540. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 1532), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 29. ** The document claims conformance with section 10 of RFC 2026, but uses some RFC 3978/3979 boilerplate. As RFC 3978/3979 replaces section 10 of RFC 2026, you should not claim conformance with it if you have changed to using RFC 3978/3979 boilerplate. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78 -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 30 longer pages, the longest (page 1) being 63 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 2 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 168 has weird spacing: '...ability to ca...' == Line 190 has weird spacing: '...ng upon appli...' == Line 191 has weird spacing: '...eliable multi...' == Line 192 has weird spacing: '... and/or poten...' == Line 207 has weird spacing: '... in the group...' == (18 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2004) is 7222 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 3452 (ref. '9') (Obsoleted by RFC 5052, RFC 5445) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. '20') (Obsoleted by RFC 4301) Summary: 13 errors (**), 0 flaws (~~), 9 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RMT Working Group B. Adamson/NRL 3 INTERNET-DRAFT C. Bormann/Tellique 4 draft-ietf-rmt-bb-norm-09 M. Handley/ACIRI 5 Expires: January 2005 J. Macker/NRL 6 July 2004 8 NACK-Oriented Reliable Multicast (NORM) Building Blocks 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with all 13 provisions of Section 10 of RFC2026. By submitting this Internet-Draft, 14 we certify that any applicable patent or other IPR claims of which we 15 are aware have been disclosed, and any of which we become aware will be 16 disclosed, in accordance with RFC 3668. 18 Internet-Drafts are working documents of the Internet Engineering Task 19 Force (IETF), its areas, and its working groups. Note that other groups 20 may also distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference material 25 or to cite them other than as "work in progress." 27 Copyright Notice 29 Copyright (C) The Internet Society (2004). All Rights Reserved. 31 Abstract 33 This document discusses the creation the of negative-acknowledgment 34 (NACK)-oriented reliable multicast (NORM) protocols. The rationale for 35 NORM goals and assumptions are presented. Technical challenges for 36 NACK-oriented (and in some cases general) reliable multicast protocol 37 operation are identified. These goals and challenges are resolved into 38 a set of functional "building blocks" that address different aspects of 39 NORM protocol operation. It is anticipated that these building blocks 40 will be useful in generating different instantiations of reliable 41 multicast protocols. 43 Table of Contents 45 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . 2 46 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 47 2.1. Delivery Service Model . . . . . . . . . . . . . . . . . . . . 4 48 2.2. Group Membership Dynamics. . . . . . . . . . . . . . . . . . . 4 49 2.3. Sender/Receiver Relationships. . . . . . . . . . . . . . . . . 4 50 2.4. Group Size Scalability . . . . . . . . . . . . . . . . . . . . 5 51 2.5. Data Delivery Performance. . . . . . . . . . . . . . . . . . . 5 52 2.6. Network Environments . . . . . . . . . . . . . . . . . . . . . 5 53 2.7. Router/Intermediate System Assistance. . . . . . . . . . . . . 6 54 3. Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . 6 55 3.1. NORM Sender Transmission . . . . . . . . . . . . . . . . . . . 7 56 3.2. NORM Repair Process. . . . . . . . . . . . . . . . . . . . . . 9 57 3.2.1. Receiver NACK Process Initiation. . . . . . . . . . . . . . 9 58 3.2.2. NACK Suppression. . . . . . . . . . . . . . . . . . . . . . 10 59 3.2.3. NACK Content. . . . . . . . . . . . . . . . . . . . . . . . 14 60 3.2.3.1. NACK and FEC Repair Strategies . . . . . . . . . . . . . 14 61 3.2.3.2. NACK Content Format. . . . . . . . . . . . . . . . . . . 16 62 3.2.4. Sender Repair Response. . . . . . . . . . . . . . . . . . . 18 63 3.3. NORM Receiver Join Policies and Procedures . . . . . . . . . . 20 64 3.4. Reliable Multicast Member Identification . . . . . . . . . . . 20 65 3.5. Data Content Identification. . . . . . . . . . . . . . . . . . 20 66 3.6. Forward Error Correction (FEC) . . . . . . . . . . . . . . . . 22 67 3.7. Round-trip Timing Collection . . . . . . . . . . . . . . . . . 23 68 3.7.1. One-to-Many Sender GRTT Measurement . . . . . . . . . . . . 23 69 3.7.2. One-to-Many Receiver RTT Measurement. . . . . . . . . . . . 25 70 3.7.3. Many-to-Many RTT Measurement. . . . . . . . . . . . . . . . 25 71 3.7.4. Sender GRTT Advertisement . . . . . . . . . . . . . . . . . 25 72 3.8. Group Size Determination/Estimation. . . . . . . . . . . . . . 26 73 3.9. Congestion Control Operation . . . . . . . . . . . . . . . . . 27 74 3.10. Router/Intermediate System Assistance . . . . . . . . . . . . 27 75 3.11. NORM Applicability. . . . . . . . . . . . . . . . . . . . . . 27 76 4. Security Considerations . . . . . . . . . . . . . . . . . . . . . 28 77 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . . 28 78 6. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . 28 79 7. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 80 7.1. Normative References . . . . . . . . . . . . . . . . . . . . . 28 81 7.2. Informative References . . . . . . . . . . . . . . . . . . . . 28 82 8. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . . . 30 83 9. Full Copyright Statement. . . . . . . . . . . . . . . . . . . . . 30 85 1. Introduction 87 Reliable multicast transport is a desirable technology for the efficient 88 and reliable distribution of data to a group on the Internet. The 89 complexities of group communication paradigms necessitate different 90 protocol types and instantiations to meet the range of performance and 91 scalability requirements of different potential reliable multicast 92 applications and users [3]. This document addresses the creation of 93 negative-acknowledgment (NACK)-oriented reliable multicast (NORM) 94 protocols. While different protocol instantiations may be required to 95 meet specific application and network architecture demands [4], there 96 are a number of fundamental components that may be common to these 97 different instantiations. This document describes the framework and 98 common "building block" components relevant to multicast protocols based 99 primarily on NACK operation for reliable transport. While this document 100 discusses a large set of reliable multicast components and issues 101 relevant to NORM protocol design, it specifically addresses in detail 102 the following building blocks which are not addressed in other IETF 103 documents: 105 1) NORM sender transmission strategies, 107 2) NACK-oriented repair process with timer-based feedback 108 suppression, and 110 3) Round-trip timing for adapting NORM timers. 112 The potential relationships to other reliable multicast transport 113 building blocks (Forward Error Correction (FEC), congestion control) and 114 general issues with NORM protocols are also discussed. This document is 115 a product of the IETF RMT WG and follows the guidelines provided in RFC 116 3269 [5]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 117 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 118 "OPTIONAL" in this document are to be interpreted as described in BCP 119 14, RFC 2119 [1]. 121 2. Rationale 123 Each potential protocol instantiation using the building blocks 124 presented here (and in other applicable building block documents) will 125 have specific criteria that may influence individual protocol design. 126 To support the development of applicable building blocks, it is useful 127 to identify and summarize driving general protocol design goals and 128 assumptions. These are areas that each protocol instantiation will need 129 to address in detail. Each building block description in this document 130 will include a discussion of the impact of these design criteria. 131 The categories of design criteria considered here include: 133 1) Delivery Service Model, 134 2) Group Membership Dynamics, 135 3) Sender/receiver relationships, 136 4) Group Size Scalability, 137 5) Data Delivery Performance, 138 6) Network Environments, and 139 7) Router/Intermediate System Interactions. 141 All of these areas are at least briefly discussed. Additionally, other 142 reliable multicast transport building block documents such as have been 143 created to address areas outside of the scope of this document. NORM 144 protocol instantiations may depend upon these other building blocks as 145 well as the ones presented here. This document focuses on areas that 146 are unique to NORM but may be used in concert with the other building 147 block areas. In some cases, a building block may be able address a wide 148 range of assumptions, while in other cases there will be trade-offs 149 required to meet different application needs or operating environments. 150 Where necessary, building block features are designed to be parametric 151 to meet different requirements. Of course, an underlying goal will be 152 to minimize design complexity and to at least recommend default values 153 for any such parameters that meet a general purpose "bulk data transfer" 154 requirement in a typical Internet environment. 156 2.1. Delivery Service Model 158 The implicit goal of a reliable multicast transport protocol is the 159 reliable delivery of data among a group of members communicating using 160 IP multicast datagram service. However, the specific service the 161 application is attempting to provide can impact design decisions. A 162 most basic service model for reliable multicast transport is that of 163 "bulk transfer" which is a primary focus of this and other related RMT 164 working group documents. However, the same principals in protocol 165 design may also be applied to other services models (e.g. more 166 interactive exchanges of small messages such as with white-boarding or 167 text chat. Within these different models there are issues such as the 168 sender's ability to cache transmitted data (or state referencing it) 169 for retransmission or repair. The needs for ordering and/or causality 170 in the sequence of transmissions and receptions among members in the 171 group may be different depending upon data content. The group 172 communication paradigm differs significantly from the point-to-point 173 model in that, depending upon the data content type, some receivers may 174 complete reception of a portion of data content and be able to act upon 175 it before other members have received the content. This may be 176 acceptable (or even desirable) for some applications but not for others. 177 These varying requirements drive the need for a number of different 178 protocol instantiation designs. A significant challenge in developing 179 generally useful building block mechanisms is accommodating even a 180 limited range of these capabilities without defining specific 181 application-level details. 183 2.2. Group Membership Dynamics 185 One area where group communication can differ from point-to-point 186 communications is that even if the composition of the group changes, the 187 "thread" of communication can still exist. This contrasts with the 188 point-to-point communication model where, if either of the two parties 189 leave, the communication process (exchange of data) is terminated (or at 190 least paused). Depending upon application goals, senders and receivers 191 participating in a reliable multicast transport "session" may be able 192 to join late, leave, and/or potentially rejoin while the ongoing group 193 communication "thread" still remains functional and useful. Also note 194 that this can impact protocol message content. If "late joiners" are 195 supported, some amount of additional information may be placed in 196 message headers to accommodate this functionality. Alternatively, the 197 information may be sent in its own message (on demand or intermittently) 198 if the impact to the overhead of typical message transmissions is deemed 199 too great. Group dynamics can also impact other protocol mechanisms 200 such as NACK timing, congestion control operation, etc. 202 2.3. Sender/Receiver Relationships 204 The relationship of senders and receivers among group members requires 205 consideration. In some applications, there may be a single sender 206 multicasting to a group of receivers. In other cases, there may be more 207 one sender or the potential for everyone in the group to be a sender 208 _and_ receiver of data may exist. 210 2.4. Group Size Scalability 212 Native IP multicast [2] may scale to extremely large group sizes. It 213 may be desirable for some applications to scale along with the multicast 214 infrastructure's ability to scale. In its simplest form, there are 215 limits to the group size to which a NACK-oriented protocol can apply 216 without NACK implosion problems. Research suggests that NORM group 217 sizes on the order of tens of thousands of receivers may operate with 218 modest feedback to the sender using probabilistic, timer-based 219 suppression techniques [7]. However, the potential for router 220 assistance and/or other NACK suppression heuristics may enable these 221 protocols to scale to very large group sizes. In large scale cases, it 222 may be prohibitive for members to maintain state on all other members 223 (in particular, other receivers) in the group. The impact of group size 224 needs to be considered in the development of applicable building blocks. 226 2.5. Data Delivery Performance 228 There is a trade-off between scalability and data delivery latency when 229 designing NACK-oriented protocols. If probabilistic, timer-based NACK 230 suppression is to be used, there will be some delays built into the NACK 231 process to allow suppression to occur and for the sender of data to 232 identify appropriate content for efficient repair transmission. For 233 example, backoff timeouts can be used to ensure efficient NACK 234 suppression and repair transmission, but this comes at a cost of 235 increased delivery latency and increased buffering requirements for both 236 senders and receivers. The building blocks SHOULD allow applications to 237 establish bounds for data delivery performance. Note that application 238 designers must be aware of the scalability trade-off that is made when 239 such bounds are applied. 241 2.6. Network Environments 243 The Internet Protocol has historically assumed a role of providing 244 service across heterogeneous network topologies. It is desirable that a 245 reliable multicast protocol be capable of effectively operating across a 246 wide range of the networks to which general purpose IP service applies. 247 The bandwidth available on the links between the members of a single 248 group today may vary between low numbers of kbit/s for wireless links 249 and multiple Gbit/s for high speed LAN connections, with varying degrees 250 of contention from other flows. Recently, a number of asymmetric 251 network services including 56K/ADSL modems, CATV Internet service, 252 satellite and other wireless communication services have begun to 253 proliferate. Many of these are inherently broadcast media with 254 potentially large "fan-out" to which IP multicast service is highly 255 applicable. Additionally, policy and/or technical issues may result in 256 topologies where multicast connectivity is limited to a single source 257 (SSM) model from a specific source [8]. Receivers in the group may be 258 restricted to unicast feedback for NACKs and other messages. 259 Consideration must be given, in building block development and protocol 260 design, to the nature of the underlying networks. 262 2.7. Router/Intermediate System Assistance 264 While intermediate assistance from devices/systems with direct knowledge 265 of the underlying network topology may be used to leverage the 266 performance and scalability of reliable multicast protocols, there will 267 continue to be a number of instances where this is not available or 268 practical. Any building block components for NACK-oriented reliable 269 multicast SHALL be capable of operating without such assistance. 270 However, it is RECOMMENDED that such protocols also be consider 271 utilizing these features when available. 273 3. Functionality 275 The previous section has presented the role of protocol building blocks 276 and some of the criteria that may affect NORM building block 277 identification/design. This section describes different building block 278 areas applicable to NORM protocols. Some of these areas are specific to 279 NACK-oriented protocols. Detailed descriptions of such areas are 280 provided. In other cases, the areas (e.g., node identifiers, forward 281 error correction (FEC), etc) may be applicable to other forms of 282 reliable multicast. In those cases, the discussion below describes 283 requirements placed on those other general building block areas from the 284 standpoint of NACK-oriented reliable multicast. Where applicable, other 285 building block documents are referenced for possible contribution to 286 NORM protocols. 288 For each building block, a notional "interface description" is provided 289 to illustrate any dependencies of one building block component upon 290 another or upon other protocol parameters. A building block component 291 may require some form of "input" from another building block component 292 or other source to perform its function. Any "inputs" required by a 293 building block component and/or any resultant "output" provided will be 294 defined and described in each building block component's interface 295 description. Note that the set of building blocks presented here do not 296 fully satisfy each other's "input" and "output" needs. In some cases, 297 "inputs" for the building blocks here must come from other building 298 blocks external to this document (e.g., congestion control or FEC). In 299 other cases NORM building block "inputs" must be satisfied by the 300 specific protocol instantiation or implementation (e.g., application 301 data and control). 303 The following building block components relevant to NORM are identified: 305 (NORM-Specific) 306 1) NORM Sender Transmission 307 2) NORM Repair Process 308 3) NORM Receiver Join Policies 309 (General Purpose) 310 4) Node (member) Identification 311 5) Data Content Identification 312 6) Forward Error Correction (FEC) 313 7) Round-trip Timing Collection 314 8) Group Size Determination/Estimation 315 9) Congestion Control Operation 316 10) Router/Intermediate System Assistance 317 11) Ancillary Protocol Mechanisms 319 Figure 1 provides a pictorial overview of these building block areas and 320 some of their relationships. For example, the content of the data 321 messages that sender initially transmits depends upon the "Node 322 Identification", "Data Content Identification", and "FEC" components 323 while the rate of message transmission will generally depend upon the 324 "Congestion Control" component. Subsequently, the receivers' response 325 to these transmissions (e.g., NACKing for repair) will depend upon the 326 data message content and inputs from other building block components. 327 Finally, the sender's processing of receiver responses will feed back 328 into its transmission strategy. 330 The components on the left side of this figure are areas that may be 331 applicable beyond NORM. The most significant of these components are 332 discussed in other building block documents such as [9]. A brief 333 description of these areas and their role in the NORM protocol is given 334 below. The components on the right are seen as specific to NORM 335 protocols, most notably the NACK repair process. These areas are 336 discussed in detail below. Some other components (e.g., "Security") 337 impact many aspects of the protocol, and others such as "Router 338 Assistance" may be more transparent to the core protocol processing. 339 The sections below describe the "NORM Sender Transmission", "NORM Repair 340 Process", and "RTT Collection" building blocks in detail. The 341 relationships to and among the other building block areas are also 342 discussed, focusing on issues applicable to NORM protocol design. Where 343 applicable, specific technical recommendations are made for mechanisms 344 that will properly satisfy the goals of NORM transport for the Internet. 346 3.1. NORM Sender Transmission 348 NORM senders will transmit data content to the multicast session. The 349 data content will be application dependent. The sender will transmit 350 data content at a rate, and with message sizes, determined by 351 application and/or network architecture requirements. Any FEC encoding 352 of sender transmissions SHOULD conform with the guidelines of [9]. 353 When congestion control mechanisms are needed (REQUIRED for general 354 Internet operation), NORM transmission SHALL be controlled by the 355 congestion control mechanism. In any case, it is RECOMMENDED that all 356 data transmissions from NORM senders be subject to rate limitations 357 determined by the application or congestion control algorithm. The 358 sender's transmissions SHOULD make good utilization of the available 359 capacity (which may be limited by the application and/or by congestion 360 control). As a result, it is expected there will be overlap and 361 multiplexing of new data content transmission with repair content. 362 Other factors related to application operation may determine sender 363 transmission formats and methods. For example, some consideration needs 364 to be given to the sender's behavior during intermittent idle periods 365 when it has no data to transmit. 367 In addition to data content, other sender messages or commands may be 368 employed as part of protocol operation. These messages may occur 369 outside of the scope of application data transfer. In NORM protocols, 370 reliability of such protocol messages may be attempted by redundant 371 transmission when positive acknowledgement is prohibitive due to group 372 size scalability concerns. Note that protocol design SHOULD provide 373 mechanisms for dealing with cases where such messages are not received 374 by the group. As an example, a command message might be redundantly 375 transmitted by a sender to indicate that it is temporarily (or 376 permanently) halting transmission. At this time, it may be appropriate 377 for receivers to respond with NACKs for any outstanding repairs they 378 require following the rules of the NORM NACK procedure. For efficiency, 379 the sender should allow sufficient time between the redundant 380 transmissions to receive any NACK-oriented responses from the receivers 381 to this command. 383 In general, when there is any resultant NACK or other feedback 384 operation, the timing of redundant transmission of control messages 385 issued by a sender and other NORM protocol timeouts should be dependent 386 upon the group greatest round trip timing (GRTT) estimate and any 387 expected resultant NACK or other feedback operation. The NORM GRTT is 388 an estimate of the worst-case round-trip timing from a sender to any 389 receivers in the group. It is assumed that the GRTT interval is a 390 conservative estimate of the maximum span (with respect to delay) of the 391 multicast group across a network topology with respect to given sender. 392 NORM instantiations SHOULD be able to dynamically adapt to a wide range 393 of multicast network topologies. 395 Sender Transmission Interface Description 397 Inputs: 399 1) Application data and control 400 2) Sender node identifier 401 3) Data identifiers 402 4) Segmentation and FEC parameters 403 5) Transmission rate 404 6) Application controls 405 7) Receiver feedback messages (e.g., NACKs) 407 Outputs: 409 1) Controlled transmission of messages with headers uniquely 410 identifying data or repair content within the context of the 411 NORM session. 412 2) Commands indicating sender's status or other transport control 413 actions to be taken. 415 3.2. NORM Repair Process 417 A critical component of NORM protocols is the NACK repair process. This 418 includes the receiver's role in detecting and requesting repair needs, 419 and the sender's response to such requests. There are four primary 420 elements of the NORM repair process: 422 1) Receiver NACK process initiation, 424 3) NACK suppression, 426 2) NACK message content, 428 4) Sender NACK processing and response. 430 3.2.1. Receiver NACK Process Initiation 432 The NORM NACK process (cycle) will be initiated by receivers that detect 433 a need for repair transmissions from a specific sender to achieve 434 reliable reception. When FEC is applied, a receiver should initiate the 435 NACK process only when it is known its repair requirements exceed the 436 amount of pending FEC transmission for a given coding block of data 437 content. This can be determined at the end of the current transmission 438 block (if it is indicated) or upon the start of reception of a 439 subsequent coding block or transmission object. This implies the NORM 440 data content is marked to identify its FEC block number and that ordinal 441 relationship is preserved in order of transmission. 443 Alternatively, if the sender's transmission advertises the quantity of 444 repair packets it is already planning to send for a block, the receiver 445 may be able to initiate the NACK processor earlier. Allowing receivers 446 to initiate NACK cycles at any time they detect their repair needs have 447 exceeded pending repair transmissions may result in slightly quicker 448 repair cycles. However, it may be useful to limit NACK process 449 initiation to specific events such as at the end-of-transmission of an 450 FEC coding block or upon detection of subsequent coding blocks. This 451 can allow receivers to aggregate NACK content into a smaller number of 452 NACK messages and provide some implicit loose synchronization among the 453 receiver set to help facilitate effective probabilistic suppression of 454 NACK feedback. The receiver MUST maintain a history of data content 455 received from the sender to determine its current repair needs. When 456 FEC is employed, it is expected that the history will correspond to a 457 record of pending or partially-received coding blocks. 459 For probabilistic, timer-base suppression of feedback, the NACK cycle 460 should begin with receivers observing backoff timeouts. In conjunction 461 with initiating this backoff timeout, it is important that the receivers 462 record the current position in the sender's transmission sequence at 463 which they initiate the NACK cycle. When the suppression backoff 464 timeout expires, the receivers should only consider their repair needs 465 up to this recorded transmission position in making the decision to 466 transmit or suppress a NACK. Without this restriction, suppression is 467 greatly reduced as additional content is received from the sender during 468 the time a NACK message propagates across the network to the sender and 469 other receivers. 471 Receiver NACK Process Initiation Interface Description 473 Inputs: 475 1) Sender data content with sequencing identifiers from sender 476 transmissions. 478 2) History of content received from sender. 480 Outputs: 482 1) NACK process initiation decision 483 2) Recorded sender transmission sequence position. 485 3.2.2. NACK Suppression 487 An effective NORM feedback suppression mechanism is the use of random 488 backoff timeouts prior to NACK transmission by receivers requiring 489 repairs [10]. Upon expiration of the backoff timeout, a receiver will 490 request repairs unless its pending repair needs have been completely 491 superseded by NACK messages heard from other receivers (when receivers 492 are multicasting NACKs) or from some indicator from the sender. When 493 receivers are unicasting NACK messages, the sender may facilitate NACK 494 suppression by forwarding a representation of NACK content it has 495 received to the group at large or provide some other indicator of the 496 repair information it will be subsequently transmitting. 498 For effective and scalable suppression performance, the backoff timeout 499 periods used by receivers should be independently, randomly picked by 500 receivers with a truncated exponential distribution [6]. This results 501 in the majority of the receiver set holding off transmission of NACK 502 messages under the assumption that the smaller number of "early 503 NACKers" will supersede the repair needs of the remainder of the group. 504 The mean of the distribution should be determined as a function of the 505 current estimate of sender<->group GRTT and a group size estimate that 506 is determined by other mechanisms within the protocol or preset by the 507 multicast application. 509 A simple algorithm can be constructed to generate random backoff 510 timeouts with the appropriate distribution. Additionally, the algorithm 511 may be designed to optimize the backoff distribution given the number of 512 receivers (R) potentially generating feedback. This "optimization" 513 minimizes the number of feedback messages (e.g., NACK) in the worst-case 514 situation where all receivers generate a NACK. The maximum backoff 515 timeout (T_maxBackoff) can be set to control reliable delivery latency 516 versus volume of feedback traffic. A larger value of T_maxBackoff will 517 result in a lower density of feedback traffic for a given repair cycle. 518 A smaller value of T_maxBackoff results in shorter latency which also 519 reduces the buffering requirements of senders and receivers for reliable 520 transport. 522 Given the receiver group size (R), and maximum allowed backoff timeout 523 (T_maxBackoff), random backoff timeouts (t') with a truncated 524 exponential distribution can be picked with the following algorithm: 526 1) Establish an optimal mean (L) for the exponential backoff based on 527 the group size: 529 L = ln(R) + 1 531 2) Pick a random number (x) from a uniform distribution over a range 532 of: 534 L L L 535 -------------------- to -------------------- + ---------- 536 T_maxBackoff*(exp(L)-1) T_maxBackoff*(exp(L)-1) T_maxBackoff 538 3) Transform this random variate to generate the desired random 539 backoff time (t') with the following equation: 541 t' = T_maxBackoff/L * ln(x * (exp(L) - 1) * (T_maxBackoff/L)) 543 This C language function can be used to generate an appropriate random 544 backoff time interval: 546 double RandomBackoff(double maxTime, double groupSize) 547 { 548 double lambda = log(groupSize) + 1; 549 double x = UniformRand(lambda/maxTime) + 550 lambda / (maxTime*(exp(lambda)-1)); 551 return ((maxTime/lambda) * 552 log(x*(exp(lambda)-1)*(maxTime/lambda))); 553 } // end RandomBackoff() 555 where UniformRand(double max) returns random numbers with a uniform 556 distribution from the range of 0..max. For example, based on the POSIX 557 "rand()" function, the following C code can be used: 559 double UniformRand(double max) 560 { 561 return (max * ((double)rand()/(double)RAND_MAX)); 562 } 564 The number of expected NACK messages generated (N) within the first 565 round trip time for a single feedback event is approximately: 567 N = exp(1.2 * L / (2*T_maxBackoff/GRTT)) 569 Thus the maximum backoff time can be adjusted to tradeoff worst-case 570 NACK feedback volume versus latency. This is derived from [6] and 571 assumes T_maxBackoff >= GRTT, and L is the mean of the distribution 572 optimized for the given group size as shown in the algorithm above. 573 Note that other mechanisms within the protocol may work to reduce 574 redundant NACK generation further. It is suggested that T_maxBackoff be 575 selected as an integer multiple of the sender's current advertised GRTT 576 estimate such that: 578 T_maxBackoff = K * GRTT ;where K >= 1 580 For general Internet operation, a default value of K=4 is RECOMMENDED 581 for operation with multicast (to the group at large) NACK delivery and a 582 value of K=6 for unicast NACK delivery. Alternate values may be used to 583 for buffer utilization, reliable delivery latency and group size 584 scalability tradeoffs. 586 Given that (K*GRTT) is the maximum backoff time used by the receivers to 587 initiate NACK transmission, other timeout periods related to the NACK 588 repair process can be scaled accordingly. One of those timeouts is the 589 amount of time a receiver should wait after generating a NACK message 590 before allowing itself to initiate another NACK backoff/transmission 591 cycle (T_rcvrHoldoff). This delay should be sufficient for the sender 592 to respond to the received NACK with repair messages. An appropriate 593 value depends upon the amount of time for the NACK to reach the sender 594 and the sender to provide a repair response. This MUST include any 595 amount of sender NACK aggregation period during which possible multiple 596 NACKs are accumulated to determine an efficient repair response. These 597 timeouts are further discussed in the section below on "Sender NACK 598 Processing and Repair Response". 600 There are also secondary measures that can be applied to improve the 601 performance of feedback suppression. For example, the sender's data 602 content transmissions can follow an ordinal sequence of transmission. 603 When repairs for data content occur, the receiver can note that the 604 sender has "rewound" its data content transmission position by observing 605 the data object, FEC block number, and FEC symbol identifiers. Receivers 606 SHOULD limit transmission of NACKs to only when the sender's current 607 transmission position exceeds the point to which the receiver has 608 incomplete reception. This reduces premature requests for repair of data 609 the sender may be planning to provide in response to other receiver 610 requests. This mechanism can be very effective for protocol convergence 611 in high loss conditions when transmissions of NACKs from other receivers 612 (or indicators from the sender) are lost. Another mechanism 613 (particularly applicable when FEC is used) is for the sender to embed an 614 indication of impending repair transmissions in current packets sent. 615 For example, the indication may be as simple as an advertisement of the 616 number of FEC packets to be sent for the current applicable coding 617 block. 619 Finally, some consideration might be given to using the NACKing history 620 of receivers to weight their selection of NACK backoff timeout 621 intervals. For example, if a receiver has historically been 622 experiencing the greatest degree of loss, it may promote itself to 623 statistically NACK sooner than other receivers. Note this requires 624 there is correlation over successive intervals of time in the loss 625 experienced by a receiver. Such correlation MAY not be present in 626 multicast networks. This adjustment of backoff timeout selection may 627 require the creation of an "early NACK" slot for these historical 628 NACKers. This additional slot in the NACK backoff window will result in 629 a longer repair cycle process that may not be desirable for some 630 applications. The resolution of these trade-offs may be dependent upon 631 the protocol's target application set or network. 633 After the random backoff timeout has expired, the receiver will make a 634 decision on whether to generate a NACK repair request or not (i.e., it 635 has been suppressed). The NACK will be suppressed when any of the 636 following conditions has occurred: 638 1) The accumulated state of NACKs heard from other receivers (or 639 forwarding of this state by the sender) is equal to or supersedes 640 the repair needs of the local receiver. Note that the local 641 receiver should consider its repair needs only up to the sender 642 transmission position recorded at the NACK cycle initiation (when 643 the backoff timer was activated). 645 2) The sender's data content transmission position "rewinds" to a 646 point ordinally less than that of the lowest sequence position of 647 the local receiver's repair needs. (This detection of sender 648 "rewind" indicates the sender has already responded to other 649 receiver repair needs of which the local receiver may not have been 650 aware). This "rewind" event can occur any time between 1) when the 651 NACK cycle was initiated with the backoff timeout activation and 2) 652 the current moment when the backoff timeout has expired to suppress 653 the NACK. Another NACK cycle must be initiated by the receiver 654 when the sender's transmission sequence position exceeds the 655 receiver's lowest ordinal repair point. Note it is possible that 656 the local receiver may have had its repair needs satisfied as a 657 result of the sender's response to the repair needs of other 658 receivers and no further NACKing is required. 660 If these conditions have not occurred and the receiver still has pending 661 repair needs, a NACK message is generated and transmitted. The NACK 662 should consist of an accumulation of repair needs from the receiver's 663 lowest ordinal repair point up to the current sender transmission 664 sequence position. A single NACK message should be generated and the 665 NACK message content should be truncated if it exceeds the payload size 666 of single protocol message. When such NACK payload limits occur, the 667 NACK content SHOULD contain requests for the ordinally lowest repair 668 content needed from the sender. 670 NACK Suppression Interface Description 672 Inputs: 674 1) NACK process initiation decision. 675 2) Recorded sender transmission sequence position. 676 3) Sender GRTT. 677 4) Sender group size estimate. 678 5) Application-defined bound on backoff timeout period. 679 6) NACKs from other receivers. 680 7) Pending repair indication from sender (may be forwarded 681 NACKs). 682 8) Current sender transmission sequence position. 684 Outputs: 686 1) Yes/no decision to generate NACK message upon backoff timer 687 expiration. 689 3.2.3. NACK Content 691 The content of NACK messages generated by reliable multicast receivers 692 will include information detailing their current repair needs. The 693 specific information depends on the use and type of FEC in the NORM 694 repair process. The identification of repair needs is dependent upon 695 the data content identification (See Section 3.5 below). At the highest 696 level the NACK content will identify the sender to which the NACK is 697 addressed and the data transport object (or stream) within the sender's 698 transmission that needs repair. For the indicated transport entity, the 699 NACK content will then identify the specific FEC coding blocks and/or 700 symbols it requires to reconstruct the complete transmitted data. This 701 content may consist of FEC block erasure counts and/or explicit 702 indication of missing blocks or symbols (segments) of data and FEC 703 content. It should also be noted that NORM can be effectively 704 instantiated without a requirement for reliable NACK delivery using the 705 techniques discussed here. 707 3.2.3.1. NACK and FEC Repair Strategies 709 Where FEC-based repair is used, the NACK message content will minimally 710 need to identify the coding block(s) for which repair is needed and a 711 count of erasures (missing packets) for the coding block. An exact 712 count of erasures implies the FEC algorithm is capable of repairing 713 _any_ loss combination within the coding block. This count may need to 714 be adjusted for some FEC algorithms. Considering that multiple repair 715 rounds may be required to successfully complete repair, and erasure 716 count also implies that the quantity of unique FEC parity packets the 717 server has available to transmit is essentially unlimited (i.e., the 718 server will always be able to provide new, unique, previously unsent 719 parity packets in response to any subsequent repair requests for the 720 same coding block). Alternatively, the sender may "round-robin" 721 transmit through its available set of FEC symbols for a given coding 722 block, and eventually affect repair. For a most efficient repair 723 strategy, the NACK content will need to also _explicitly_ identify which 724 symbols (information and/or parity) the receiver requires to 725 successfully reconstruct the content of the coding block. This will be 726 particularly true of small to medium size block FEC codes (e.g., Reed 727 Solomon) that are capable of provided a limited number of parity symbols 728 per FEC coding block. 730 When FEC is not used as part of the repair process, or the protocol 731 instantiation is required to provide reliability even when the sender 732 has transmitted all available parity for a given coding block (or the 733 sender's ability to buffer transmission history is exceeded by the 734 delay*bandwidth*loss characteristics of the network topology), the NACK 735 content will need to contain _explicit_ coding block and/or segment 736 loss information so that the sender can provide appropriate repair 737 packets and/or data retransmissions. Explicit loss information in NACK 738 content may also potentially serve other purposes. For example, it may 739 be useful for decorrelating loss characteristics among a group of 740 receivers to help differentiate candidate congestion control bottlenecks 741 among the receiver set. 743 When FEC is used and NACK content is designed to contain explicit repair 744 requests, there is a strategy where the receivers can NACK for specific 745 content that will help facilitate NACK suppression and repair 746 efficiency. The assumptions for this strategy are that sender may 747 potentially exhaust its supply of new, unique parity packets available 748 for a given coding block and be required to explicitly retransmit some 749 data or parity symbols to complete reliable transfer. Another 750 assumption is that an FEC algorithm where any parity packet can fill any 751 erasure within the coding block (e.g., Reed Solomon) is used. The goal 752 of this strategy is to make maximum use of the available parity and 753 provide the minimal amount of data and repair transmissions during 754 reliable transfer of data content to the group. 756 When systematic FEC codes are used, the sender transmits the data 757 content of the coding block (and optionally some quantity of parity 758 packets) in its initial transmission. Note that a systematic FEC coding 759 block is considered to be logically made up of the contiguous set of 760 data vectors plus parity vectors for the given FEC algorithm used. For 761 example, a coding scheme that provides for 64 data symbols and 32 parity 762 symbols per coding block would contain FEC symbol identifiers in the 763 range of 0 to 95. 765 Receivers then can construct NACK messages requesting sufficient content 766 to satisfy their repair needs. For example, if the receiver has three 767 erasures in a given received coding block, it will request transmission 768 of the three lowest ordinal parity vectors in the coding block. In our 769 example coding scheme from the previous paragraph, the receiver would 770 explicitly request parity symbols 64 to 66 to fill its three erasures 771 for the coding block. Note that if the receiver's loss for the coding 772 block exceeds the available parity quantity (i.e., greater than 32 773 missing symbols in our example), the receiver will be required to 774 construct a NACK requesting all (32) of the available parity symbols 775 plus some additional portions of its missing data symbols in order to 776 reconstruct the block. If this is done consistently across the receiver 777 group, the resulting NACKs will comprise a minimal set of sender 778 transmissions to satisfy their repair needs. 780 In summary, the rule is to request the lower ordinal portion of the 781 parity content for the FEC coding block to satisfy the erasure repair 782 needs on the first NACK cycle. If the available number of parity 783 symbols is insufficient, the receiver will also request the subset of 784 ordinally highest missing data symbols to cover what the parity symbols 785 will not fill. Note this strategy assumes FEC codes such as Reed- 786 Solomon for which a single parity symbol can repair any erased symbol. 787 This strategy would need minor modification to take into account the 788 possibly limited repair capability of other FEC types. On subsequent 789 NACK repair cycles where the receiver may have received some portion of 790 its previously requested repair content, the receiver will use the same 791 strategy, but only NACK for the set of parity and/or data symbols it has 792 not yet received. Optionally, the receivers could also provide a count 793 of erasures as a convenience to the sender or intermediate systems 794 assisting NACK operation. 796 After receipt and accumulation of NACK messages during the aggregation 797 period, the sender can begin transmission of fresh (previously 798 untransmitted) parity symbols for the coding block based on the highest 799 receiver erasure count _if_ it has a sufficient quantity of parity 800 symbols that were _not_ previously transmitted. Otherwise, the sender 801 MUST resort to transmitting the explicit set of repair vectors 802 requested. With this approach, the sender needs to maintain very little 803 state on requests it has received from the group without need for 804 synchronization of repair requests from the group. Since all receivers 805 use the same consistent algorithm to express their explicit repair 806 needs, NACK suppression among receivers is simplified over the course of 807 multiple repair cycles. The receivers can simply compare NACKs heard 808 from other receivers against their own calculated repair needs to 809 determine whether they should transmit or suppress their pending NACK 810 messages. 812 3.2.3.2. NACK Content Format 814 The format of NACK content will depend on the protocol's data service 815 model and the format of data content identification the protocol uses. 816 This NACK format also depends upon the type of FEC encoding (if any) is 817 used. Figure 2 illustrates a logical, hierarchical transmission content 818 identification scheme, denoting that the notion of objects (or streams) 819 and/or FEC blocking is optional at the protocol instantiation's 820 discretion. Note that the identification of objects is with respect to 821 a given sender. It is recommended that transport data content 822 identification is done within the context of a sender in a given 823 session. Since the notion of session "streams" and "blocks" is optional, 824 the framework degenerates to that of typical transport data segmentation 825 and reassembly in its simplest form. 827 Session_ 828 \_ 829 Sender_ 830 \_ 831 [Object/Stream(s)]_ 832 \_ 833 [FEC Blocks]_ 834 \_ 835 Symbols 837 Fig. 2: NORM Data Content Identification Hierarchy 839 The format of NACK messages should meet the following goals: 841 1) Able to identify transport data unit transmissions required to 842 repair a portion of the received content, whether it is an entire 843 missing object/stream (or range), entire FEC coding block(s), or 844 sets of symbols, 846 2) Be simple to process for NACK aggregation and suppression, 848 3) Be capable of including NACKs for multiple objects, FEC coding 849 blocks and/or symbols in a single message, and 851 4) Have a reasonably compact format. 853 If the NORM transport object/stream is identified with an and 854 the FEC symbol being transmitted is identified with and , 855 the concatenation of comprises a basic 856 transport protocol data unit (TPDU) identifier for symbols from a given 857 source. NACK content can be composed of lists and/or ranges of these 858 TPDU identifiers to build up NACK messages to describe the receivers 859 repair needs. If no hierarchical object delineation or FEC blocking is 860 used, the TPDU is a simple linear representation of the data symbols 861 transmitted by the sender. When the TPDU represents a hierarchy for 862 purposes of object/stream delineation and/or FEC blocking, the NACK 863 content unit may require flags to indicate which portion of the TPDU is 864 applicable. For example, if an entire "object" (or range of objects) is 865 missing in the received data, the receiver will not necessarily know the 866 appropriate range of or for 867 which to request repair and thus requires some mechanism to request 868 repair (or retransmission) of the entire unit represented by an 869 . The same is true if entire FEC coding blocks represented by 870 one or a range of have been lost. 872 NACK Content Interface Description 874 Inputs: 876 1) Sender identification. 877 2) Sender data identification. 878 3) Sender FEC Object Transmission Information. 879 4) Recorded sender transmission sequence position. 880 5) Current sender transmission sequence position. History of 881 repair needs for this sender. 882 Outputs: 884 1) NACK message with repair requests. 886 3.2.4. Sender Repair Response 888 Upon reception of a repair request from a receiver in the group, the 889 sender will initiate a repair response procedure. The sender may wish 890 to delay transmission of repair content until it has had sufficient 891 time to accumulate potentially multiple NACKs from the receiver set. 892 This allows the sender to determine the most efficient repair strategy 893 for a given transport stream/object or FEC coding block. Depending upon 894 the approach used, some protocols may find it beneficial for the sender 895 to provide an indicator of pending repair transmissions as part of the 896 its current transmitted message content. This can aid some NACK 897 suppression mechanisms. The amount of time to perform this NACK 898 aggregation should be sufficient to allow for the maximum receiver NACK 899 backoff window ("T_maxBackoff" from Section 3.2.2) and propagation of 900 NACK messages from the receivers to the sender. Note the maximum 901 transmission delay of a message from a receiver to the sender may be 902 approximately (1*GRTT) in the case of very asymmetric network topology 903 with respect to transmission delay. Thus, if the maximum receiver NACK 904 backoff time is T_maxBackoff = K*GRTT, the sender NACK aggregation 905 period should be equal to at least: 907 T_sndrAggregate = T_maxBackoff + 1*GRTT = (K+1)*GRTT 909 Immediately after the sender NACK aggregation period, the sender will 910 begin transmitting repair content determined from the aggregate NACK 911 state and continue with any new transmission. Also, at this time, the 912 sender should observe a "holdoff" period where it constrains itself from 913 initiating a new NACK aggregation period to allow propagation of the new 914 transmission sequence position due to the repair response to the 915 receiver group. To allow for worst case asymmetry, this "holdoff" time 916 should be: 918 T_sndrHoldoff = 1*GRTT 920 Recall that the receivers will also employ a "holdoff" timeout after 921 generating a NACK message to allow time for the sender's response. 922 Given a sender plus time of 923 (K+1)*GRTT, the receivers should use holdoff timeouts of: 925 T_rcvrHoldoff = T_sndrAggregate + T_sndrHoldoff = (K+2)*GRTT 927 This allows for a worst-case propagation time of the receiver's NACK to 928 the sender, the sender's aggregation time and propagation of the 929 sender's response back to the receiver. Additionally, in the case of 930 unicast feedback from the receiver set, it may be useful for the sender 931 to forward (via multicast) a representation of its aggregated NACK 932 content to the group to allow for NACK suppression when there is not 933 multicast connectivity among the receiver set. 935 At the expiration of the timeout, the sender will 936 begin transmitting repair messages according to the accumulated content 937 of NACKs received. There are some guidelines with regards to FEC-based 938 repair and the ordering of the repair response from the sender that can 939 improve reliable multicast efficiency: 941 1) When FEC is used, it is beneficial that the sender transmit 942 previously untransmitted parity content as repair messages whenever 943 possible. This maximizes the receiving nodes' ability to 944 reconstruct the entire transmitted content from their individual 945 subsets of received messages. 947 2) The transmitted object and/or stream data and repair content should 948 be indexed with monotonically increasing sequence numbers (within 949 a reasonably large ordinal space). If the sender observes the 950 discipline of transmitting repair for the earliest content (e.g., 951 ordinally lowest FEC blocks) first, the receivers can use a 952 strategy of withholding repair requests for later content until the 953 sender once again returns to that point in the object/stream 954 transmission sequence. This can increase overall message 955 efficiency among the group and help work to keep repair cycles 956 relatively synchronized without dependence upon strict time 957 synchronization among the sender and receivers. This also helps 958 minimize the buffering requirements of receivers and senders and 959 reduces redundant transmission of data to the group at large. 961 Sender Repair Response Interface Description 963 Inputs: 965 1) Receiver NACK messages 966 2) Group timing information 968 Outputs 970 1) Repair messages (FEC and/or Data content retransmission) 971 2) Advertisement of current pending repair transmissions when 972 unicast receiver feedback is detected. 974 3.3. NORM Receiver Join Policies and Procedures 976 Consideration should be given to the policies and procedures by which 977 new receivers join a group (perhaps where reliable transmission is 978 already in progress) and begin requesting repair. If receiver joins are 979 unconstrained, the dynamics of group membership may impede the 980 application's ability to meet its goals for forward progression of data 981 transmission. Policies limiting the opportunities when receivers begin 982 participating in the NACK process may be used to achieve the desired 983 behavior. For example, it may be beneficial for receivers to attempt 984 reliable reception from a newly-heard sender only upon non-repair 985 transmissions of data in the first FEC block of an object or logical 986 portion of a stream. The sender may also implement policies limiting 987 the receivers from which it will accept NACK requests, but this may be 988 prohibitive for scalability reasons in some situations. Alternatively, 989 it may be desirable to have a looser transport synchronization policy 990 and rely upon session management mechanisms to limit group dynamics that 991 can cause poor performance , in some types of bulk transfer applications 992 (or for potential interactive reliable multicast applications). 994 Group Join Policy Interface Description 996 Inputs: 998 1) Current object/stream data/repair content and sequencing 999 identifiers from sender transmissions. 1001 Outputs: 1003 1) Receiver yes/no decision to begin receiving and NACKing for 1004 reliable reception of data 1006 3.4. Reliable Multicast Member Identification 1008 In a NORM protocol (or other multicast protocols) where there is the 1009 potential for multiple sources of data, it is necessary to provide some 1010 mechanism to uniquely identify the sources (and possibly some or all 1011 receivers in some cases) within the group. Identity based on arriving 1012 packet source addresses is insufficient for several reasons. These 1013 reasons include routing changes for hosts with multiple interfaces that 1014 result in different packet source addresses for a given host over time, 1015 network address translation (NAT) or firewall devices, or other 1016 transport/network bridging approaches. As a result, some type of unique 1017 source identifier field should be present in packets 1018 transmitted by reliable multicast session members. 1020 3.5. Data Content Identification 1022 The data and repair content transmitted by a NORM sender requires some 1023 form of identification in the protocol header fields. This 1024 identification is required to facilitate the reliable NACK-oriented 1025 repair process. These identifiers will also be used in NACK messages 1026 generated. This building block document assumes two very general types 1027 of data that may comprise bulk transfer session content. One type is 1028 static, discrete objects of finite size and the other is continuous 1029 non-finite streams. A given application may wish to reliably multicast 1030 data content using either one or both of these paradigms. While it may 1031 be possible for some applications to further generalize this model and 1032 provide mechanisms to encapsulate static objects as content embedded 1033 within a stream, there are advantages in many applications to provide 1034 distinct support for static bulk objects and messages with the context 1035 of a reliable multicast session. These applications may include content 1036 caching servers, file transfer, or collaborative tools with bulk 1037 content. Applications with requirements for these static object types 1038 can then take advantage of transport layer mechanisms (i.e., 1039 segmentation/reassembly, caching, integrated forward error correction 1040 coding, etc) rather than being required to provide their own mechanisms 1041 for these functions at the application layer. 1043 As noted, some applications may alternatively desire to transmit bulk 1044 content in the form of one or more streams of non-finite size. Example 1045 streams include continuous quasi-real-time message broadcasts (e.g., 1046 stock ticker) or some content types that are part of collaborative tools 1047 or other applications. And, as indicated above, some applications may 1048 wish to encapsulate other bulk content (e.g., files) into one or more 1049 streams within a multicast session. 1051 The components described within this building block draft document are 1052 envisioned to be applicable to both of these models with the potential 1053 for a mix of both types within a single multicast session. To support 1054 this requirement, the normal data content identification should include 1055 a field to uniquely identify the object or stream within some 1056 reasonable temporal or ordinal interval. Note that it is _not_ expected 1057 that this data content identification will be globally unique. It is 1058 expected that the object/stream identifier will be unique with respect 1059 to a given sender within the reliable multicast session and during the 1060 time that sender is supporting a specific transport instance of that 1061 object or stream. 1063 Since the "bulk" object/stream content usually requires segmentation, 1064 some form of segment identification must also be provided. This 1065 segment identifier will be relative to any object or stream identifier 1066 that has been provided. Thus, in some cases, NORM protocol 1067 instantiations may be able to receive transmissions and request repair 1068 for multiple streams and one or more sets of static objects in parallel. 1069 For protocol instantiations employing FEC the segment identification 1070 portion of the data content identifier may consist of a logical 1071 concatenation of a coding block identifier and an 1072 identifier for the specific data or parity symbol of 1073 the code block. The FEC Building Block document [9] provides a 1074 standard message format for identifying FEC transmission content. NORM 1075 protocol instantiations using FEC SHOULD follow that document's 1076 guidelines. 1078 Additionally, flags to determine the usage of the content identifier 1079 fields (e.g., stream vs. object) may be applicable. Flags may also 1080 serve other purposes in data content identification. It is expected 1081 that any flags defined will be dependent upon individual protocol 1082 instantiations. 1084 In summary, the following data content identification fields may be 1085 required for NORM protocol data content messages: 1087 1) Source node identifier () 1089 2) Object/Stream identifier (), if applicable. 1091 3) FEC Block identifier (), if applicable. 1093 4) FEC Symbol identifier () 1095 5) Flags to differentiate interpretation of identifier fields or 1096 identifier structure that implicitly indicates usage. 1098 6) Additional FEC transmission content fields per FEC Building Block 1100 These fields have been identified because any generated NACK messages 1101 will use these identifiers in requesting repair or retransmission of 1102 data. NORM protocols that use these data content fields should also be 1103 compatible with support for intermediate system assistance to reliable 1104 multicast transport operation when available. 1106 3.6. Forward Error Correction (FEC) 1108 Multiple forward error correction (FEC) approaches have been identified 1109 that can provide great performance enhancements to the repair process of 1110 NACK-oriented and other reliable multicast protocols [11], [12], [13]. 1111 NORM protocols can reap additional benefits since FEC-based repair does 1112 not _generally_ require explicit knowledge of repair content within the 1113 bounds of its coding block size (in symbols). In NORM, parity repair 1114 packets generated will generally be transmitted only in response to NACK 1115 repair requests from receiving nodes. However, there are benefits in 1116 some network environments for transmitting some predetermined quantity 1117 of FEC repair packets multiplexed with the regular data symbol 1118 transmissions [14]. This can reduce the amount of NACK traffic 1119 generated with relatively little overhead cost when group sizes are 1120 very large or the network connectivity has a large delay*bandwidth 1121 product with some nominal level of expected packet loss. While the 1122 application of FEC is not unique to NORM, these sorts of requirements 1123 may dictate the types of algorithms and protocol approaches that are 1124 applicable. 1126 A specific issue related to the use of FEC with NORM is the mechanism 1127 used to identify which portion(s) of transmitted data content to which 1128 specific FEC packets are applicable. It is expected that FEC algorithms 1129 will be based on generating a set of parity repair packets for a 1130 corresponding block of transmitted data packets. Since data content 1131 packets are uniquely identified by the concatenation of 1132 during 1133 transport, it is expected that FEC packets will be identified in a 1134 similar manner. The FEC Building Block document [9] provides detailed 1135 recommendations concerning application of FEC and standard formats for 1136 related reliable multicast protocol messages. 1138 3.7. Round-trip Timing Collection 1140 The measurement of packet propagation round-trip time (RTT) among 1141 members of the group is required to support timer-based NACK suppression 1142 algorithms, timing of sender commands or certain repair functions, and 1143 congestion control operation. The nature of the round-trip information 1144 collected is dependent upon the type of interaction among the members of 1145 the group. In the case where only "one-to-many" transmission is 1146 required, it may be that only the sender require RTT knowledge of the 1147 greatest RTT (GRTT) among the receiver set and/or RTT knowledge of only 1148 a portion of the group. Here, the GRTT information might be collected 1149 in a reasonably scalable manner. For congestion control operation, it 1150 is possible that RTT information may be required by each receiver in the 1151 group. In this case, an alternative RTT collection scheme may be 1152 utilized where receivers collect individual RTT measurements with 1153 respect to the sender and advertise them to the group or sender. Where 1154 it is likely that exchange of reliable multicast data will occur among 1155 the group on a "many-to-many" basis, there are alternative measurement 1156 techniques that might be employed for increased efficiency [15]. And 1157 in some cases, there might be absolute time synchronization among hosts 1158 that may simplify RTT measurement. There are trade-offs in multicast 1159 congestion control design that require further consideration before a 1160 universal recommendation on RTT (or GRTT) measurement can be specified. 1161 Regardless of how the RTT information is collected (and more 1162 specifically GRTT) with respect to congestion control or other 1163 requirements, the sender will need to advertise its current GRTT 1164 estimate to the group for various timeouts used by receivers. 1166 3.7.1. One-to-Many Sender GRTT Measurement 1168 The goal of this form of RTT measurement is for the sender to learn the 1169 GRTT among the receivers who are actively participating in NORM 1170 operation. The set of receivers participating in this process may be 1171 the entire group or some subset of the group determined from another 1172 mechanism within the protocol instantiation. An approach to collect 1173 this GRTT information follows. 1175 The sender periodically polls the group with a message (independent or 1176 "piggy-backed" with other transmissions) containing a 1177 timestamp relative to an internal clock at the sender. Upon reception 1178 of this message, the receivers will record this timestamp and 1179 the time (referenced to their own clocks) at which it was received 1180 . When the receiver provides feedback to the sender (either 1181 explicitly or as part of other feedback messages depending upon protocol 1182 instantiation specification), it will construct a "response" using the 1183 formula: 1185 grttResponse = sendTime + (currentTime - recvTime) 1187 where the is the timestamp from the last probe message 1188 received from the source and the ( - ) is the 1189 amount of time differential since that request was received until the 1190 receiver generated the response. 1192 The sender processes each receiver response by calculating a current RTT 1193 measurement for the receiver from whom the response was received using 1194 the following formula: 1196 RTT_rcvr = currentTime - grttResponse 1198 During the each periodic GRTT probing interval, the source keeps the 1199 peak round trip timing measurement (RTT_peak) from the set of responses 1200 it has received. A conservative estimate of GRTT is kept to maximize 1201 the efficiency of redundant NACK suppression and repair aggregation. 1202 The update to the source's ongoing estimate of GRTT is done observing 1203 the following rules: 1205 1) If a receiver's response round trip time (RTT_rcvr) is greater than 1206 the current GRTT estimate, the GRTT is immediately updated to this 1207 new peak value: 1209 GRTT = RTT_rcvr 1211 2) At the end of the response collection period (i.e., the GRTT probe 1212 interval), if the recorded "peak" response RTT_peak) is less than 1213 the current GRTT estimate, the GRTT is updated to: 1215 GRTT = MAX(0.9*GRTT, RTT_peak) 1217 3) If no feedback is received, the sender GRTT estimate remains 1218 unchanged. 1220 4) At the end of the response collection period, the peak tracking 1221 value (RTT_peak) is reset to ZERO for subsequent peak detection. 1223 The GRTT collection period (i.e., period of probe transmission) could be 1224 fixed at a value on the order of that expected for group membership 1225 and/or network topology dynamics. For robustness, more rapid probing 1226 could be used at protocol startup before settling to a less frequent, 1227 steady-state interval. Optionally, an algorithm may be developed to 1228 adjust the GRTT collection period dynamically in response to the current 1229 GRTT estimate (or variations in it) and to an estimation of packet loss. 1230 The overhead of probing messages could then be reduced when the GRTT 1231 estimate is stable and unchanging, but be adjusted to track more 1232 dynamically during periods of variation with correspondingly shorter 1233 GRTT collection periods. GRTT collection may also be coupled with 1234 collection of other information for congestion control purposes. 1236 In summary, although NORM repair cycle timeouts are based on GRTT, it 1237 should be noted that convergent operation of the protocol does not 1238 _strictly_ depend on highly accurate GRTT estimation. The current 1239 mechanism has proved sufficient in simulations and in the environments 1240 where NORM-like protocols have been deployed to date. The estimate 1241 provided by the algorithm tracks the peak envelope of actual GRTT 1242 (including operating system effect as well as network delays) even in 1243 relatively high loss connectivity. The steady-state probing/update 1244 interval may potentially be varied to accommodate different levels of 1245 expected network dynamics in different environments. 1247 3.7.2. One-to-Many Receiver RTT Measurement 1249 In this approach, receivers send messages with timestamps to the sender. 1250 To control the volume of these receiver-generated messages, a 1251 suppression mechanism similar to that described for NACK suppression my 1252 be used. The "age" of receivers' RTT measurement should be kept by 1253 receivers and used as a metric in competing for feedback opportunities 1254 in the suppression scheme. For example, receiver who have not made any 1255 RTT measurement or whose RTT measurement has aged most should have 1256 precedence over other receivers. In turn the sender may have limited 1257 capacity to provide an "echo" of the receiver timestamps back to the 1258 group, and it could use this RTT "age" metric to determine which 1259 receivers get precedence. The sender can determine the GRTT as 1260 described in 3.7.1 if it provides sender timestamps to the group. 1261 Alternatively, receivers who note their RTT is greater than the sender 1262 GRTT can compete in the feedback opportunity/suppression scheme to 1263 provide the sender and group with this information. 1265 3.7.3. Many-to-Many RTT Measurement 1267 For reliable multicast sessions that involve multiple senders, it may be 1268 useful to have RTT measurements occur on a true "many-to-many" basis 1269 rather than have each sender independently tracking RTT. Some protocol 1270 efficiency can be gained when receivers can infer an approximation of 1271 their RTT with respect to a sender based on RTT information they have on 1272 another sender and that other sender's RTT with respect to the new 1273 sender of interest. For example, for receiver "a" and sender's "b" and 1274 "c", it is likely that: 1276 RTT(a<->b) <= RTT(a<->c)) + RTT(b<->c) 1278 Further refinement of this estimate can be conducted if RTT information 1279 is available to a node concerning its own RTT to a small subset of other 1280 group members and RTT information among those other group members it 1281 learns during protocol operation. 1283 3.7.4. Sender GRTT Advertisement 1285 To facilitate deterministic NORM protocol operation, the sender should 1286 robustly advertise its current estimation of GRTT to the receiver set. 1287 Common, robust knowledge of the sender's current operating GRTT estimate 1288 among the group will allow the protocol to progress in its most 1289 efficient manner. The sender's GRTT estimate can be robustly advertised 1290 to the group by simply embedding the estimate into all pertinent 1291 messages transmitted by the sender. The overhead of this can be made 1292 quite small by quantizing (compressing) the GRTT estimate to a single 1293 byte of information. The following C-language functions allows this to 1294 be done over a wide range (RTT_MIN through RTT_MAX) of GRTT values while 1295 maintaining a greater range of precision for small GRTT values and less 1296 precision for large values. Values of 1.0e-06 seconds and 1000 seconds 1297 are RECOMMENDED for RTT_MIN and RTT_MAX respectively. NORM applications 1298 may wish to place an additional, smaller upper limit on the GRTT 1299 advertised by senders to meet application data delivery latency 1300 constraints at the expense of greater feedback volume in some network 1301 environments. 1303 unsigned char QuantizeGrtt(double grtt) 1304 { 1305 if (grtt > RTT_MAX) 1306 grtt = RTT_MAX; 1307 else if (grtt < 1.0e-06) 1308 grtt = RTT_MIN; 1309 if (grtt < (33*RTT_MIN)) 1310 return ((unsigned char)(grtt * RTT_MIN) - 1); 1311 else 1312 return ((unsigned char)(ceil(255.0.- 1313 (13.0 * log(RTT_MAX/grtt))))); 1314 } 1316 double UnquantizeRtt(unsigned char qrtt) 1317 { 1318 return ((qrtt < 31) ? 1319 (((double)(qrtt+1))/(double)RTT_MIN) : 1320 (RTT_MAX/exp(((double)(255-qrtt))/(double)13.0))); 1321 } 1323 Note that this function is useful for quantizing GRTT times in the range 1324 of 1 microsecond to 1000 seconds. Of course, NORM protocol 1325 implementations may wish to further constrain advertised GRTT estimates 1326 (e.g., limit the maximum value) for practical reasons. 1328 3.8. Group Size Determination/Estimation 1330 When NORM protocol operation includes mechanisms that excite feedback 1331 from the group at large (e.g., congestion control), it may be possible 1332 to roughly estimate the group size based on the number of feedback 1333 messages received with respect to the distribution of the probabilistic 1334 suppression mechanism used. Note the timer-based suppression mechanism 1335 described in this document does not require a very accurate estimate of 1336 group size to perform adequately. Thus, a rough estimate, particularly 1337 if conservatively managed, may suffice. Group size may also be 1338 determined administratively. In absence of a group size determination 1339 mechanism a default group size value of 10,000 is RECOMMENDED for 1340 reasonable management of feedback given the scalability of expected NORM 1341 usage. 1343 3.9. Congestion Control Operation 1345 Congestion control that fairly shares available network capacity with 1346 other reliable multicast and TCP instantiations is REQUIRED for general 1347 Internet operation. The TCP-Friendly Multicast Congestion Control 1348 (TFMCC) [16] or Pragmatic General Multicast Congestion Control (PGMCC) 1349 techniques [17] may be applied to NORM operation to meet this 1350 requirement. 1352 3.10. Router/Intermediate System Assistance 1354 NACK-oriented protocols may benefit from general purpose router 1355 assistance. In particular, additional NACK suppression where routers or 1356 intermediate systems can aggregate NACK content (or filter duplicate 1357 NACK content) from receivers as it is relayed toward the sender could 1358 enhance NORM group size scalability. For NORM protocols using FEC, it 1359 is possible that intermediate systems may be able to filter FEC repair 1360 messages to provide an intelligent "subcast" of repair content to 1361 different legs of the multicast topology depending on the repair needs 1362 learned from previous receiver NACKs. Both of these types of assist 1363 functions would require router interpretation of transport data unit 1364 content identifiers and flags. 1366 3.11. NORM Applicability 1368 The NORM building block applies to protocols wishing to employ negative 1369 acknowledgement to achieve reliable data transfer. Properly designed 1370 negative-acknowledgement (NACK)-oriented reliable multicast (NORM) 1371 protocols offer scalability advantages for applications and/or network 1372 topologies where, for various reasons, it is prohibitive to construct a 1373 higher order delivery infrastructure above the basic Layer 3 IP 1374 multicast service (e.g., unicast or hybrid unicast/multicast data 1375 distribution trees). Additionally, the scalability property of NACK- 1376 oriented protocols [18], [19] is applicable where broad "fan-out" is 1377 expected for a single network hop (e.g., cable-TV data delivery, 1378 satellite, or other broadcast communication services). Furthermore, the 1379 simplicity of a protocol based on "flat" group-wide multicast 1380 distribution may offer advantages for a broad range of distributed 1381 services or dynamic networks and applications. NORM protocols can make 1382 use of reciprocal (among senders and receivers) multicast communication 1383 under the Any-Source Multicast (ASM) model defined in RFC 1112 [2], and 1384 are capable of scalable operation in asymmetric topologies such as 1385 Single-Source Multicast (SSM) [8] where there may only be unicast 1386 routing service from the receivers to the sender(s). 1388 NORM operation is compatible with transport layer forward error 1389 correction coding techniques as described in [13] and congestion control 1390 mechanisms such as those described in [16] and [17]. A principle 1391 limitation of NORM operation involves group size scalability when 1392 network capacity for receiver feedback is very limited. NORM operation 1393 is also governed by implementation buffering constraints. Buffering 1394 greater than that required for typical point-to-point reliable transport 1395 (e.g., TCP) is recommended to allow for disparity in the receiver group 1396 connectivity and to allow for the feedback delays required to attain 1397 group size scalability. 1399 4. Security Considerations 1401 NORM protocols are expected to be subject to same sort of security 1402 vulnerabilities as other IP and IP multicast protocols. NORM is 1403 compatible with IP security (IPsec) authentication mechanisms [20] that 1404 are RECOMMENDED for protection against session intrusion and denial of 1405 service attacks. A particular threat for NACK based protocols is that 1406 of NACK replay attacks that would prevent a NORM sender from making 1407 forward progress in transmission. Any standard IPsec mechanisms that 1408 can provide protection against such replay attacks are RECOMMENDED for 1409 use. Additionally, NORM protocol instantiations SHOULD consider 1410 providing support for their own NACK replay attack protection when 1411 network layer mechanisms are not available. The IETF Multicast Security 1412 (msec) Working Group is also developing solutions which may be 1413 applicable to NORM in the future. 1415 5. IANA Considerations 1417 This document introduces no additional considerations for IANA. 1419 6. Acknowledgements (and these are not Negative) 1421 The authors would like to thank Rick Jones, and Joerg Widmer for their 1422 valuable comments on this document. The authors would also like to 1423 thank the RMT working group chairs, Roger Kermode and Lorenzo Vicisano, 1424 for their support in development of this specification, and Sally Floyd 1425 for her early inputs into this document. 1427 7. References 1429 7.1. Normative References 1431 [1] S. Bradner, "Key words for use in RFCs to Indicate Requirement 1432 Levels", BCP 14, RFC 2119, March 1997. 1434 [2] S. Deering, "Host Extensions for IP Multicasting". Internet 1435 RFC1112, August 1989. 1437 7.2. Informative References 1439 [3] A. Mankin, A. Romanow, S. Bradner, V. Paxson, "IETF Criteria for 1440 Evaluating Reliable Multicast Transport and Application Protocols", RFC 1441 2357, June 1998. 1443 [4] D. Clark, D. Tennenhouse, "Architectural Considerations for a New 1444 Generation of Protocols". In Proc. ACM SIGCOMM, pages 201--208, 1445 September 1990. 1447 [5] R. Kermode, L. Vicisano, "Author Guidelines for Reliable Multicast 1448 Transport (RMT) Building Blocks and Protocol Instantiation documents", 1449 RFC 3269, April 2002. 1451 [6] J. Nonnenmacher and E. W. Biersack, "Optimal Multicast Feedback," in 1452 IEEE Infocom , (San Francisco, California), p. 964, March/April 1998. 1454 [7] J. Macker, R. Adamson, "Quantitative Prediction of Nack Oriented 1455 Reliable Multicast (NORM) Feedback", Proc. IEEE MILCOM 2002, October 1456 2002. 1458 [8] Holbrook, H. W., "A Channel Model for Multicast", Ph.D. 1459 Dissertation, Stanford University, Department of Computer Science, 1460 Stanford, California, August 2001. 1462 [9] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and J. 1463 Crowcroft, "Forward Error Correction (FEC) Building Block", RFC 3452, 1464 December 2002. 1466 [10] S. Floyd, V. Jacobson, S. McCanne, C. Liu, and L. Zhang. "A 1467 Reliable Multicast Framework for Light-weight Sessions and Application 1468 Level Framing", Proc. ACM SIGCOMM, August 1995. 1470 [11] J. Metzner, "An Improved Broadcast Retransmission Protocol", IEEE 1471 Transactions on Communications, Vol. Com-32, No.6, June 1984. 1473 [12] J. Macker, "Reliable Multicast Transport and Integrated Erasure- 1474 based Forward Error Correction", Proc. IEEE MILCOM 97, October 1997. 1476 [13] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, and J. 1477 Crowcroft, "The Use of Forward Error Correction (FEC) in Reliable 1478 Multicast", RFC 3453, December 2002. 1480 [14] D. Gossink, J. Macker, "Reliable Multicast and Integrated Parity 1481 Retransmission with Channel Estimation", IEEE GLOBECOM 98'. 1483 [15] V. Ozdemir, S. Muthukrishnan, I. Rhee, "Scalable, Low-Overhead 1484 Network Delay Estimation", NCSU/AT&T White Paper, February 1999. 1486 [16] J. Widmer and M. Handley, "Extending Equation-Based Congestion 1487 Control to Multicast Applications", Proc ACM SIGCOMM 2001, San Diego, 1488 August 2001. 1490 [17] L. Rizzo, "pgmcc: A TCP-Friendly Single-Rate Multicast Congestion 1491 Control Scheme", Proc ACM SIGCOMM 2000, Stockholm, August 2000. 1493 [18] S. Pingali, D. Towsley, J. Kurose, "A Comparison of Sender- 1494 Initiated and Receiver-Initiated Reliable Multicast Protocols". In 1495 Proc. INFOCOM, San Francisco, CA, October 1993. 1497 [19] B.N. Levine, J.J. Garcia-Luna-Aceves, "A Comparison of Known 1498 Classes of Reliable Multicast Protocols", Proc. International 1499 Conference on Network Protocols (ICNP-96), Columbus, Ohio, Oct 29--Nov 1500 1, 1996. 1502 [20] S. Kent and R. Atkinson, "Security Architecture for the Internet 1503 Protocol", RFC 2401, November 1998. 1505 8. Authors' Addresses 1507 Brian Adamson 1508 adamson@itd.nrl.navy.mil 1509 Naval Research Laboratory 1510 Washington, DC 20375 1512 Carsten Bormann 1513 cabo@tzi.org 1514 Universitaet Bremen TZI 1515 Postfach 330440 1516 D-28334 Bremen, Germany 1518 Mark Handley 1519 mjh@aciri.org 1520 1947 Center Street, Suite 600 1521 Berkeley, CA 94704 1523 Joe Macker 1524 macker@itd.nrl.navy.mil 1525 Naval Research Laboratory 1526 Washington, DC 20375 1528 9. Full Copyright Statement 1530 Copyright (C) The Internet Society (2004). This document is subject to 1531 the rights, licenses and restrictions contained in BCP 78 and except as 1532 set forth therein, the authors retain all their rights. 1534 This document and the information contained herein are provided on an 1535 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR 1536 IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1537 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1538 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1539 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1540 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.