idnits 2.17.1 draft-wenger-avt-rtcp-feedback-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 21 longer pages, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 541 instances of too long lines in the document, the longest one being 8 characters in excess of 72. ** The abstract seems to contain references ([2], [7], [1]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 303: '... schemes MUST use the feedback...' RFC 2119 keyword, line 304: '...ackward compatibility reasons, it MUST...' RFC 2119 keyword, line 603: '... Otherwise, R MUST check whether it...' RFC 2119 keyword, line 636: '...t is sent as an Early RTCP packet MUST...' RFC 2119 keyword, line 638: '...TCP packets will MUST set the E bit to...' (9 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2001) is 8258 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? '1' on line 841 looks like a reference -- Missing reference section? '2' on line 846 looks like a reference -- Missing reference section? '7' on line 895 looks like a reference -- Missing reference section? '3' on line 849 looks like a reference -- Missing reference section? '4' on line 853 looks like a reference -- Missing reference section? '5' on line 858 looks like a reference -- Missing reference section? '6' on line 861 looks like a reference -- Missing reference section? '9' on line 872 looks like a reference -- Missing reference section? '10' on line 875 looks like a reference -- Missing reference section? '124' on line 122 looks like a reference -- Missing reference section? '8' on line 869 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 13 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Stephan Wenger 3 draft-wenger-avt-rtcp-feedback-02.txt TU Berlin 4 Joerg Ott 5 Universitaet Bremen TZI 7 2 March, 2001 8 Expires September 2001 10 RTCP-based Feedback: Concepts and Message Timing Rules 12 Status of this Memo 14 This document is an Internet-Draft and is in full conformance with all 15 provisions of Section 10 of RFC 2026. Internet-Drafts are working 16 documents of the Internet Engineering Task Force (IETF), its areas, and 17 its working groups. Note that other groups may also distribute working 18 documents as Internet-Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet- Drafts as reference material 23 or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Abstract 33 Real-time media streams are not resilient against packet losses. RTP 34 [1] provides all the necessary mechanisms to restore ordering and 35 timing to properly reproduce a media stream at the recipient. RTP 36 also provides continuous feedback about the overall reception quality 37 from all receivers -- thereby allowing the sender(s) in the mid-term 38 (in the order of several seconds to minutes) to adapt their coding 39 scheme and transmission behavior to the observed network QoS. 40 However, except for a few payload specific mechanisms [2], RTP makes 41 no provision for timely feedback that would allow a sender to repair 42 the media stream immediately: through retransmissions, retro-active 43 FEC, or media-specific mechanisms such as reference picture 44 selection. 46 This document specifies a modification to the algorithm for 47 scheduling RTCP packets in order to allow occasional timely feedback 48 to events observed by a receiver (such a lost packets). The message 49 format for RTCP-based feedback is defined in a companion document 50 [7]. 52 1. Introduction 54 Real-time media streams are not resilient against packet losses. RTP 55 [1] provides all the necessary mechanisms to restore ordering and 56 timing present at the sender to properly reproduce a media stream at 57 a recipient. RTP also provides continuous feedback about the overall 58 reception quality from all receivers -- thereby allowing the 59 sender(s) in the mid-term (in the order of several seconds to 60 minutes) to adapt their coding scheme and transmission behavior to 61 the observed network QoS. However, except for a few payload specific 62 mechanisms [2], RTP makes no provision for timely feedback that would 63 allow a sender to repair the media stream immediately: through 64 retransmissions, retro-active FEC, or media-specific mechanisms such 65 as reference picture selection. 67 Current mechanisms available with RTP to improve error resilience 68 include audio redundancy coding [3], video redundancy coding [4], 69 RTP-level FEC [5], and general considerations on more robust media 70 streams transmission [6]. Particularly in small groups, however, 71 virtually all kinds of all types of real-time media streams could 72 benefit from a mechanism that would enable a sender to perform media 73 stream repair -- including but not limited to audio, video, DTMF, and 74 text chat streams. In some case of networks with acceptable round- 75 trip times but scarce bandwidth, occasional retransmissions may be 76 much preferred over continuous transmission of redundant information. 78 For example, predictive video coding is not loss resilient. Any loss 79 of coded data leads to annoying artifacts not only in the reproduced 80 picture in which the loss occurred, but also in subsequent pictures. 81 Error resilience can be achieved by spending bits to convey redundant 82 information using source coding based mechanisms or transport based 83 mechanisms. This can be done without the use of any feedback between 84 the decoder(s) and the encoder. Similar consideration apply to 85 protecting e.g. DTMF (and other tones) carried in an RTP stream [9]. 87 Alternatively, where applicable, receivers can inform the sender 88 through a feedback channel about a loss situation, and the sender can 89 react accordingly. This approach provides better media quality and 90 is more efficient with respect to the bandwidth used by the sender to 91 achieve a given media quality. However, using feedback mechanisms is 92 limited to certain application scenarios identified by encoder 93 characteristics, delay constraints, and/or the number of recipients. 95 This memo specifies a profile based upon [1] and [10] with enhanced 96 rules for sending receiver reports to support feedback transmission 97 reflecting the need for very low delay for conveying feedback, which 98 is necessary to make them efficient (or workable at all). Immediate 99 Feedback messages (FB messages) and Early Receiver Reports (Early 100 RRs) and algorithms are specified that allow for low delay in small 101 multicast groups, but prevent network flooding in larger ones. 102 Special consideration is given to point-to-point scenarios. 104 In addition, this memo gives some consideration to specific 105 application scenarios are the respective feedback requirements, at 106 the moment focusing on predictive video coding. 108 A companion document [7] discusses various types of general purpose 109 feedback information (also allowing for extensions specific to 110 certain media payload) and defines an RTCP packet format to transmit 111 FBs in an RTP environment. It can be used in conjunction with all 112 payload specifications for predictive video coding schemes currently 113 available for RTP. 115 2. Motivation 117 2.1 Example: Predictive Video Coding 119 2.1.1 Video Encoder-decoder synchronicity 121 Most current video coding schemes for compressed video, such as the 122 ITU-T H.261 and H.263 and ISO/IEC MPEG[124] employ a mechanism known 123 as Inter Picture Prediction. Each picture is divided into 124 macroblocks of uniform size. For each macroblock, one or more 125 motion vectors may be identified and transmitted. The residual 126 signal after motion compensation is DCT-transformed, quantized, 127 entropy coded, and transmitted as well. The encoder reconstructs, 128 based on this information, a so-called reference picture, which is 129 used to perform the motion compensation and residual signal coding 130 steps for the subsequent picture. Since the reference picture is 131 generated using only such information that is also available at the 132 decoder, the reference picture is identical to the reconstructed 133 picture at the decoder. Having identical reference pictures at the 134 encoder and decoder is referred to as encoder-decoder-synchronicity. 136 Whenever data is damaged or lost on the way between the encoder and 137 the decoder, the reconstructed picture at the decoder is no more 138 identical with the encoder's reference picture -- the encoder-decoder 139 synchronicity is lost. 141 Any loss of the encoder-decoder synchronicity results in annoying 142 artifacts at the decoder. Because the prediction of subsequent 143 pictures in the decoder is based on a damaged reference picture, the 144 annoying artifacts are present not only in the picture in which the 145 loss occurred; they propagate to all subsequent pictures, until, 146 through source coding based mechanisms, the encoder-decoder 147 synchronicity is restored. Therefore, the goal of systems employing 148 predictive video coding in a lossy environment must be to keep the 149 encoder-decoder synchronicity, or, if this is not possible, to regain 150 that synchronicity as quickly as possible. 152 2.1.2. Non-feedback based mechanisms 154 Avoiding the loss of the encoder-decoder synchronicity corresponds to 155 avoiding the loss of coded picture data. Such a task can be 156 performed on the transport layer. In RTP environments, the use of 157 packet-based FEC is a good example for such a technique. (The use of 158 TCP or reliable multicast as the transport for media streams would be 159 an even better one but is inappropriate for low-delay (interactive) 160 real-time systems.) FEC schemes, interleaving, and other means for 161 repairing real-time media streams may also add additional delay and 162 significant bit rate overhead without being able to guarantee 163 compensation of virtually all packet losses. 165 Once the encoder-decoder synchronicity is lost, only source coding 166 oriented mechanisms can help to regain it. One common way is to send 167 a non-predictively coded picture (known as Intra picture). Intra 168 pictures have the disadvantage of being several times bigger than 169 predictively coded pictures (Inter pictures). Therefore, sending 170 Intra pictures has negative implications both on the bandwidth and 171 (in bandwidth limited environments) delay. Another way is to use 172 Intra macroblock refresh. Here, certain parts of the picture (those 173 affected by a packet loss) are coded non-predictively in order to 174 resynchronize the encoder and decoder over time. Intra macroblock 175 refresh has better delay characteristics then full Intra pictures 176 because the picture size can be kept constant, but is less efficient 177 in terms of bit rate/distortion than full Intra pictures. More 178 sophisticated means such as Reference Picture Selection (RPS) are 179 also available in modern video coding standards. 181 Systems not employing feedback channels may use any combination of 182 the mechanisms described above to add error resilience -- at the cost 183 of added bit rate and, sometimes, added delay. The number of 184 additional bits spent for error resilience can be adapted using the 185 long-term packet loss rate information in the RTCP receiver reports. 186 But, even when using such adaptive means, it is still likely that 187 systems spend many more bits then theoretically necessary to achieve 188 error resilience in order to be on the safe side. Plus, as regular 189 RTCP feedback is aimed at longer terms, reactivity to sudden losses 190 is limited. In all practical applications today this means that 191 fewer bits are available for non redundant picture data, and hence 192 the overall picture quality suffers. 194 2.1.3 Feedback based systems 196 Feedback-based systems try to avoid spending too many bits for 197 redundant information by informing the encoder about a loss situation 198 at the decoder(s). The encoder can then react accordingly and spend 199 redundant bits only when needed possibly only for the part of the 200 picture that was effected by the loss -- thereby reducing the number 201 of redundant bits and leaving more bits for useful information. As a 202 result, a higher reproduced picture quality can generally be expected 203 when feedback channels are available. 205 Similar to the observations of section 2.1.2, transport and source 206 coding based mechanisms can be distinguished that react on loss 207 situations reported by feedback. 209 Transport based systems employing feedback react media unaware, by 210 re-transmitting lost packets. TCP is a good example for a protocol 211 following such a scheme. Transport-based feedback in real-time 212 and/or multicast environments is a complex matter and subject of a 213 lot of engineering and research in and outside of the IETF. This 214 specification is not concerned with pure transport-based feedback. 216 Source coding based mechanisms may react upon the arrival of a 217 feedback message indicating a loss situation by adding bits that 218 restore, or at least make an effort to restore, the encoder-decoder 219 synchronicity. This process has to be performed by a real-time 220 encoder. However, schemes were reported, that allow the use of 221 feedback also for non-real-time encoders by storing multiple 222 representations of the same data (e.g. Inter and Intra coded), and 223 dynamically switching between those representations. 225 Several types of feedback messages, called Feedback Messages or FB 226 messages, can be defined for such a case. An FB message can be as 227 simple as a Boolean condition, indicating for example the loss of a 228 full picture (and, therefore, the need of a full Intra picture 229 transmission). Other feedback messages may contain more complex 230 information such as information about the damage of a spatial region 231 of the picture. A special form consists of a message the format and 232 semantics of which are not known at the transport level, because they 233 are defined in the video codec standards. 235 2.2 Feedback Messages 237 Most FB messages contain negative acknowledge information, indicating 238 an erroneous situation at the decoder. In others, the nature of the 239 acknowledge (positive, negative, or both) is part of the feedback 240 message itself. When used in multicast environments, positive 241 acknowledge must not be used. 243 This document assumes that feedback messages are transmitted using 244 RTCP packets. RTCP messages from the receivers to the sender cannot 245 be sent at any possible time, in order to prevent traffic explosion 246 in case of large multicast groups. Instead, the bit rate for all 247 RTCP messages of all receivers together has to obey a maximum 248 fraction of the total RTP session bit rate, yielding a very limited 249 bit rate budget for a single receiver when having a large multicast 250 group. This, in turn, leads to an increased average delay when the 251 size of the receiving multicast group grows. (see section 6 of [1] 252 for details) 254 This specification defines an algorithm that adheres to the bit rate 255 limitations for the feedback channel on the long term, but allows 256 short-term overdrafting for any receiver (but not all of them 257 simultaneously). Thus, the algorithm allows for better real-time 258 performance then the one specified in [1]. Traffic explosion in such 259 cases in which many receivers identify a picture damage 260 simultaneously is prevented by dithering. 262 As this specification assumes a sender that has full control over its 263 transmission bit rate (e.g. a real-time encoder), there is no scaling 264 problem on the forward channel. Any reaction to negative feedback 265 generates additional bits, which have to be conveyed but this is 266 taken from the sender's total bit rate budget. The encoder can take 267 this into account by, for example, changing the encoding mode, packet 268 size, and so forth. The sender is also free to simply ignore 269 feedback messages. Adjusting the tradeoff between the reproduced 270 media quality of all receivers of a multicast group and the amount of 271 additional repair traffic is a media-dependent, very complex task and 272 is not covered in this specification. 274 Finally, frequent RTCP-based feedback messages may provide additional 275 input to the sender(s)'s congestion control algorithms and thus 276 improve its reactivity towards network congestion. 278 Feedback messages as well as sender and receiver behavior are to be 279 specified in separate documents (such as [7]). Such specifications 280 need to consider that, frequently, packet loss is an indication of 281 network congestion and thus define mechanisms for media-specific 282 congestion control in the presence of feedback as defined in this 283 memo. 285 2.3. Applications and Relationships to other Standards 287 This specification is based on RTCP, which implies its use in an RTP 288 environment. RTP itself is used in a variety of systems such as in 289 SIP- or H.323-based multimedia conferencing/telephony, SAP-announced 290 Mbone conferences, and RTSP-based media streaming. 292 As for the video codecs, there is currently a small set of standards 293 that are, for the purpose of this discussion, roughly comparable. 294 Many mechanisms for regaining encoder-decoder synchronicity are 295 applicable to all video codecs. Others require certain tools (such 296 as Reference Picture Selection, aka NEWPRED) that are available only 297 in certain versions of the standards, and/or optional tools whose use 298 must be negotiated prior to being used. 300 A few RTP payload specifications such as RFC 2032 [2] already define 301 a feedback mechanism for some of the coding algorithms considered in 302 this specification. An application capable of performing both 303 schemes MUST use the feedback mechanism defined in this 304 specification, although, for backward compatibility reasons, it MUST 305 also be capable to conform to the feedback scheme defined in the 306 respective RTP payload format, if this is required by that payload 307 format. 309 Also, audio, DTMF, and text streams could benefit from more immediate 310 feedback even though the redundancy payload formats work well for 311 these media. 313 All kinds of non-interactive media streams (such as RTSP-controlled 314 media streaming applications) could benefit significantly as without 315 interactivity there is more time available for media repair. 317 2.4 Remarks on the size of the multicast group 319 This specification prevents traffic explosion on the feedback channel 320 in a very similar way as RTP does, with the exception of allowing 321 individual receivers to overdraft their bit rate budget from time to 322 time. This is necessary in order to allow for low delay, which is 323 needed by the algorithms reacting to Feedback messages. 325 This scaling, however, limits the usefulness of this mechanism in 326 multicast groups from a certain size upwards (where the size 327 threshold depends on a number of parameters including loss rate, 328 frame rate, number of packets per frame, and session bandwidth). The 329 maximum size of the multicast group is soft and also depends on 330 application requirements and is therefore not specified here. 331 Considerations on the multicast group sizes will be presented in 332 section 3.5. 334 2.5 Terminology 336 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 337 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 338 document are to be interpreted as described in RFC 2119 [8] 340 3. Low delay RTCP Feedback 342 Two components constitute RTCP-based feedback as described in this 343 memo: 345 . Status reports are contained in SR/RR messages and are transmitted 346 at regular intervals as part of compound RTCP packets (which also 347 include SDES and possibly other messages); these status reports 348 provide an overall indication for the recent reception quality of a 349 media stream. RTP [1] define rules for the transmission of these 350 status reports. 352 . Feedback messages as defined in a companion document [7] that 353 indicate loss or reception of particular pieces of a media stream 354 (or provide some other form of rather immediate feedback on the 355 data received). Rules for the transmission of feedback messages 356 are newly introduced in this memo. 358 As discussed in [7], RTCP Feedback (FB) messages are just another 359 RTCP message type. Thus multiple FB messages may be combined in a 360 single RTCP packet. FB messages may be sent in full compound RTCP 361 packets along with SR/RR, SDES, and other RTCP messages. Or they may 362 be transmitted in minimal compound RTCP FB packets (which only 363 contain the RR/SR and an encryption prefix if necessary to reduce the 364 message size). RTCP packets that do not contain FB messages are 365 referred to as non-FB RTCP packets. 367 3.1 Algorithm Outline 369 FB messages are part of the RTCP control streams and are thus subject 370 to the same bandwidth constraints as other RTCP traffic. This means 371 in particular that it may not be possible to report a packet loss at 372 a receiver immediately back to the sender. However, the value of 373 feedback given to a sender typically decreases over time -- in terms 374 of the media quality as perceived by the user at the receiving end 375 and/or the cost required to achieve media stream repair. 377 RTP [1] specifies rules when compound RTCP packets should be sent. 378 This specification modifies those rules in order to allow 379 applications to timely report media loss or reception events, since 380 most algorithms that use FB messages are very critical to the 381 feedback timing. See section 5 and following for a discussion of FB 382 messages and the impact of delay on the performance these FB types. 384 The modified algorithm can be outlined as follows: Normally, when no 385 FB messages have to be conveyed, compound RTCP packets are sent 386 following the rules of RTP [1]. If a receiver detects the need for 387 an FB message, the receiver first checks whether it has already seen 388 a corresponding FB message from any other receiver (which it can do 389 with all FB messages that are transmitted via multicast; for unicast 390 sessions, there is no such delay). If this is the case then the 391 receiver refrains from sending the FB message, and continues to 392 follow the regular RTCP sending schedule. If the receiver has not 393 yet seen a similar FB message from any other receiver, it checks 394 whether it has recently exceeded its RTCP bit rate budget to transmit 395 another FB message (without waiting for its regularly scheduled RTCP 396 transmission time). Only if this is not the case, it sends the FB 397 message, after waiting a short, random dithering interval period (in 398 case of multicast). 400 FB messages are sent as part of minimal compound RTCP packets . Full 401 compound RTCP packet are interspersed as per [1] in regular intervals 402 of at least five seconds. 404 3.2 Modes of Operation 406 RTCP-based feedback may operate in one of three modes (figure 1): 408 a) Immediate feedback mode: the group size is below a certain 409 threshold (the FB threshold) which gives each receiving party 410 sufficient bandwidth to transmit the feedback traffic for the 411 intended purpose. This means, for each receiver there is enough 412 bandwidth to report each event it is supposed/expected to by means 413 of a virtually "immediate" Early RTCP packet. 415 The group size threshold is a function of a number of parameters 416 including (but not necessarily limited to) the type of feedback 417 used (e.g. ACK vs. NACK), bandwidth, packet rate, packet loss 418 probability, media type, codec, and -- again depending on the type 419 of FB used -- the (worst case or observed) frequency of events to 420 report (e.g. frame received, packet lost). 422 A special case of this is the ACK mode (where positive 423 acknowledgements are used to confirm reception of data) which is 424 restricted to point-to-point communications. 426 b) In Early RTCP mode, the group size and other parameters no longer 427 allow each receiver to react to each event that would be worth (or 428 needed) to report. But feedback can still be given sufficiently 429 often so that it allows the sender to adapt the media stream and 430 thereby increase the overall reproduced media quality. 432 c) From some group size upwards, it is no longer useful to provide 433 feedback from individual receivers at all -- because of the time 434 scale in which the feedback could be provided and/or because in 435 large groups the sender(s) have no chance to react to individual 436 feedback anymore. 438 As the feedback algorithm described in this memo scales, there is no 439 need for an agreement on the precise values of the respective 440 "thresholds" within the group. Hence the borders between all these 441 modes are fluent. 443 ACK 444 feedback 445 V 446 :<- - - - NACK feedback - - - ->// 447 : 448 : Immediate || 449 : Feedback mode ||Early RTCP mode Regular RTCP mode 450 :<=============>||<=============>//<=================> 451 : || 452 -+---------------||---------------//------------------> group size 453 2 || 454 Application-specific FB Threshold 455 = f(rate,loss,codec,...) 457 Figure 1: Modes of operation 459 The respective thresholds depend on a number of technical parameters 460 (of the codec, the transport, the feedback used, etc.) but also on 461 the respective application scenarios. Section 3.5 provides some 462 useful hints (but no complete precise calculations) on estimating 463 these thresholds. 465 3.3 Definitions 467 a) Let the media stream be transmitted at a (roughly) constant packet 468 rate f (in packets per second). This results in an average 469 inter-packet interval of tau=1/f. 471 b) Let T_rtt be the maximum round trip time as measured by RTCP 472 (if available to the receiver). Note that this may be asymmetric. 474 d) Let t_rr and t_(rr-1) be the time for the next (last) scheduled 475 RTCP RR transmission calculated prior to reconsideration. 476 Let T_rr + t_(rr-1) = t_rr. (In RTP [1] these are termed tp, tn, 477 respectively). 479 d) Let t_e be the time for which a feedback packet is scheduled. 481 e) Let t_dither_max be the maximum interval for which an RTCP 482 feedback packet may be additionally delayed (to prevent 483 implosions). 485 f) Let T_fd be the delay for the feedback message that a certain 486 packet P caused to return to the sender after reception of P. 488 g) Let S be the number of active senders in the RTP session. 490 h) Let N be the current estimate of the number of receivers in the 491 RTP session. 493 The feedback situation for an event to report at a receiver is 494 depicted in figure 2 below. At time t0, such an event (e.g. a packet 495 loss is detected at the receiver. The receiver decides -- based upon 496 current T_rtt, group size, and other (application-specific) 497 parameters -- that a feedback message shall be sent back to the 498 sender. 500 To avoid an implosion of immediate feedback packets, the receiver 501 delays transmission of the compound feedback packet by a random 502 amount T_fd (with the random number evenly distributed in the 503 interval [0, T_dither_max]. Transmission of the compound RTCP packet 504 is then scheduled for t_e = t0 + T_fd. 506 The T_dither_max parameter is chosen based upon the group size, the 507 RTCP bandwidth constraints, and, if available, the round-trip time. 508 In addition, the receiver may take into account a number of other 509 parameters (such as the estimated round-trip time, the type of 510 feedback to be provided) to possibly extend the upper bound for the 511 feedback while ensuring that the feedback information still will make 512 sense when it reaches the sender. 514 If a compound RTCP feedback packet is scheduled, the time slot for 515 the next scheduled compound RTCP packet is updated accordingly to a 516 new t_rr. 518 event to 519 report 520 detected 521 | 522 | RTCP feedback 523 vXXXXXXXXXXXXXXXXXXXX ) ) 524 |---+--------+-------------+-----+------------| |--------+---------> 525 | | | | ( ( | 526 | t0 te | 527 t_(rr-1) t_rr 528 \_______ ________/ 529 \/ 530 T_dither_max 532 Figure 2: Event report and parameters for Early RTCP scheduling 534 3.4 Early RTCP Algorithm 536 Assume an active sender S0 (out of S senders) and a number N of 537 receivers with R being one of these receivers. 539 Assume further that R has verified that using feedback mechanisms is 540 reasonable at the current constellation (which is highly application 541 specific and hence not specified in this memo). 543 Then, the following rules apply to transmitting a Feedback Messages 544 as minimal compound RTCP packet: 546 Initially, R sets allow_early := TRUE. 548 At a point in time t0, R has transmitted the last RTCP RR packet at 549 t_(rr-1) and has scheduled the next transmission (prior to 550 reconsideration) for t_rr. 552 Now R detects the need to transmit a feedback message (e.g. because a 553 media "unit" needs to be ACKed or NACKed) at time t0. 555 R first checks whether there is still a feedback packet waiting for 556 transmission. If so, the new feedback message is appended to the 557 packet and the increased RTCP packet size is updated in the RTCP 558 bandwidth calculation (which may later lead to an adjustment of 559 t_rr); the schedule for the waiting RTCP feedback packet remains 560 unchanged. 562 If no feedback message is already awaiting transmission a new 563 (minimal) compound RTCP feedback message is created and the interval 564 T_dither_max is chosen as follows: 566 i) If the session is a unicast session (group size = 2) then 567 T_dither_max := 0. 569 ii) If the receiver has an RTT estimate to the originator of the 570 media unit to provide feedback about, then 572 / T_rtt/2 if T_rtt/2 > 10ms 573 T_dither_max := < 574 \ 10ms otherwise. 576 iii) If the receiver does not have an RTT estimate to the originator, 577 then 579 / T_rr/2 if T_rr/2 < 100ms 580 T_dither_max := < 581 \ 100ms otherwise. 583 (Note: These values are *still* open to discussion.) 585 (Note that application-specific feedback considerations may make it 586 worth while to increase T_dither_max beyond this value.) 588 Then, R checks whether its next regularly scheduled RTCP packet is 589 within the time bounds for the RTCP FB (t_e + T_dither_max > t_rr). 590 If so, no Early RTCP is scheduled; instead the FB message is appended 591 to the regular RTCP packet and the RTCP bandwidth calculation is 592 updated to reflect the additional RTCP size. The updated bandwidth 593 calculation may result in a slightly increased t_rr (=t_rr') but, 594 even if t_rr' > t_e + T_dither_max, this does not change the updated 595 transmission time t_rr'. 597 (Q: if the FB is piggybacked onto a regularly scheduled RTCP RR 598 message but the same or a superset of the feedback information is 599 received from another receiver, should the FB then be removed from 600 the compound RR/FB and its transmission time be revised again from 601 t_rr' to t_rr as calculated before?) 603 Otherwise, R MUST check whether it is allowed to transmit an Early 604 RTCP packet (allow_early == TRUE). 606 If so, R schedules an Early RTCP packet for t_e = t0 + RND * 607 T_dither_max with the RND function evenly distributed between 0 608 and 1. 610 If R receives an RTCP feedback packet (indicating the same or a 611 superset of the feedback information R wanted to transmit) before 612 t_e is reached, the FB information is discarded and the 613 transmission schedule for the next RR packet is reset to t_rr as 614 calculated before. 616 Otherwise, when t_e is reached, R creates an RR, appends the FB 617 information, and transmits the RTCP packet. R then sets 618 allow_early := FALSE and recalculates t_rr := t_e + 2*T_rr. As 619 soon as R sends its next regularly scheduled RTCP RR 620 (at the new t_rr), it sets allow_early := TRUE again. 622 If allow_early == FALSE then R checks the time for the next scheduled 623 RR: if t_rr - t0 < T_dither_max then R creates an FB message for 624 transmission along with the RTCP packet at a then slightly modified 625 t_rr' (see above). Otherwise, R does not send an RTCP feedback 626 message at all. 628 In regular RTCP intervals as specified by [1] (i.e. at most every 629 five seconds), a full compound RTCP packet is sent (which may also 630 contain a feedback message if one has been created according to the 631 above rules and scheduled for transmission along the full compound 632 RTCP message). 634 The E bit in the message header [7] is used upon reception to detect 635 whether this RTCP feedback message was sent as Early RTCP or not. 636 Hence, a feedback message that is sent as an Early RTCP packet MUST 637 set the E bit in the message header to "1". Feedback messages piggy- 638 backed on regularly scheduled RTCP packets will MUST set the E bit to 639 "0". 641 3.5 Considerations on the Group Size 643 This section intends to give some brief guidelines to the group sizes 644 at which the various feedback modes may be used. 646 3.5.1 ACK mode 648 The group size MUST be exactly two participants, i.e. point-to-point 649 communications. Unicast addresses SHOULD be used in the session 650 description. 652 For unidirectional as well as bi-directional communication between 653 two parties, 2.5% of the RTP session bandwidth are available for 654 feedback. Assuming a ratio of 1:10 for minimal to full compound RTCP 655 packets, at 64kbit/s, a receiver can report 2.5 events per second 656 back to the sender, at 256kbit/s 10 events and so forth. 658 From 768kbit/s upwards, a receiver would be able to acknowledge each 659 individual frame (not packet!) in a 30 fps video stream. 661 ACK strategies have to be defined accordingly to work with these 662 bandwidth limitations. 664 3.5.2 NACK mode 666 Negative acknowledgements (or similar types of feedback) have to be 667 used for all groups larger than two. 669 Whether or not the use of Immediate or Early RTCP packets should be 670 considered depends upon a number of parameters including session 671 bandwidth, codec, special type of feedback, number of senders and 672 receivers, among many others. 674 The crucial parameters -- to which all of the above can be reduced -- 675 is the allowed minimal interval between two RTCP reports and the 676 number of events that presumably need reporting per time interval. 677 The minimum interval is derived from the available RTCP bandwidth and 678 the expected average size of an RTCP packet. The number events to 679 report e.g. per second may be derived from the packet loss rate and 680 sender's rate of transmitting packets. From these two values, the 681 allowable group size for the Immediate feedback mode can be 682 calculated. 684 The upper bound for the Early RTCP mode then solely depends on the 685 acceptable quality degradation, i.e. how many events per time 686 interval may go unreported. 688 Example: If a 256kbit/s video with 30 fps is transmitted through a 689 network with an MTU size of some 1500 bytes, then, in most cases, 690 each frame would fit in its own packet leading to a packet rate of 30 691 packets per second. If 5% packet loss occurs in the network (equally 692 distributed, no inter-dependence between receivers), then each 693 receiver will have to report 3 packets lost each two seconds. 694 Assuming a single sender and more then three receivers yields 3.75% 695 of the RTCP bandwidth allocated to the receivers and thus 9.6kbit/s. 696 Assuming further a size of 100 bytes for the average compound RTCP 697 packet allows 12 RTCP packets to be sent per second or 24 in two 698 seconds. If every receiver needs to report three packets, this 699 yields a maximum group size of 8 receivers if all loss events shall 700 be reported. The rules for transmission of immediate RTCP packets 701 should provide sufficient flexibility for most of this reporting to 702 occur in a timely fashion. 704 Extending this example to determine the upper bound for Early RTCP 705 mode leads to the following considerations: assume that the 706 underlying coding scheme and the application (as well as the tolerant 707 users) allow in the order of one loss without repair per two seconds. 708 Thus the number of packets to be reported by each receiver decreases 709 to two per two seconds second and increases the group size to 12. 710 Assuming further that some number of packet losses are correlated, 711 feedback traffic is further reduced and group sizes of some 15 to 20 712 can be reasonably well supported using Early RTCP mode. 714 3.6 Summary of decision steps 716 3.6.1 General Hints 718 Before even considering whether or not to send RTCP feedback 719 information an application has to determine whether this mechanism is 720 applicable: 722 1) An application has to decide whether -- for the current ratio of 723 packet rate with the associated (application-specific) maximum 724 feedback delay and the currently observed round-trip time (if 725 available) -- feedback mechanisms can be applied at all. 727 This decision may obviously be based upon (and dynamically revised 728 following) regular RTCP reception statistics. 730 2) The application has to decide whether -- for a certain observed 731 error rate, assigned bandwidth, frame rate, and group size -- (and 732 which) feedback mechanisms can be applied. 734 Regular RTCP provides valuable input to this step, too. 736 3) If these tests pass, the application has to follow the rules for 737 transmitting Early RTCP packets or regularly scheduled RTCP 738 packets with piggybacked feedback. 740 3.6.2 Session Description Attributes 742 A number of additional SDP parameters may be used to describe a 743 session. These are defined as session level and/or media level 744 attributes: 746 3.6.1.1 RTCP Feedback 748 a=rtcp-fb: {"ack"|"nack"|extension} params 750 This attribute is used to indicate the feedback (to be) supported by 751 the sender. "ack" MUST only be used if the media session is allowed 752 to operate in ACK mode as defined in 3.6.1.2. 754 It is up to the recipients whether or not they send feedback 755 information and up to the sender(s) to make use of feedback provided. 757 3.6.1.2 Unicasting 759 If an m= line in the SDP describing a session indicates unicast 760 addresses for a particular media type (and does not operate in multi- 761 unicast mode with all recipients listed explicitly but still 762 addressed via unicast), the RTCP feedback MAY operate in ACK feedback 763 mode. 765 4. Format of RTCP Feedback messages 767 The general format of the FB messages are defined in [7]. 769 5. Security Considerations 771 RTP packets transporting information with the proposed payload for 772 mat are subject to the security considerations discussed in the RTP 773 specification [1]. This implies that confidentiality of the media 774 streams is achieved by encryption. 776 If the entire stream (extension data and AU data) is to be secured 777 and all the participants are expected to have the keys to decode the 778 entire stream, then the encryption is performed in the usual manner, 779 and there is no conflict between the two operations (encapsulation 780 and encryption). 782 The need for a portion of stream (e.g. extension data) to be 783 encrypted with a different key, or not to be encrypted, would require 784 application level signaling protocols to be aware of the usage of 785 the XT field, and to exchange keys and negotiate their usage on the 786 media and extension data separately. 788 6. Acknowledgements 790 Large parts of the syntax and the text concerned with RPS and NEWPRED 791 were borrowed from an early I-D from Fukunaga et. al. that was 792 concerned with MPEG-4 ES packetization. 794 7. Full Copyright Statement 796 Copyright (C) The Internet Society (2001). All Rights Reserved. 798 This document and translations of it may be copied and furnished to 799 others, and derivative works that comment on or otherwise explain it 800 or assist in its implementation may be prepared, copied, published 801 and distributed, in whole or in part, without restriction of any 802 kind, provided that the above copyright notice and this paragraph are 803 included on all such copies and derivative works. 805 However, this document itself may not be modified in any way, such as 806 by removing the copyright notice or references to the Internet Soci- 807 ety or other Internet organizations, except as needed for the purpose 808 of developing Internet standards in which case the procedures for 809 copyrights defined in the Internet Standards process must be fol- 810 lowed, or as required to translate it into languages other than 811 English. 813 The limited permissions granted above are perpetual and will not be 814 revoked by the Internet Society or its successors or assigns. 816 This document and the information contained herein is provided on an 817 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 818 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 819 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 820 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MER- 821 CHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 823 8. Authors' Addresses 825 Stephan Wenger (stewe@cs.tu-berlin.de) 826 TU Berlin 827 Sekr. FR 6-3 828 Franklinstr. 28-29 829 D-10587 Berlin 830 Germany 832 Joerg Ott (jo@tzi.uni-bremen.de) 833 Universitaet Bremen TZI 834 MZH 5180 835 Bibliothekstr. 1 836 D-28359 Bremen 837 Germany 839 4. Bibliography 841 [1] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, "RTP - 842 A Transport Protocol for Real-time Applications," Internet 843 Draft, draft-ietf-avt-rtp-new-08.txt, Work in Progress, July 844 2000. 846 [2] T. Turletti and C. Huitema, "RTP Payload Format for H.261 Video 847 Streams, RFC 2032, October 1996. 849 [3] C. Perkins, I. Kouvelas, O. Hodson, V. Hardman, M. Handley, J.C. 850 Bolot, A. Vega-Garcia, and S. Fosse-Parisis, "RTP Payload for 851 Redundant Audio Data," RFC 2198, September 1997. 853 [4] C. Bormann, L. Cline, G. Deisher, T. Gardos, C. Maciocco, D. 854 Newell, J. Ott, G. Sullivan, S. Wenger, and C. Zhu, "RTP Payload 855 Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)," 856 RFC 2429, October 1998. 858 [5] C. Perkins and O. Hodson, "2354 Options for Repair of Streaming 859 Media," RFC 2354, June 1998. 861 [6] J. Rosenberg and H. Schulzrinne, "An RTP Payload Format for 862 Generic Forward Error Correction,", RFC 2733, December 1999. 864 [7] S. Fukunaga, N. Sato, K. Yano, A. Miyazaki, K. Hata, R. 865 Hakenberg, C. Burmeister, "Low Delay RTCP Feedback Format," 866 Internet Draft draft-fukunaga-low-delay-rtcp-02.txt, Work in 867 Progress, February 2001. 869 [8] S. Bradner, "Key words for use in RFCs to Indicate Requirement 870 Levels," RFC 2119, March 1997. 872 [9] H. Schulzrinne and S. Petrack, "RTP Payload for DTMF Digits, 873 Telephony Tones and Telephony Signals," RFC 2833, May 2000. 875 [10] H. Schulzrinne and S. Casner, " RTP Profile for Audio and Video 876 Conferences with Minimal Control," Internet Draft draft-ietf- 877 avt-profile-new-09.txt, July 2000. 879 Appendix A: Considerations On Video 881 This section of this memo covers feedback messages for a Picture Loss 882 Indication (PLI), Slice Loss Indication (SLI), and Reference Picture 883 Selection Indication (RPSI). PLI indicates the loss of a full 884 picture and roughly corresponds to the Fast Intra Request known from 885 H.320 systems and from RFC 2032 (H261 packetization). Algorithms 886 using SLI can be found under the acronym Automatic Repeat Request 887 (ARQ) in the signal processing literature. Reference Picture 888 Selection, aka NEWPRED, is available in certain profiles of MPEG-4 889 (version 2 and later) and as an optional mode in H.263 (version 2 and 890 later). The packet format specified in this document is open to 891 extensions so that future feedback mechanisms can easily be 892 integrated. 894 All these messages use the payload specific feedback format as 895 defined in [7], using PT=PSFB and the FMT field to further 896 distinguish between the three subtypes. These messages are defined 897 for payload types indicating H.263 and MPEG-4. 899 Note that the Bit 00 of the first (counting from 1) 32-bit word in 900 the messages described below is placed in Bit 08 of the fourth 901 (counting from 1) 32-bit word of the payload type specific feedback 902 message. 904 A.1 Message Type 1: Picture Loss Indication (PLI) 906 A.1.1 Semantics 908 With the Picture Loss Indication message a decoder informs the 909 encoder about the loss of one or more full pictures 911 A.1.2 Format 913 PLI does not require parameters. Therefore, the length field MUST be 914 0, and there MUST NOT be Feedback Control Information. 916 A.1.3 Timing Rules 918 The timing follows the rules outlined in section 3. In systems that 919 employ both PLI and other FB types it may be advisable to follow the 920 regular RTCP RR timing rules, since PLI is not as delay critical as 921 other FB types. 923 A.1.4 Remarks 925 PLI messages typically trigger the sending of full Intra pictures. 926 Intra Pictures are several times larger then predicted (Inter) 927 pictures. Their size is independent of the time they are generated. 928 In most environments, especially when employing bandwidth-limited 929 links, the use of an Intra picture implies an allowed delay that is a 930 significant multitude of the typical frame duration. An example: If 931 the sending frame rate is 10 fps, and an Intra picture is assumed to 932 be 10 times as big as an Inter picture (not an unrealistic 933 assumption, see [] for details), then a full second of latency has to 934 be accepted. In such an environment there is no need for a 935 particular short delay in sending the feedback message. Hence 936 waiting for the next possible time slot allowed by RFC1889bis RTCP 937 timing rules does not negatively influence system performance. 939 A.2 Message Type 2: Slice Lost Indication 941 A.2.1 Semantics 943 With the Slice Lost Indication a decoder can inform an encoder that 944 it was unable to decode one, or several consecutive, macroblocks. 945 The encoder can take appropriate action in order to re-synchronize 946 encoder and decoder by means of its choice, typically by sending the 947 lost macroblocks in Intra mode. This feedback message SHALL NOT be 948 used for video codecs with non-uniform, dynamically changeable 949 macroblock sizes such as H.263 with enabled Annex Q. In such a case, 950 an encoder cannot always identify the corrupted spatial region. 952 A.2.2 Format 954 When FBT indicates a Slice Lost Indication, then there is one 955 additional UCI field the content of which is in the following format: 957 0 1 2 3 958 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 960 | First | Number | TR | 961 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 963 First: 13 bits 964 The macroblock (MB) address of the first lost macroblock. The MB 965 numbering is done such that the macroblock in the upper left corner 966 of the picture is considered macroblock number 1 and the number for 967 each macroblock increases from left to right and then from top to 968 bottom in raster-scan order (such that if there is a total of N 969 macroblocks in a picture, the bottom right macroblock is considered 970 macroblock number N). 972 Number: 13 bits 973 The number of lost macroblocks, in scan order as discussed above. 975 TR: 6 bits 976 The six least significant bits of the Temporal Reference of the 977 picture. 979 A.2.3 Timing Rules 981 The efficiency of algorithms using the Slice Lost Indication is 982 reduced greatly when the Indication is not transmitted in a timely 983 fashion. Motion compensation propagates corrupted pixels that are 984 not reported as being corrupted. Therefore, the use of the algorithm 985 discussed in section 3 is highly recommended. 987 Constraints on T_dither_max to be discussed. 989 A.2.4 Remarks 991 The First field of the UCI defines the first macroblock of a picture 992 as 1 and not, as one could suspect, as 0. This was done to align 993 this specification with the comparable mechanism available in H.245. 994 The maximum number of macroblocks in a picture (2**13 or 8192) 995 corresponds to the maximum picture sizes of the ITU-T and ISO/IEC 996 video codecs. If future video codecs offer larger picture sizes 997 and/or smaller macroblock sizes, then an additional feedback message 998 has to be defined. The six least significant bits of the Temporal 999 Reference field are deemed to be sufficient to indicate the picture 1000 in which the loss occurred. 1002 Algorithms were reported that keep track of the regions effected by 1003 motion compensation, in order to allow for a transmission of Intra 1004 macroblocks to all those areas, regardless of the timing of the FB 1005 [TBP.]. While, when those algorithms are used, the timing of the FB 1006 is less critical then without, it has to be observed that those 1007 algorithms correct large parts of the picture and, therefore, have to 1008 transmit many for bits in case of delayed FBs. 1010 A.3 Message Type 3: Reference Picture Selection Indication 1012 A.3.1 Semantics 1014 Modern video coding standards such as MPEG-4 visual version 2 or 1015 H.263 version 2 allow the use of older reference pictures then the 1016 most recent one. Typically, a first-in-first-out queue of reference 1017 pictures is maintained. If an encoder has learned about a loss of 1018 encoder-decoder synchronicity, a known-as-correct reference picture 1019 can be used. As this reference picture is temporally further away 1020 then usual, the resulting predictively coded picture will use more 1021 bits. 1023 Both MPEG-4 and H.263 define a binary format for the _payload_ of an 1024 RPSI message that includes information such as the temporal ID of the 1025 damaged picture and the size of the damaged region. This bit string 1026 is typically small _- a couple of dozen bits -_, of variable length, 1027 and self-contained, i.e. contains all information that is necessary 1028 to perform reference picture selection. 1030 Note that both MPEG-4 and H.263 allow the use of RPSI with positive 1031 feedback information as well. That is, all corrected pictures are 1032 reported. Any form of positive feedback MUST NOT be used when in a 1033 multicast environment (reporting positive feedback about individual 1034 reference pictures at RTCP intervals is not expected to be of much 1035 use anyway). For point-to-point communication, positive feedback MAY 1036 be used but, again, the bit rate budget of RTCP feedback will prevent 1037 the use in most scenarios anyway. 1039 A.3.2 Format 1040 When FB indicates an RPSI, then the length field is set to the number 1041 of bits of the following bit string that contains the RPS 1042 information. This bit string follows byte aligned in the UCI field. 1043 Bit padding is used to achieve 32-bit word alignment of the UCI 1044 message (and the whole packet). 1046 A.3.3 Timing Rules 1048 RPS is even more critical to delay then algorithms using SLI. This 1049 is due to the fact that the older the RPS message is, the more bits 1050 the encoder has to spend to achieve encoder-decoder synchronicity. 1051 See [TBP.] for some information about the overhead of RPS for certain 1052 bit rate/frame rate/loss rate scenarios. 1054 Therefore, RPS messages should typically be sent as soon as possible, 1055 employing the algorithm of section 3. 1057 Constraints on T_dither_max to be discussed. 1059 A.3.4 Remarks 1061 TBD.